[COMMITTED] rs6000: Add new builtin __builtin_ppc_atomic_cas_local

Message ID 20260407025629.270663-1-avinashd@linux.ibm.com
State Committed
Commit d5fb79718b22207253d62c92dcf8b1c16e1ea370
Headers
Series [COMMITTED] rs6000: Add new builtin __builtin_ppc_atomic_cas_local |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 fail Patch failed to apply

Commit Message

Avinash Jayakar April 7, 2026, 2:56 a.m. UTC
  This patch adds a new powerpc specific atomic builtin which is similar
to the generic __atomic_compare_exchange builtin.

bool __builtin_ppc_atomic_cas_local (type *ptr, type *expected,
				     type *desired, bool weak,
				     int success_memorder,
				     int failure_memorder)

It behaves like __atomic_compare_exchange(), but it uses an EH value of
1 in the larx (load-and-reserve) instruction. The new builtin helps
optimize lock contention on PowerPC by keeping the lock cacheline in
the local processor longer, reducing performance penalties from
cache coherence protocol traffic.

2026-04-07  Avinash Jayakar  <avinashd@linux.ibm.com>
	    Surya Kumari Jangala  <jskumari@linux.ibm.com>

gcc/ChangeLog:

	* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Add
	logic to handle __builtin_ppc_atomic_cas_local.
	* config/rs6000/rs6000-builtins.def: New builtins for
	__builtin_ppc_atomic_cas_local with types.
	* config/rs6000/rs6000-c.cc (altivec_build_resolved_builtin):
	Handle builtins with up to 6 arguments.
	* config/rs6000/rs6000-overload.def: Overload builtin for
	signed/unsiged char, short, int, long, __int128.
	* config/rs6000/rs6000-protos.h (rs6000_expand_atomic_compare_and_swap):
	Add additional parameter 'local' to the prototype.
	* config/rs6000/rs6000.cc (emit_load_locked): Add new parameter. Pass
	new parameter to generate load-locked instruction.
	(rs6000_expand_atomic_compare_and_swap): Add new parameter. Call
	emit_load_locked() with additional parameter value of EH bit.
	(rs6000_expand_atomic_exchange): Pass EH value 0 to emit_load_locked().
	(rs6000_expand_atomic_op): Likewise.
	* config/rs6000/sync.md (load_locked<mode>): Add new operand in RTL
	template. Specify EH bit in the larx instruction.
	(load_locked<QHI:mode>_si): Likewise.
	(load_lockedpti): Likewise.
	(load_lockedti): Add new operand in RTL template. Pass EH bit to
	gen_load_lockedpti().
	(atomic_compare_and_swap<mode>): Pass new parameter 'false' to
	rs6000_expand_atomic_compare_and_swap.
	(atomic_compare_and_swap_local<mode>): New define_expand.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/acmp-tst.c: New test.
---
 gcc/config/rs6000/rs6000-builtin.cc         | 102 ++++++++++++++++++++
 gcc/config/rs6000/rs6000-builtins.def       |  11 +++
 gcc/config/rs6000/rs6000-c.cc               |  10 +-
 gcc/config/rs6000/rs6000-overload.def       |  22 +++++
 gcc/config/rs6000/rs6000-protos.h           |   2 +-
 gcc/config/rs6000/rs6000.cc                 |  17 ++--
 gcc/config/rs6000/sync.md                   |  37 +++++--
 gcc/testsuite/gcc.target/powerpc/acmp-tst.c |  32 ++++++
 8 files changed, 214 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/acmp-tst.c
  

Comments

Avinash Jayakar April 14, 2026, 5:31 a.m. UTC | #1
Hi all,

Reverted this commit, due to PR124800 on trunk.

Regards,
Avinash

On Tue, 2026-04-07 at 08:26 +0530, Avinash Jayakar wrote:
> This patch adds a new powerpc specific atomic builtin which is
> similar
> to the generic __atomic_compare_exchange builtin.
> 
> bool __builtin_ppc_atomic_cas_local (type *ptr, type *expected,
> 				     type *desired, bool weak,
> 				     int success_memorder,
> 				     int failure_memorder)
> 
> It behaves like __atomic_compare_exchange(), but it uses an EH value
> of
> 1 in the larx (load-and-reserve) instruction. The new builtin helps
> optimize lock contention on PowerPC by keeping the lock cacheline in
> the local processor longer, reducing performance penalties from
> cache coherence protocol traffic.
> 
> 2026-04-07  Avinash Jayakar  <avinashd@linux.ibm.com>
> 	    Surya Kumari Jangala  <jskumari@linux.ibm.com>
> 
> gcc/ChangeLog:
> 
> 	* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
> Add
> 	logic to handle __builtin_ppc_atomic_cas_local.
> 	* config/rs6000/rs6000-builtins.def: New builtins for
> 	__builtin_ppc_atomic_cas_local with types.
> 	* config/rs6000/rs6000-c.cc
> (altivec_build_resolved_builtin):
> 	Handle builtins with up to 6 arguments.
> 	* config/rs6000/rs6000-overload.def: Overload builtin for
> 	signed/unsiged char, short, int, long, __int128.
> 	* config/rs6000/rs6000-protos.h
> (rs6000_expand_atomic_compare_and_swap):
> 	Add additional parameter 'local' to the prototype.
> 	* config/rs6000/rs6000.cc (emit_load_locked): Add new
> parameter. Pass
> 	new parameter to generate load-locked instruction.
> 	(rs6000_expand_atomic_compare_and_swap): Add new parameter.
> Call
> 	emit_load_locked() with additional parameter value of EH
> bit.
> 	(rs6000_expand_atomic_exchange): Pass EH value 0 to
> emit_load_locked().
> 	(rs6000_expand_atomic_op): Likewise.
> 	* config/rs6000/sync.md (load_locked<mode>): Add new operand
> in RTL
> 	template. Specify EH bit in the larx instruction.
> 	(load_locked<QHI:mode>_si): Likewise.
> 	(load_lockedpti): Likewise.
> 	(load_lockedti): Add new operand in RTL template. Pass EH
> bit to
> 	gen_load_lockedpti().
> 	(atomic_compare_and_swap<mode>): Pass new parameter 'false'
> to
> 	rs6000_expand_atomic_compare_and_swap.
> 	(atomic_compare_and_swap_local<mode>): New define_expand.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/powerpc/acmp-tst.c: New test.
> ---
>  gcc/config/rs6000/rs6000-builtin.cc         | 102
> ++++++++++++++++++++
>  gcc/config/rs6000/rs6000-builtins.def       |  11 +++
>  gcc/config/rs6000/rs6000-c.cc               |  10 +-
>  gcc/config/rs6000/rs6000-overload.def       |  22 +++++
>  gcc/config/rs6000/rs6000-protos.h           |   2 +-
>  gcc/config/rs6000/rs6000.cc                 |  17 ++--
>  gcc/config/rs6000/sync.md                   |  37 +++++--
>  gcc/testsuite/gcc.target/powerpc/acmp-tst.c |  32 ++++++
>  8 files changed, 214 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/acmp-tst.c
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 45c88fe063b..bbf60de3b1b 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -3284,6 +3284,108 @@ rs6000_expand_builtin (tree exp, rtx target,
> rtx /* subtarget */,
>        return expand_call (exp, target, ignore);
>      }
>  
> +  if (fcode == RS6000_BIF_PPC_ATOMIC_CAS_QI
> +      || fcode == RS6000_BIF_PPC_ATOMIC_CAS_HI
> +      || fcode == RS6000_BIF_PPC_ATOMIC_CAS_SI
> +      || fcode == RS6000_BIF_PPC_ATOMIC_CAS_DI
> +      || fcode == RS6000_BIF_PPC_ATOMIC_CAS_TI)
> +    {
> +      machine_mode mode; // Get mode based on BIF ID (QImode,
> SImode, etc.)
> +
> +      switch (fcode)
> +	{
> +	case RS6000_BIF_PPC_ATOMIC_CAS_QI:
> +	  mode = QImode;
> +	  icode = CODE_FOR_atomic_compare_and_swap_localqi;
> +	  break;
> +	case RS6000_BIF_PPC_ATOMIC_CAS_HI:
> +	  mode = HImode;
> +	  icode = CODE_FOR_atomic_compare_and_swap_localhi;
> +	  break;
> +	case RS6000_BIF_PPC_ATOMIC_CAS_SI:
> +	  mode = SImode;
> +	  icode = CODE_FOR_atomic_compare_and_swap_localsi;
> +	  break;
> +	case RS6000_BIF_PPC_ATOMIC_CAS_DI:
> +	  mode = DImode;
> +	  icode = CODE_FOR_atomic_compare_and_swap_localdi;
> +	  break;
> +	case RS6000_BIF_PPC_ATOMIC_CAS_TI:
> +	  mode = TImode;
> +	  icode = CODE_FOR_atomic_compare_and_swap_localti;
> +	  break;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +
> +      // Arg 0: ptr (pointer to data)
> +      rtx ptr = expand_normal (CALL_EXPR_ARG (exp, 0));
> +      rtx mem = gen_rtx_MEM (mode, force_reg (Pmode, ptr));
> +
> +      // Arg 1: expected (pointer to value) -> WE MUST DEREFERENCE
> THIS
> +      rtx exp_ptr = expand_normal (CALL_EXPR_ARG (exp, 1));
> +      rtx expected_val = gen_reg_rtx (mode);
> +      emit_move_insn (expected_val, gen_rtx_MEM (mode,
> +						 force_reg (Pmode,
> exp_ptr)));
> +
> +      // Arg 2: desired (value), dereference this as well
> +      rtx desired_ptr = expand_normal (CALL_EXPR_ARG (exp, 2));
> +      rtx desired_val = gen_reg_rtx (mode);
> +      emit_move_insn (desired_val,
> +		      gen_rtx_MEM (mode, force_reg (Pmode,
> desired_ptr)));
> +
> +      // Args 3, 4, 5: weak, succ, fail (constants)
> +      rtx weak = expand_normal (CALL_EXPR_ARG (exp, 3));
> +      rtx succ = expand_normal (CALL_EXPR_ARG (exp, 4));
> +      rtx fail = expand_normal (CALL_EXPR_ARG (exp, 5));
> +
> +      // 0: Boolean return (Output)
> +      struct expand_operand ops[8];
> +      create_output_operand (&ops[0], target, SImode);
> +
> +      // 1: Old value return (Output)
> +      rtx old_val = gen_reg_rtx (mode);
> +      create_output_operand (&ops[1], old_val, mode);
> +
> +      // 2: The Memory (Fixed/Input - it's a MEM rtx)
> +      // We use create_fixed_operand because it's a specific MEM
> location
> +      create_fixed_operand (&ops[2], mem);
> +
> +      // 3: Expected Value (Input)
> +      create_input_operand (&ops[3], expected_val, mode);
> +
> +      // 4: Desired Value (Input)
> +      create_input_operand (&ops[4], desired_val, mode);
> +
> +      // 5, 6, 7: Weak, Success, Failure (Immediate/Constants)
> +      create_input_operand (&ops[5], weak, SImode);
> +      create_input_operand (&ops[6], succ, SImode);
> +      create_input_operand (&ops[7], fail, SImode);
> +
> +      // Now call expand_insn with the ops array
> +      if (!maybe_expand_insn (icode, 8, ops))
> +	error ("invalid arguments to builtin");
> +
> +      // Create a label for the end of the function.
> +      rtx done_label = gen_label_rtx ();
> +
> +      /* Standard Semantics: Update 'expected' ONLY on failure.
> +	 If target (the boolean result) is NOT 0, the CAS succeeded.
> +	 In the case of success, we jump straight to the end.  */
> +
> +      // If target != 0 (Success), skip the store.
> +      emit_cmp_and_jump_insns (target, const0_rtx, NE, NULL_RTX,
> +			       SImode, 1, done_label);
> +
> +      // FAILURE PATH: This code runs only if target == 0.
> +      rtx expected_mem = gen_rtx_MEM (mode, force_reg (Pmode,
> exp_ptr));
> +      emit_move_insn (expected_mem, old_val);
> +
> +      emit_label (done_label);
> +
> +      return target;
> +    }
> +
>    if (bif_is_nosoft (*bifaddr)
>        && rs6000_isa_flags & OPTION_MASK_SOFT_FLOAT)
>      {
> diff --git a/gcc/config/rs6000/rs6000-builtins.def
> b/gcc/config/rs6000/rs6000-builtins.def
> index 7e5a4fb96e7..577c9d6c8f0 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -246,6 +246,17 @@
>    const double __builtin_unpack_longdouble (long double, const
> int<1>);
>      UNPACK_TF unpacktf {ibmld}
>  
> +; Builtins for ppc specific atomic compare exchange
> +  bool __builtin_ppc_atomic_cas_local_qi (char *, char *, char *,
> const int, const int, const int);
> +    PPC_ATOMIC_CAS_QI nothing {}
> +  bool __builtin_ppc_atomic_cas_local_hi (short *, short *, short *,
> const int, const int, const int);
> +    PPC_ATOMIC_CAS_HI nothing {}
> +  bool __builtin_ppc_atomic_cas_local_si (int *, int *, int *, const
> int, const int, const int);
> +    PPC_ATOMIC_CAS_SI nothing {}
> +  bool __builtin_ppc_atomic_cas_local_di (long *, long *, long *,
> const int, const int, const int);
> +    PPC_ATOMIC_CAS_DI nothing {}
> +  bool __builtin_ppc_atomic_cas_local_ti (vsq *, vsq *, vsq *, const
> int, const int, const int);
> +    PPC_ATOMIC_CAS_TI nothing {}
>  
>  ; Builtins that have been around just about forever, but not quite.
>  [power5]
> diff --git a/gcc/config/rs6000/rs6000-c.cc
> b/gcc/config/rs6000/rs6000-c.cc
> index 8c221b71b8d..b230d9a7345 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -900,7 +900,7 @@ altivec_build_resolved_builtin (tree *args, int
> n, tree fntype, tree ret_type,
>  
>    /* If the number of arguments to an overloaded function increases,
>       we must expand this switch.  */
> -  gcc_assert (MAX_OVLD_ARGS <= 4);
> +  gcc_assert (MAX_OVLD_ARGS <= 6);
>  
>    tree call;
>    switch (n)
> @@ -920,6 +920,14 @@ altivec_build_resolved_builtin (tree *args, int
> n, tree fntype, tree ret_type,
>      case 4:
>        call = build_call_expr (fndecl, 4, args[0], args[1], args[2],
> args[3]);
>        break;
> +    case 5:
> +      call = build_call_expr (fndecl, 5, args[0], args[1], args[2],
> args[3],
> +			      args[4]);
> +      break;
> +    case 6:
> +      call = build_call_expr (fndecl, 6, args[0], args[1], args[2],
> args[3],
> +			      args[4], args[5]);
> +      break;
>      default:
>        gcc_unreachable ();
>      }
> diff --git a/gcc/config/rs6000/rs6000-overload.def
> b/gcc/config/rs6000/rs6000-overload.def
> index 5238c81b214..8f2fa978475 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -79,6 +79,28 @@
>  ; a semicolon are also treated as blank lines.
>  
>  
> +[PPC_ATOMIC_CAS, SKIP, __builtin_ppc_atomic_cas_local]
> +  bool __builtin_ppc_atomic_cas_local (signed char *, signed char *,
> signed char *, const int, const int, const int);
> +    PPC_ATOMIC_CAS_QI PPC_ATOMIC_CAS_SQI
> +  bool __builtin_ppc_atomic_cas_local (unsigned char *, unsigned
> char *, unsigned char *, const int, const int, const int);
> +    PPC_ATOMIC_CAS_QI PPC_ATOMIC_CAS_UQI
> +  bool __builtin_ppc_atomic_cas_local (signed short *, signed short
> *, signed short *, const int, const int, const int);
> +    PPC_ATOMIC_CAS_HI PPC_ATOMIC_CAS_SHI
> +  bool __builtin_ppc_atomic_cas_local (unsigned short *, unsigned
> short *, unsigned short *, const int, const int, const int);
> +    PPC_ATOMIC_CAS_HI PPC_ATOMIC_CAS_UHI
> +  bool __builtin_ppc_atomic_cas_local (signed int *, signed int *,
> signed int *, const int, const int, const int);
> +    PPC_ATOMIC_CAS_SI PPC_ATOMIC_CAS_SSI
> +  bool __builtin_ppc_atomic_cas_local (unsigned int *, unsigned int
> *, unsigned int *, const int, const int, const int);
> +    PPC_ATOMIC_CAS_SI PPC_ATOMIC_CAS_USI
> +  bool __builtin_ppc_atomic_cas_local (signed long *, signed long *,
> signed long *, const int, const int, const int);
> +    PPC_ATOMIC_CAS_DI PPC_ATOMIC_CAS_SDI
> +  bool __builtin_ppc_atomic_cas_local (unsigned long *, unsigned
> long *, unsigned long *, const int, const int, const int);
> +    PPC_ATOMIC_CAS_DI PPC_ATOMIC_CAS_UDI
> +  bool __builtin_ppc_atomic_cas_local (vsq *, vsq *, vsq *, const
> int, const int, const int);
> +    PPC_ATOMIC_CAS_TI PPC_ATOMIC_CAS_STI
> +  bool __builtin_ppc_atomic_cas_local (vuq *, vuq *, vuq *, const
> int, const int, const int);
> +    PPC_ATOMIC_CAS_TI PPC_ATOMIC_CAS_UTI
> +
>  [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
>    vsq __builtin_vec_bcdadd (vsq, vsq, const int);
>      BCDADD_V1TI
> diff --git a/gcc/config/rs6000/rs6000-protos.h
> b/gcc/config/rs6000/rs6000-protos.h
> index 09424ebaf97..5efca2d5834 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -127,7 +127,7 @@ extern bool rs6000_emit_set_const (rtx, rtx);
>  extern bool rs6000_emit_cmove (rtx, rtx, rtx, rtx);
>  extern bool rs6000_emit_int_cmove (rtx, rtx, rtx, rtx);
>  extern void rs6000_emit_minmax (rtx, enum rtx_code, rtx, rtx);
> -extern void rs6000_expand_atomic_compare_and_swap (rtx op[]);
> +extern void rs6000_expand_atomic_compare_and_swap (rtx op[], bool
> local);
>  extern rtx swap_endian_selector_for_mode (machine_mode mode);
>  
>  extern void rs6000_expand_atomic_exchange (rtx op[]);
> diff --git a/gcc/config/rs6000/rs6000.cc
> b/gcc/config/rs6000/rs6000.cc
> index 3838059a7e2..42a4d7bb622 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -16746,12 +16746,13 @@ emit_unlikely_jump (rtx cond, rtx label)
>  
>  /* A subroutine of the atomic operation splitters.  Emit a load-
> locked
>     instruction in MODE.  For QI/HImode, possibly use a pattern than
> includes
> -   the zero_extend operation.  */
> +   the zero_extend operation.  LOCAL indicates the EH bit value for
> the
> +   load-locked instruction.  */
>  
>  static void
> -emit_load_locked (machine_mode mode, rtx reg, rtx mem)
> +emit_load_locked (machine_mode mode, rtx reg, rtx mem, rtx local)
>  {
> -  rtx (*fn) (rtx, rtx) = NULL;
> +  rtx (*fn) (rtx, rtx, rtx) = NULL;
>  
>    switch (mode)
>      {
> @@ -16778,7 +16779,7 @@ emit_load_locked (machine_mode mode, rtx reg,
> rtx mem)
>      default:
>        gcc_unreachable ();
>      }
> -  emit_insn (fn (reg, mem));
> +  emit_insn (fn (reg, mem, local));
>  }
>  
>  /* A subroutine of the atomic operation splitters.  Emit a store-
> conditional
> @@ -16948,7 +16949,7 @@ rs6000_finish_atomic_subword (rtx narrow, rtx
> wide, rtx shift)
>  /* Expand an atomic compare and swap operation.  */
>  
>  void
> -rs6000_expand_atomic_compare_and_swap (rtx operands[])
> +rs6000_expand_atomic_compare_and_swap (rtx operands[], bool local)
>  {
>    rtx boolval, retval, mem, oldval, newval, cond;
>    rtx label1, label2, x, mask, shift;
> @@ -17011,7 +17012,7 @@ rs6000_expand_atomic_compare_and_swap (rtx
> operands[])
>      }
>    label2 = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ());
>  
> -  emit_load_locked (mode, retval, mem);
> +  emit_load_locked (mode, retval, mem, local ? const1_rtx :
> const0_rtx);
>  
>    x = retval;
>    if (mask)
> @@ -17109,7 +17110,7 @@ rs6000_expand_atomic_exchange (rtx
> operands[])
>    label = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ());
>    emit_label (XEXP (label, 0));
>  
> -  emit_load_locked (mode, retval, mem);
> +  emit_load_locked (mode, retval, mem, const0_rtx);
>  
>    x = val;
>    if (mask)
> @@ -17214,7 +17215,7 @@ rs6000_expand_atomic_op (enum rtx_code code,
> rtx mem, rtx val,
>    if (before == NULL_RTX)
>      before = gen_reg_rtx (mode);
>  
> -  emit_load_locked (mode, before, mem);
> +  emit_load_locked (mode, before, mem, const0_rtx);
>  
>    if (code == NOT)
>      {
> diff --git a/gcc/config/rs6000/sync.md b/gcc/config/rs6000/sync.md
> index a4e8344ef11..7087daf7e4c 100644
> --- a/gcc/config/rs6000/sync.md
> +++ b/gcc/config/rs6000/sync.md
> @@ -278,17 +278,19 @@ (define_mode_iterator ATOMIC [(QI
> "TARGET_SYNC_HI_QI")
>  (define_insn "load_locked<mode>"
>    [(set (match_operand:ATOMIC 0 "int_reg_operand" "=r")
>  	(unspec_volatile:ATOMIC
> -         [(match_operand:ATOMIC 1 "memory_operand" "Z")]
> UNSPECV_LL))]
> +	  [(match_operand:ATOMIC 1 "memory_operand" "Z")
> +	   (match_operand:QI 2 "u1bit_cint_operand" "n")]
> UNSPECV_LL))]
>    ""
> -  "<larx> %0,%y1"
> +  "<larx> %0,%y1,%2"
>    [(set_attr "type" "load_l")])
>  
>  (define_insn "load_locked<QHI:mode>_si"
>    [(set (match_operand:SI 0 "int_reg_operand" "=r")
>  	(unspec_volatile:SI
> -	  [(match_operand:QHI 1 "memory_operand" "Z")] UNSPECV_LL))]
> +	  [(match_operand:QHI 1 "memory_operand" "Z")
> +	   (match_operand:QI 2 "u1bit_cint_operand" "n")]
> UNSPECV_LL))]
>    "TARGET_SYNC_HI_QI"
> -  "<QHI:larx> %0,%y1"
> +  "<QHI:larx> %0,%y1,%2"
>    [(set_attr "type" "load_l")])
>  
>  ;; Use PTImode to get even/odd register pairs.
> @@ -302,7 +304,8 @@ (define_insn "load_locked<QHI:mode>_si"
>  
>  (define_expand "load_lockedti"
>    [(use (match_operand:TI 0 "quad_int_reg_operand"))
> -   (use (match_operand:TI 1 "memory_operand"))]
> +   (use (match_operand:TI 1 "memory_operand"))
> +   (use (match_operand:QI 2 "u1bit_cint_operand"))]
>    "TARGET_SYNC_TI"
>  {
>    rtx op0 = operands[0];
> @@ -316,7 +319,7 @@ (define_expand "load_lockedti"
>        operands[1] = op1 = change_address (op1, TImode, new_addr);
>      }
>  
> -  emit_insn (gen_load_lockedpti (pti, op1));
> +  emit_insn (gen_load_lockedpti (pti, op1, operands[2]));
>    if (WORDS_BIG_ENDIAN)
>      emit_move_insn (op0, gen_lowpart (TImode, pti));
>    else
> @@ -330,11 +333,12 @@ (define_expand "load_lockedti"
>  (define_insn "load_lockedpti"
>    [(set (match_operand:PTI 0 "quad_int_reg_operand" "=&r")
>  	(unspec_volatile:PTI
> -         [(match_operand:TI 1 "indexed_or_indirect_operand" "Z")]
> UNSPECV_LL))]
> +	  [(match_operand:TI 1 "indexed_or_indirect_operand" "Z")
> +	   (match_operand:QI 2 "u1bit_cint_operand" "n")]
> UNSPECV_LL))]
>    "TARGET_SYNC_TI
>     && !reg_mentioned_p (operands[0], operands[1])
>     && quad_int_reg_operand (operands[0], PTImode)"
> -  "lqarx %0,%y1"
> +  "lqarx %0,%y1,%2"
>    [(set_attr "type" "load_l")
>     (set_attr "size" "128")])
>  
> @@ -411,7 +415,22 @@ (define_expand "atomic_compare_and_swap<mode>"
>     (match_operand:SI 7 "const_int_operand")]		;; model
> fail
>    ""
>  {
> -  rs6000_expand_atomic_compare_and_swap (operands);
> +  rs6000_expand_atomic_compare_and_swap (operands, false);
> +  DONE;
> +})
> +
> +(define_expand "atomic_compare_and_swap_local<mode>"
> +  [(match_operand:SI 0 "int_reg_operand")		;; bool out
> +   (match_operand:AINT 1 "int_reg_operand")		;; val out
> +   (match_operand:AINT 2 "memory_operand")		;; memory
> +   (match_operand:AINT 3 "reg_or_short_operand")	;; expected
> +   (match_operand:AINT 4 "int_reg_operand")		;; desired
> +   (match_operand:SI 5 "const_int_operand")		;; is_weak
> +   (match_operand:SI 6 "const_int_operand")		;; model
> succ
> +   (match_operand:SI 7 "const_int_operand")]		;; model
> fail
> +  ""
> +{
> +  rs6000_expand_atomic_compare_and_swap (operands, true);
>    DONE;
>  })
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/acmp-tst.c
> b/gcc/testsuite/gcc.target/powerpc/acmp-tst.c
> new file mode 100644
> index 00000000000..6ebd2ebbc28
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/acmp-tst.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define TESTS \
> +  X(signed char, qi) \
> +  X(unsigned char, uqi) \
> +  X(short, hi) \
> +  X(signed short, shi) \
> +  X(unsigned short, uhi) \
> +  X(int, si) \
> +  X(signed int, ssi) \
> +  X(unsigned int, usi) \
> +  X(long, di) \
> +  X(signed long, sdi) \
> +  X(unsigned long, udi) \
> +  X(vector signed __int128, sti) \
> +  X(vector unsigned __int128, uti)
> +
> +#define X(T, name) \
> +bool word_exchange_##name (T *ptr, T *expected, T * desired) \
> +{ \
> +  return __builtin_ppc_atomic_cas_local (ptr, expected, desired, 0,
> \
> +					 __ATOMIC_SEQ_CST,
> __ATOMIC_ACQUIRE); \
> +}
> +
> +TESTS
> +
> +/* { dg-final { scan-assembler-times {\mlbarx +[0-9]+,[0-9]+,[0-
> 9]+,1} 2 } } */
> +/* { dg-final { scan-assembler-times {\mlharx +[0-9]+,[0-9]+,[0-
> 9]+,1} 3 } } */
> +/* { dg-final { scan-assembler-times {\mlwarx +[0-9]+,[0-9]+,[0-
> 9]+,1} 3 } } */
> +/* { dg-final { scan-assembler-times {\mldarx +[0-9]+,[0-9]+,[0-
> 9]+,1} 3 } } */
> +/* { dg-final { scan-assembler-times {\mlqarx +[0-9]+,[0-9]+,[0-
> 9]+,1} 2 } } */
  

Patch

diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc
index 45c88fe063b..bbf60de3b1b 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -3284,6 +3284,108 @@  rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
       return expand_call (exp, target, ignore);
     }
 
+  if (fcode == RS6000_BIF_PPC_ATOMIC_CAS_QI
+      || fcode == RS6000_BIF_PPC_ATOMIC_CAS_HI
+      || fcode == RS6000_BIF_PPC_ATOMIC_CAS_SI
+      || fcode == RS6000_BIF_PPC_ATOMIC_CAS_DI
+      || fcode == RS6000_BIF_PPC_ATOMIC_CAS_TI)
+    {
+      machine_mode mode; // Get mode based on BIF ID (QImode, SImode, etc.)
+
+      switch (fcode)
+	{
+	case RS6000_BIF_PPC_ATOMIC_CAS_QI:
+	  mode = QImode;
+	  icode = CODE_FOR_atomic_compare_and_swap_localqi;
+	  break;
+	case RS6000_BIF_PPC_ATOMIC_CAS_HI:
+	  mode = HImode;
+	  icode = CODE_FOR_atomic_compare_and_swap_localhi;
+	  break;
+	case RS6000_BIF_PPC_ATOMIC_CAS_SI:
+	  mode = SImode;
+	  icode = CODE_FOR_atomic_compare_and_swap_localsi;
+	  break;
+	case RS6000_BIF_PPC_ATOMIC_CAS_DI:
+	  mode = DImode;
+	  icode = CODE_FOR_atomic_compare_and_swap_localdi;
+	  break;
+	case RS6000_BIF_PPC_ATOMIC_CAS_TI:
+	  mode = TImode;
+	  icode = CODE_FOR_atomic_compare_and_swap_localti;
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+
+      // Arg 0: ptr (pointer to data)
+      rtx ptr = expand_normal (CALL_EXPR_ARG (exp, 0));
+      rtx mem = gen_rtx_MEM (mode, force_reg (Pmode, ptr));
+
+      // Arg 1: expected (pointer to value) -> WE MUST DEREFERENCE THIS
+      rtx exp_ptr = expand_normal (CALL_EXPR_ARG (exp, 1));
+      rtx expected_val = gen_reg_rtx (mode);
+      emit_move_insn (expected_val, gen_rtx_MEM (mode,
+						 force_reg (Pmode, exp_ptr)));
+
+      // Arg 2: desired (value), dereference this as well
+      rtx desired_ptr = expand_normal (CALL_EXPR_ARG (exp, 2));
+      rtx desired_val = gen_reg_rtx (mode);
+      emit_move_insn (desired_val,
+		      gen_rtx_MEM (mode, force_reg (Pmode, desired_ptr)));
+
+      // Args 3, 4, 5: weak, succ, fail (constants)
+      rtx weak = expand_normal (CALL_EXPR_ARG (exp, 3));
+      rtx succ = expand_normal (CALL_EXPR_ARG (exp, 4));
+      rtx fail = expand_normal (CALL_EXPR_ARG (exp, 5));
+
+      // 0: Boolean return (Output)
+      struct expand_operand ops[8];
+      create_output_operand (&ops[0], target, SImode);
+
+      // 1: Old value return (Output)
+      rtx old_val = gen_reg_rtx (mode);
+      create_output_operand (&ops[1], old_val, mode);
+
+      // 2: The Memory (Fixed/Input - it's a MEM rtx)
+      // We use create_fixed_operand because it's a specific MEM location
+      create_fixed_operand (&ops[2], mem);
+
+      // 3: Expected Value (Input)
+      create_input_operand (&ops[3], expected_val, mode);
+
+      // 4: Desired Value (Input)
+      create_input_operand (&ops[4], desired_val, mode);
+
+      // 5, 6, 7: Weak, Success, Failure (Immediate/Constants)
+      create_input_operand (&ops[5], weak, SImode);
+      create_input_operand (&ops[6], succ, SImode);
+      create_input_operand (&ops[7], fail, SImode);
+
+      // Now call expand_insn with the ops array
+      if (!maybe_expand_insn (icode, 8, ops))
+	error ("invalid arguments to builtin");
+
+      // Create a label for the end of the function.
+      rtx done_label = gen_label_rtx ();
+
+      /* Standard Semantics: Update 'expected' ONLY on failure.
+	 If target (the boolean result) is NOT 0, the CAS succeeded.
+	 In the case of success, we jump straight to the end.  */
+
+      // If target != 0 (Success), skip the store.
+      emit_cmp_and_jump_insns (target, const0_rtx, NE, NULL_RTX,
+			       SImode, 1, done_label);
+
+      // FAILURE PATH: This code runs only if target == 0.
+      rtx expected_mem = gen_rtx_MEM (mode, force_reg (Pmode, exp_ptr));
+      emit_move_insn (expected_mem, old_val);
+
+      emit_label (done_label);
+
+      return target;
+    }
+
   if (bif_is_nosoft (*bifaddr)
       && rs6000_isa_flags & OPTION_MASK_SOFT_FLOAT)
     {
diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def
index 7e5a4fb96e7..577c9d6c8f0 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -246,6 +246,17 @@ 
   const double __builtin_unpack_longdouble (long double, const int<1>);
     UNPACK_TF unpacktf {ibmld}
 
+; Builtins for ppc specific atomic compare exchange
+  bool __builtin_ppc_atomic_cas_local_qi (char *, char *, char *, const int, const int, const int);
+    PPC_ATOMIC_CAS_QI nothing {}
+  bool __builtin_ppc_atomic_cas_local_hi (short *, short *, short *, const int, const int, const int);
+    PPC_ATOMIC_CAS_HI nothing {}
+  bool __builtin_ppc_atomic_cas_local_si (int *, int *, int *, const int, const int, const int);
+    PPC_ATOMIC_CAS_SI nothing {}
+  bool __builtin_ppc_atomic_cas_local_di (long *, long *, long *, const int, const int, const int);
+    PPC_ATOMIC_CAS_DI nothing {}
+  bool __builtin_ppc_atomic_cas_local_ti (vsq *, vsq *, vsq *, const int, const int, const int);
+    PPC_ATOMIC_CAS_TI nothing {}
 
 ; Builtins that have been around just about forever, but not quite.
 [power5]
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 8c221b71b8d..b230d9a7345 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -900,7 +900,7 @@  altivec_build_resolved_builtin (tree *args, int n, tree fntype, tree ret_type,
 
   /* If the number of arguments to an overloaded function increases,
      we must expand this switch.  */
-  gcc_assert (MAX_OVLD_ARGS <= 4);
+  gcc_assert (MAX_OVLD_ARGS <= 6);
 
   tree call;
   switch (n)
@@ -920,6 +920,14 @@  altivec_build_resolved_builtin (tree *args, int n, tree fntype, tree ret_type,
     case 4:
       call = build_call_expr (fndecl, 4, args[0], args[1], args[2], args[3]);
       break;
+    case 5:
+      call = build_call_expr (fndecl, 5, args[0], args[1], args[2], args[3],
+			      args[4]);
+      break;
+    case 6:
+      call = build_call_expr (fndecl, 6, args[0], args[1], args[2], args[3],
+			      args[4], args[5]);
+      break;
     default:
       gcc_unreachable ();
     }
diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def
index 5238c81b214..8f2fa978475 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -79,6 +79,28 @@ 
 ; a semicolon are also treated as blank lines.
 
 
+[PPC_ATOMIC_CAS, SKIP, __builtin_ppc_atomic_cas_local]
+  bool __builtin_ppc_atomic_cas_local (signed char *, signed char *, signed char *, const int, const int, const int);
+    PPC_ATOMIC_CAS_QI PPC_ATOMIC_CAS_SQI
+  bool __builtin_ppc_atomic_cas_local (unsigned char *, unsigned char *, unsigned char *, const int, const int, const int);
+    PPC_ATOMIC_CAS_QI PPC_ATOMIC_CAS_UQI
+  bool __builtin_ppc_atomic_cas_local (signed short *, signed short *, signed short *, const int, const int, const int);
+    PPC_ATOMIC_CAS_HI PPC_ATOMIC_CAS_SHI
+  bool __builtin_ppc_atomic_cas_local (unsigned short *, unsigned short *, unsigned short *, const int, const int, const int);
+    PPC_ATOMIC_CAS_HI PPC_ATOMIC_CAS_UHI
+  bool __builtin_ppc_atomic_cas_local (signed int *, signed int *, signed int *, const int, const int, const int);
+    PPC_ATOMIC_CAS_SI PPC_ATOMIC_CAS_SSI
+  bool __builtin_ppc_atomic_cas_local (unsigned int *, unsigned int *, unsigned int *, const int, const int, const int);
+    PPC_ATOMIC_CAS_SI PPC_ATOMIC_CAS_USI
+  bool __builtin_ppc_atomic_cas_local (signed long *, signed long *, signed long *, const int, const int, const int);
+    PPC_ATOMIC_CAS_DI PPC_ATOMIC_CAS_SDI
+  bool __builtin_ppc_atomic_cas_local (unsigned long *, unsigned long *, unsigned long *, const int, const int, const int);
+    PPC_ATOMIC_CAS_DI PPC_ATOMIC_CAS_UDI
+  bool __builtin_ppc_atomic_cas_local (vsq *, vsq *, vsq *, const int, const int, const int);
+    PPC_ATOMIC_CAS_TI PPC_ATOMIC_CAS_STI
+  bool __builtin_ppc_atomic_cas_local (vuq *, vuq *, vuq *, const int, const int, const int);
+    PPC_ATOMIC_CAS_TI PPC_ATOMIC_CAS_UTI
+
 [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
   vsq __builtin_vec_bcdadd (vsq, vsq, const int);
     BCDADD_V1TI
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 09424ebaf97..5efca2d5834 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -127,7 +127,7 @@  extern bool rs6000_emit_set_const (rtx, rtx);
 extern bool rs6000_emit_cmove (rtx, rtx, rtx, rtx);
 extern bool rs6000_emit_int_cmove (rtx, rtx, rtx, rtx);
 extern void rs6000_emit_minmax (rtx, enum rtx_code, rtx, rtx);
-extern void rs6000_expand_atomic_compare_and_swap (rtx op[]);
+extern void rs6000_expand_atomic_compare_and_swap (rtx op[], bool local);
 extern rtx swap_endian_selector_for_mode (machine_mode mode);
 
 extern void rs6000_expand_atomic_exchange (rtx op[]);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 3838059a7e2..42a4d7bb622 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -16746,12 +16746,13 @@  emit_unlikely_jump (rtx cond, rtx label)
 
 /* A subroutine of the atomic operation splitters.  Emit a load-locked
    instruction in MODE.  For QI/HImode, possibly use a pattern than includes
-   the zero_extend operation.  */
+   the zero_extend operation.  LOCAL indicates the EH bit value for the
+   load-locked instruction.  */
 
 static void
-emit_load_locked (machine_mode mode, rtx reg, rtx mem)
+emit_load_locked (machine_mode mode, rtx reg, rtx mem, rtx local)
 {
-  rtx (*fn) (rtx, rtx) = NULL;
+  rtx (*fn) (rtx, rtx, rtx) = NULL;
 
   switch (mode)
     {
@@ -16778,7 +16779,7 @@  emit_load_locked (machine_mode mode, rtx reg, rtx mem)
     default:
       gcc_unreachable ();
     }
-  emit_insn (fn (reg, mem));
+  emit_insn (fn (reg, mem, local));
 }
 
 /* A subroutine of the atomic operation splitters.  Emit a store-conditional
@@ -16948,7 +16949,7 @@  rs6000_finish_atomic_subword (rtx narrow, rtx wide, rtx shift)
 /* Expand an atomic compare and swap operation.  */
 
 void
-rs6000_expand_atomic_compare_and_swap (rtx operands[])
+rs6000_expand_atomic_compare_and_swap (rtx operands[], bool local)
 {
   rtx boolval, retval, mem, oldval, newval, cond;
   rtx label1, label2, x, mask, shift;
@@ -17011,7 +17012,7 @@  rs6000_expand_atomic_compare_and_swap (rtx operands[])
     }
   label2 = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ());
 
-  emit_load_locked (mode, retval, mem);
+  emit_load_locked (mode, retval, mem, local ? const1_rtx : const0_rtx);
 
   x = retval;
   if (mask)
@@ -17109,7 +17110,7 @@  rs6000_expand_atomic_exchange (rtx operands[])
   label = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ());
   emit_label (XEXP (label, 0));
 
-  emit_load_locked (mode, retval, mem);
+  emit_load_locked (mode, retval, mem, const0_rtx);
 
   x = val;
   if (mask)
@@ -17214,7 +17215,7 @@  rs6000_expand_atomic_op (enum rtx_code code, rtx mem, rtx val,
   if (before == NULL_RTX)
     before = gen_reg_rtx (mode);
 
-  emit_load_locked (mode, before, mem);
+  emit_load_locked (mode, before, mem, const0_rtx);
 
   if (code == NOT)
     {
diff --git a/gcc/config/rs6000/sync.md b/gcc/config/rs6000/sync.md
index a4e8344ef11..7087daf7e4c 100644
--- a/gcc/config/rs6000/sync.md
+++ b/gcc/config/rs6000/sync.md
@@ -278,17 +278,19 @@  (define_mode_iterator ATOMIC [(QI "TARGET_SYNC_HI_QI")
 (define_insn "load_locked<mode>"
   [(set (match_operand:ATOMIC 0 "int_reg_operand" "=r")
 	(unspec_volatile:ATOMIC
-         [(match_operand:ATOMIC 1 "memory_operand" "Z")] UNSPECV_LL))]
+	  [(match_operand:ATOMIC 1 "memory_operand" "Z")
+	   (match_operand:QI 2 "u1bit_cint_operand" "n")] UNSPECV_LL))]
   ""
-  "<larx> %0,%y1"
+  "<larx> %0,%y1,%2"
   [(set_attr "type" "load_l")])
 
 (define_insn "load_locked<QHI:mode>_si"
   [(set (match_operand:SI 0 "int_reg_operand" "=r")
 	(unspec_volatile:SI
-	  [(match_operand:QHI 1 "memory_operand" "Z")] UNSPECV_LL))]
+	  [(match_operand:QHI 1 "memory_operand" "Z")
+	   (match_operand:QI 2 "u1bit_cint_operand" "n")] UNSPECV_LL))]
   "TARGET_SYNC_HI_QI"
-  "<QHI:larx> %0,%y1"
+  "<QHI:larx> %0,%y1,%2"
   [(set_attr "type" "load_l")])
 
 ;; Use PTImode to get even/odd register pairs.
@@ -302,7 +304,8 @@  (define_insn "load_locked<QHI:mode>_si"
 
 (define_expand "load_lockedti"
   [(use (match_operand:TI 0 "quad_int_reg_operand"))
-   (use (match_operand:TI 1 "memory_operand"))]
+   (use (match_operand:TI 1 "memory_operand"))
+   (use (match_operand:QI 2 "u1bit_cint_operand"))]
   "TARGET_SYNC_TI"
 {
   rtx op0 = operands[0];
@@ -316,7 +319,7 @@  (define_expand "load_lockedti"
       operands[1] = op1 = change_address (op1, TImode, new_addr);
     }
 
-  emit_insn (gen_load_lockedpti (pti, op1));
+  emit_insn (gen_load_lockedpti (pti, op1, operands[2]));
   if (WORDS_BIG_ENDIAN)
     emit_move_insn (op0, gen_lowpart (TImode, pti));
   else
@@ -330,11 +333,12 @@  (define_expand "load_lockedti"
 (define_insn "load_lockedpti"
   [(set (match_operand:PTI 0 "quad_int_reg_operand" "=&r")
 	(unspec_volatile:PTI
-         [(match_operand:TI 1 "indexed_or_indirect_operand" "Z")] UNSPECV_LL))]
+	  [(match_operand:TI 1 "indexed_or_indirect_operand" "Z")
+	   (match_operand:QI 2 "u1bit_cint_operand" "n")] UNSPECV_LL))]
   "TARGET_SYNC_TI
    && !reg_mentioned_p (operands[0], operands[1])
    && quad_int_reg_operand (operands[0], PTImode)"
-  "lqarx %0,%y1"
+  "lqarx %0,%y1,%2"
   [(set_attr "type" "load_l")
    (set_attr "size" "128")])
 
@@ -411,7 +415,22 @@  (define_expand "atomic_compare_and_swap<mode>"
    (match_operand:SI 7 "const_int_operand")]		;; model fail
   ""
 {
-  rs6000_expand_atomic_compare_and_swap (operands);
+  rs6000_expand_atomic_compare_and_swap (operands, false);
+  DONE;
+})
+
+(define_expand "atomic_compare_and_swap_local<mode>"
+  [(match_operand:SI 0 "int_reg_operand")		;; bool out
+   (match_operand:AINT 1 "int_reg_operand")		;; val out
+   (match_operand:AINT 2 "memory_operand")		;; memory
+   (match_operand:AINT 3 "reg_or_short_operand")	;; expected
+   (match_operand:AINT 4 "int_reg_operand")		;; desired
+   (match_operand:SI 5 "const_int_operand")		;; is_weak
+   (match_operand:SI 6 "const_int_operand")		;; model succ
+   (match_operand:SI 7 "const_int_operand")]		;; model fail
+  ""
+{
+  rs6000_expand_atomic_compare_and_swap (operands, true);
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/powerpc/acmp-tst.c b/gcc/testsuite/gcc.target/powerpc/acmp-tst.c
new file mode 100644
index 00000000000..6ebd2ebbc28
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/acmp-tst.c
@@ -0,0 +1,32 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define TESTS \
+  X(signed char, qi) \
+  X(unsigned char, uqi) \
+  X(short, hi) \
+  X(signed short, shi) \
+  X(unsigned short, uhi) \
+  X(int, si) \
+  X(signed int, ssi) \
+  X(unsigned int, usi) \
+  X(long, di) \
+  X(signed long, sdi) \
+  X(unsigned long, udi) \
+  X(vector signed __int128, sti) \
+  X(vector unsigned __int128, uti)
+
+#define X(T, name) \
+bool word_exchange_##name (T *ptr, T *expected, T * desired) \
+{ \
+  return __builtin_ppc_atomic_cas_local (ptr, expected, desired, 0, \
+					 __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE); \
+}
+
+TESTS
+
+/* { dg-final { scan-assembler-times {\mlbarx +[0-9]+,[0-9]+,[0-9]+,1} 2 } } */
+/* { dg-final { scan-assembler-times {\mlharx +[0-9]+,[0-9]+,[0-9]+,1} 3 } } */
+/* { dg-final { scan-assembler-times {\mlwarx +[0-9]+,[0-9]+,[0-9]+,1} 3 } } */
+/* { dg-final { scan-assembler-times {\mldarx +[0-9]+,[0-9]+,[0-9]+,1} 3 } } */
+/* { dg-final { scan-assembler-times {\mlqarx +[0-9]+,[0-9]+,[0-9]+,1} 2 } } */