[RFC] powerpc: Fix missing barriers in atomic_exchange_and_add_{acq,rel}

Message ID 1413921274.8483.65.camel@triegel.csb
State Superseded
Delegated to: Adhemerval Zanella Netto
Headers

Commit Message

Torvald Riegel Oct. 21, 2014, 7:54 p.m. UTC
  On powerpc, atomic_exchange_and_add is implemented without any barriers.
However, atomic_exchange_and_add_acq and atomic_exchange_and_add_rel
(which supposedly should have acquire / release barrier semantics) both
fall back to atomic_exchange_and_add if they are not defined (see
include/atomic.h).  I have not reviewed existing code to see whether
this would indeed cause a bug, but this lack of barriers likely is a
(future) fault, and prevents any use of
atomic_exchange_and_add_{acq,rel} that actually rely on the barrier
semantics.  Therefore, this patch defines
atomic_exchange_and_add_{acq,rel} using atomic_read_barrier /
atomic_write_barrier.

I have NOT tested this.  Can somebody who cares about powerpc please
have a look and test this?  Thanks!
  

Comments

Steven Munroe Oct. 24, 2014, 1:15 p.m. UTC | #1
On Tue, 2014-10-21 at 21:54 +0200, Torvald Riegel wrote:
> On powerpc, atomic_exchange_and_add is implemented without any barriers.
> However, atomic_exchange_and_add_acq and atomic_exchange_and_add_rel
> (which supposedly should have acquire / release barrier semantics) both
> fall back to atomic_exchange_and_add if they are not defined (see
> include/atomic.h).  I have not reviewed existing code to see whether
> this would indeed cause a bug, but this lack of barriers likely is a
> (future) fault, and prevents any use of
> atomic_exchange_and_add_{acq,rel} that actually rely on the barrier
> semantics.  Therefore, this patch defines
> atomic_exchange_and_add_{acq,rel} using atomic_read_barrier /
> atomic_write_barrier.
> 
> I have NOT tested this.  Can somebody who cares about powerpc please
> have a look and test this?  Thanks!

This is incorrect. Do not apply these changes.

The requirement for memory barriers depend on how the result of the
exchange and add is used. For example if the value is a simple counter
or the original value (result of the larx load) is used as an address
there is NO need for a memory barrier. 

See PowerISA-2.07 BookII, Appendix B.2.1.2

So this is unnecessarily pessimistic and should not be the default.
  
Rich Felker Oct. 25, 2014, 5:49 a.m. UTC | #2
On Fri, Oct 24, 2014 at 08:15:04AM -0500, Steven Munroe wrote:
> On Tue, 2014-10-21 at 21:54 +0200, Torvald Riegel wrote:
> > On powerpc, atomic_exchange_and_add is implemented without any barriers.
> > However, atomic_exchange_and_add_acq and atomic_exchange_and_add_rel
> > (which supposedly should have acquire / release barrier semantics) both
> > fall back to atomic_exchange_and_add if they are not defined (see
> > include/atomic.h).  I have not reviewed existing code to see whether
> > this would indeed cause a bug, but this lack of barriers likely is a
> > (future) fault, and prevents any use of
> > atomic_exchange_and_add_{acq,rel} that actually rely on the barrier
> > semantics.  Therefore, this patch defines
> > atomic_exchange_and_add_{acq,rel} using atomic_read_barrier /
> > atomic_write_barrier.
> > 
> > I have NOT tested this.  Can somebody who cares about powerpc please
> > have a look and test this?  Thanks!
> 
> This is incorrect. Do not apply these changes.
> 
> The requirement for memory barriers depend on how the result of the
> exchange and add is used. For example if the value is a simple counter
> or the original value (result of the larx load) is used as an address
> there is NO need for a memory barrier. 
> 
> See PowerISA-2.07 BookII, Appendix B.2.1.2
> 
> So this is unnecessarily pessimistic and should not be the default.

I think you misread Torvald's email/patch. The idea is not to make
barriers the default, but to provide working versions with acquire and
release barriers for callers that need them since the default lacks
any barriers.

Rich
  
Torvald Riegel Oct. 27, 2014, 2:43 p.m. UTC | #3
On Fri, 2014-10-24 at 08:15 -0500, Steven Munroe wrote:
> On Tue, 2014-10-21 at 21:54 +0200, Torvald Riegel wrote:
> > On powerpc, atomic_exchange_and_add is implemented without any barriers.
> > However, atomic_exchange_and_add_acq and atomic_exchange_and_add_rel
> > (which supposedly should have acquire / release barrier semantics) both
> > fall back to atomic_exchange_and_add if they are not defined (see
> > include/atomic.h).  I have not reviewed existing code to see whether
> > this would indeed cause a bug, but this lack of barriers likely is a
> > (future) fault, and prevents any use of
> > atomic_exchange_and_add_{acq,rel} that actually rely on the barrier
> > semantics.  Therefore, this patch defines
> > atomic_exchange_and_add_{acq,rel} using atomic_read_barrier /
> > atomic_write_barrier.
> > 
> > I have NOT tested this.  Can somebody who cares about powerpc please
> > have a look and test this?  Thanks!
> 
> This is incorrect. Do not apply these changes.

Have a look at the different variants of atomic_exchange_and_add*.  The
_acq and _rel suffixes are used on the current atomic ops to designate
those operations that have acquire and release barrier semantics.

> The requirement for memory barriers depend on how the result of the
> exchange and add is used.

Yes, see above.  Note that this doesn't add any barriers to
atomic_exchange_and_add, just to atomic_exchange_and_add_acq and
atomic_exchange_and_add_rel.

If you are concerned about uses of unnecessary memory barriers on
powerpc, the way forward is to help review concurrent code in glibc and
help moving it over to the C11 memory model (see below).

> For example if the value is a simple counter
> or the original value (result of the larx load) is used as an address
> there is NO need for a memory barrier. 

I do understand the concept of memory barriers, and their difference to
atomic HW operations.

Please also note that we won't be using address dependencies or such in
the near future, because the language support isn't there yet.  We don't
have the resources to do the complicated dance with the compiler that
the Linux kernel people do.  See
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4215.pdf for
more background.

> See PowerISA-2.07 BookII, Appendix B.2.1.2
> 
> So this is unnecessarily pessimistic and should not be the default.

It is not the default.  And there are no defaults really, just that some
implementations pick defaults.

BTW, we're very likely to transition to using the C11 memory model.  So
at that point, all barriers will be explicit in the atomic operations,
and we'll add actual memory_order_relaxed atomic operations (including
for CAS and other atomic RMW ops).  There won't be any default in the
C11-like atomics we use.
  
Torvald Riegel Nov. 20, 2014, 11:04 a.m. UTC | #4
On Tue, 2014-10-21 at 21:54 +0200, Torvald Riegel wrote:
> On powerpc, atomic_exchange_and_add is implemented without any barriers.
> However, atomic_exchange_and_add_acq and atomic_exchange_and_add_rel
> (which supposedly should have acquire / release barrier semantics) both
> fall back to atomic_exchange_and_add if they are not defined (see
> include/atomic.h).  I have not reviewed existing code to see whether
> this would indeed cause a bug, but this lack of barriers likely is a
> (future) fault, and prevents any use of
> atomic_exchange_and_add_{acq,rel} that actually rely on the barrier
> semantics.  Therefore, this patch defines
> atomic_exchange_and_add_{acq,rel} using atomic_read_barrier /
> atomic_write_barrier.
> 
> I have NOT tested this.  Can somebody who cares about powerpc please
> have a look and test this?  Thanks!

Ping.
  
Adhemerval Zanella Netto Nov. 20, 2014, 11:07 a.m. UTC | #5
On 20-11-2014 09:04, Torvald Riegel wrote:
> On Tue, 2014-10-21 at 21:54 +0200, Torvald Riegel wrote:
>> On powerpc, atomic_exchange_and_add is implemented without any barriers.
>> However, atomic_exchange_and_add_acq and atomic_exchange_and_add_rel
>> (which supposedly should have acquire / release barrier semantics) both
>> fall back to atomic_exchange_and_add if they are not defined (see
>> include/atomic.h).  I have not reviewed existing code to see whether
>> this would indeed cause a bug, but this lack of barriers likely is a
>> (future) fault, and prevents any use of
>> atomic_exchange_and_add_{acq,rel} that actually rely on the barrier
>> semantics.  Therefore, this patch defines
>> atomic_exchange_and_add_{acq,rel} using atomic_read_barrier /
>> atomic_write_barrier.
>>
>> I have NOT tested this.  Can somebody who cares about powerpc please
>> have a look and test this?  Thanks!
> Ping.
>
>
Sorry, I missed it on my radar. I will review and test today, thanks.
  
Adhemerval Zanella Netto Nov. 25, 2014, 3:07 p.m. UTC | #6
Hi Torvald,

On 21-10-2014 17:54, Torvald Riegel wrote:
> 				      \
>    })
> +#define atomic_exchange_and_add_acq(mem, value) \
> +  ({									      \
> +    __typeof (*(mem)) __result2;					      \
> +    __result2 = atomic_exchange_and_add (mem, value);			      \
> +    atomic_read_barrier ();						      \
> +    __result2;		

Although it is not wrong by using a 'atomic_read_barrier' (lwsync), it adds a more 
expensive synchronization than required (isync).  I would prefer if we use the
already defined __arch_compare_and_exchange_val_[32|64]_[acq|rel] operations on powerpc.

> 						      \
> +  })
> +#define atomic_exchange_and_add_rel(mem, value) \
> +  ({									      \
> +    atomic_write_barrier ();						      \
> +    atomic_exchange_and_add (mem, value);				      \
> +  })
>
>  #define atomic_increment_val(mem) \
>    ({									      \
  
Torvald Riegel Nov. 25, 2014, 3:39 p.m. UTC | #7
On Tue, 2014-11-25 at 13:07 -0200, Adhemerval Zanella wrote:
> Hi Torvald,
> 
> On 21-10-2014 17:54, Torvald Riegel wrote:
> > 				      \
> >    })
> > +#define atomic_exchange_and_add_acq(mem, value) \
> > +  ({									      \
> > +    __typeof (*(mem)) __result2;					      \
> > +    __result2 = atomic_exchange_and_add (mem, value);			      \
> > +    atomic_read_barrier ();						      \
> > +    __result2;		
> 
> Although it is not wrong by using a 'atomic_read_barrier' (lwsync), it adds a more 
> expensive synchronization than required (isync).  I would prefer if we use the
> already defined __arch_compare_and_exchange_val_[32|64]_[acq|rel] operations on powerpc.

That's fine with me.  Do you want to go adapt and commit the patch
(given that you can test this easily I guess), or should I?
  
Adhemerval Zanella Netto Nov. 25, 2014, 6:09 p.m. UTC | #8
On 25-11-2014 13:39, Torvald Riegel wrote:
> On Tue, 2014-11-25 at 13:07 -0200, Adhemerval Zanella wrote:
>> Hi Torvald,
>>
>> On 21-10-2014 17:54, Torvald Riegel wrote:
>>> 				      \
>>>    })
>>> +#define atomic_exchange_and_add_acq(mem, value) \
>>> +  ({									      \
>>> +    __typeof (*(mem)) __result2;					      \
>>> +    __result2 = atomic_exchange_and_add (mem, value);			      \
>>> +    atomic_read_barrier ();						      \
>>> +    __result2;		
>> Although it is not wrong by using a 'atomic_read_barrier' (lwsync), it adds a more 
>> expensive synchronization than required (isync).  I would prefer if we use the
>> already defined __arch_compare_and_exchange_val_[32|64]_[acq|rel] operations on powerpc.
> That's fine with me.  Do you want to go adapt and commit the patch
> (given that you can test this easily I guess), or should I?
>
I will do it, thanks.
  

Patch

commit 67da4e31ce865d57bb798d6247d893941f2148f3
Author: Torvald Riegel <triegel@redhat.com>
Date:   Tue Oct 21 21:21:57 2014 +0200

    powerpc: Fix missing barriers in atomic_exchange_and_add_{acq,rel}

diff --git a/sysdeps/powerpc/bits/atomic.h b/sysdeps/powerpc/bits/atomic.h
index 2ffba48..b838631 100644
--- a/sysdeps/powerpc/bits/atomic.h
+++ b/sysdeps/powerpc/bits/atomic.h
@@ -253,6 +253,18 @@  typedef uintmax_t uatomic_max_t;
        abort ();							      \
     __result;								      \
   })
+#define atomic_exchange_and_add_acq(mem, value) \
+  ({									      \
+    __typeof (*(mem)) __result2;					      \
+    __result2 = atomic_exchange_and_add (mem, value);			      \
+    atomic_read_barrier ();						      \
+    __result2;								      \
+  })
+#define atomic_exchange_and_add_rel(mem, value) \
+  ({									      \
+    atomic_write_barrier ();						      \
+    atomic_exchange_and_add (mem, value);				      \
+  })
 
 #define atomic_increment_val(mem) \
   ({									      \