memcmp-sse4.S EqualHappy bug

  On Thu, Jun 18, 2015 at 04:23:35PM +0200, Torvald Riegel wrote:
> There are a few differences to your memcmp test case though.  If
> barrier() is a compiler barrier, this should "remove" prior knowledge
> the compiler has about memory.  Second, you access volatile-qualified
> data, which at least requires to do the accesses byte-wise or similar.
> Does the memcmp test case also fail when you acess volatiles?

Actually the kernel code in the __builtin_memcpy case removes the
volatile qualifier.

On a side note it's not normal to use READ_ONCE/WRITE_ONCE on
objects larger than sizeof(long) and it would warn at build time but
it would build successfully.

Changing my testcase like this just creates warning about volatile
warning: passing argument 1 of ‘memset’ discards ‘volatile’ qualifier
from pointer target type, etc... it still builds and fail at runtime
like before. I added another barrier as well but it can't make a
difference, the problem is very clear and how to prevent this
practical problem is clear too. As long as memcmp-sse4.S is called
there's nothing the compiler can do to prevent this problem from
materializing.

> Note that I'm not saying that this is guaranteed to work.  It's still UB
> if there is a data race.  However, in practice, this might just continue
> to work.

READ_ONCE/WRITE_ONCE are only used if there is going to be a data
race, otherwise they're never used.

> Userland programs should really try to transition to atomics and the
> C11/C++11 memory model or just the __atomic builtins, if possible.

We should look into that.

Without a way to access non-atomic regions with C, RCU in the kernel
cannot work. But if memcpy in the kernel has to behave sane (and not
break out of the loop too happily) in presence of not-atomic changes
to the memory, then I don't see why memcmp in userland is different.

Also note there is also an userspace RCU, I don't know exactly how
that works because I never used it, but the whole concept of RCU is to
allow C language to access not-atomic regions of memory and to get
away with it. And if that way is memcpy, I don't see what's
fundamentally different about memcpy being required to cope with that,
but memcmp not.

I totally see why memcmp-sse4.S is safe by the standard, just the
standard is not the only way to gain performance, RCU is required to
scale 100% in SMP, pthread_mutex_rwlock wouldn't scale, so it's
unthinkable to get rid of READ_ONCE/WRITE_ONCE. Switching to __atomic
would be fine, and if __atomic can make memcpy (and IMHO memcmp)
behave in a way they won't happily return too soon on __atomic
builtins, it'd be great.

memcmp-sse4.S EqualHappy bug

Commit Message

Comments

Patch