From patchwork Thu Jun 18 14:52:02 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrea Arcangeli X-Patchwork-Id: 7239 Received: (qmail 107958 invoked by alias); 18 Jun 2015 14:52:11 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 107929 invoked by uid 89); 18 Jun 2015 14:52:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=no version=3.3.2 X-HELO: mx1.redhat.com Date: Thu, 18 Jun 2015 16:52:02 +0200 From: Andrea Arcangeli To: Torvald Riegel Cc: Szabolcs Nagy , libc-alpha@sourceware.org, "H.J. Lu" Subject: Re: memcmp-sse4.S EqualHappy bug Message-ID: <20150618145202.GG14955@redhat.com> References: <20150617172903.GC4317@redhat.com> <20150617185952.GE22285@port70.net> <20150617210612.GB14955@redhat.com> <1434621040.5250.212.camel@localhost.localdomain> <20150618124900.GD14955@redhat.com> <1434637415.5250.271.camel@localhost.localdomain> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1434637415.5250.271.camel@localhost.localdomain> User-Agent: Mutt/1.5.23 (2014-03-12) On Thu, Jun 18, 2015 at 04:23:35PM +0200, Torvald Riegel wrote: > There are a few differences to your memcmp test case though. If > barrier() is a compiler barrier, this should "remove" prior knowledge > the compiler has about memory. Second, you access volatile-qualified > data, which at least requires to do the accesses byte-wise or similar. > Does the memcmp test case also fail when you acess volatiles? Actually the kernel code in the __builtin_memcpy case removes the volatile qualifier. On a side note it's not normal to use READ_ONCE/WRITE_ONCE on objects larger than sizeof(long) and it would warn at build time but it would build successfully. Changing my testcase like this just creates warning about volatile warning: passing argument 1 of ‘memset’ discards ‘volatile’ qualifier from pointer target type, etc... it still builds and fail at runtime like before. I added another barrier as well but it can't make a difference, the problem is very clear and how to prevent this practical problem is clear too. As long as memcmp-sse4.S is called there's nothing the compiler can do to prevent this problem from materializing. > Note that I'm not saying that this is guaranteed to work. It's still UB > if there is a data race. However, in practice, this might just continue > to work. READ_ONCE/WRITE_ONCE are only used if there is going to be a data race, otherwise they're never used. > Userland programs should really try to transition to atomics and the > C11/C++11 memory model or just the __atomic builtins, if possible. We should look into that. Without a way to access non-atomic regions with C, RCU in the kernel cannot work. But if memcpy in the kernel has to behave sane (and not break out of the loop too happily) in presence of not-atomic changes to the memory, then I don't see why memcmp in userland is different. Also note there is also an userspace RCU, I don't know exactly how that works because I never used it, but the whole concept of RCU is to allow C language to access not-atomic regions of memory and to get away with it. And if that way is memcpy, I don't see what's fundamentally different about memcpy being required to cope with that, but memcmp not. I totally see why memcmp-sse4.S is safe by the standard, just the standard is not the only way to gain performance, RCU is required to scale 100% in SMP, pthread_mutex_rwlock wouldn't scale, so it's unthinkable to get rid of READ_ONCE/WRITE_ONCE. Switching to __atomic would be fine, and if __atomic can make memcpy (and IMHO memcmp) behave in a way they won't happily return too soon on __atomic builtins, it'd be great. diff --git a/equalhappybug.c b/equalhappybug.c index 6c5fac5..ed51fd7 100644 --- a/equalhappybug.c +++ b/equalhappybug.c @@ -18,7 +18,7 @@ static long nr_cpus, page_size; #define NON_ZERO_OFFSET 16U -static char *page, *zeropage; +static volatile char *page, *zeropage; volatile int finished; static int my_bcmp(char *str1, char *str2, unsigned long n) @@ -47,6 +47,7 @@ static void *locking_thread(void *arg) #else unsigned long loops = 0; //while (!bcmp(page, zeropage, page_size)) + asm volatile("" : : : "memory"); while (!memcmp(page, zeropage, page_size)) { loops += 1; asm volatile("" : : : "memory");