From patchwork Mon Apr 30 15:19:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Liebler X-Patchwork-Id: 27055 Received: (qmail 44092 invoked by alias); 30 Apr 2018 15:19:57 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 44071 invoked by uid 89); 30 Apr 2018 15:19:56 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=iow, perspective, operates X-HELO: mx0a-001b2d01.pphosted.com Subject: Re: [PATCH]: Fix blocking pthread_join. To: libc-alpha@sourceware.org References: <8aeb7b8b-6b1e-9a60-e961-75cde1aa463b@linux.vnet.ibm.com> <1524669947.7567.248.camel@redhat.com> From: Stefan Liebler Date: Mon, 30 Apr 2018 17:19:47 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <1524669947.7567.248.camel@redhat.com> X-TM-AS-GCONF: 00 x-cbid: 18043015-0040-0000-0000-000004348452 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18043015-0041-0000-0000-000026389A78 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-04-30_06:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1804300146 On 04/25/2018 05:25 PM, Torvald Riegel wrote: > On Wed, 2018-04-25 at 07:39 -0500, Carlos O'Donell wrote: >> On 04/25/2018 06:27 AM, Stefan Liebler wrote: >>> With this patch, the tid is loaded by dereferencing a volatile pointer. >>> Then the compiler is not allowed to reload the value for __tid from memory. > > We always use atomic accesses when it comes to concurrently accessed > data (there are exceptions, but these are tightly controlled). > We never use volatile to "fix" concurrent accesses. > >>> Okay to commit? >> >> Would using an atomic type and an atomic load MO relaxed prevent the >> compiler from reloading from memory? > > That's the right fix, and it should be an acquire MO load to synchronize > with the kernel's store to 0. (We should make it a requirement for the > kernel to use a release store; IIRC, it is on many archs, but it isn't > documented.) > See the attached patch for lll_wait_tid. This prevents the compiler from reloading from memory if build with -Os on s390 (31bit). > The accesses to the TID should be changed to use atomics everywhere, and > some (simple) concurrency notes should be added. > There are some functions which are using the loaded pd->tid as argument for e.g. passing it to a syscall. Then this syscall "operates" on the thread with given tid or on the calling thread if zero was specified, e.g.: -nptl/pthread_setschedparam.c: The INVALID_TD_P macro is used in order to check if pd->tid is valid, but pd->tid is reloaded before the call to __sched_setscheduler(). -sysdeps/unix/sysv/linux/pthread_[s|g]etaffinity.c: pd is not evaluated with INVALID_TD_P macro in order to return ESRCH. If the thread has already exited, then this function won't fail with ESRCH. Can we enhance the INVALID_TD_P macro in a way, that it additionally stores the evaluated tid in a local variable? Then we could e.g. pass this tid-value to the mentioned syscalls. Is atomic_load_relaxed enough for loading pd->tid within INVALID_TD_P? In the examples above, the syscall will fail if the thread has just exited. >> I'm unhappy with the use of volatile here because it's not quite >> the real semantics. Sure, the memory is volatile, it may change at >> any point, but that's not what matters. What matters is that we load >> from that memory once and only once. > > It's a normal concurrent access, so we're using atomics for it. > Volatile but non-atomic is for cases where one would communicate with an > external device or sth like that, and those device's memory accesses > would appear to interrupt the thread that's using the volatile accesses. > IOW, it's like sequential code from a memory-model perspective, just > that the device's accesses can interleave with the CPU thread's > accesses. There's no such simple interleaving when it comes to > concurrent accesses. Reviewed-by: Carlos O'Donell diff --git a/sysdeps/nptl/lowlevellock.h b/sysdeps/nptl/lowlevellock.h index 8326e2805c..bfbda99940 100644 --- a/sysdeps/nptl/lowlevellock.h +++ b/sysdeps/nptl/lowlevellock.h @@ -181,11 +181,14 @@ extern int __lll_timedlock_wait (int *futex, const struct timespec *, thread ID while the clone is running and is reset to zero by the kernel afterwards. The kernel up to version 3.16.3 does not use the private futex operations for futex wake-up when the clone terminates. */ -#define lll_wait_tid(tid) \ - do { \ - __typeof (tid) __tid; \ - while ((__tid = (tid)) != 0) \ - lll_futex_wait (&(tid), __tid, LLL_SHARED);\ +#define lll_wait_tid(tid) \ + do { \ + __typeof (tid) __tid; \ + /* We need acquire MO here so that we synchronize \ + with the kernel's store to 0 when the clone \ + terminates. (see above) */ \ + while ((__tid = atomic_load_acquire (&(tid))) != 0) \ + lll_futex_wait (&(tid), __tid, LLL_SHARED); \ } while (0) extern int __lll_timedwait_tid (int *, const struct timespec *)