From patchwork Tue Jun 21 20:32:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 55242 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1CC63385742B for ; Tue, 21 Jun 2022 20:36:47 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1CC63385742B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1655843807; bh=2VTH3u/tya3dyZmSdw8DYj+glkmhXm3qBRmvYt4Knf0=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=UhnQCpzT84JMamNsjGC6KNgDSC64nGG6zTJ95aPxjoaTbSP5M3CA9k3Nie0WTA7YO COzXvDm1gczVC8h5ip47j9eKVT3x4vrnK2M4B3nNs89kx5j362RZ7nzt1vi+SmaIQx L2Z7QiwoukBSLUpNGC3EW5hYIwWXOd2ow+2U1GAo= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oa1-x33.google.com (mail-oa1-x33.google.com [IPv6:2001:4860:4864:20::33]) by sourceware.org (Postfix) with ESMTPS id 391C3385AE4A for ; Tue, 21 Jun 2022 20:32:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 391C3385AE4A Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-101dc639636so10063106fac.6 for ; Tue, 21 Jun 2022 13:32:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2VTH3u/tya3dyZmSdw8DYj+glkmhXm3qBRmvYt4Knf0=; b=TeGxJWWhkmo/6uMvtJhCdEKTMq90AiGX1st3IkcrGiFmlzFh8mxWriZzIO8M1LZy1o 9jrclDwFWzf136nY3QLOmhVtOvEvKIQ4EaHFwIxLfPZwu1kv2qaT89MXR21M+n0unTpa 3RLH4xkwnokdb7UqJnvqXU95XCLuhq3OTa1Tt3QYRESmcHqL/xM/YmMniiaJr93koCf4 UKW8cdB/IItU4Ifggz2IjN5zfs0tdrfntYfU08mGfn1wABGlj+DNcaQF9YrjWDDVXmSt DUdwaO8uIEsIgv6ZHSLkYoXinD0VYoN5am5Axs533p3DihObvdWFtLxsQpj+6Y159VII oq0w== X-Gm-Message-State: AJIora9rwpW+W/RqRM6QEI3Q1KnScEXhEiXcWCFGEQ2Sg791GQX4TC01 F0nTowL25hnofUJvQ2e1TNPuBVLVfeTciw== X-Google-Smtp-Source: AGRyM1syctOvfGPOkZS2TCVCHljrAKFeU11n43iTQ4jjvFFV6asCVW22ZG2mL019krFXhb7Yv21new== X-Received: by 2002:a05:6870:9609:b0:fe:5226:ce98 with SMTP id d9-20020a056870960900b000fe5226ce98mr16397207oaq.172.1655843554904; Tue, 21 Jun 2022 13:32:34 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:6d95:e2fc:bea2:ccdc:f4a]) by smtp.gmail.com with ESMTPSA id v21-20020a056870e49500b00101f9e87b39sm3779567oag.11.2022.06.21.13.32.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jun 2022 13:32:34 -0700 (PDT) To: libc-alpha@sourceware.org, Wilco Dijkstra , Fangrui Song Subject: [PATCH v3 3/4] Remove usage of TLS_MULTIPLE_THREADS_IN_TCB Date: Tue, 21 Jun 2022 17:32:24 -0300 Message-Id: <20220621203225.714328-4-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220621203225.714328-1-adhemerval.zanella@linaro.org> References: <20220621203225.714328-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Instead use __libc_single_threaded on all architectures. The TCB field is renamed to avoid change the struct layout. The x86 atomic need some adjustments since it has single-thread optimizationi builtin within the inline assemblye. It now uses SINGLE_THREAD_P and atomic optimizations are removed (since they are not used). Checked on x86_64-linux-gnu and i686-linux-gnu. --- misc/tst-atomic.c | 1 + nptl/allocatestack.c | 6 - nptl/descr.h | 17 +- nptl/pthread_cancel.c | 7 +- nptl/pthread_create.c | 5 - sysdeps/i386/htl/tcb-offsets.sym | 1 - sysdeps/i386/nptl/tcb-offsets.sym | 1 - sysdeps/i386/nptl/tls.h | 4 +- sysdeps/ia64/nptl/tcb-offsets.sym | 1 - sysdeps/ia64/nptl/tls.h | 2 - sysdeps/mach/hurd/i386/tls.h | 4 +- sysdeps/nios2/nptl/tcb-offsets.sym | 1 - sysdeps/or1k/nptl/tls.h | 2 - sysdeps/powerpc/nptl/tcb-offsets.sym | 3 - sysdeps/powerpc/nptl/tls.h | 3 - sysdeps/s390/nptl/tcb-offsets.sym | 1 - sysdeps/s390/nptl/tls.h | 6 +- sysdeps/sh/nptl/tcb-offsets.sym | 1 - sysdeps/sh/nptl/tls.h | 2 - sysdeps/sparc/nptl/tcb-offsets.sym | 1 - sysdeps/sparc/nptl/tls.h | 2 +- sysdeps/unix/sysv/linux/single-thread.h | 15 +- sysdeps/x86/atomic-machine.h | 488 +++++++----------------- sysdeps/x86_64/nptl/tcb-offsets.sym | 1 - sysdeps/x86_64/nptl/tls.h | 2 +- 25 files changed, 148 insertions(+), 429 deletions(-) diff --git a/misc/tst-atomic.c b/misc/tst-atomic.c index 6d681a7bfd..ddbc618e25 100644 --- a/misc/tst-atomic.c +++ b/misc/tst-atomic.c @@ -18,6 +18,7 @@ #include #include +#include #ifndef atomic_t # define atomic_t int diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index 98f5f6dd85..3e0d01cb52 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -290,9 +290,6 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, stack cache nor will the memory (except the TLS memory) be freed. */ pd->user_stack = true; - /* This is at least the second thread. */ - pd->header.multiple_threads = 1; - #ifdef NEED_DL_SYSINFO SETUP_THREAD_SYSINFO (pd); #endif @@ -408,9 +405,6 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, descriptor. */ pd->specific[0] = pd->specific_1stblock; - /* This is at least the second thread. */ - pd->header.multiple_threads = 1; - #ifdef NEED_DL_SYSINFO SETUP_THREAD_SYSINFO (pd); #endif diff --git a/nptl/descr.h b/nptl/descr.h index bb46b5958e..77b25d8267 100644 --- a/nptl/descr.h +++ b/nptl/descr.h @@ -137,22 +137,7 @@ struct pthread #else struct { - /* multiple_threads is enabled either when the process has spawned at - least one thread or when a single-threaded process cancels itself. - This enables additional code to introduce locking before doing some - compare_and_exchange operations and also enable cancellation points. - The concepts of multiple threads and cancellation points ideally - should be separate, since it is not necessary for multiple threads to - have been created for cancellation points to be enabled, as is the - case is when single-threaded process cancels itself. - - Since enabling multiple_threads enables additional code in - cancellation points and compare_and_exchange operations, there is a - potential for an unneeded performance hit when it is enabled in a - single-threaded, self-canceling process. This is OK though, since a - single-threaded process will enable async cancellation only when it - looks to cancel itself and is hence going to end anyway. */ - int multiple_threads; + int unused_multiple_threads; int gscope_flag; } header; #endif diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c index 459317df49..27dca9fe6a 100644 --- a/nptl/pthread_cancel.c +++ b/nptl/pthread_cancel.c @@ -157,12 +157,9 @@ __pthread_cancel (pthread_t th) /* A single-threaded process should be able to kill itself, since there is nothing in the POSIX specification that says that it - cannot. So we set multiple_threads to true so that cancellation - points get executed. */ - THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1); -#ifndef TLS_MULTIPLE_THREADS_IN_TCB + cannot. So we set __libc_single_threaded to true so that + cancellation points get executed. */ __libc_single_threaded_internal = 0; -#endif } while (!atomic_compare_exchange_weak_acquire (&pd->cancelhandling, &oldval, newval)); diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c index 5b98e053d6..59a0df68bc 100644 --- a/nptl/pthread_create.c +++ b/nptl/pthread_create.c @@ -881,11 +881,6 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr, other reason that create_thread chose. Now let it run free. */ lll_unlock (pd->lock, LLL_PRIVATE); - - /* We now have for sure more than one thread. The main thread might - not yet have the flag set. No need to set the global variable - again if this is what we use. */ - THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1); } out: diff --git a/sysdeps/i386/htl/tcb-offsets.sym b/sysdeps/i386/htl/tcb-offsets.sym index 7b7c719369..f3f7df6c06 100644 --- a/sysdeps/i386/htl/tcb-offsets.sym +++ b/sysdeps/i386/htl/tcb-offsets.sym @@ -2,7 +2,6 @@ #include #include -MULTIPLE_THREADS_OFFSET offsetof (tcbhead_t, multiple_threads) SYSINFO_OFFSET offsetof (tcbhead_t, sysinfo) POINTER_GUARD offsetof (tcbhead_t, pointer_guard) SIGSTATE_OFFSET offsetof (tcbhead_t, _hurd_sigstate) diff --git a/sysdeps/i386/nptl/tcb-offsets.sym b/sysdeps/i386/nptl/tcb-offsets.sym index 2ec9e787c1..1efd1469d8 100644 --- a/sysdeps/i386/nptl/tcb-offsets.sym +++ b/sysdeps/i386/nptl/tcb-offsets.sym @@ -6,7 +6,6 @@ RESULT offsetof (struct pthread, result) TID offsetof (struct pthread, tid) CANCELHANDLING offsetof (struct pthread, cancelhandling) CLEANUP_JMP_BUF offsetof (struct pthread, cleanup_jmp_buf) -MULTIPLE_THREADS_OFFSET offsetof (tcbhead_t, multiple_threads) SYSINFO_OFFSET offsetof (tcbhead_t, sysinfo) CLEANUP offsetof (struct pthread, cleanup) CLEANUP_PREV offsetof (struct _pthread_cleanup_buffer, __prev) diff --git a/sysdeps/i386/nptl/tls.h b/sysdeps/i386/nptl/tls.h index 91090bf287..48940a9f44 100644 --- a/sysdeps/i386/nptl/tls.h +++ b/sysdeps/i386/nptl/tls.h @@ -36,7 +36,7 @@ typedef struct thread descriptor used by libpthread. */ dtv_t *dtv; void *self; /* Pointer to the thread descriptor. */ - int multiple_threads; + int unused_multiple_threads; uintptr_t sysinfo; uintptr_t stack_guard; uintptr_t pointer_guard; @@ -57,8 +57,6 @@ typedef struct _Static_assert (offsetof (tcbhead_t, __private_ss) == 0x30, "offset of __private_ss != 0x30"); -# define TLS_MULTIPLE_THREADS_IN_TCB 1 - #else /* __ASSEMBLER__ */ # include #endif diff --git a/sysdeps/ia64/nptl/tcb-offsets.sym b/sysdeps/ia64/nptl/tcb-offsets.sym index b01f712be2..ab2cb180f9 100644 --- a/sysdeps/ia64/nptl/tcb-offsets.sym +++ b/sysdeps/ia64/nptl/tcb-offsets.sym @@ -2,5 +2,4 @@ #include TID offsetof (struct pthread, tid) - TLS_PRE_TCB_SIZE -MULTIPLE_THREADS_OFFSET offsetof (struct pthread, header.multiple_threads) - TLS_PRE_TCB_SIZE SYSINFO_OFFSET offsetof (tcbhead_t, __private) diff --git a/sysdeps/ia64/nptl/tls.h b/sysdeps/ia64/nptl/tls.h index 8ccedb73e6..008e080fc4 100644 --- a/sysdeps/ia64/nptl/tls.h +++ b/sysdeps/ia64/nptl/tls.h @@ -36,8 +36,6 @@ typedef struct register struct pthread *__thread_self __asm__("r13"); -# define TLS_MULTIPLE_THREADS_IN_TCB 1 - #else /* __ASSEMBLER__ */ # include #endif diff --git a/sysdeps/mach/hurd/i386/tls.h b/sysdeps/mach/hurd/i386/tls.h index 264ed9a9c5..d33e91c922 100644 --- a/sysdeps/mach/hurd/i386/tls.h +++ b/sysdeps/mach/hurd/i386/tls.h @@ -33,7 +33,7 @@ typedef struct void *tcb; /* Points to this structure. */ dtv_t *dtv; /* Vector of pointers to TLS data. */ thread_t self; /* This thread's control port. */ - int multiple_threads; + int unused_multiple_threads; uintptr_t sysinfo; uintptr_t stack_guard; uintptr_t pointer_guard; @@ -117,8 +117,6 @@ _hurd_tls_init (tcbhead_t *tcb) /* This field is used by TLS accesses to get our "thread pointer" from the TLS point of view. */ tcb->tcb = tcb; - /* We always at least start the sigthread anyway. */ - tcb->multiple_threads = 1; /* Get the first available selector. */ int sel = -1; diff --git a/sysdeps/nios2/nptl/tcb-offsets.sym b/sysdeps/nios2/nptl/tcb-offsets.sym index 3cd8d984ac..93a695ac7f 100644 --- a/sysdeps/nios2/nptl/tcb-offsets.sym +++ b/sysdeps/nios2/nptl/tcb-offsets.sym @@ -8,6 +8,5 @@ # define __thread_self ((void *) 0) # define thread_offsetof(mem) ((ptrdiff_t) THREAD_SELF + offsetof (struct pthread, mem)) -MULTIPLE_THREADS_OFFSET thread_offsetof (header.multiple_threads) TID_OFFSET thread_offsetof (tid) POINTER_GUARD (offsetof (tcbhead_t, pointer_guard) - TLS_TCB_OFFSET - sizeof (tcbhead_t)) diff --git a/sysdeps/or1k/nptl/tls.h b/sysdeps/or1k/nptl/tls.h index c6ffe62c3f..3bb07beef8 100644 --- a/sysdeps/or1k/nptl/tls.h +++ b/sysdeps/or1k/nptl/tls.h @@ -35,8 +35,6 @@ typedef struct register tcbhead_t *__thread_self __asm__("r10"); -# define TLS_MULTIPLE_THREADS_IN_TCB 1 - /* Get system call information. */ # include diff --git a/sysdeps/powerpc/nptl/tcb-offsets.sym b/sysdeps/powerpc/nptl/tcb-offsets.sym index 4c01615ad0..a0ee95f94d 100644 --- a/sysdeps/powerpc/nptl/tcb-offsets.sym +++ b/sysdeps/powerpc/nptl/tcb-offsets.sym @@ -10,9 +10,6 @@ # define thread_offsetof(mem) ((ptrdiff_t) THREAD_SELF + offsetof (struct pthread, mem)) -#if TLS_MULTIPLE_THREADS_IN_TCB -MULTIPLE_THREADS_OFFSET thread_offsetof (header.multiple_threads) -#endif TID thread_offsetof (tid) POINTER_GUARD (offsetof (tcbhead_t, pointer_guard) - TLS_TCB_OFFSET - sizeof (tcbhead_t)) TAR_SAVE (offsetof (tcbhead_t, tar_save) - TLS_TCB_OFFSET - sizeof (tcbhead_t)) diff --git a/sysdeps/powerpc/nptl/tls.h b/sysdeps/powerpc/nptl/tls.h index 22b0075235..fd5ee51981 100644 --- a/sysdeps/powerpc/nptl/tls.h +++ b/sysdeps/powerpc/nptl/tls.h @@ -52,9 +52,6 @@ # define TLS_DTV_AT_TP 1 # define TLS_TCB_AT_TP 0 -/* We use the multiple_threads field in the pthread struct */ -#define TLS_MULTIPLE_THREADS_IN_TCB 1 - /* Get the thread descriptor definition. */ # include diff --git a/sysdeps/s390/nptl/tcb-offsets.sym b/sysdeps/s390/nptl/tcb-offsets.sym index 9c1c01f353..bc7b267463 100644 --- a/sysdeps/s390/nptl/tcb-offsets.sym +++ b/sysdeps/s390/nptl/tcb-offsets.sym @@ -1,6 +1,5 @@ #include #include -MULTIPLE_THREADS_OFFSET offsetof (tcbhead_t, multiple_threads) STACK_GUARD offsetof (tcbhead_t, stack_guard) TID offsetof (struct pthread, tid) diff --git a/sysdeps/s390/nptl/tls.h b/sysdeps/s390/nptl/tls.h index ff210ffeb2..d69ed539f7 100644 --- a/sysdeps/s390/nptl/tls.h +++ b/sysdeps/s390/nptl/tls.h @@ -35,7 +35,7 @@ typedef struct thread descriptor used by libpthread. */ dtv_t *dtv; void *self; /* Pointer to the thread descriptor. */ - int multiple_threads; + int unused_multiple_threads; uintptr_t sysinfo; uintptr_t stack_guard; int gscope_flag; @@ -44,10 +44,6 @@ typedef struct void *__private_ss; } tcbhead_t; -# ifndef __s390x__ -# define TLS_MULTIPLE_THREADS_IN_TCB 1 -# endif - #else /* __ASSEMBLER__ */ # include #endif diff --git a/sysdeps/sh/nptl/tcb-offsets.sym b/sysdeps/sh/nptl/tcb-offsets.sym index 234207779d..4e452d9c6c 100644 --- a/sysdeps/sh/nptl/tcb-offsets.sym +++ b/sysdeps/sh/nptl/tcb-offsets.sym @@ -6,7 +6,6 @@ RESULT offsetof (struct pthread, result) TID offsetof (struct pthread, tid) CANCELHANDLING offsetof (struct pthread, cancelhandling) CLEANUP_JMP_BUF offsetof (struct pthread, cleanup_jmp_buf) -MULTIPLE_THREADS_OFFSET offsetof (struct pthread, header.multiple_threads) TLS_PRE_TCB_SIZE sizeof (struct pthread) MUTEX_FUTEX offsetof (pthread_mutex_t, __data.__lock) POINTER_GUARD offsetof (tcbhead_t, pointer_guard) diff --git a/sysdeps/sh/nptl/tls.h b/sysdeps/sh/nptl/tls.h index 76591ab6ef..8778cb4ac0 100644 --- a/sysdeps/sh/nptl/tls.h +++ b/sysdeps/sh/nptl/tls.h @@ -36,8 +36,6 @@ typedef struct uintptr_t pointer_guard; } tcbhead_t; -# define TLS_MULTIPLE_THREADS_IN_TCB 1 - #else /* __ASSEMBLER__ */ # include #endif /* __ASSEMBLER__ */ diff --git a/sysdeps/sparc/nptl/tcb-offsets.sym b/sysdeps/sparc/nptl/tcb-offsets.sym index f75d02065e..e4a7e4720f 100644 --- a/sysdeps/sparc/nptl/tcb-offsets.sym +++ b/sysdeps/sparc/nptl/tcb-offsets.sym @@ -1,6 +1,5 @@ #include #include -MULTIPLE_THREADS_OFFSET offsetof (tcbhead_t, multiple_threads) POINTER_GUARD offsetof (tcbhead_t, pointer_guard) TID offsetof (struct pthread, tid) diff --git a/sysdeps/sparc/nptl/tls.h b/sysdeps/sparc/nptl/tls.h index d1e2bb4ad1..b78cf0d6b4 100644 --- a/sysdeps/sparc/nptl/tls.h +++ b/sysdeps/sparc/nptl/tls.h @@ -35,7 +35,7 @@ typedef struct thread descriptor used by libpthread. */ dtv_t *dtv; void *self; - int multiple_threads; + int unused_multiple_threads; #if __WORDSIZE == 64 int gscope_flag; #endif diff --git a/sysdeps/unix/sysv/linux/single-thread.h b/sysdeps/unix/sysv/linux/single-thread.h index 30dde4e81a..2099848cf3 100644 --- a/sysdeps/unix/sysv/linux/single-thread.h +++ b/sysdeps/unix/sysv/linux/single-thread.h @@ -23,20 +23,7 @@ # include #endif -/* The default way to check if the process is single thread is by using the - pthread_t 'multiple_threads' field. However, for some architectures it is - faster to either use an extra field on TCB or global variables (the TCB - field is also used on x86 for some single-thread atomic optimizations). - - The ABI might define SINGLE_THREAD_BY_GLOBAL to enable the single thread - check to use global variables instead of the pthread_t field. */ - -#if !defined SINGLE_THREAD_BY_GLOBAL || IS_IN (rtld) -# define SINGLE_THREAD_P \ - (THREAD_GETMEM (THREAD_SELF, header.multiple_threads) == 0) -#else -# define SINGLE_THREAD_P (__libc_single_threaded_internal != 0) -#endif +#define SINGLE_THREAD_P (__libc_single_threaded_internal != 0) #define RTLD_SINGLE_THREAD_P SINGLE_THREAD_P diff --git a/sysdeps/x86/atomic-machine.h b/sysdeps/x86/atomic-machine.h index f24f1c71ed..2db69d2d5d 100644 --- a/sysdeps/x86/atomic-machine.h +++ b/sysdeps/x86/atomic-machine.h @@ -51,292 +51,145 @@ #define atomic_compare_and_exchange_bool_acq(mem, newval, oldval) \ (! __sync_bool_compare_and_swap (mem, oldval, newval)) - -#define __arch_c_compare_and_exchange_val_8_acq(mem, newval, oldval) \ - ({ __typeof (*mem) ret; \ - __asm __volatile ("cmpl $0, %%" SEG_REG ":%P5\n\t" \ - "je 0f\n\t" \ - "lock\n" \ - "0:\tcmpxchgb %b2, %1" \ - : "=a" (ret), "=m" (*mem) \ - : BR_CONSTRAINT (newval), "m" (*mem), "0" (oldval), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - ret; }) - -#define __arch_c_compare_and_exchange_val_16_acq(mem, newval, oldval) \ - ({ __typeof (*mem) ret; \ - __asm __volatile ("cmpl $0, %%" SEG_REG ":%P5\n\t" \ - "je 0f\n\t" \ - "lock\n" \ - "0:\tcmpxchgw %w2, %1" \ - : "=a" (ret), "=m" (*mem) \ - : BR_CONSTRAINT (newval), "m" (*mem), "0" (oldval), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - ret; }) - -#define __arch_c_compare_and_exchange_val_32_acq(mem, newval, oldval) \ - ({ __typeof (*mem) ret; \ - __asm __volatile ("cmpl $0, %%" SEG_REG ":%P5\n\t" \ - "je 0f\n\t" \ - "lock\n" \ - "0:\tcmpxchgl %2, %1" \ - : "=a" (ret), "=m" (*mem) \ - : BR_CONSTRAINT (newval), "m" (*mem), "0" (oldval), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - ret; }) - -#ifdef __x86_64__ -# define __arch_c_compare_and_exchange_val_64_acq(mem, newval, oldval) \ - ({ __typeof (*mem) ret; \ - __asm __volatile ("cmpl $0, %%fs:%P5\n\t" \ - "je 0f\n\t" \ - "lock\n" \ - "0:\tcmpxchgq %q2, %1" \ - : "=a" (ret), "=m" (*mem) \ - : "q" ((int64_t) cast_to_integer (newval)), \ - "m" (*mem), \ - "0" ((int64_t) cast_to_integer (oldval)), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - ret; }) -# define do_exchange_and_add_val_64_acq(pfx, mem, value) 0 -# define do_add_val_64_acq(pfx, mem, value) do { } while (0) -#else -/* XXX We do not really need 64-bit compare-and-exchange. At least - not in the moment. Using it would mean causing portability - problems since not many other 32-bit architectures have support for - such an operation. So don't define any code for now. If it is - really going to be used the code below can be used on Intel Pentium - and later, but NOT on i486. */ -# define __arch_c_compare_and_exchange_val_64_acq(mem, newval, oldval) \ - ({ __typeof (*mem) ret = *(mem); \ - __atomic_link_error (); \ - ret = (newval); \ - ret = (oldval); \ - ret; }) - -# define __arch_compare_and_exchange_val_64_acq(mem, newval, oldval) \ - ({ __typeof (*mem) ret = *(mem); \ - __atomic_link_error (); \ - ret = (newval); \ - ret = (oldval); \ - ret; }) - -# define do_exchange_and_add_val_64_acq(pfx, mem, value) \ - ({ __typeof (value) __addval = (value); \ - __typeof (*mem) __result; \ - __typeof (mem) __memp = (mem); \ - __typeof (*mem) __tmpval; \ - __result = *__memp; \ - do \ - __tmpval = __result; \ - while ((__result = pfx##_compare_and_exchange_val_64_acq \ - (__memp, __result + __addval, __result)) == __tmpval); \ - __result; }) - -# define do_add_val_64_acq(pfx, mem, value) \ - { \ - __typeof (value) __addval = (value); \ - __typeof (mem) __memp = (mem); \ - __typeof (*mem) __oldval = *__memp; \ - __typeof (*mem) __tmpval; \ - do \ - __tmpval = __oldval; \ - while ((__oldval = pfx##_compare_and_exchange_val_64_acq \ - (__memp, __oldval + __addval, __oldval)) == __tmpval); \ - } -#endif - - -/* Note that we need no lock prefix. */ -#define atomic_exchange_acq(mem, newvalue) \ - ({ __typeof (*mem) result; \ +#define __cmpxchg_op(lock, mem, newval, oldval) \ + ({ __typeof (*mem) __ret; \ if (sizeof (*mem) == 1) \ - __asm __volatile ("xchgb %b0, %1" \ - : "=q" (result), "=m" (*mem) \ - : "0" (newvalue), "m" (*mem)); \ + asm volatile (lock "cmpxchgb %2, %1" \ + : "=a" (__ret), "+m" (*mem) \ + : BR_CONSTRAINT (newval), "0" (oldval) \ + : "memory"); \ else if (sizeof (*mem) == 2) \ - __asm __volatile ("xchgw %w0, %1" \ - : "=r" (result), "=m" (*mem) \ - : "0" (newvalue), "m" (*mem)); \ + asm volatile (lock "cmpxchgw %2, %1" \ + : "=a" (__ret), "+m" (*mem) \ + : BR_CONSTRAINT (newval), "0" (oldval) \ + : "memory"); \ else if (sizeof (*mem) == 4) \ - __asm __volatile ("xchgl %0, %1" \ - : "=r" (result), "=m" (*mem) \ - : "0" (newvalue), "m" (*mem)); \ + asm volatile (lock "cmpxchgl %2, %1" \ + : "=a" (__ret), "+m" (*mem) \ + : BR_CONSTRAINT (newval), "0" (oldval) \ + : "memory"); \ else if (__HAVE_64B_ATOMICS) \ - __asm __volatile ("xchgq %q0, %1" \ - : "=r" (result), "=m" (*mem) \ - : "0" ((int64_t) cast_to_integer (newvalue)), \ - "m" (*mem)); \ + asm volatile (lock "cmpxchgq %2, %1" \ + : "=a" (__ret), "+m" (*mem) \ + : "q" ((int64_t) cast_to_integer (newval)), \ + "0" ((int64_t) cast_to_integer (oldval)) \ + : "memory"); \ else \ - { \ - result = 0; \ - __atomic_link_error (); \ - } \ - result; }) - + __atomic_link_error (); \ + __ret; }) -#define __arch_exchange_and_add_body(lock, pfx, mem, value) \ - ({ __typeof (*mem) __result; \ - __typeof (value) __addval = (value); \ - if (sizeof (*mem) == 1) \ - __asm __volatile (lock "xaddb %b0, %1" \ - : "=q" (__result), "=m" (*mem) \ - : "0" (__addval), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (sizeof (*mem) == 2) \ - __asm __volatile (lock "xaddw %w0, %1" \ - : "=r" (__result), "=m" (*mem) \ - : "0" (__addval), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (sizeof (*mem) == 4) \ - __asm __volatile (lock "xaddl %0, %1" \ - : "=r" (__result), "=m" (*mem) \ - : "0" (__addval), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (__HAVE_64B_ATOMICS) \ - __asm __volatile (lock "xaddq %q0, %1" \ - : "=r" (__result), "=m" (*mem) \ - : "0" ((int64_t) cast_to_integer (__addval)), \ - "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ +#define __arch_c_compare_and_exchange_val_8_acq(mem, newval, oldval) \ + ({ __typeof (*mem) __ret; \ + if (SINGLE_THREAD_P) \ + __ret = __cmpxchg_op ("", (mem), (newval), (oldval)); \ else \ - __result = do_exchange_and_add_val_64_acq (pfx, (mem), __addval); \ - __result; }) - -#define atomic_exchange_and_add(mem, value) \ - __sync_fetch_and_add (mem, value) - -#define __arch_exchange_and_add_cprefix \ - "cmpl $0, %%" SEG_REG ":%P4\n\tje 0f\n\tlock\n0:\t" - -#define catomic_exchange_and_add(mem, value) \ - __arch_exchange_and_add_body (__arch_exchange_and_add_cprefix, __arch_c, \ - mem, value) + __ret = __cmpxchg_op (LOCK_PREFIX, (mem), (newval), (oldval)); \ + __ret; }) +#define __arch_c_compare_and_exchange_val_16_acq(mem, newval, oldval) \ + ({ __typeof (*mem) __ret; \ + if (SINGLE_THREAD_P) \ + __ret = __cmpxchg_op ("", (mem), (newval), (oldval)); \ + else \ + __ret = __cmpxchg_op (LOCK_PREFIX, (mem), (newval), (oldval)); \ + __ret; }) -#define __arch_add_body(lock, pfx, apfx, mem, value) \ - do { \ - if (__builtin_constant_p (value) && (value) == 1) \ - pfx##_increment (mem); \ - else if (__builtin_constant_p (value) && (value) == -1) \ - pfx##_decrement (mem); \ - else if (sizeof (*mem) == 1) \ - __asm __volatile (lock "addb %b1, %0" \ - : "=m" (*mem) \ - : IBR_CONSTRAINT (value), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (sizeof (*mem) == 2) \ - __asm __volatile (lock "addw %w1, %0" \ - : "=m" (*mem) \ - : "ir" (value), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (sizeof (*mem) == 4) \ - __asm __volatile (lock "addl %1, %0" \ - : "=m" (*mem) \ - : "ir" (value), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (__HAVE_64B_ATOMICS) \ - __asm __volatile (lock "addq %q1, %0" \ - : "=m" (*mem) \ - : "ir" ((int64_t) cast_to_integer (value)), \ - "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else \ - do_add_val_64_acq (apfx, (mem), (value)); \ - } while (0) - -# define atomic_add(mem, value) \ - __arch_add_body (LOCK_PREFIX, atomic, __arch, mem, value) - -#define __arch_add_cprefix \ - "cmpl $0, %%" SEG_REG ":%P3\n\tje 0f\n\tlock\n0:\t" +#define __arch_c_compare_and_exchange_val_32_acq(mem, newval, oldval) \ + ({ __typeof (*mem) __ret; \ + if (SINGLE_THREAD_P) \ + __ret = __cmpxchg_op ("", (mem), (newval), (oldval)); \ + else \ + __ret = __cmpxchg_op (LOCK_PREFIX, (mem), (newval), (oldval)); \ + __ret; }) -#define catomic_add(mem, value) \ - __arch_add_body (__arch_add_cprefix, atomic, __arch_c, mem, value) +#define __arch_c_compare_and_exchange_val_64_acq(mem, newval, oldval) \ + ({ __typeof (*mem) __ret; \ + if (SINGLE_THREAD_P) \ + __ret = __cmpxchg_op ("", (mem), (newval), (oldval)); \ + else \ + __ret =__cmpxchg_op (LOCK_PREFIX, (mem), (newval), (oldval)); \ + __ret; }) -#define atomic_add_negative(mem, value) \ - ({ unsigned char __result; \ +#define __xchg_op(lock, mem, arg, op) \ + ({ __typeof (*mem) __ret = (arg); \ if (sizeof (*mem) == 1) \ - __asm __volatile (LOCK_PREFIX "addb %b2, %0; sets %1" \ - : "=m" (*mem), "=qm" (__result) \ - : IBR_CONSTRAINT (value), "m" (*mem)); \ + __asm __volatile (lock #op "b %b0, %1" \ + : "=q" (__ret), "=m" (*mem) \ + : "0" (arg), "m" (*mem) \ + : "memory", "cc"); \ else if (sizeof (*mem) == 2) \ - __asm __volatile (LOCK_PREFIX "addw %w2, %0; sets %1" \ - : "=m" (*mem), "=qm" (__result) \ - : "ir" (value), "m" (*mem)); \ + __asm __volatile (lock #op "w %w0, %1" \ + : "=r" (__ret), "=m" (*mem) \ + : "0" (arg), "m" (*mem) \ + : "memory", "cc"); \ else if (sizeof (*mem) == 4) \ - __asm __volatile (LOCK_PREFIX "addl %2, %0; sets %1" \ - : "=m" (*mem), "=qm" (__result) \ - : "ir" (value), "m" (*mem)); \ + __asm __volatile (lock #op "l %0, %1" \ + : "=r" (__ret), "=m" (*mem) \ + : "0" (arg), "m" (*mem) \ + : "memory", "cc"); \ else if (__HAVE_64B_ATOMICS) \ - __asm __volatile (LOCK_PREFIX "addq %q2, %0; sets %1" \ - : "=m" (*mem), "=qm" (__result) \ - : "ir" ((int64_t) cast_to_integer (value)), \ - "m" (*mem)); \ + __asm __volatile (lock #op "q %q0, %1" \ + : "=r" (__ret), "=m" (*mem) \ + : "0" ((int64_t) cast_to_integer (arg)), \ + "m" (*mem) \ + : "memory", "cc"); \ else \ __atomic_link_error (); \ - __result; }) + __ret; }) - -#define atomic_add_zero(mem, value) \ - ({ unsigned char __result; \ +#define __single_op(lock, mem, op) \ + ({ \ if (sizeof (*mem) == 1) \ - __asm __volatile (LOCK_PREFIX "addb %b2, %0; setz %1" \ - : "=m" (*mem), "=qm" (__result) \ - : IBR_CONSTRAINT (value), "m" (*mem)); \ + __asm __volatile (lock #op "b %b0" \ + : "=m" (*mem) \ + : "m" (*mem) \ + : "memory", "cc"); \ else if (sizeof (*mem) == 2) \ - __asm __volatile (LOCK_PREFIX "addw %w2, %0; setz %1" \ - : "=m" (*mem), "=qm" (__result) \ - : "ir" (value), "m" (*mem)); \ + __asm __volatile (lock #op "w %b0" \ + : "=m" (*mem) \ + : "m" (*mem) \ + : "memory", "cc"); \ else if (sizeof (*mem) == 4) \ - __asm __volatile (LOCK_PREFIX "addl %2, %0; setz %1" \ - : "=m" (*mem), "=qm" (__result) \ - : "ir" (value), "m" (*mem)); \ + __asm __volatile (lock #op "l %b0" \ + : "=m" (*mem) \ + : "m" (*mem) \ + : "memory", "cc"); \ else if (__HAVE_64B_ATOMICS) \ - __asm __volatile (LOCK_PREFIX "addq %q2, %0; setz %1" \ - : "=m" (*mem), "=qm" (__result) \ - : "ir" ((int64_t) cast_to_integer (value)), \ - "m" (*mem)); \ + __asm __volatile (lock #op "q %b0" \ + : "=m" (*mem) \ + : "m" (*mem) \ + : "memory", "cc"); \ else \ - __atomic_link_error (); \ - __result; }) + __atomic_link_error (); \ + }) +/* Note that we need no lock prefix. */ +#define atomic_exchange_acq(mem, newvalue) \ + __xchg_op ("", (mem), (newvalue), xchg) -#define __arch_increment_body(lock, pfx, mem) \ - do { \ - if (sizeof (*mem) == 1) \ - __asm __volatile (lock "incb %b0" \ - : "=m" (*mem) \ - : "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (sizeof (*mem) == 2) \ - __asm __volatile (lock "incw %w0" \ - : "=m" (*mem) \ - : "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (sizeof (*mem) == 4) \ - __asm __volatile (lock "incl %0" \ - : "=m" (*mem) \ - : "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (__HAVE_64B_ATOMICS) \ - __asm __volatile (lock "incq %q0" \ - : "=m" (*mem) \ - : "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else \ - do_add_val_64_acq (pfx, mem, 1); \ - } while (0) +#define atomic_add(mem, value) \ + __xchg_op (LOCK_PREFIX, (mem), (value), add); \ -#define atomic_increment(mem) __arch_increment_body (LOCK_PREFIX, __arch, mem) +#define catomic_add(mem, value) \ + ({ \ + if (SINGLE_THREAD_P) \ + __xchg_op ("", (mem), (value), add); \ + else \ + atomic_add (mem, value); \ + }) -#define __arch_increment_cprefix \ - "cmpl $0, %%" SEG_REG ":%P2\n\tje 0f\n\tlock\n0:\t" -#define catomic_increment(mem) \ - __arch_increment_body (__arch_increment_cprefix, __arch_c, mem) +#define atomic_increment(mem) \ + __single_op (LOCK_PREFIX, (mem), inc) +#define catomic_increment(mem) \ + ({ \ + if (SINGLE_THREAD_P) \ + __single_op ("", (mem), inc); \ + else \ + atomic_increment (mem); \ + }) #define atomic_increment_and_test(mem) \ ({ unsigned char __result; \ @@ -357,43 +210,20 @@ : "=m" (*mem), "=qm" (__result) \ : "m" (*mem)); \ else \ - __atomic_link_error (); \ + __atomic_link_error (); \ __result; }) -#define __arch_decrement_body(lock, pfx, mem) \ - do { \ - if (sizeof (*mem) == 1) \ - __asm __volatile (lock "decb %b0" \ - : "=m" (*mem) \ - : "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (sizeof (*mem) == 2) \ - __asm __volatile (lock "decw %w0" \ - : "=m" (*mem) \ - : "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (sizeof (*mem) == 4) \ - __asm __volatile (lock "decl %0" \ - : "=m" (*mem) \ - : "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (__HAVE_64B_ATOMICS) \ - __asm __volatile (lock "decq %q0" \ - : "=m" (*mem) \ - : "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else \ - do_add_val_64_acq (pfx, mem, -1); \ - } while (0) - -#define atomic_decrement(mem) __arch_decrement_body (LOCK_PREFIX, __arch, mem) - -#define __arch_decrement_cprefix \ - "cmpl $0, %%" SEG_REG ":%P2\n\tje 0f\n\tlock\n0:\t" +#define atomic_decrement(mem) \ + __single_op (LOCK_PREFIX, (mem), dec) -#define catomic_decrement(mem) \ - __arch_decrement_body (__arch_decrement_cprefix, __arch_c, mem) +#define catomic_decrement(mem) \ + ({ \ + if (SINGLE_THREAD_P) \ + __single_op ("", (mem), dec); \ + else \ + atomic_decrement (mem); \ + }) #define atomic_decrement_and_test(mem) \ @@ -463,73 +293,31 @@ : "=q" (__result), "=m" (*mem) \ : "m" (*mem), "ir" (bit)); \ else \ - __atomic_link_error (); \ + __atomic_link_error (); \ __result; }) -#define __arch_and_body(lock, mem, mask) \ - do { \ - if (sizeof (*mem) == 1) \ - __asm __volatile (lock "andb %b1, %0" \ - : "=m" (*mem) \ - : IBR_CONSTRAINT (mask), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (sizeof (*mem) == 2) \ - __asm __volatile (lock "andw %w1, %0" \ - : "=m" (*mem) \ - : "ir" (mask), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (sizeof (*mem) == 4) \ - __asm __volatile (lock "andl %1, %0" \ - : "=m" (*mem) \ - : "ir" (mask), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (__HAVE_64B_ATOMICS) \ - __asm __volatile (lock "andq %q1, %0" \ - : "=m" (*mem) \ - : "ir" (mask), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else \ - __atomic_link_error (); \ - } while (0) - -#define __arch_cprefix \ - "cmpl $0, %%" SEG_REG ":%P3\n\tje 0f\n\tlock\n0:\t" - -#define atomic_and(mem, mask) __arch_and_body (LOCK_PREFIX, mem, mask) - -#define catomic_and(mem, mask) __arch_and_body (__arch_cprefix, mem, mask) +#define atomic_and(mem, mask) \ + __xchg_op (LOCK_PREFIX, (mem), (mask), and) +#define catomic_and(mem, mask) \ + ({ \ + if (SINGLE_THREAD_P) \ + __xchg_op ("", (mem), (mask), and); \ + else \ + atomic_and (mem, mask); \ + }) -#define __arch_or_body(lock, mem, mask) \ - do { \ - if (sizeof (*mem) == 1) \ - __asm __volatile (lock "orb %b1, %0" \ - : "=m" (*mem) \ - : IBR_CONSTRAINT (mask), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (sizeof (*mem) == 2) \ - __asm __volatile (lock "orw %w1, %0" \ - : "=m" (*mem) \ - : "ir" (mask), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (sizeof (*mem) == 4) \ - __asm __volatile (lock "orl %1, %0" \ - : "=m" (*mem) \ - : "ir" (mask), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else if (__HAVE_64B_ATOMICS) \ - __asm __volatile (lock "orq %q1, %0" \ - : "=m" (*mem) \ - : "ir" (mask), "m" (*mem), \ - "i" (offsetof (tcbhead_t, multiple_threads))); \ - else \ - __atomic_link_error (); \ - } while (0) - -#define atomic_or(mem, mask) __arch_or_body (LOCK_PREFIX, mem, mask) +#define atomic_or(mem, mask) \ + __xchg_op (LOCK_PREFIX, (mem), (mask), or) -#define catomic_or(mem, mask) __arch_or_body (__arch_cprefix, mem, mask) +#define catomic_or(mem, mask) \ + ({ \ + if (SINGLE_THREAD_P) \ + __xchg_op ("", (mem), (mask), or); \ + else \ + atomic_or (mem, mask); \ + }) /* We don't use mfence because it is supposedly slower due to having to provide stronger guarantees (e.g., regarding self-modifying code). */ diff --git a/sysdeps/x86_64/nptl/tcb-offsets.sym b/sysdeps/x86_64/nptl/tcb-offsets.sym index 2bbd563a6c..8ec55a7ea8 100644 --- a/sysdeps/x86_64/nptl/tcb-offsets.sym +++ b/sysdeps/x86_64/nptl/tcb-offsets.sym @@ -9,7 +9,6 @@ CLEANUP_JMP_BUF offsetof (struct pthread, cleanup_jmp_buf) CLEANUP offsetof (struct pthread, cleanup) CLEANUP_PREV offsetof (struct _pthread_cleanup_buffer, __prev) MUTEX_FUTEX offsetof (pthread_mutex_t, __data.__lock) -MULTIPLE_THREADS_OFFSET offsetof (tcbhead_t, multiple_threads) POINTER_GUARD offsetof (tcbhead_t, pointer_guard) FEATURE_1_OFFSET offsetof (tcbhead_t, feature_1) SSP_BASE_OFFSET offsetof (tcbhead_t, ssp_base) diff --git a/sysdeps/x86_64/nptl/tls.h b/sysdeps/x86_64/nptl/tls.h index 75f8020975..7967df571f 100644 --- a/sysdeps/x86_64/nptl/tls.h +++ b/sysdeps/x86_64/nptl/tls.h @@ -45,7 +45,7 @@ typedef struct thread descriptor used by libpthread. */ dtv_t *dtv; void *self; /* Pointer to the thread descriptor. */ - int multiple_threads; + int unused_multiple_threads; int gscope_flag; uintptr_t sysinfo; uintptr_t stack_guard;