From patchwork Thu Mar  7 23:50:12 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Baldwin <jhb@freebsd.org>
X-Patchwork-Id: 31782
Received: (qmail 103255 invoked by alias); 7 Mar 2019 23:50:39 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Delivered-To: mailing list gdb-patches@sourceware.org
Received: (qmail 103247 invoked by uid 89); 7 Mar 2019 23:50:39 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-14.4 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,
	SPF_PASS autolearn=ham version=3.3.1 spammy=understands,
	inferiors
X-HELO: mx2.freebsd.org
Received: from mx2.freebsd.org (HELO mx2.freebsd.org) (8.8.178.116) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Thu, 07 Mar 2019 23:50:37 +0000
Received: from mx1.freebsd.org (mx1.freebsd.org
	[IPv6:2610:1c1:1:606c::19:1])	(using TLSv1.3 with cipher
	TLS_AES_256_GCM_SHA384 (256/256 bits))	(Client CN
	"mx1.freebsd.org", Issuer "Let's Encrypt Authority X3"
	(verified OK))	by mx2.freebsd.org (Postfix) with ESMTPS id
	E44406A868;
	Thu, 7 Mar 2019 23:50:35 +0000 (UTC)	(envelope-from jhb@FreeBSD.org)
Received: from smtp.freebsd.org (smtp.freebsd.org
	[IPv6:2610:1c1:1:606c::24b:4])	(using TLSv1.3 with cipher
	TLS_AES_256_GCM_SHA384 (256/256 bits)	 server-signature
	RSA-PSS (4096 bits)	 client-signature RSA-PSS (4096 bits)
	client-digest SHA256)	(Client CN "smtp.freebsd.org", Issuer
	"Let's Encrypt Authority X3" (verified OK))	by
	mx1.freebsd.org (Postfix) with ESMTPS id 88FBC80CBA;
	Thu, 7 Mar 2019 23:50:34 +0000 (UTC)	(envelope-from jhb@FreeBSD.org)
Received: from John-Baldwins-MacBook-Pro-3.local (ralph.baldwin.cx
	[66.234.199.215])	(using TLSv1.2 with cipher
	ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))	(Client did not
	present a certificate)	(Authenticated sender: jhb)	by
	smtp.freebsd.org (Postfix) with ESMTPSA id 1BFA311564;
	Thu, 7 Mar 2019 23:50:34 +0000 (UTC)	(envelope-from jhb@FreeBSD.org)
Subject: Re: [PATCH v2 04/11] Add a new gdbarch method to resolve the
	address of TLS variables.
To: Simon Marchi <simark@simark.ca>, gdb-patches@sourceware.org
References: <cover.1549672588.git.jhb@FreeBSD.org>
	<4db33aead3f31532b7d4e165d9786df792a4d925.1549672588.git.jhb@FreeBSD.org>
	<02c8a44b-b1d2-0f0f-9b6f-72a0fb673f83@simark.ca>
From: John Baldwin <jhb@FreeBSD.org>
Openpgp: preference=signencrypt
Message-ID: <2c282f52-0269-d6a8-8533-4c00b1a4ee8d@FreeBSD.org>
Date: Thu, 7 Mar 2019 15:50:12 -0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12;
	rv:60.0) Gecko/20100101 Thunderbird/60.5.3
MIME-Version: 1.0
In-Reply-To: <02c8a44b-b1d2-0f0f-9b6f-72a0fb673f83@simark.ca>
X-Rspamd-Queue-Id: 88FBC80CBA
X-Spamd-Bar: --
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-2.96 / 15.00];
	local_wl_from(0.00)[FreeBSD.org];
	NEURAL_HAM_MEDIUM(-1.00)[-0.999,0];
	NEURAL_HAM_LONG(-1.00)[-1.000,0];
	NEURAL_HAM_SHORT(-0.96)[-0.964,0];
	ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US]
X-IsSubscribed: yes

On 3/7/19 8:08 AM, Simon Marchi wrote:
> On 2019-02-08 7:40 p.m., John Baldwin wrote:
>> Permit TLS variable addresses to be resolved purely by an ABI rather
>> than requiring a target method.  This doesn't try the target method if
>> the ABI function is present (even if the ABI function fails) to
>> simplify error handling.
> 
> I don't see anything wrong with the patch (and the comment you removed 
> in target_translate_tls_address hints it is right), but again I am not 
> very familiar with how TLS works, so I wouldn't spot if anything was 
> conceptually wrong with this approach.  I would appreciate if another 
> maintainer could take a look and give their opinion.

Ok.  FWIW, the reason for target vs gdbarch has to do with the different ways one can
resolve a TLS variable.  Some background:

In ELF relocations, a TLS variable is identified by an offset in a special TLS section
of an ELF file, similar to global symbols being an offset relative to .data or .bss.
However, TLS variables are duplicated for each thread.  To support this, the runtime
linker allocates an array of pointers for each thread called the DTV array.  The runtime
linker also assigns an array index to each ELF object, so the executable is assigned array
index 1, and other shared libraries that use TLS are assigned indices as they are mapped
by the runtime linker.  The pointers in each thread's array point to the per-thread blocks
of TLS variables for a given ELF object.  Thus, if index 1 is for my program and index 2
was assigned to libc, then DTV[1] contains a pointer to all of the TLS variables in my
main program and DTV[2] contains a pointer to all of the TLS variables in libc.

Thus, if libc has two TLS integers 'foo' and 'bar', they might be assigned offsets of
0 and 4.  To read the value of 'foo' one uses the expression '*(int *)(DTV[1])'.  To
read 'bar' you would use '*(int *)(DTV[1] + 4)'.

There are some extra optimizations in the compiler-generated code (there's something
called static TLS that can be at a fixed offset from the per-thread TCB pointer IIRC,
but there are also valid DTV[] pointers that can get to the same variables just via
more indirection.  Compiled code is also allowed to fetch the 'base' of a TLS block
for a given shared object and then save that 'base' and use offsets from it to access
different variables.  Put another way, the compiler can assume that &bar - &foo is
always '4' and just add the relative offset to '&foo' to compute '&bar' without going
through the DTV array every time).

In target_translate_tls_address() we are given the 'struct objfile' of the ELF object
and the offset of the TLS variable we are trying to find.  The
gdbarch_fetch_tls_load_module_address function fetches the pointer to the runtime
linker's data structure describing that ELF object.

The target version (target::get_thread_local_address) expects to use some target
specific method to turn a (thread, linker_module_addr, offset) tuple into the
address of a TLS variable.  On Linux and other systems using libthread_db for this,
it calls a libthread_db function.  Internally that libthread_db function looks at
the runtime linker's structure to extract the TLS index of the ELF object.  It
then looks in the thread library's per-thread data structure to find a pointer
to the DTV array.  It then uses 'DTV[index] + offset' to compute the final address.
Note that this is all done in libthread_db rather than in gdb itself.

The gdbarch method I'm using for FreeBSD doesn't use libthread_db.  Instead, it
more closely mimics what the compiler-generated code does.  Many architectures use
some sort of register to point to a per-thread Thread Control Block (TCB), and they
store a pointer to the DTV array either in the TCB or at a fixed offset relative to
the TCB.  For example, 64-bit x86 uses the %fs segment prefix to access the TCB,
and the %fs_base register is thus a pointer to the TCB.  32-bit x86 uses %gs and
%gs_base instead.  RISC-V has a 'tp' register for this purpose, etc.  The approach
I'm using for FreeBSD is to provide an architecture-specific function that uses the
relevant register to locate the pointer to the DTV array.  It then calls a shared
function (patch 7) that extracts the TLS index from the runtime linker's data
structure and computes the final address via 'DTV[index] + offset'.

Mostly I did this because I don't like libthread_db, but using a gdbarch method
should also be a bit more cross-debugger friendly (you don't have to have a libthread_db
on a FreeBSD host that understands the Linux runtime linker or thread library or
vice versa, and similar concerns with 32-bit vs 64-bit and x86 vs ARM, etc.).

>> diff --git a/gdb/gdbarch.sh b/gdb/gdbarch.sh
>> index afc4da7cdd..09097bcbaf 100755
>> --- a/gdb/gdbarch.sh
>> +++ b/gdb/gdbarch.sh
>> @@ -602,6 +602,7 @@ m;int;remote_register_number;int regno;regno;;default_remote_register_number;;0
>>   
>>   # Fetch the target specific address used to represent a load module.
>>   F;CORE_ADDR;fetch_tls_load_module_address;struct objfile *objfile;objfile
>> +M;CORE_ADDR;get_thread_local_address;ptid_t ptid, CORE_ADDR lm_addr, CORE_ADDR offset;ptid, lm_addr, offset
> 
> Could you document the method, especially the meaning of the parameters?

Sure.  I used a variant of the comment from the target method:

diff --git a/gdb/gdbarch.sh b/gdb/gdbarch.sh
index 48fcebd19a..d15b6aa794 100755
--- a/gdb/gdbarch.sh
+++ b/gdb/gdbarch.sh
@@ -602,6 +602,14 @@ m;int;remote_register_number;int regno;regno;;default_remote_register_number;;0
 
 # Fetch the target specific address used to represent a load module.
 F;CORE_ADDR;fetch_tls_load_module_address;struct objfile *objfile;objfile
+
+# Return the thread-local address at OFFSET in the thread-local
+# storage for the thread PTID and the shared library or executable
+# file given by LM_ADDR.  If that block of thread-local storage hasn't
+# been allocated yet, this function may return an error.  LM_ADDR may
+# be zero for statically linked multithreaded inferiors.
+
+M;CORE_ADDR;get_thread_local_address;ptid_t ptid, CORE_ADDR lm_addr, CORE_ADDR offset;ptid, lm_addr, offset
 #
 v;CORE_ADDR;frame_args_skip;;;0;;;0
 m;CORE_ADDR;unwind_pc;struct frame_info *next_frame;next_frame;;default_unwind_pc;;0