[ARM] Fix _dl_tlsdesc_resolve_hold to save r0

Message ID 561E1E63.3020404@arm.com
State New, archived
Headers

Commit Message

Szabolcs Nagy Oct. 14, 2015, 9:20 a.m. UTC
  _dl_tlsdesc_resolve_hold calls into a C function that clobbers r0,
but it assumes the original argument is still in r0 after the call.
This can cause crash in case of concurrent TLS access when TLSDESC
is in use (-mtls-dialect=gnu2).

Run into this while fixing BZ 18572.

Both r0 and r1 are saved/restored so the stack remains 8 byte aligned.

ChangeLog:

2015-10-14  Szabolcs Nagy  <szabolcs.nagy@arm.com>

	* sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_resolve_hold): Save and restore
	r0 and r1.
  

Comments

Joseph Myers Oct. 14, 2015, 12:17 p.m. UTC | #1
On Wed, 14 Oct 2015, Szabolcs Nagy wrote:

> _dl_tlsdesc_resolve_hold calls into a C function that clobbers r0,
> but it assumes the original argument is still in r0 after the call.
> This can cause crash in case of concurrent TLS access when TLSDESC
> is in use (-mtls-dialect=gnu2).

I presume this issue was user-visible in a release, in which case a bug 
should be filed in Bugzilla for it.  Should I take it that it's hard to 
write a testcase for this bug that fails reliably without the patch?
  
Szabolcs Nagy Oct. 14, 2015, 1:51 p.m. UTC | #2
On 14/10/15 13:17, Joseph Myers wrote:
> On Wed, 14 Oct 2015, Szabolcs Nagy wrote:
>
>> _dl_tlsdesc_resolve_hold calls into a C function that clobbers r0,
>> but it assumes the original argument is still in r0 after the call.
>> This can cause crash in case of concurrent TLS access when TLSDESC
>> is in use (-mtls-dialect=gnu2).
>
> I presume this issue was user-visible in a release, in which case a bug
> should be filed in Bugzilla for it.  Should I take it that it's hard to
> write a testcase for this bug that fails reliably without the patch?
>

ok it's BZ 19129

i have a test that triggers this or the other tlsdesc race
i was about to fix, but it is a bit expensive.

(666 tls objects accessed from 66 threads, 2 versions to
get tlsdesc entries 8byte-unaligned, running 10x triggers
the failure, a smaller test triggers less frequently.)

i'm not sure what's the glibc policy for expensive tests.

and i think -mtls-dialect=gnu2 is not available in all
supported gcc versions.

(the test makes sense on all archs where lazy tls
initialization may be racy).
  
Ramana Radhakrishnan Oct. 14, 2015, 3:04 p.m. UTC | #3
On Wed, Oct 14, 2015 at 2:51 PM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 14/10/15 13:17, Joseph Myers wrote:
>>
>> On Wed, 14 Oct 2015, Szabolcs Nagy wrote:
>>
>>> _dl_tlsdesc_resolve_hold calls into a C function that clobbers r0,
>>> but it assumes the original argument is still in r0 after the call.
>>> This can cause crash in case of concurrent TLS access when TLSDESC
>>> is in use (-mtls-dialect=gnu2).
>>
>>
>> I presume this issue was user-visible in a release, in which case a bug
>> should be filed in Bugzilla for it.  Should I take it that it's hard to
>> write a testcase for this bug that fails reliably without the patch?
>>
>
> ok it's BZ 19129
>
> i have a test that triggers this or the other tlsdesc race
> i was about to fix, but it is a bit expensive.
>
> (666 tls objects accessed from 66 threads, 2 versions to
> get tlsdesc entries 8byte-unaligned, running 10x triggers
> the failure, a smaller test triggers less frequently.)
>
> i'm not sure what's the glibc policy for expensive tests.
>
> and i think -mtls-dialect=gnu2 is not available in all
> supported gcc versions.

Support for this appeared in FSF GCC 4.7


Ramana

>
> (the test makes sense on all archs where lazy tls
> initialization may be racy).
>
  
Joseph Myers Oct. 14, 2015, 3:09 p.m. UTC | #4
On Wed, 14 Oct 2015, Szabolcs Nagy wrote:

> On 14/10/15 13:17, Joseph Myers wrote:
> > On Wed, 14 Oct 2015, Szabolcs Nagy wrote:
> > 
> > > _dl_tlsdesc_resolve_hold calls into a C function that clobbers r0,
> > > but it assumes the original argument is still in r0 after the call.
> > > This can cause crash in case of concurrent TLS access when TLSDESC
> > > is in use (-mtls-dialect=gnu2).
> > 
> > I presume this issue was user-visible in a release, in which case a bug
> > should be filed in Bugzilla for it.  Should I take it that it's hard to
> > write a testcase for this bug that fails reliably without the patch?
> > 
> 
> ok it's BZ 19129

Thanks.  The patch is OK with the usual [BZ #19129] notation in the 
ChangeLog, addition to the list of fixed bugs in NEWS and closing of the 
bug as FIXED with the right milestone.

> i have a test that triggers this or the other tlsdesc race
> i was about to fix, but it is a bit expensive.
> 
> (666 tls objects accessed from 66 threads, 2 versions to
> get tlsdesc entries 8byte-unaligned, running 10x triggers
> the failure, a smaller test triggers less frequently.)
> 
> i'm not sure what's the glibc policy for expensive tests.

Well, at least it should be attached to the bug in Bugzilla so it's 
available to other people.  What are the actual memory / CPU time 
requirements of this test?
  
Szabolcs Nagy Oct. 14, 2015, 3:29 p.m. UTC | #5
On 14/10/15 16:09, Joseph Myers wrote:
> On Wed, 14 Oct 2015, Szabolcs Nagy wrote:
>> On 14/10/15 13:17, Joseph Myers wrote:
>>> On Wed, 14 Oct 2015, Szabolcs Nagy wrote:
>> i have a test that triggers this or the other tlsdesc race
>> i was about to fix, but it is a bit expensive.
>>
>> (666 tls objects accessed from 66 threads, 2 versions to
>> get tlsdesc entries 8byte-unaligned, running 10x triggers
>> the failure, a smaller test triggers less frequently.)
>>
>> i'm not sure what's the glibc policy for expensive tests.
>
> Well, at least it should be attached to the bug in Bugzilla so it's
> available to other people.  What are the actual memory / CPU time
> requirements of this test?
>

i can run the test 10 times under a second,
the memory usage is small (but the address
space usage of the thread stacks may cause
problems with memory overcommit disabled).

native compilation of the 666 sections takes some
time (i put each tls object into different section
so each gets its own tlsdesc relocations).

(the code is partly generated i see if i can
cut it down a bit more for inclusion into glibc).
  

Patch

diff --git a/sysdeps/arm/dl-tlsdesc.S b/sysdeps/arm/dl-tlsdesc.S
index e42ca68..33a2695 100644
--- a/sysdeps/arm/dl-tlsdesc.S
+++ b/sysdeps/arm/dl-tlsdesc.S
@@ -196,21 +196,30 @@  _dl_tlsdesc_lazy_resolver:
 	eabi_fnstart
 	.align 2
 _dl_tlsdesc_resolve_hold:
-	eabi_save ({r2,r3,ip,lr})
-	push	{r2, r3, ip, lr}
-	cfi_adjust_cfa_offset (16)
-	cfi_rel_offset (r2, 0)
-	cfi_rel_offset (r3, 4)
-	cfi_rel_offset (ip, 8)
-	cfi_rel_offset (lr, 12)
+	/* r0 is saved so its original value can be used after the call and
+	   r1 is saved only to keep the stack aligned.  (r0 points to the tls
+	   descriptor, it is passed to _dl_tlsdesc_resolve_hold_fixup which
+	   is a void function that may clobber r0, later r0 is used to load
+	   the new resolver.)  */
+	eabi_save ({r0,r1,r2,r3,ip,lr})
+	push	{r0, r1, r2, r3, ip, lr}
+	cfi_adjust_cfa_offset (24)
+	cfi_rel_offset (r0, 0)
+	cfi_rel_offset (r1, 4)
+	cfi_rel_offset (r2, 8)
+	cfi_rel_offset (r3, 12)
+	cfi_rel_offset (ip, 16)
+	cfi_rel_offset (lr, 20)
 	adr	r1, _dl_tlsdesc_resolve_hold
 	bl	_dl_tlsdesc_resolve_hold_fixup
-	pop	{r2, r3, ip, lr}
-	cfi_adjust_cfa_offset (-16)
+	pop	{r0, r1, r2, r3, ip, lr}
+	cfi_adjust_cfa_offset (-24)
 	cfi_restore (lr)
 	cfi_restore (ip)
 	cfi_restore (r3)
 	cfi_restore (r2)
+	cfi_restore (r1)
+	cfi_restore (r0)
 	sfi_breg r0, \
 	ldr     r1, [\B, #4]
 	BX      (r1)