[ARM] Fix _dl_tlsdesc_resolve_hold to save r0
Commit Message
_dl_tlsdesc_resolve_hold calls into a C function that clobbers r0,
but it assumes the original argument is still in r0 after the call.
This can cause crash in case of concurrent TLS access when TLSDESC
is in use (-mtls-dialect=gnu2).
Run into this while fixing BZ 18572.
Both r0 and r1 are saved/restored so the stack remains 8 byte aligned.
ChangeLog:
2015-10-14 Szabolcs Nagy <szabolcs.nagy@arm.com>
* sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_resolve_hold): Save and restore
r0 and r1.
Comments
On Wed, 14 Oct 2015, Szabolcs Nagy wrote:
> _dl_tlsdesc_resolve_hold calls into a C function that clobbers r0,
> but it assumes the original argument is still in r0 after the call.
> This can cause crash in case of concurrent TLS access when TLSDESC
> is in use (-mtls-dialect=gnu2).
I presume this issue was user-visible in a release, in which case a bug
should be filed in Bugzilla for it. Should I take it that it's hard to
write a testcase for this bug that fails reliably without the patch?
On 14/10/15 13:17, Joseph Myers wrote:
> On Wed, 14 Oct 2015, Szabolcs Nagy wrote:
>
>> _dl_tlsdesc_resolve_hold calls into a C function that clobbers r0,
>> but it assumes the original argument is still in r0 after the call.
>> This can cause crash in case of concurrent TLS access when TLSDESC
>> is in use (-mtls-dialect=gnu2).
>
> I presume this issue was user-visible in a release, in which case a bug
> should be filed in Bugzilla for it. Should I take it that it's hard to
> write a testcase for this bug that fails reliably without the patch?
>
ok it's BZ 19129
i have a test that triggers this or the other tlsdesc race
i was about to fix, but it is a bit expensive.
(666 tls objects accessed from 66 threads, 2 versions to
get tlsdesc entries 8byte-unaligned, running 10x triggers
the failure, a smaller test triggers less frequently.)
i'm not sure what's the glibc policy for expensive tests.
and i think -mtls-dialect=gnu2 is not available in all
supported gcc versions.
(the test makes sense on all archs where lazy tls
initialization may be racy).
On Wed, Oct 14, 2015 at 2:51 PM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 14/10/15 13:17, Joseph Myers wrote:
>>
>> On Wed, 14 Oct 2015, Szabolcs Nagy wrote:
>>
>>> _dl_tlsdesc_resolve_hold calls into a C function that clobbers r0,
>>> but it assumes the original argument is still in r0 after the call.
>>> This can cause crash in case of concurrent TLS access when TLSDESC
>>> is in use (-mtls-dialect=gnu2).
>>
>>
>> I presume this issue was user-visible in a release, in which case a bug
>> should be filed in Bugzilla for it. Should I take it that it's hard to
>> write a testcase for this bug that fails reliably without the patch?
>>
>
> ok it's BZ 19129
>
> i have a test that triggers this or the other tlsdesc race
> i was about to fix, but it is a bit expensive.
>
> (666 tls objects accessed from 66 threads, 2 versions to
> get tlsdesc entries 8byte-unaligned, running 10x triggers
> the failure, a smaller test triggers less frequently.)
>
> i'm not sure what's the glibc policy for expensive tests.
>
> and i think -mtls-dialect=gnu2 is not available in all
> supported gcc versions.
Support for this appeared in FSF GCC 4.7
Ramana
>
> (the test makes sense on all archs where lazy tls
> initialization may be racy).
>
On Wed, 14 Oct 2015, Szabolcs Nagy wrote:
> On 14/10/15 13:17, Joseph Myers wrote:
> > On Wed, 14 Oct 2015, Szabolcs Nagy wrote:
> >
> > > _dl_tlsdesc_resolve_hold calls into a C function that clobbers r0,
> > > but it assumes the original argument is still in r0 after the call.
> > > This can cause crash in case of concurrent TLS access when TLSDESC
> > > is in use (-mtls-dialect=gnu2).
> >
> > I presume this issue was user-visible in a release, in which case a bug
> > should be filed in Bugzilla for it. Should I take it that it's hard to
> > write a testcase for this bug that fails reliably without the patch?
> >
>
> ok it's BZ 19129
Thanks. The patch is OK with the usual [BZ #19129] notation in the
ChangeLog, addition to the list of fixed bugs in NEWS and closing of the
bug as FIXED with the right milestone.
> i have a test that triggers this or the other tlsdesc race
> i was about to fix, but it is a bit expensive.
>
> (666 tls objects accessed from 66 threads, 2 versions to
> get tlsdesc entries 8byte-unaligned, running 10x triggers
> the failure, a smaller test triggers less frequently.)
>
> i'm not sure what's the glibc policy for expensive tests.
Well, at least it should be attached to the bug in Bugzilla so it's
available to other people. What are the actual memory / CPU time
requirements of this test?
On 14/10/15 16:09, Joseph Myers wrote:
> On Wed, 14 Oct 2015, Szabolcs Nagy wrote:
>> On 14/10/15 13:17, Joseph Myers wrote:
>>> On Wed, 14 Oct 2015, Szabolcs Nagy wrote:
>> i have a test that triggers this or the other tlsdesc race
>> i was about to fix, but it is a bit expensive.
>>
>> (666 tls objects accessed from 66 threads, 2 versions to
>> get tlsdesc entries 8byte-unaligned, running 10x triggers
>> the failure, a smaller test triggers less frequently.)
>>
>> i'm not sure what's the glibc policy for expensive tests.
>
> Well, at least it should be attached to the bug in Bugzilla so it's
> available to other people. What are the actual memory / CPU time
> requirements of this test?
>
i can run the test 10 times under a second,
the memory usage is small (but the address
space usage of the thread stacks may cause
problems with memory overcommit disabled).
native compilation of the 666 sections takes some
time (i put each tls object into different section
so each gets its own tlsdesc relocations).
(the code is partly generated i see if i can
cut it down a bit more for inclusion into glibc).
@@ -196,21 +196,30 @@ _dl_tlsdesc_lazy_resolver:
eabi_fnstart
.align 2
_dl_tlsdesc_resolve_hold:
- eabi_save ({r2,r3,ip,lr})
- push {r2, r3, ip, lr}
- cfi_adjust_cfa_offset (16)
- cfi_rel_offset (r2, 0)
- cfi_rel_offset (r3, 4)
- cfi_rel_offset (ip, 8)
- cfi_rel_offset (lr, 12)
+ /* r0 is saved so its original value can be used after the call and
+ r1 is saved only to keep the stack aligned. (r0 points to the tls
+ descriptor, it is passed to _dl_tlsdesc_resolve_hold_fixup which
+ is a void function that may clobber r0, later r0 is used to load
+ the new resolver.) */
+ eabi_save ({r0,r1,r2,r3,ip,lr})
+ push {r0, r1, r2, r3, ip, lr}
+ cfi_adjust_cfa_offset (24)
+ cfi_rel_offset (r0, 0)
+ cfi_rel_offset (r1, 4)
+ cfi_rel_offset (r2, 8)
+ cfi_rel_offset (r3, 12)
+ cfi_rel_offset (ip, 16)
+ cfi_rel_offset (lr, 20)
adr r1, _dl_tlsdesc_resolve_hold
bl _dl_tlsdesc_resolve_hold_fixup
- pop {r2, r3, ip, lr}
- cfi_adjust_cfa_offset (-16)
+ pop {r0, r1, r2, r3, ip, lr}
+ cfi_adjust_cfa_offset (-24)
cfi_restore (lr)
cfi_restore (ip)
cfi_restore (r3)
cfi_restore (r2)
+ cfi_restore (r1)
+ cfi_restore (r0)
sfi_breg r0, \
ldr r1, [\B, #4]
BX (r1)