Questions about failing testcase nptl/test-mutex-printers

Message ID a40c850a-7e99-9a67-10db-8ec2003f7f1e@linux.vnet.ibm.com
State Committed
Headers

Commit Message

Stefan Liebler March 28, 2018, 11:58 a.m. UTC
  Hi,

I'm running the testcase nptl/test-mutex-printers with all the needed 
prerequirements (build with debug-info, python-pexpect, gdb, ...).

The test passes if I start it without lock-elision.
If I run it with lock-elision (export 
GLIBC_TUNABLES=glibc.elision.enable=1) on s390x,
it fails with:
Error: Response does not match the expected pattern.
Command: print *mutex
Expected pattern: pthread_mutex_t
Response:  No symbol "mutex" in current context.
(gdb)


The program built with test-mutex-printers.c is started in gdb via 
test-mutex-printers.py.

But you don't need the script to reproduce the issue.
Just start the test-binary with gdb:
-gdb --args nptl/test-mutex-printers --direct
-(gdb) set environment GLIBC_TUNABLES glibc.elision.enable=1
-(gdb) b 79
  (gdb) r
Now we break at line 79:"      && pthread_mutex_lock (mutex) == 0 /* 
Test status (non-robust).  */"
just before the call of pthread_mutex_lock().
-"next" shows us the real issue:
(gdb) n

Program received signal SIGILL, Illegal instruction.
0x000003fffdf14f1c in __lll_lock_elision (futex=0x3ffffffe9b0,
     adapt_count=0x3ffffffe9c6, private=0)
     at ../sysdeps/unix/sysv/linux/s390/elision-lock.c:59

(gdb) disassemble
Dump of assembler code for function __lll_lock_elision:
...
    0x000003fffdf14f12 <+170>:	tbegin	0,65294
    0x000003fffdf14f18 <+176>:	jne	0x3fffdf14f24 <__lll_lock_elision+188>
=> 0x000003fffdf14f1c <+180>:	lhi	%r0,0
    0x000003fffdf14f20 <+184>:	j	0x3fffdf14f66 <__lll_lock_elision+254>
...

-Note: If we are now following the commands in test-mutex-printers.py 
script, we get the same response:
(gdb) print *mutex
No symbol "mutex" in current context.


I'm currently using GNU gdb (GDB) Fedora 8.0.1-30.fc27.
And yes, this is definitively a bug in gdb.

As I don't have access to other lock-elision enabled machines,
can somebody test this on power / intel?
What is the behaviour if you step over pthread_mutex_lock() with "next" 
if lock-elision is enabled?
Does the transaction abort and we are debugging the fallback-path 
(without a transaction) or does it stop just after pthread_mutex_unlock()?

If I step manually to the tbegin-instruction (which starts the 
transaction on s390x) and step over it, then gdb steps over the whole 
transaction and we are just after the tend-instruction.

Does it make sense to disable lock-elision for the pretty-printer-tests?
E.g. with the following patch:

Bye.
Stefan
  

Comments

Carlos O'Donell March 28, 2018, 1:15 p.m. UTC | #1
On 03/28/2018 06:58 AM, Stefan Liebler wrote:
> Does it make sense to disable lock-elision for the pretty-printer-tests?

Yes it does, and that is probably the correct answer in this case.

Many of the tests rely on looking at the internal details of the lock,
particularly if you're going to pretty-print it, and elision does not allow
that to happen because the lock and any modifications are elided.

> E.g. with the following patch:
> diff --git a/scripts/test_printers_common.py b/scripts/test_printers_common.py
> index 73ca525556..d74a8b4d4b 100644
> --- a/scripts/test_printers_common.py
> +++ b/scripts/test_printers_common.py
> @@ -171,6 +171,9 @@ def init_test(test_bin, printer_files, printer_names):
>      # Finally, load the test binary.
>      test('file {0}'.format(test_bin))
> 
> +    # Disable lock elision.
> +    test('set environment GLIBC_TUNABLES glibc.elision.enable=0')
> +
>  def go_to_main():
>      """Executes a gdb 'start' command, which takes us to main."""

Someone would have to confirm on at least one other arch to make sure this works
as expected.
  
Stefan Liebler April 5, 2018, 6:43 a.m. UTC | #2
On 03/28/2018 03:15 PM, Carlos O'Donell wrote:
> On 03/28/2018 06:58 AM, Stefan Liebler wrote:
>> Does it make sense to disable lock-elision for the pretty-printer-tests?
> 
> Yes it does, and that is probably the correct answer in this case.
> 
> Many of the tests rely on looking at the internal details of the lock,
> particularly if you're going to pretty-print it, and elision does not allow
> that to happen because the lock and any modifications are elided.
> 
>> E.g. with the following patch:
>> diff --git a/scripts/test_printers_common.py b/scripts/test_printers_common.py
>> index 73ca525556..d74a8b4d4b 100644
>> --- a/scripts/test_printers_common.py
>> +++ b/scripts/test_printers_common.py
>> @@ -171,6 +171,9 @@ def init_test(test_bin, printer_files, printer_names):
>>       # Finally, load the test binary.
>>       test('file {0}'.format(test_bin))
>>
>> +    # Disable lock elision.
>> +    test('set environment GLIBC_TUNABLES glibc.elision.enable=0')
>> +
>>   def go_to_main():
>>       """Executes a gdb 'start' command, which takes us to main."""
> 
> Someone would have to confirm on at least one other arch to make sure this works
> as expected.
> 

PING.
Is there anybody with an intel / power lock-elision machine who can test 
this?
  
Stefan Liebler April 12, 2018, 1:54 p.m. UTC | #3
On 04/05/2018 08:43 AM, Stefan Liebler wrote:
> On 03/28/2018 03:15 PM, Carlos O'Donell wrote:
>> On 03/28/2018 06:58 AM, Stefan Liebler wrote:
>>> Does it make sense to disable lock-elision for the pretty-printer-tests?
>>
>> Yes it does, and that is probably the correct answer in this case.
>>
>> Many of the tests rely on looking at the internal details of the lock,
>> particularly if you're going to pretty-print it, and elision does not 
>> allow
>> that to happen because the lock and any modifications are elided.
>>
>>> E.g. with the following patch:
>>> diff --git a/scripts/test_printers_common.py 
>>> b/scripts/test_printers_common.py
>>> index 73ca525556..d74a8b4d4b 100644
>>> --- a/scripts/test_printers_common.py
>>> +++ b/scripts/test_printers_common.py
>>> @@ -171,6 +171,9 @@ def init_test(test_bin, printer_files, 
>>> printer_names):
>>>       # Finally, load the test binary.
>>>       test('file {0}'.format(test_bin))
>>>
>>> +    # Disable lock elision.
>>> +    test('set environment GLIBC_TUNABLES glibc.elision.enable=0')
>>> +
>>>   def go_to_main():
>>>       """Executes a gdb 'start' command, which takes us to main."""
>>
>> Someone would have to confirm on at least one other arch to make sure 
>> this works
>> as expected.
>>
> 
> PING.
> Is there anybody with an intel / power lock-elision machine who can test 
> this?
> 

PING
  
Stefan Liebler April 18, 2018, 6:26 a.m. UTC | #4
On 04/12/2018 03:54 PM, Stefan Liebler wrote:
> On 04/05/2018 08:43 AM, Stefan Liebler wrote:
>> On 03/28/2018 03:15 PM, Carlos O'Donell wrote:
>>> On 03/28/2018 06:58 AM, Stefan Liebler wrote:
>>>> Does it make sense to disable lock-elision for the 
>>>> pretty-printer-tests?
>>>
>>> Yes it does, and that is probably the correct answer in this case.
>>>
>>> Many of the tests rely on looking at the internal details of the lock,
>>> particularly if you're going to pretty-print it, and elision does not 
>>> allow
>>> that to happen because the lock and any modifications are elided.
>>>
>>>> E.g. with the following patch:
>>>> diff --git a/scripts/test_printers_common.py 
>>>> b/scripts/test_printers_common.py
>>>> index 73ca525556..d74a8b4d4b 100644
>>>> --- a/scripts/test_printers_common.py
>>>> +++ b/scripts/test_printers_common.py
>>>> @@ -171,6 +171,9 @@ def init_test(test_bin, printer_files, 
>>>> printer_names):
>>>>       # Finally, load the test binary.
>>>>       test('file {0}'.format(test_bin))
>>>>
>>>> +    # Disable lock elision.
>>>> +    test('set environment GLIBC_TUNABLES glibc.elision.enable=0')
>>>> +
>>>>   def go_to_main():
>>>>       """Executes a gdb 'start' command, which takes us to main."""
>>>
>>> Someone would have to confirm on at least one other arch to make sure 
>>> this works
>>> as expected.
>>>
>>
>> PING.
>> Is there anybody with an intel / power lock-elision machine who can 
>> test this?
>>
> 
> PING
> 
PING
  
Tulio Magno Quites Machado Filho April 18, 2018, 2:40 p.m. UTC | #5
Stefan Liebler <stli@linux.vnet.ibm.com> writes:

> As I don't have access to other lock-elision enabled machines,
> can somebody test this on power / intel?

Tested on powerpc.  Same behavior.

> If I step manually to the tbegin-instruction (which starts the 
> transaction on s390x) and step over it, then gdb steps over the whole 
> transaction and we are just after the tend-instruction.

That behavior also happens on powerpc with all kinds of transactions.
But it's the GDB behavior with transactions on powerpc.
I don't know the details on s390x.

> Does it make sense to disable lock-elision for the pretty-printer-tests?
> E.g. with the following patch:
> diff --git a/scripts/test_printers_common.py 
> b/scripts/test_printers_common.py
> index 73ca525556..d74a8b4d4b 100644
> --- a/scripts/test_printers_common.py
> +++ b/scripts/test_printers_common.py
> @@ -171,6 +171,9 @@ def init_test(test_bin, printer_files, printer_names):
>       # Finally, load the test binary.
>       test('file {0}'.format(test_bin))
>
> +    # Disable lock elision.
> +    test('set environment GLIBC_TUNABLES glibc.elision.enable=0')
> +
>   def go_to_main():
>       """Executes a gdb 'start' command, which takes us to main."""

LGTM.
  
Stefan Liebler April 19, 2018, 10:50 a.m. UTC | #6
On 04/18/2018 04:40 PM, Tulio Magno Quites Machado Filho wrote:
> Stefan Liebler <stli@linux.vnet.ibm.com> writes:
> 
>> As I don't have access to other lock-elision enabled machines,
>> can somebody test this on power / intel?
> 
> Tested on powerpc.  Same behavior.
> 
>> If I step manually to the tbegin-instruction (which starts the
>> transaction on s390x) and step over it, then gdb steps over the whole
>> transaction and we are just after the tend-instruction.
> 
> That behavior also happens on powerpc with all kinds of transactions.
> But it's the GDB behavior with transactions on powerpc.
> I don't know the details on s390x.
> 
>> Does it make sense to disable lock-elision for the pretty-printer-tests?
>> E.g. with the following patch:
>> diff --git a/scripts/test_printers_common.py
>> b/scripts/test_printers_common.py
>> index 73ca525556..d74a8b4d4b 100644
>> --- a/scripts/test_printers_common.py
>> +++ b/scripts/test_printers_common.py
>> @@ -171,6 +171,9 @@ def init_test(test_bin, printer_files, printer_names):
>>        # Finally, load the test binary.
>>        test('file {0}'.format(test_bin))
>>
>> +    # Disable lock elision.
>> +    test('set environment GLIBC_TUNABLES glibc.elision.enable=0')
>> +
>>    def go_to_main():
>>        """Executes a gdb 'start' command, which takes us to main."""
> 
> LGTM.
> 

Thanks.
Committed:
https://sourceware.org/git/?p=glibc.git;a=commit;h=0085be1415a38b40a5a1a12e49368498f1687380
  

Patch

diff --git a/scripts/test_printers_common.py 
b/scripts/test_printers_common.py
index 73ca525556..d74a8b4d4b 100644
--- a/scripts/test_printers_common.py
+++ b/scripts/test_printers_common.py
@@ -171,6 +171,9 @@  def init_test(test_bin, printer_files, printer_names):
      # Finally, load the test binary.
      test('file {0}'.format(test_bin))

+    # Disable lock elision.
+    test('set environment GLIBC_TUNABLES glibc.elision.enable=0')
+
  def go_to_main():
      """Executes a gdb 'start' command, which takes us to main."""