[0/3] Fix issues with writing Linux core PRSTATUS note on MIPS o32, n32 and n64 into core file

Message ID 64ad38a4-b8ae-912e-45d6-7048135ada2e@rt-rk.com
State Not applicable
Headers

Commit Message

Djordje Todorovic Oct. 26, 2017, 11:21 a.m. UTC
  Hi Maciej,

>  Thank you.  There's still a small nit remaining WRT 1/3, but as I have
> noted in a separate reply I'll handle it myself when committing.

Thanks a lot! I really appreciate it!

>> Regarding patch 3/3, as Pedro said, currently “thread” function has a 
>> piece of code that is never going to be executed since core file is 
>> going to be generated as soon as the first thread reaches the thread 
>> function.  So, since we are testing fetching value of TLS variable from 
>> core file, creating just a one thread in test case would be enough and 
>> regarding that, I have cut unnecessary code from the function.
>
>  Hmm, to understand how exactly the bug triggers (as you may recall long
> ago I observed GDB prefers LWPID over PID if available) I ran your test
> case with the code changes removed and the test has still passed across
> the three ABIs.  So it does not actually cover the case concerned here.  
> I still would like to keep it to verify the consistency between core files
> written and read (following the various issues discovered in the course of
> this review, including the n32 BFD fix I have cc-ed you on lately),
> however that brings the question again about how you originally observed
> the problem you've addressed.  Is there another test scenario that can be
> created?

Thanks a lot, I saw that n32 BFD fix, and it is great!

Actually, I have faced it when I tried to read value of TLS variable on MIPS platforms (native).  I agree with you that GDB prefers LWPID over PID if available, but with no PID from core file GDB, at least on my boards, can not read value of TLS variable.  I have tried to do some observation on x86_64 native platform and commented piece of code for fetching PID from core file like this:

 
and rebuilt GDB (with "make clean" and "make"), the test case was FAIL:

(gdb) file /home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/tls-core
Reading symbols from /home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/tls-core...done.
(gdb) core /home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/gcore.test
[New LWP 12556]
[New LWP 12552]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/tls-core'.
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0 thread_proc (arg=0x0) at /home/rtrk/gdb/build_native/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.threads/tls-core.c:25
25 return arg;
[Current thread is 1 (LWP 12556)]
(gdb) PASS: gdb.threads/tls-core.exp: load generated corefile
p/x foo
Cannot find user-level thread for LWP 12556: generic error

When I reverted this observation changes and rebuilt it again, GDB was able to read value of TLS variable:

(gdb) file /home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/tls-core
Reading symbols from /home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/tls-core...done.
(gdb) core /home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/gcore.test
[New LWP 11099]
[New LWP 11095]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/tls-core'.
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0 thread_proc (arg=0x0) at /home/rtrk/gdb/build_native/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.threads/tls-core.c:25
25 return arg;
[Current thread is 1 (Thread 0x7ffff77ef700 (LWP 11099))]
(gdb) PASS: gdb.threads/tls-core.exp: load generated corefile
p/x foo
$1 = 0xdeadbeef

Exactly the same situation I have reproduced on MIPS platforms. Have I missed something?


Regarding fetching value of TLS variable with cross or Multiarch GDB, we actually had proposal for resolving that issue in GDB, where we proposed that GDB can have its own "libthread_db" functions for getting TLS in case of cross debugging, where it is going to "simulate" "target" architecture, but that code would be hard to maintain, because every changes in "libthread_db" functions regarding particular architecture in GLIBC would take also changes in GDB, so it was not accepted by community, reasonably (please briefly see that thread: http://sourceware-org.1504.n7.nabble.com/PATCH-TLS-access-support-in-cross-Linux-GDB-td452155.html). But yes, I also think in those cases, test should be marked as known failure.

Thanks,
Djordje
  

Comments

Maciej W. Rozycki Oct. 30, 2017, 1:36 p.m. UTC | #1
Hi Djordje,

> >  Hmm, to understand how exactly the bug triggers (as you may recall long
> > ago I observed GDB prefers LWPID over PID if available) I ran your test
> > case with the code changes removed and the test has still passed across
> > the three ABIs.  So it does not actually cover the case concerned here.  
> > I still would like to keep it to verify the consistency between core files
> > written and read (following the various issues discovered in the course of
> > this review, including the n32 BFD fix I have cc-ed you on lately),
> > however that brings the question again about how you originally observed
> > the problem you've addressed.  Is there another test scenario that can be
> > created?
> 
> Thanks a lot, I saw that n32 BFD fix, and it is great!
> 
> Actually, I have faced it when I tried to read value of TLS variable on MIPS platforms (native).  I agree with you that GDB prefers LWPID over PID if available, but with no PID from core file GDB, at least on my boards, can not read value of TLS variable.  I have tried to do some observation on x86_64 native platform and commented piece of code for fetching PID from core file like this:
> 
> diff --git a/bfd/elf64-x86-64.c b/bfd/elf64-x86-64.c
> index 6a5159d..9ee9231 100644
> --- a/bfd/elf64-x86-64.c
> +++ b/bfd/elf64-x86-64.c
> @@ -415,8 +415,8 @@ elf_x86_64_grok_psinfo (bfd *abfd, Elf_Internal_Note *note)
>         break;
> 
>       case 136:                /* sizeof(struct elf_prpsinfo) on Linux/x86_64 */
> -       elf_tdata (abfd)->core->pid
> -         = bfd_get_32 (abfd, note->descdata + 24);
> +       //elf_tdata (abfd)->core->pid
> +       //  = bfd_get_32 (abfd, note->descdata + 24);
>         elf_tdata (abfd)->core->program
>          = _bfd_elfcore_strndup (abfd, note->descdata + 40, 16);
>         elf_tdata (abfd)->core->command
>  
> and rebuilt GDB (with "make clean" and "make"), the test case was FAIL:
> 
> (gdb) file /home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/tls-core
> Reading symbols from /home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/tls-core...done.
> (gdb) core /home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/gcore.test
> [New LWP 12556]
> [New LWP 12552]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/tls-core'.
> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
> #0 thread_proc (arg=0x0) at /home/rtrk/gdb/build_native/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.threads/tls-core.c:25
> 25 return arg;
> [Current thread is 1 (LWP 12556)]
> (gdb) PASS: gdb.threads/tls-core.exp: load generated corefile
> p/x foo
> Cannot find user-level thread for LWP 12556: generic error
> 
> When I reverted this observation changes and rebuilt it again, GDB was 
> able to read value of TLS variable:
> 
> (gdb) file /home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/tls-core
> Reading symbols from /home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/tls-core...done.
> (gdb) core /home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/gcore.test
> [New LWP 11099]
> [New LWP 11095]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/home/rtrk/gdb/build_native/gdb/testsuite/outputs/gdb.threads/tls-core/tls-core'.
> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
> #0 thread_proc (arg=0x0) at /home/rtrk/gdb/build_native/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.threads/tls-core.c:25
> 25 return arg;
> [Current thread is 1 (Thread 0x7ffff77ef700 (LWP 11099))]
> (gdb) PASS: gdb.threads/tls-core.exp: load generated corefile
> p/x foo
> $1 = 0xdeadbeef
> 
> Exactly the same situation I have reproduced on MIPS platforms. Have I 
> missed something?

 I'll see if I can reproduce it with the x86-64 target and that may help 
me understand why I don't see it with the MIPS one.

> Regarding fetching value of TLS variable with cross or Multiarch GDB, we 
> actually had proposal for resolving that issue in GDB, where we proposed 
> that GDB can have its own "libthread_db" functions for getting TLS in 
> case of cross debugging, where it is going to "simulate" "target" 
> architecture, but that code would be hard to maintain, because every 
> changes in "libthread_db" functions regarding particular architecture in 
> GLIBC would take also changes in GDB, so it was not accepted by 
> community, reasonably (please briefly see that thread: 
> http://sourceware-org.1504.n7.nabble.com/PATCH-TLS-access-support-in-cross-Linux-GDB-td452155.html). 
> But yes, I also think in those cases, test should be marked as known 
> failure.

 Thank you for the reference, and indeed in this case, with the problem 
being dealt with already, I'm even more confident that having it recorded 
as a known failure is the right way.  But that needs an existing issue 
report in our Bugzilla to refer to.  As you have been handling this 
already can you file one please?

 Once we have an issue number, we can then:

    if [is_remote target] {
	setup_kfail "gdb/..." "*-*-*"
    }

ahead of the offending case.  Overall we must never gratuitously cause new 
FAILs to appear in testing with new test cases -- either the causing issue 
should be fixed, or the problem KFAILed or XFAILed as appropriate, 
depending on the circumstances.

 I'm off throughout this week and this is my last reply in this thread 
until I am back.  Can you please use the time to enter the bug report, and 
can you also tell me if your copyright assignment has been sorted out with 
FSF?

  Maciej
  
Maciej W. Rozycki Oct. 30, 2017, 2:08 p.m. UTC | #2
On Mon, 30 Oct 2017, Maciej W. Rozycki wrote:

>  Once we have an issue number, we can then:
> 
>     if [is_remote target] {
> 	setup_kfail "gdb/..." "*-*-*"
>     }
> 
> ahead of the offending case.

 Or:

    if {![string match $host_triplet $target_triplet]} {
	setup_kfail "gdb/..." "*-*-*"
    }

really (`isnative' doesn't help here as it's too restrictive: we could 
have a native GDB built on a different system; this is actually what I use 
as I don't currently have an adequate version of native MIPS GCC set up).

  Maciej
  
Djordje Todorovic Nov. 3, 2017, 1:05 p.m. UTC | #3
Hi Maciej,

>Can you please use the time to enter the bug report, and 
>can you also tell me if your copyright assignment has been sorted out with 
>FSF?

The bug report is created and it can be found on the following link:
https://sourceware.org/bugzilla/show_bug.cgi?id=22381

Also, the copyright assignment with FSF has been sorted out.

>    if {![string match $host_triplet $target_triplet]} {
>	setup_kfail "gdb/..." "*-*-*"
>    }
>
>really (`isnative' doesn't help here as it's too restrictive: we could 
>have a native GDB built on a different system; this is actually what I use 
>as I don't currently have an adequate version of native MIPS GCC set up).

I agree with you that this is more appropriate way to do it.

So, could you please, tell me do you want from me to resubmit the whole patch series
or just the last one?

Thanks,
Djordje
  
Maciej W. Rozycki Nov. 7, 2017, 9:29 p.m. UTC | #4
On Mon, 30 Oct 2017, Maciej W. Rozycki wrote:

> > Exactly the same situation I have reproduced on MIPS platforms. Have I 
> > missed something?
> 
>  I'll see if I can reproduce it with the x86-64 target and that may help 
> me understand why I don't see it with the MIPS one.

 So I have looked into it now and tracked down `libthread_db' rather 
than GDB to be the component requiring PID retrieval from a core file 
for TLS access to work, and then only before glibc commit c579f48edba8 
("Remove cached PID/TID in clone"), which was first included in 2.25 
glibc release.

 Given I have been using recent glibc checkouts for MIPS verification I 
avoided the requirement, but I was indeed able to reproduce it natively 
with x86-64 and the system-supplied `libthread_db', and then with my 
MIPS environment as well, once I went with my glibc build back to commit 
c579f48edba8^.

 I will be adding a reference to said glibc commit when pushing your 2/3 
change.

  Maciej
  
Maciej W. Rozycki Nov. 7, 2017, 9:58 p.m. UTC | #5
Hi Djordje,

> >Can you please use the time to enter the bug report, and 
> >can you also tell me if your copyright assignment has been sorted out with 
> >FSF?
> 
> The bug report is created and it can be found on the following link:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22381

 Thank you, I have updated it with some pointers, based on your earlier 
reference.

> Also, the copyright assignment with FSF has been sorted out.

 Great!

> >    if {![string match $host_triplet $target_triplet]} {
> >	setup_kfail "gdb/..." "*-*-*"
> >    }
> >
> >really (`isnative' doesn't help here as it's too restrictive: we could 
> >have a native GDB built on a different system; this is actually what I use 
> >as I don't currently have an adequate version of native MIPS GCC set up).
> 
> I agree with you that this is more appropriate way to do it.
> 
> So, could you please, tell me do you want from me to resubmit the whole patch series
> or just the last one?

 I think there is no point in resending, as long as Pedro is fine with 
and approves your most recent version of the test case patch posted (NB 
it helps if on resubmission you number patch revisions in the subject 
line like [PATCH v3 2/3]).  Pedro?

 Once we've got the final ack from Pedro, I'll proceed with the right 
steps to push your changes, with the tweaks noted in the final review 
round applied.  I'll post the actual patches committed, for posterity.

  Maciej
  
Pedro Alves Nov. 8, 2017, 9:56 a.m. UTC | #6
On 11/07/2017 09:29 PM, Maciej W. Rozycki wrote:
> On Mon, 30 Oct 2017, Maciej W. Rozycki wrote:
> 
>>> Exactly the same situation I have reproduced on MIPS platforms. Have I 
>>> missed something?
>>
>>  I'll see if I can reproduce it with the x86-64 target and that may help 
>> me understand why I don't see it with the MIPS one.
> 
>  So I have looked into it now and tracked down `libthread_db' rather 
> than GDB to be the component requiring PID retrieval from a core file 
> for TLS access to work, and then only before glibc commit c579f48edba8 
> ("Remove cached PID/TID in clone"), which was first included in 2.25 
> glibc release.
> 
>  Given I have been using recent glibc checkouts for MIPS verification I 
> avoided the requirement, but I was indeed able to reproduce it natively 
> with x86-64 and the system-supplied `libthread_db', and then with my 
> MIPS environment as well, once I went with my glibc build back to commit 
> c579f48edba8^.
> 
>  I will be adding a reference to said glibc commit when pushing your 2/3 
> change.

Interesting, hadn't realized libthread_db stopped requiring this.
I wonder whether it'd be a good idea to mention it in the code itself 
too, say, in proc-service.:ps_getpid.

Thanks,
Pedro Alves
  
Pedro Alves Nov. 8, 2017, 10:10 a.m. UTC | #7
On 11/07/2017 09:58 PM, Maciej W. Rozycki wrote:

>> So, could you please, tell me do you want from me to resubmit the whole patch series
>> or just the last one?
> 
>  I think there is no point in resending, as long as Pedro is fine with 
> and approves your most recent version of the test case patch posted 

...

(NB it helps if on resubmission you number patch revisions in the subject
> line like [PATCH v3 2/3]).  Pedro?

Agreed.  And threading the patch series emails under the cover letter
also helps a lot.  In fact, I'm a bit lost and am not sure which is the
last version of the test patch that I should be looking at. :-P  :-)

I think... this one:
  https://sourceware.org/ml/gdb-patches/2017-10/msg00770.html
?

That looks fine to me.

> 
>  Once we've got the final ack from Pedro, I'll proceed with the right 
> steps to push your changes, with the tweaks noted in the final review 
> round applied.  I'll post the actual patches committed, for posterity.

Thanks,
Pedro Alves
  
Maciej W. Rozycki Nov. 8, 2017, 6:39 p.m. UTC | #8
On Wed, 8 Nov 2017, Pedro Alves wrote:

> >  So I have looked into it now and tracked down `libthread_db' rather 
> > than GDB to be the component requiring PID retrieval from a core file 
> > for TLS access to work, and then only before glibc commit c579f48edba8 
> > ("Remove cached PID/TID in clone"), which was first included in 2.25 
> > glibc release.
> > 
> >  Given I have been using recent glibc checkouts for MIPS verification I 
> > avoided the requirement, but I was indeed able to reproduce it natively 
> > with x86-64 and the system-supplied `libthread_db', and then with my 
> > MIPS environment as well, once I went with my glibc build back to commit 
> > c579f48edba8^.
> > 
> >  I will be adding a reference to said glibc commit when pushing your 2/3 
> > change.
> 
> Interesting, hadn't realized libthread_db stopped requiring this.
> I wonder whether it'd be a good idea to mention it in the code itself 
> too, say, in proc-service.:ps_getpid.

 This might not be the best place.  Calls to `ps_getpid' are still made 
from places throughout `libthread_db', and it's only one from 
`td_ta_thr_iter' (which matters here) whose result is discarded, by only 
passing it down to `iterate_thread_list' to become its ignored 
`match_pid' argument.  This looks like an oversight of some kind to me, 
which Adhemerval might be able to shed some light on.

 Adhemerval, in your commit referred above, corresponding to the change 
held at: <https://sourceware.org/ml/libc-alpha/2016-11/msg00282.html>, 
you made the `match_pid' argument of `iterate_thread_list' unused.  The 
function is static, called from `td_ta_thr_iter' only (and I actually 
wonder why the build does not trip on an unused argument here).

 Is it OK then to remove the argument then along with the `ps_getpid' 
call in `td_ta_thr_iter' used to set the corresponding parameter, or is 
there something missing from there you intended to do and `match_pid' 
was meant to remain used?

 NB I would post a clean-up proposal, but regrettably my company FSF 
copyright assignment status is in a limbo state right now, after the 
transition from Imagination to the new MIPS company, and I'd rather not 
block the cleanup by posting a patch that cannot move forward.  Or would 
it be small enough to qualify as not needing an assignment?  Hmm...

 As to choosing the right place to comment on this `libthread_db' 
semantics change, I think either the call to `bfd_core_file_pid' made 
from `add_to_thread_list' in corelow.c or the heading of the latter 
function might be better, as this is where the consumer of BFD's core 
file data PID is.  Adding a comment like this might be small/trivial 
enough for me to sneak in while the FSF paperwork is still pending.

  Maciej
  
Maciej W. Rozycki Nov. 8, 2017, 9:22 p.m. UTC | #9
On Wed, 8 Nov 2017, Pedro Alves wrote:

> >  I think there is no point in resending, as long as Pedro is fine with 
> > and approves your most recent version of the test case patch posted 
> 
> ...
> 
> (NB it helps if on resubmission you number patch revisions in the subject
> > line like [PATCH v3 2/3]).  Pedro?
> 
> Agreed.  And threading the patch series emails under the cover letter
> also helps a lot.  In fact, I'm a bit lost and am not sure which is the
> last version of the test patch that I should be looking at. :-P  :-)
> 
> I think... this one:
>   https://sourceware.org/ml/gdb-patches/2017-10/msg00770.html
> ?

 Yes.  Apologies for not pointing you explicitly at it.

> That looks fine to me.

 Great.  In reply to this message I'm following up with the final 
versions committed.  These are v7 according to my notes.

  Maciej
  
Djordje Todorovic Nov. 9, 2017, 10:32 a.m. UTC | #10
Hi Maciej,

Thanks for committing this, I appreciate it.

Best regards,
Djordje
  
Maciej W. Rozycki Nov. 9, 2017, 10:39 p.m. UTC | #11
Hi Djordje,

> Thanks for committing this, I appreciate it.

 You are welcome, it is my duty after all.

  Maciej
  

Patch

diff --git a/bfd/elf64-x86-64.c b/bfd/elf64-x86-64.c
index 6a5159d..9ee9231 100644
--- a/bfd/elf64-x86-64.c
+++ b/bfd/elf64-x86-64.c
@@ -415,8 +415,8 @@  elf_x86_64_grok_psinfo (bfd *abfd, Elf_Internal_Note *note)
        break;

      case 136:                /* sizeof(struct elf_prpsinfo) on Linux/x86_64 */
-       elf_tdata (abfd)->core->pid
-         = bfd_get_32 (abfd, note->descdata + 24);
+       //elf_tdata (abfd)->core->pid
+       //  = bfd_get_32 (abfd, note->descdata + 24);
        elf_tdata (abfd)->core->program
         = _bfd_elfcore_strndup (abfd, note->descdata + 40, 16);
        elf_tdata (abfd)->core->command