Fix stop-on-solib event failures

Message ID 20190814140638.15319-1-alan.hayward@arm.com
State New, archived
Headers

Commit Message

Alan Hayward Aug. 14, 2019, 2:06 p.m. UTC
  [ This patch replaces "[PATCH] testsuite: disable break-interp.exp for Arm
  buildbot". Instead of working around the issue, this fixes the issues. ]

On some Arm targets (namely the buildbot Arm Docker setup) placing breakpoints
on just the solib dynamic probes will cause the target process to not stop.
For stop-on-solib to work, a breakpoint also needs placing at the original
stop address.  This is due to bad placing of the probe addresses in the
linker.

This prevents over 100 timeouts when running gdb.base/break-interp.exp.

In addition, the gdb.base/break-interp.exp test has a extra step that is only
required for ppc64 targets.  However on Ubuntu X86 and AArch64, this causes
the program to now be stopped in an unknown location.  The fix here is to
ensure the ppc64 step is only run on ppc64 targets.  This fixes a long
standing issue.

gdb/ChangeLog:

2019-08-14  Alan Hayward  <alan.hayward@arm.com>

	* solib-svr4.c (svr4_create_solib_event_breakpoints): Always create
	original breakpoint.

gdb/testsuite/ChangeLog:

2019-08-14  Alan Hayward  <alan.hayward@arm.com>

	* gdb.base/break-interp.exp: Only run ppc64 step on ppc64.
---
 gdb/solib-svr4.c                        | 18 ++++++++----------
 gdb/testsuite/gdb.base/break-interp.exp |  9 ++++-----
 2 files changed, 12 insertions(+), 15 deletions(-)
  

Comments

Tom Tromey Aug. 15, 2019, 2:55 p.m. UTC | #1
>>>>> "Alan" == Alan Hayward <Alan.Hayward@arm.com> writes:

Alan> [ This patch replaces "[PATCH] testsuite: disable break-interp.exp for Arm
Alan>   buildbot". Instead of working around the issue, this fixes the issues. ]

Alan> On some Arm targets (namely the buildbot Arm Docker setup) placing breakpoints
Alan> on just the solib dynamic probes will cause the target process to not stop.
Alan> For stop-on-solib to work, a breakpoint also needs placing at the original
Alan> stop address.  This is due to bad placing of the probe addresses in the
Alan> linker.

My understanding is that the probe code is intended to be more efficient
than the old code.  Won't always installing the old-style breakpoint
eliminate the efficiency gain?

If the probes don't work properly on ARM, is there some way to detect that?
Why don't they work?

Alan> In addition, the gdb.base/break-interp.exp test has a extra step that is only
Alan> required for ppc64 targets.  However on Ubuntu X86 and AArch64, this causes
Alan> the program to now be stopped in an unknown location.  The fix here is to
Alan> ensure the ppc64 step is only run on ppc64 targets.  This fixes a long
Alan> standing issue.

This part seems fine.

Tom
  
Alan Hayward Aug. 16, 2019, 12:15 p.m. UTC | #2
> On 15 Aug 2019, at 15:55, Tom Tromey <tom@tromey.com> wrote:

> 

>>>>>> "Alan" == Alan Hayward <Alan.Hayward@arm.com> writes:

> 

> Alan> [ This patch replaces "[PATCH] testsuite: disable break-interp.exp for Arm

> Alan>   buildbot". Instead of working around the issue, this fixes the issues. ]

> 

> Alan> On some Arm targets (namely the buildbot Arm Docker setup) placing breakpoints

> Alan> on just the solib dynamic probes will cause the target process to not stop.

> Alan> For stop-on-solib to work, a breakpoint also needs placing at the original

> Alan> stop address.  This is due to bad placing of the probe addresses in the

> Alan> linker.

> 

> My understanding is that the probe code is intended to be more efficient

> than the old code.  Won't always installing the old-style breakpoint

> eliminate the efficiency gain?

> 


Right, ok, if that’s the case then yes I agree this patch isn’t right - we shouldn’t
be avoiding the optimisation to support broken probes.


> If the probes don't work properly on ARM, is there some way to detect that?

> Why don't they work?


I’ll investigate a little more to see if I can figure out why it’s wrong, and if we
can detect it.
I could add a UI command to toggle off the optimisation? Not sure I like that.

Failing all that I could go back to the previous "disable break-interp.exp for Arm”
patch.

> 

> Alan> In addition, the gdb.base/break-interp.exp test has a extra step that is only

> Alan> required for ppc64 targets.  However on Ubuntu X86 and AArch64, this causes

> Alan> the program to now be stopped in an unknown location.  The fix here is to

> Alan> ensure the ppc64 step is only run on ppc64 targets.  This fixes a long

> Alan> standing issue.

> 

> This part seems fine.


I’ll push this part.


> 

> Tom
  
Alan Hayward Aug. 16, 2019, 1:18 p.m. UTC | #3
> On 16 Aug 2019, at 13:15, Alan Hayward <Alan.Hayward@arm.com> wrote:

> 

> 

> 

>> On 15 Aug 2019, at 15:55, Tom Tromey <tom@tromey.com> wrote:

>> 

>>>>>>> "Alan" == Alan Hayward <Alan.Hayward@arm.com> writes:

>> 

>> Alan> [ This patch replaces "[PATCH] testsuite: disable break-interp.exp for Arm

>> Alan>   buildbot". Instead of working around the issue, this fixes the issues. ]

>> 

>> Alan> On some Arm targets (namely the buildbot Arm Docker setup) placing breakpoints

>> Alan> on just the solib dynamic probes will cause the target process to not stop.

>> Alan> For stop-on-solib to work, a breakpoint also needs placing at the original

>> Alan> stop address.  This is due to bad placing of the probe addresses in the

>> Alan> linker.

>> 

>> My understanding is that the probe code is intended to be more efficient

>> than the old code.  Won't always installing the old-style breakpoint

>> eliminate the efficiency gain?

>> 

> 

> Right, ok, if that’s the case then yes I agree this patch isn’t right - we shouldn’t

> be avoiding the optimisation to support broken probes.

> 

> 

>> If the probes don't work properly on ARM, is there some way to detect that?

>> Why don't they work?

> 

> I’ll investigate a little more to see if I can figure out why it’s wrong, and if we

> can detect it.

> I could add a UI command to toggle off the optimisation? Not sure I like that.

> 

> Failing all that I could go back to the previous "disable break-interp.exp for Arm”

> patch.

> 


Looking into this a little more, if I crtl-c during the timeouts, then continue again
I get the error "Probes-based dynamic linker interface failed.”

Doing a quick google I ended up here:
https://bugzilla.redhat.com/show_bug.cgi?id=1196181

Looks like this might be the same issue Sergio was looking into - the probes are
incorrect due to some glibc issue and labels getting removed.

It sounds to me that the way to stop stop-on-solib hanging is to process the locations
of the probe (eg - do all the stuff that will later cause the error message) before
the breakpoints are placed. If it fails, then just insert the old style breakpoint.

Adding Sergio onto the chain.

Sergio - are you still looking into these issues? Have you specifically seen my issue
about where break-inter.exp has lots of hangs? And are you still stuck for an Arm
machine - if you have an AArch64 box I can send you the docker instructions to get
you up and running with Arm on that.


Alan.
  
Sergio Durigan Junior Aug. 16, 2019, 5:02 p.m. UTC | #4
On Friday, August 16 2019, Alan Hayward wrote:

>> On 16 Aug 2019, at 13:15, Alan Hayward <Alan.Hayward@arm.com> wrote:
>> 
>> 
>> 
>>> On 15 Aug 2019, at 15:55, Tom Tromey <tom@tromey.com> wrote:
>>> 
>>>>>>>> "Alan" == Alan Hayward <Alan.Hayward@arm.com> writes:
>>> 
>>> Alan> [ This patch replaces "[PATCH] testsuite: disable break-interp.exp for Arm
>>> Alan>   buildbot". Instead of working around the issue, this fixes the issues. ]
>>> 
>>> Alan> On some Arm targets (namely the buildbot Arm Docker setup) placing breakpoints
>>> Alan> on just the solib dynamic probes will cause the target process to not stop.
>>> Alan> For stop-on-solib to work, a breakpoint also needs placing at the original
>>> Alan> stop address.  This is due to bad placing of the probe addresses in the
>>> Alan> linker.
>>> 
>>> My understanding is that the probe code is intended to be more efficient
>>> than the old code.  Won't always installing the old-style breakpoint
>>> eliminate the efficiency gain?
>>> 
>> 
>> Right, ok, if that’s the case then yes I agree this patch isn’t right - we shouldn’t
>> be avoiding the optimisation to support broken probes.
>> 
>> 
>>> If the probes don't work properly on ARM, is there some way to detect that?
>>> Why don't they work?
>> 
>> I’ll investigate a little more to see if I can figure out why it’s wrong, and if we
>> can detect it.
>> I could add a UI command to toggle off the optimisation? Not sure I like that.
>> 
>> Failing all that I could go back to the previous "disable break-interp.exp for Arm”
>> patch.
>> 

[ Adding Gary Benson, who wrote the Probes-based interface.  ]

> Looking into this a little more, if I crtl-c during the timeouts, then continue again
> I get the error "Probes-based dynamic linker interface failed.”
>
> Doing a quick google I ended up here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1196181

Thanks for investigating this.

Have you double checked to see if the probes have the same problem
describe in this bug?  I.e., the problematic probes contain arguments in
the form of:

  Provider: rtld, Name: init_complete, Args: '-4@.L1212 4@r4'
                                                 ^^^^^^

IOW, the argument contains a label where it should actually contain an
expression.

> Looks like this might be the same issue Sergio was looking into - the probes are
> incorrect due to some glibc issue and labels getting removed.
>
> It sounds to me that the way to stop stop-on-solib hanging is to process the locations
> of the probe (eg - do all the stuff that will later cause the error message) before
> the breakpoints are placed. If it fails, then just insert the old style breakpoint.

I have to take a look at the code, but it may be a good idea to try and
force the evaluation of the arguments before anything else, and to
revert to the old style breakpoints if anything fails.  ISTR that this
was already being done, but I may be wrong.

> Adding Sergio onto the chain.
>
> Sergio - are you still looking into these issues? Have you specifically seen my issue
> about where break-inter.exp has lots of hangs? And are you still stuck for an Arm
> machine - if you have an AArch64 box I can send you the docker instructions to get
> you up and running with Arm on that.

Well, the bug has apparently been fixed by Florian's commit to glibc,
and since we didn't get any other reports since then, I just assume(d)
it is/was fixed.

I agree it's a good idea to make GDB more resilient against these bugs.
As I said, my initial impression is that GDB can *already* cope with
problems when using the stap probes, but I will have to check more
thoroughly.

Thanks,
  
Alan Hayward Aug. 19, 2019, 9:48 a.m. UTC | #5
> On 16 Aug 2019, at 18:02, Sergio Durigan Junior <sergiodj@sergiodj.net> wrote:

> 

> On Friday, August 16 2019, Alan Hayward wrote:

> 

>>> On 16 Aug 2019, at 13:15, Alan Hayward <Alan.Hayward@arm.com> wrote:

>>> 

>>> 

>>> 

>>>> On 15 Aug 2019, at 15:55, Tom Tromey <tom@tromey.com> wrote:

>>>> 

>>>>>>>>> "Alan" == Alan Hayward <Alan.Hayward@arm.com> writes:

>>>> 

>>>> Alan> [ This patch replaces "[PATCH] testsuite: disable break-interp.exp for Arm

>>>> Alan>   buildbot". Instead of working around the issue, this fixes the issues. ]

>>>> 

>>>> Alan> On some Arm targets (namely the buildbot Arm Docker setup) placing breakpoints

>>>> Alan> on just the solib dynamic probes will cause the target process to not stop.

>>>> Alan> For stop-on-solib to work, a breakpoint also needs placing at the original

>>>> Alan> stop address.  This is due to bad placing of the probe addresses in the

>>>> Alan> linker.

>>>> 

>>>> My understanding is that the probe code is intended to be more efficient

>>>> than the old code.  Won't always installing the old-style breakpoint

>>>> eliminate the efficiency gain?

>>>> 

>>> 

>>> Right, ok, if that’s the case then yes I agree this patch isn’t right - we shouldn’t

>>> be avoiding the optimisation to support broken probes.

>>> 

>>> 

>>>> If the probes don't work properly on ARM, is there some way to detect that?

>>>> Why don't they work?

>>> 

>>> I’ll investigate a little more to see if I can figure out why it’s wrong, and if we

>>> can detect it.

>>> I could add a UI command to toggle off the optimisation? Not sure I like that.

>>> 

>>> Failing all that I could go back to the previous "disable break-interp.exp for Arm”

>>> patch.

>>> 

> 

> [ Adding Gary Benson, who wrote the Probes-based interface.  ]

> 

>> Looking into this a little more, if I crtl-c during the timeouts, then continue again

>> I get the error "Probes-based dynamic linker interface failed.”

>> 

>> Doing a quick google I ended up here:

>> https://bugzilla.redhat.com/show_bug.cgi?id=1196181

> 

> Thanks for investigating this.

> 

> Have you double checked to see if the probes have the same problem

> describe in this bug?  I.e., the problematic probes contain arguments in

> the form of:

> 

>  Provider: rtld, Name: init_complete, Args: '-4@.L1212 4@r4'

>                                                 ^^^^^^

> 



Yes, it’s looks like it’s the same issue:

    Provider: rtld
    Name: init_start
    Location: 0x00002a44, Base: 0x00017b9c, Semaphore: 0x00000000
    Arguments: -4@.L1204 4@[r7, #52]


> IOW, the argument contains a label where it should actually contain an

> expression.

> 

>> Looks like this might be the same issue Sergio was looking into - the probes are

>> incorrect due to some glibc issue and labels getting removed.

>> 

>> It sounds to me that the way to stop stop-on-solib hanging is to process the locations

>> of the probe (eg - do all the stuff that will later cause the error message) before

>> the breakpoints are placed. If it fails, then just insert the old style breakpoint.

> 

> I have to take a look at the code, but it may be a good idea to try and

> force the evaluation of the arguments before anything else, and to

> revert to the old style breakpoints if anything fails.  ISTR that this

> was already being done, but I may be wrong.

> 

>> Adding Sergio onto the chain.

>> 

>> Sergio - are you still looking into these issues? Have you specifically seen my issue

>> about where break-inter.exp has lots of hangs? And are you still stuck for an Arm

>> machine - if you have an AArch64 box I can send you the docker instructions to get

>> you up and running with Arm on that.

> 

> Well, the bug has apparently been fixed by Florian's commit to glibc,

> and since we didn't get any other reports since then, I just assume(d)

> it is/was fixed.


Looks like I’ve got an older version. The bug report says fixed in glibc-2.29-3.

apt-cache says I’m using 2.27-3ubuntu1.


Oddly, my other Arm setup is Fedora27.
On here, it looks like I have the same issue:

    Provider: rtld
    Name: init_start
    Location: 0x00003c58, Base: 0x00022bbc, Semaphore: 0x00000000
    Arguments: -4@.L1196 4@r4

And glibc 2.26
...But on this setup, the probes work fine.


> 

> I agree it's a good idea to make GDB more resilient against these bugs.

> As I said, my initial impression is that GDB can *already* cope with

> problems when using the stap probes, but I will have to check more

> thoroughly.

> 


I’ll debug this a little further - see if I can get gdb to detect the issue
earlier.



> Thanks,

> 

> -- 

> Sergio

> GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36

> Please send encrypted e-mail if possible

> http://sergiodj.net/
  

Patch

diff --git a/gdb/solib-svr4.c b/gdb/solib-svr4.c
index c0c505acaa..d755832bbf 100644
--- a/gdb/solib-svr4.c
+++ b/gdb/solib-svr4.c
@@ -2066,20 +2066,20 @@  svr4_create_probe_breakpoints (svr4_info *info, struct gdbarch *gdbarch,
    purpose of this method is to allow debuggers to set a breakpoint so
    they can track these changes.
 
-   Some versions of the glibc dynamic linker contain named probes
-   to allow more fine grained stopping.  Given the address of the
-   original marker function, this function attempts to find these
-   probes, and if found, sets breakpoints on those instead.  If the
-   probes aren't found, a single breakpoint is set on the original
-   marker function.  */
+   A single breakpoint will be placed on the given ADDRESS.
+
+   In addition, some versions of the glibc dynamic linker contain named probes
+   to allow more fine grained stopping.  Given the address of the original
+   marker function, this function attempts to find these probes, and if found,
+   sets breakpoints on those as well.  */
 
 static void
 svr4_create_solib_event_breakpoints (svr4_info *info, struct gdbarch *gdbarch,
 				     CORE_ADDR address)
 {
-  struct obj_section *os;
+  create_solib_event_breakpoint (gdbarch, address);
 
-  os = find_pc_section (address);
+  struct obj_section *os = find_pc_section (address);
   if (os != NULL)
     {
       int with_prefix;
@@ -2143,8 +2143,6 @@  svr4_create_solib_event_breakpoints (svr4_info *info, struct gdbarch *gdbarch,
 	    return;
 	}
     }
-
-  create_solib_event_breakpoint (gdbarch, address);
 }
 
 /* Helper function for gdb_bfd_lookup_symbol.  */
diff --git a/gdb/testsuite/gdb.base/break-interp.exp b/gdb/testsuite/gdb.base/break-interp.exp
index d6da653529..5caa5ecdf3 100644
--- a/gdb/testsuite/gdb.base/break-interp.exp
+++ b/gdb/testsuite/gdb.base/break-interp.exp
@@ -131,7 +131,8 @@  proc reach_1 {func command displacement} {
 
     set test "reach"
     set test_displacement "seen displacement message as $displacement"
-    set debug_state_count 0
+    # First stop does not yet relocate the _start descriptor on ppc64.
+    set debug_state_count [istarget "powerpc64-*-linux*"]
     gdb_test_multiple $command $test {
 	-re "Using PIE \\(Position Independent Executable\\) displacement (0x\[0-9a-f\]+) " {
 	    # Missing "$gdb_prompt $" is intentional.
@@ -164,10 +165,8 @@  proc reach_1 {func command displacement} {
 	}
 	-re "Stopped due to (spurious )?shared library event.*\r\n$gdb_prompt $" {
 	    if {$func == $solib_bp} {
-		if {$debug_state_count == 0} {
-		    # First stop does not yet relocate the _start function
-		    # descriptor on ppc64.
-		    set debug_state_count 1
+		if {$debug_state_count == 1} {
+		    set debug_state_count 0
 		    send_gdb "continue\n"
 		    exp_continue
 		} else {