Catch exception on solib_svr4_r_ldsomap (was: Re: [RFC/PATCH] Extend gdb_core_cmd to allow "Cannot access memory..." messages)

  On Friday, March 27 2015, Pedro Alves wrote:

> On 03/25/2015 12:06 AM, Sergio Durigan Junior wrote:> Hi,
>>
>> While hacking the coredump_filter patch, I noticed that, when you load a
>> corefile on GDB and receive a "Cannot access memory at address..."
>> message, gdb_core_cmd will fail and return -1, which means that some
>> fatal error happened.
>>
>> Unfortunately, this kind of message does not mean that the user cannot
>> continue debugging with the corefile; it meant that some memory region
>> (sometimes not important) was inaccessible.  Given that
>> gcore_create_callback, nowadays, will dump memory regions if they don't
>> have the 'read' permission set (but have any other permission set), this
>> kind of error can be expected sometimes.
>
> So, gdb itself errors and stops processing the core?

No, GDB does not "error and stop", but some testcases do that.  For
example, the proposed testcase for the coredump_filter feature has:

        ...
	set core_loaded [gdb_core_cmd "$core" "load core"]
	if { $core_loaded == -1 } {
	    fail "loading $core"
	    return
	}
        ...

On some corefiles generated by the test, GDB emits the "Cannot
access..." warning, which makes gdb_core_cmd return -1, and then this
specific test aborts.

>> I would like to propose this patch to be applied, which will allow the
>> test to continue even if this message is triggered.  I was not sure
>> wether I should use a "pass" or "xfail" there, so I chose the second
>> (which makes more sense to me).
>>
>
> Seems like you chose the first, not second.

You are right, I made a mistake when choosing the right patch to include
in the message, sorry about that.  Please, consider it as having a
"xfail" instead of a "pass".

>> WDYT?
>
> I think I don't understand.  :-)  Can you please show an
> example session?  Did GDB continue processing the core when
> it printed that error, or was it just a warning and it continued?

Sure, sorry for not sending the example session before!  Here is the
pertinent part:

  (gdb) core /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore
  [New LWP 28468]
  Cannot access memory at address 0x355fc21148
  Cannot access memory at address 0x355fc21140
  (gdb) FAIL: gdb.base/coredump-filter.exp: loading and testing corefile for non-Private-Anonymous: load core
  FAIL: gdb.base/coredump-filter.exp: loading and testing corefile for non-Private-Anonymous: loading /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore
  spawn /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../data-directory
  ...

GDB correctly loaded the corefile (despite the warnings), and the
debugging sessions could continue, but, because gdb_core_cmd returned
-1, the test was interrupted (and another one began).

> From your message and patch, I assume the former, otherwise
>
>  "Core was generated by .*\r\n$gdb_prompt $"
>
> would still match, right?  If it didn't match, why not?

It did not match because GDB did not generate this message.  I also
found this to be very strange, but I did not investigate further.  But
see below for more details.

> Your arguing that error "does not mean that the user cannot
> continue debugging with the corefile".  So why paper over
> this in the testsuite?  Saying "pass" in the testsuite when
> it was "error" for gdb seems like a conflict.

I totally agree.  As I explained above, it was a thinko: the patch
should really contain a "xfail" instead of a "pass".

> It seems like what
> we really should be discussing is whether that should be a
> fatal error for the "core" command.

And that's why I marked the patch as RFC :-).  I knew that maybe someone
would like to raise this question.

> But as said, we'll need to see a gdb log and understand better
> what gdb is doing to discuss this further.

Right, so I took some time and found the right fix, I think.  As we
agreed above, the fact that GDB is not printing the "Core was generated
by..." message is really strange, so I decided to investigate why it is
doing that.

The answer is that we are forgetting to check for an exception on
solib_svr4_r_ldsomap.  When loading the corefile, GDB calls this
function, which then calls read_memory_unsigned_integer, which throws an
error.  This error is not being caught by the function, so it propagates
until the main loop catches it.  The fix is obvious: we should catch
this regression and continue in the function.  With it, GDB now
correctly prints the "Core was generated by..." message, and the patch
to adjust gdb_core_cmd is no longer needed.

Regression-tested on Fedora 20 for x86_64, i686 and native-gdbserver.

Does that make more sense now?

Thanks,

Catch exception on solib_svr4_r_ldsomap (was: Re: [RFC/PATCH] Extend gdb_core_cmd to allow "Cannot access memory..." messages)

Commit Message

Comments

Patch