Catch exception on solib_svr4_r_ldsomap (was: Re: [RFC/PATCH] Extend gdb_core_cmd to allow "Cannot access memory..." messages)

Message ID 87k2y2qdbu.fsf_-_@redhat.com
State New, archived
Headers

Commit Message

Sergio Durigan Junior March 27, 2015, 10:34 p.m. UTC
  On Friday, March 27 2015, Pedro Alves wrote:

> On 03/25/2015 12:06 AM, Sergio Durigan Junior wrote:> Hi,
>>
>> While hacking the coredump_filter patch, I noticed that, when you load a
>> corefile on GDB and receive a "Cannot access memory at address..."
>> message, gdb_core_cmd will fail and return -1, which means that some
>> fatal error happened.
>>
>> Unfortunately, this kind of message does not mean that the user cannot
>> continue debugging with the corefile; it meant that some memory region
>> (sometimes not important) was inaccessible.  Given that
>> gcore_create_callback, nowadays, will dump memory regions if they don't
>> have the 'read' permission set (but have any other permission set), this
>> kind of error can be expected sometimes.
>
> So, gdb itself errors and stops processing the core?

No, GDB does not "error and stop", but some testcases do that.  For
example, the proposed testcase for the coredump_filter feature has:

        ...
	set core_loaded [gdb_core_cmd "$core" "load core"]
	if { $core_loaded == -1 } {
	    fail "loading $core"
	    return
	}
        ...

On some corefiles generated by the test, GDB emits the "Cannot
access..." warning, which makes gdb_core_cmd return -1, and then this
specific test aborts.

>> I would like to propose this patch to be applied, which will allow the
>> test to continue even if this message is triggered.  I was not sure
>> wether I should use a "pass" or "xfail" there, so I chose the second
>> (which makes more sense to me).
>>
>
> Seems like you chose the first, not second.

You are right, I made a mistake when choosing the right patch to include
in the message, sorry about that.  Please, consider it as having a
"xfail" instead of a "pass".

>> WDYT?
>
> I think I don't understand.  :-)  Can you please show an
> example session?  Did GDB continue processing the core when
> it printed that error, or was it just a warning and it continued?

Sure, sorry for not sending the example session before!  Here is the
pertinent part:

  (gdb) core /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore
  [New LWP 28468]
  Cannot access memory at address 0x355fc21148
  Cannot access memory at address 0x355fc21140
  (gdb) FAIL: gdb.base/coredump-filter.exp: loading and testing corefile for non-Private-Anonymous: load core
  FAIL: gdb.base/coredump-filter.exp: loading and testing corefile for non-Private-Anonymous: loading /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore
  spawn /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../data-directory
  ...

GDB correctly loaded the corefile (despite the warnings), and the
debugging sessions could continue, but, because gdb_core_cmd returned
-1, the test was interrupted (and another one began).

> From your message and patch, I assume the former, otherwise
>
>  "Core was generated by .*\r\n$gdb_prompt $"
>
> would still match, right?  If it didn't match, why not?

It did not match because GDB did not generate this message.  I also
found this to be very strange, but I did not investigate further.  But
see below for more details.

> Your arguing that error "does not mean that the user cannot
> continue debugging with the corefile".  So why paper over
> this in the testsuite?  Saying "pass" in the testsuite when
> it was "error" for gdb seems like a conflict.

I totally agree.  As I explained above, it was a thinko: the patch
should really contain a "xfail" instead of a "pass".

> It seems like what
> we really should be discussing is whether that should be a
> fatal error for the "core" command.

And that's why I marked the patch as RFC :-).  I knew that maybe someone
would like to raise this question.

> But as said, we'll need to see a gdb log and understand better
> what gdb is doing to discuss this further.

Right, so I took some time and found the right fix, I think.  As we
agreed above, the fact that GDB is not printing the "Core was generated
by..." message is really strange, so I decided to investigate why it is
doing that.

The answer is that we are forgetting to check for an exception on
solib_svr4_r_ldsomap.  When loading the corefile, GDB calls this
function, which then calls read_memory_unsigned_integer, which throws an
error.  This error is not being caught by the function, so it propagates
until the main loop catches it.  The fix is obvious: we should catch
this regression and continue in the function.  With it, GDB now
correctly prints the "Core was generated by..." message, and the patch
to adjust gdb_core_cmd is no longer needed.

Regression-tested on Fedora 20 for x86_64, i686 and native-gdbserver.

Does that make more sense now?

Thanks,
  

Comments

Pedro Alves March 31, 2015, 11:30 a.m. UTC | #1
On 03/27/2015 10:34 PM, Sergio Durigan Junior wrote:
> On Friday, March 27 2015, Pedro Alves wrote:
> 
>> On 03/25/2015 12:06 AM, Sergio Durigan Junior wrote:> Hi,
>>>
>>> While hacking the coredump_filter patch, I noticed that, when you load a
>>> corefile on GDB and receive a "Cannot access memory at address..."
>>> message, gdb_core_cmd will fail and return -1, which means that some
>>> fatal error happened.
>>>
>>> Unfortunately, this kind of message does not mean that the user cannot
>>> continue debugging with the corefile; it meant that some memory region
>>> (sometimes not important) was inaccessible.  Given that
>>> gcore_create_callback, nowadays, will dump memory regions if they don't
>>> have the 'read' permission set (but have any other permission set), this
>>> kind of error can be expected sometimes.
>>
>> So, gdb itself errors and stops processing the core?
> 
> No, GDB does not "error and stop", but some testcases do that.  

Well, it clearly does do that.   Hence your new patch.  :-)

>> I think I don't understand.  :-)  Can you please show an
>> example session?  Did GDB continue processing the core when
>> it printed that error, or was it just a warning and it continued?
> 

I meant "Did GDB stop processing the core", of course.

> Sure, sorry for not sending the example session before!  Here is the
> pertinent part:
> 
>   (gdb) core /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore
>   [New LWP 28468]
>   Cannot access memory at address 0x355fc21148
>   Cannot access memory at address 0x355fc21140
>   (gdb) FAIL: gdb.base/coredump-filter.exp: loading and testing corefile for non-Private-Anonymous: load core
>   FAIL: gdb.base/coredump-filter.exp: loading and testing corefile for non-Private-Anonymous: loading /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore
>   spawn /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../data-directory
>   ...
> 
> GDB correctly loaded the corefile (despite the warnings), and the

The error made the rest of core_open be skipped: if the core is still
loaded, gdb is potentially in an inconsistent state at this point.
I'd think we should completely discard the core/target if something
errors out.  And then if we can be tolerant to specific parts of
loading a core failing, we should handle those before the error escapes
out of core_open.  We do something like that
already (note core_close_cleanup and the TRY/CATCH'S), but it's clearly
not complete.  After:

  push_target (&core_ops);
  discard_cleanups (old_chain);

... several things can throw and let an exception escape.

> Right, so I took some time and found the right fix, I think.  As we
> agreed above, the fact that GDB is not printing the "Core was generated
> by..." message is really strange, so I decided to investigate why it is
> doing that.
> 
> The answer is that we are forgetting to check for an exception on
> solib_svr4_r_ldsomap.  When loading the corefile, GDB calls this
> function, which then calls read_memory_unsigned_integer, which throws an
> error.  This error is not being caught by the function, so it propagates
> until the main loop catches it.  The fix is obvious: we should catch
> this regression and continue in the function.  With it, GDB now
> correctly prints the "Core was generated by..." message, and the patch
> to adjust gdb_core_cmd is no longer needed.
> 
> Regression-tested on Fedora 20 for x86_64, i686 and native-gdbserver.
> 
> Does that make more sense now?
> 

Yes, this is OK.

Thanks,
Pedro Alves
  
Sergio Durigan Junior March 31, 2015, 11:39 p.m. UTC | #2
On Tuesday, March 31 2015, Pedro Alves wrote:

>>> So, gdb itself errors and stops processing the core?
>> 
>> No, GDB does not "error and stop", but some testcases do that.  
>
> Well, it clearly does do that.   Hence your new patch.  :-)
>
>>> I think I don't understand.  :-)  Can you please show an
>>> example session?  Did GDB continue processing the core when
>>> it printed that error, or was it just a warning and it continued?
>> 
>
> I meant "Did GDB stop processing the core", of course.

Sorry I misunderstood your question.

>> Sure, sorry for not sending the example session before!  Here is the
>> pertinent part:
>> 
>>   (gdb) core /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore
>>   [New LWP 28468]
>>   Cannot access memory at address 0x355fc21148
>>   Cannot access memory at address 0x355fc21140
>>   (gdb) FAIL: gdb.base/coredump-filter.exp: loading and testing corefile for non-Private-Anonymous: load core
>>   FAIL: gdb.base/coredump-filter.exp: loading and testing corefile
>> for non-Private-Anonymous: loading
>> /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore
>>   spawn
>> /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../../gdb/gdb
>> -nw -nx -data-directory
>> /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../data-directory
>>   ...
>> 
>> GDB correctly loaded the corefile (despite the warnings), and the
>
> The error made the rest of core_open be skipped: if the core is still
> loaded, gdb is potentially in an inconsistent state at this point.
> I'd think we should completely discard the core/target if something
> errors out.  And then if we can be tolerant to specific parts of
> loading a core failing, we should handle those before the error escapes
> out of core_open.  We do something like that
> already (note core_close_cleanup and the TRY/CATCH'S), but it's clearly
> not complete.  After:
>
>   push_target (&core_ops);
>   discard_cleanups (old_chain);
>
> ... several things can throw and let an exception escape.

Yeah, certainly we should be catching those before they escape.  I will
see if I can take a look at more places to improve/fix later.

>> Right, so I took some time and found the right fix, I think.  As we
>> agreed above, the fact that GDB is not printing the "Core was generated
>> by..." message is really strange, so I decided to investigate why it is
>> doing that.
>> 
>> The answer is that we are forgetting to check for an exception on
>> solib_svr4_r_ldsomap.  When loading the corefile, GDB calls this
>> function, which then calls read_memory_unsigned_integer, which throws an
>> error.  This error is not being caught by the function, so it propagates
>> until the main loop catches it.  The fix is obvious: we should catch
>> this regression and continue in the function.  With it, GDB now
>> correctly prints the "Core was generated by..." message, and the patch
>> to adjust gdb_core_cmd is no longer needed.
>> 
>> Regression-tested on Fedora 20 for x86_64, i686 and native-gdbserver.
>> 
>> Does that make more sense now?
>> 
>
> Yes, this is OK.

Thanks, pushed:

  <https://sourceware.org/ml/gdb-cvs/2015-03/msg00274.html>
  416f679e68468ea6dd7384213994ce74201f82e7
  

Patch

diff --git a/gdb/solib-svr4.c b/gdb/solib-svr4.c
index 0cecc2a..dd93847 100644
--- a/gdb/solib-svr4.c
+++ b/gdb/solib-svr4.c
@@ -910,13 +910,22 @@  solib_svr4_r_ldsomap (struct svr4_info *info)
   struct link_map_offsets *lmo = svr4_fetch_link_map_offsets ();
   struct type *ptr_type = builtin_type (target_gdbarch ())->builtin_data_ptr;
   enum bfd_endian byte_order = gdbarch_byte_order (target_gdbarch ());
-  ULONGEST version;
+  ULONGEST version = 0;
+
+  TRY
+    {
+      /* Check version, and return zero if `struct r_debug' doesn't have
+	 the r_ldsomap member.  */
+      version
+	= read_memory_unsigned_integer (info->debug_base + lmo->r_version_offset,
+					lmo->r_version_size, byte_order);
+    }
+  CATCH (ex, RETURN_MASK_ERROR)
+    {
+      exception_print (gdb_stderr, ex);
+    }
+  END_CATCH
 
-  /* Check version, and return zero if `struct r_debug' doesn't have
-     the r_ldsomap member.  */
-  version
-    = read_memory_unsigned_integer (info->debug_base + lmo->r_version_offset,
-				    lmo->r_version_size, byte_order);
   if (version < 2 || lmo->r_ldsomap_offset == -1)
     return 0;