Catch exception on solib_svr4_r_ldsomap (was: Re: [RFC/PATCH] Extend gdb_core_cmd to allow "Cannot access memory..." messages)
Message ID | 87k2y2qdbu.fsf_-_@redhat.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 24785 invoked by alias); 27 Mar 2015 22:34:50 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 24752 invoked by uid 89); 27 Mar 2015 22:34:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, SPF_HELO_PASS, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Fri, 27 Mar 2015 22:34:48 +0000 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (Postfix) with ESMTPS id F025391C06 for <gdb-patches@sourceware.org>; Fri, 27 Mar 2015 22:34:46 +0000 (UTC) Received: from localhost (unused-10-15-17-126.yyz.redhat.com [10.15.17.126]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t2RMYkra026932 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA256 bits=256 verify=NO); Fri, 27 Mar 2015 18:34:46 -0400 From: Sergio Durigan Junior <sergiodj@redhat.com> To: Pedro Alves <palves@redhat.com> Cc: GDB Patches <gdb-patches@sourceware.org> Subject: [PATCH] Catch exception on solib_svr4_r_ldsomap (was: Re: [RFC/PATCH] Extend gdb_core_cmd to allow "Cannot access memory..." messages) References: <871tke2b6a.fsf@redhat.com> <55152B9F.4090606@redhat.com> X-URL: http://blog.sergiodj.net Date: Fri, 27 Mar 2015 18:34:45 -0400 In-Reply-To: <55152B9F.4090606@redhat.com> (Pedro Alves's message of "Fri, 27 Mar 2015 10:06:23 +0000") Message-ID: <87k2y2qdbu.fsf_-_@redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-IsSubscribed: yes |
Commit Message
Sergio Durigan Junior
March 27, 2015, 10:34 p.m. UTC
On Friday, March 27 2015, Pedro Alves wrote: > On 03/25/2015 12:06 AM, Sergio Durigan Junior wrote:> Hi, >> >> While hacking the coredump_filter patch, I noticed that, when you load a >> corefile on GDB and receive a "Cannot access memory at address..." >> message, gdb_core_cmd will fail and return -1, which means that some >> fatal error happened. >> >> Unfortunately, this kind of message does not mean that the user cannot >> continue debugging with the corefile; it meant that some memory region >> (sometimes not important) was inaccessible. Given that >> gcore_create_callback, nowadays, will dump memory regions if they don't >> have the 'read' permission set (but have any other permission set), this >> kind of error can be expected sometimes. > > So, gdb itself errors and stops processing the core? No, GDB does not "error and stop", but some testcases do that. For example, the proposed testcase for the coredump_filter feature has: ... set core_loaded [gdb_core_cmd "$core" "load core"] if { $core_loaded == -1 } { fail "loading $core" return } ... On some corefiles generated by the test, GDB emits the "Cannot access..." warning, which makes gdb_core_cmd return -1, and then this specific test aborts. >> I would like to propose this patch to be applied, which will allow the >> test to continue even if this message is triggered. I was not sure >> wether I should use a "pass" or "xfail" there, so I chose the second >> (which makes more sense to me). >> > > Seems like you chose the first, not second. You are right, I made a mistake when choosing the right patch to include in the message, sorry about that. Please, consider it as having a "xfail" instead of a "pass". >> WDYT? > > I think I don't understand. :-) Can you please show an > example session? Did GDB continue processing the core when > it printed that error, or was it just a warning and it continued? Sure, sorry for not sending the example session before! Here is the pertinent part: (gdb) core /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore [New LWP 28468] Cannot access memory at address 0x355fc21148 Cannot access memory at address 0x355fc21140 (gdb) FAIL: gdb.base/coredump-filter.exp: loading and testing corefile for non-Private-Anonymous: load core FAIL: gdb.base/coredump-filter.exp: loading and testing corefile for non-Private-Anonymous: loading /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore spawn /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../data-directory ... GDB correctly loaded the corefile (despite the warnings), and the debugging sessions could continue, but, because gdb_core_cmd returned -1, the test was interrupted (and another one began). > From your message and patch, I assume the former, otherwise > > "Core was generated by .*\r\n$gdb_prompt $" > > would still match, right? If it didn't match, why not? It did not match because GDB did not generate this message. I also found this to be very strange, but I did not investigate further. But see below for more details. > Your arguing that error "does not mean that the user cannot > continue debugging with the corefile". So why paper over > this in the testsuite? Saying "pass" in the testsuite when > it was "error" for gdb seems like a conflict. I totally agree. As I explained above, it was a thinko: the patch should really contain a "xfail" instead of a "pass". > It seems like what > we really should be discussing is whether that should be a > fatal error for the "core" command. And that's why I marked the patch as RFC :-). I knew that maybe someone would like to raise this question. > But as said, we'll need to see a gdb log and understand better > what gdb is doing to discuss this further. Right, so I took some time and found the right fix, I think. As we agreed above, the fact that GDB is not printing the "Core was generated by..." message is really strange, so I decided to investigate why it is doing that. The answer is that we are forgetting to check for an exception on solib_svr4_r_ldsomap. When loading the corefile, GDB calls this function, which then calls read_memory_unsigned_integer, which throws an error. This error is not being caught by the function, so it propagates until the main loop catches it. The fix is obvious: we should catch this regression and continue in the function. With it, GDB now correctly prints the "Core was generated by..." message, and the patch to adjust gdb_core_cmd is no longer needed. Regression-tested on Fedora 20 for x86_64, i686 and native-gdbserver. Does that make more sense now? Thanks,
Comments
On 03/27/2015 10:34 PM, Sergio Durigan Junior wrote: > On Friday, March 27 2015, Pedro Alves wrote: > >> On 03/25/2015 12:06 AM, Sergio Durigan Junior wrote:> Hi, >>> >>> While hacking the coredump_filter patch, I noticed that, when you load a >>> corefile on GDB and receive a "Cannot access memory at address..." >>> message, gdb_core_cmd will fail and return -1, which means that some >>> fatal error happened. >>> >>> Unfortunately, this kind of message does not mean that the user cannot >>> continue debugging with the corefile; it meant that some memory region >>> (sometimes not important) was inaccessible. Given that >>> gcore_create_callback, nowadays, will dump memory regions if they don't >>> have the 'read' permission set (but have any other permission set), this >>> kind of error can be expected sometimes. >> >> So, gdb itself errors and stops processing the core? > > No, GDB does not "error and stop", but some testcases do that. Well, it clearly does do that. Hence your new patch. :-) >> I think I don't understand. :-) Can you please show an >> example session? Did GDB continue processing the core when >> it printed that error, or was it just a warning and it continued? > I meant "Did GDB stop processing the core", of course. > Sure, sorry for not sending the example session before! Here is the > pertinent part: > > (gdb) core /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore > [New LWP 28468] > Cannot access memory at address 0x355fc21148 > Cannot access memory at address 0x355fc21140 > (gdb) FAIL: gdb.base/coredump-filter.exp: loading and testing corefile for non-Private-Anonymous: load core > FAIL: gdb.base/coredump-filter.exp: loading and testing corefile for non-Private-Anonymous: loading /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore > spawn /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../data-directory > ... > > GDB correctly loaded the corefile (despite the warnings), and the The error made the rest of core_open be skipped: if the core is still loaded, gdb is potentially in an inconsistent state at this point. I'd think we should completely discard the core/target if something errors out. And then if we can be tolerant to specific parts of loading a core failing, we should handle those before the error escapes out of core_open. We do something like that already (note core_close_cleanup and the TRY/CATCH'S), but it's clearly not complete. After: push_target (&core_ops); discard_cleanups (old_chain); ... several things can throw and let an exception escape. > Right, so I took some time and found the right fix, I think. As we > agreed above, the fact that GDB is not printing the "Core was generated > by..." message is really strange, so I decided to investigate why it is > doing that. > > The answer is that we are forgetting to check for an exception on > solib_svr4_r_ldsomap. When loading the corefile, GDB calls this > function, which then calls read_memory_unsigned_integer, which throws an > error. This error is not being caught by the function, so it propagates > until the main loop catches it. The fix is obvious: we should catch > this regression and continue in the function. With it, GDB now > correctly prints the "Core was generated by..." message, and the patch > to adjust gdb_core_cmd is no longer needed. > > Regression-tested on Fedora 20 for x86_64, i686 and native-gdbserver. > > Does that make more sense now? > Yes, this is OK. Thanks, Pedro Alves
On Tuesday, March 31 2015, Pedro Alves wrote: >>> So, gdb itself errors and stops processing the core? >> >> No, GDB does not "error and stop", but some testcases do that. > > Well, it clearly does do that. Hence your new patch. :-) > >>> I think I don't understand. :-) Can you please show an >>> example session? Did GDB continue processing the core when >>> it printed that error, or was it just a warning and it continued? >> > > I meant "Did GDB stop processing the core", of course. Sorry I misunderstood your question. >> Sure, sorry for not sending the example session before! Here is the >> pertinent part: >> >> (gdb) core /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore >> [New LWP 28468] >> Cannot access memory at address 0x355fc21148 >> Cannot access memory at address 0x355fc21140 >> (gdb) FAIL: gdb.base/coredump-filter.exp: loading and testing corefile for non-Private-Anonymous: load core >> FAIL: gdb.base/coredump-filter.exp: loading and testing corefile >> for non-Private-Anonymous: loading >> /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/gdb.base/non-private-anon.gcore >> spawn >> /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../../gdb/gdb >> -nw -nx -data-directory >> /home/sergio/work/src/git/binutils-gdb/rhbz1085906-coredump-filter/build-64-3/gdb/testsuite/../data-directory >> ... >> >> GDB correctly loaded the corefile (despite the warnings), and the > > The error made the rest of core_open be skipped: if the core is still > loaded, gdb is potentially in an inconsistent state at this point. > I'd think we should completely discard the core/target if something > errors out. And then if we can be tolerant to specific parts of > loading a core failing, we should handle those before the error escapes > out of core_open. We do something like that > already (note core_close_cleanup and the TRY/CATCH'S), but it's clearly > not complete. After: > > push_target (&core_ops); > discard_cleanups (old_chain); > > ... several things can throw and let an exception escape. Yeah, certainly we should be catching those before they escape. I will see if I can take a look at more places to improve/fix later. >> Right, so I took some time and found the right fix, I think. As we >> agreed above, the fact that GDB is not printing the "Core was generated >> by..." message is really strange, so I decided to investigate why it is >> doing that. >> >> The answer is that we are forgetting to check for an exception on >> solib_svr4_r_ldsomap. When loading the corefile, GDB calls this >> function, which then calls read_memory_unsigned_integer, which throws an >> error. This error is not being caught by the function, so it propagates >> until the main loop catches it. The fix is obvious: we should catch >> this regression and continue in the function. With it, GDB now >> correctly prints the "Core was generated by..." message, and the patch >> to adjust gdb_core_cmd is no longer needed. >> >> Regression-tested on Fedora 20 for x86_64, i686 and native-gdbserver. >> >> Does that make more sense now? >> > > Yes, this is OK. Thanks, pushed: <https://sourceware.org/ml/gdb-cvs/2015-03/msg00274.html> 416f679e68468ea6dd7384213994ce74201f82e7
diff --git a/gdb/solib-svr4.c b/gdb/solib-svr4.c index 0cecc2a..dd93847 100644 --- a/gdb/solib-svr4.c +++ b/gdb/solib-svr4.c @@ -910,13 +910,22 @@ solib_svr4_r_ldsomap (struct svr4_info *info) struct link_map_offsets *lmo = svr4_fetch_link_map_offsets (); struct type *ptr_type = builtin_type (target_gdbarch ())->builtin_data_ptr; enum bfd_endian byte_order = gdbarch_byte_order (target_gdbarch ()); - ULONGEST version; + ULONGEST version = 0; + + TRY + { + /* Check version, and return zero if `struct r_debug' doesn't have + the r_ldsomap member. */ + version + = read_memory_unsigned_integer (info->debug_base + lmo->r_version_offset, + lmo->r_version_size, byte_order); + } + CATCH (ex, RETURN_MASK_ERROR) + { + exception_print (gdb_stderr, ex); + } + END_CATCH - /* Check version, and return zero if `struct r_debug' doesn't have - the r_ldsomap member. */ - version - = read_memory_unsigned_integer (info->debug_base + lmo->r_version_offset, - lmo->r_version_size, byte_order); if (version < 2 || lmo->r_ldsomap_offset == -1) return 0;