Message ID | 20220921131200.3983844-1-aburgess@redhat.com |
---|---|
State | Superseded |
Delegated to: | Simon Marchi |
Headers |
Return-Path: <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6701C385828E for <patchwork@sourceware.org>; Wed, 21 Sep 2022 13:13:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6701C385828E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1663766019; bh=3WKnRGeiUhYtYgtsBosuzuynrtYtUd2Vnene6ptKmkA=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=yLEkPQ/qvQfoyCONhEAGPr4TkVElQSU4n0lemNDNCN3xsc6Dx068YGJq7whNL3GHz sqYEu0pFgGc4lxsFAu/1pOTr5e4x/Clw3rU12NQl0GfzYQfE1OADD5viPp7slnvZhu gN5aDNjFivdw7YdK+HMMscpNdrz1ER9hBA6qwRoM= X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 30E943858C55 for <gdb-patches@sourceware.org>; Wed, 21 Sep 2022 13:12:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 30E943858C55 Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-379-Vl1YoH5sNlKh7ltUr6msjw-1; Wed, 21 Sep 2022 09:12:05 -0400 X-MC-Unique: Vl1YoH5sNlKh7ltUr6msjw-1 Received: by mail-wm1-f70.google.com with SMTP id 84-20020a1c0257000000b003b4be28d7e3so1283319wmc.0 for <gdb-patches@sourceware.org>; Wed, 21 Sep 2022 06:12:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date; bh=3WKnRGeiUhYtYgtsBosuzuynrtYtUd2Vnene6ptKmkA=; b=Cq2Kgh077AWMFFijM5tZQwXp2UBm9bj3lrYPANciLIQxgqEf/kxRaPxIWrV1aRBxFv 7tSNhF5ftMuY94YUlLOem1vlwektEaTF/h897K45EMIbI2StT7pVprdwakui9R3RJJAJ Q0r+bquddZsPeZIMEJTKGPIZGg4mCPfPwKlqTibic2ADAssa7OlkZ3JJLXh6e1liVkrG 2Ongt98GLI154A58Pku7rtZs0KkGBqBKJuwmtRQ24hoWetBe52SiZNZXmnWwyZ4fnQlB zT2qnznrVpCASvnADg54wSy5FVJhbYqnP9cB+fpZrT+SLVKEwB+TPEzdRWNSogZUkHQW 3fVA== X-Gm-Message-State: ACrzQf0csroEteHEtaMR/RT5mq0DS0ylOiv2xVTS28AiFXOO04Ov0A2G caXH4uyzRRD6Mp0BeAMf94r0d4TWUN1ldrm5Xe1Qu39lOQ4VXUA1Evr/FY8dGb962S0d383NexA p8NxJOp7e80vqd75u92ePvLLBs3A9CvUJHY4Q0PyMWzYc4gra+43rEWr0s7VW/lDruTK2cRGHGQ == X-Received: by 2002:a05:6000:797:b0:22a:c13a:a971 with SMTP id bu23-20020a056000079700b0022ac13aa971mr17261397wrb.320.1663765923788; Wed, 21 Sep 2022 06:12:03 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7FI+H5XH1ynOcOv+YODDIYpmm1181LEb17B1Tyr7KvtLnC8S10ll7KD1pU61Sju3uljmWQFg== X-Received: by 2002:a05:6000:797:b0:22a:c13a:a971 with SMTP id bu23-20020a056000079700b0022ac13aa971mr17261364wrb.320.1663765923200; Wed, 21 Sep 2022 06:12:03 -0700 (PDT) Received: from localhost (52.72.115.87.dyn.plus.net. [87.115.72.52]) by smtp.gmail.com with ESMTPSA id f8-20020adfb608000000b00228692033dcsm2369950wre.91.2022.09.21.06.12.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Sep 2022 06:12:02 -0700 (PDT) To: gdb-patches@sourceware.org Subject: [PATCH] gdb: fix target_ops reference count for some cases Date: Wed, 21 Sep 2022 14:12:00 +0100 Message-Id: <20220921131200.3983844-1-aburgess@redhat.com> X-Mailer: git-send-email 2.25.4 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=subscribe> From: Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> Reply-To: Andrew Burgess <aburgess@redhat.com> Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
gdb: fix target_ops reference count for some cases
|
|
Commit Message
Andrew Burgess
Sept. 21, 2022, 1:12 p.m. UTC
This commit started as an investigation into why the test gdb.python/py-inferior.exp crashes when GDB exits, leaving a core file behind. The crash occurs in connpy_connection_dealloc, and is actually triggered by this assert: gdb_assert (conn_obj->target == nullptr); Now a little aside... ... the assert is never actually printed, instead GDB crashes due to calling a pure virtual function. The backtrace at the point of crash looks like this: #7 0x00007fef7e2cf747 in std::terminate() () from /lib64/libstdc++.so.6 #8 0x00007fef7e2d0515 in __cxa_pure_virtual () from /lib64/libstdc++.so.6 #9 0x0000000000de334d in target_stack::find_beneath (this=0x4934d78, t=0x2bda270 <the_dummy_target>) at ../../src/gdb/target.c:3606 #10 0x0000000000df4380 in inferior::find_target_beneath (this=0x4934b50, t=0x2bda270 <the_dummy_target>) at ../../src/gdb/inferior.h:377 #11 0x0000000000de2381 in target_ops::beneath (this=0x2bda270 <the_dummy_target>) at ../../src/gdb/target.c:3047 #12 0x0000000000de68aa in target_ops::supports_terminal_ours (this=0x2bda270 <the_dummy_target>) at ../../src/gdb/target-delegates.c:1223 #13 0x0000000000dde6b9 in target_supports_terminal_ours () at ../../src/gdb/target.c:1112 #14 0x0000000000ee55f1 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=0x2bdab00 <internal_error_problem>, file=0x198acf0 "../../src/gdb/python/py-connection.c", line=193, fmt=0x198ac9f "%s: Assertion `%s' failed.", ap=0x7ffdc26109d8) at ../../src/gdb/utils.c:379 Notice in frame #12 we called target_ops::supports_terminal_ours, however, this is the_dummy_target, which is of type dummy_target, and so we should have called dummy_target::supports_terminal_ours. I believe the reason we ended up in the wrong implementation of supports_terminal_ours (which is a virtual function) is because we made the call during GDB's shut-down, and, I suspect, the vtables were in a weird state. Anyway, the point of this patch is not to fix GDB's ability to print an assert during exit, but to address the root cause of the assert. With that aside out of the way, we can return to the main story... Connections are represented in Python with gdb.TargtetConnection objects (or its sub-classes). The assert in question confirms that when a gdb.TargtetConnection is deallocated, the underlying GDB connection has itself been removed from GDB. If this is not true then we risk creating multiple different gdb.TargtetConnection objects for the same connection, which would be bad. When a connection removed in GDB the connection_removed observer fires, which we catch with connpy_connection_removed, this function then sets conn_obj->target to nullptr. The first issue here is that connpy_connection_dealloc is being called as part of GDB's exit code, which is run after the Python interpreter has been shut down. The connpy_connection_dealloc function is used to deallocate the gdb.TargtetConnection Python object. Surely it is wrong for us to be deallocating Python objects after the interpreter has been shut down. The reason why connpy_connection_dealloc is called during GDB's exit is that the global all_connection_objects map is holding a reference to the gdb.TargtetConnection object. When the map is destroyed during GDB's exit, the gdb.TargtetConnection objects within the map can finally be deallocated. Another job of connpy_connection_removed (the function we mentioned earlier) is to remove connections from the all_connection_objects map when the connection is removed from GDB. And so, the reason why all_connection_objects has contents when GDB exits, and the reason the assert fires, is that, when GDB exits, there are still some connections that have not yet been removed from GDB, that is, they have a non-zero reference count. If we take a look at quit_force (top.c) you can see that, for each inferior, we call pop_all_targets before we (later in the function) call do_final_cleanups. It is the do_final_cleanups call that is responsible for shutting down the Python interpreter. So, in theory, we should have popped all targets be the time GDB exits, this should have reduced their reference counts to zero, which in turn should have triggered the connection_removed observer, and resulted in the connection being removed from all_connection_objects, and the gdb.TargtetConnection object being deallocated. That this is not happening indicates that earlier, somewhere else in GDB, we are leaking references to GDB's connections. I tracked the problem down to the 'remove-inferiors' command, implemented with the remove_inferior_command function (in inferior.c). This function calls delete_inferior for each inferior the user specifies. In delete_inferior we do some house keeping, and then delete the inferior object, which calls inferior::~inferior. In neither delete_inferior or inferior::~inferior do we call pop_all_targets, and it is this missing call that means we leak some references to the target_ops objects on the inferior's target_stack. To fix this we need to add a pop_all_targets call either in delete_inferior or in inferior::~inferior. Currently, I think that we should place the call in delete_inferior. Before calling pop_all_targets the inferior for which we are popping needs to be made current, along with the program_space associated with the inferior. At the moment the inferior's program_space is deleted in delete_inferior before we call inferior::~inferior, so, I think, to place the pop_all_targets call into inferior::~inferior would require additional adjustment to GDB. As delete_inferior already exists, and includes various house keeping tasks, it doesn't seem unreasonable to place the pop_all_targets call there. Now when I run py-inferior.exp, by the time GDB exits, the reference counts are correct. The final pop_all_targets calls in quit_force reduce the reference counts to zero, which means the connections are removed before the Python interpreter is shut down. When GDB actually exits the all_connection_objects map is empty, and no further Python objects are deallocated at that point. The test now exits cleanly without creating a core file. I've made some additional, related, changes in this commit. In inferior::~inferior I've added a new assert that ensures, by the time the inferior is destructed, the inferior's target stack is empty (with the exception of the dummy_target). If this is not true then we will be loosing a reference to a target_ops object. It is worth noting that we are loosing references to the dummy_target object, however, I've not tried to fix that problem in this patch, as I don't think it is as important. The dummy target is a global singleton, there's no observer for when the dummy target is deleted, so no other parts of GDB care when the object is deleted. As a global it is always just deleted as part of the exit code, and we never really care what its reference count is. So, though it is a little annoying that its reference count is wrong, it doesn't really matter. Maybe I'll come back in a later patch and try to clean that up... but that's for another day. When I tested the changes above I ran into a failure from 'maint selftest infrun_thread_ptid_changed'. The problem is with scoped_mock_context. This object creates a new inferior (called mock_inferior), with a thread, and some other associated state, and then select this new inferior. We also push a process_stratum_target sub-class onto the new inferior's target stack. In ~scoped_mock_context we call: pop_all_targets_at_and_above (process_stratum); this will remove all target_ops objects from the mock_inferior's target stack, but leaves anything at the dummy_stratum and the file_stratum (which I find a little weird, but more on this later). The problem though is that pop_all_targets_at_and_above, just like pop_all_targets, removes things from the target stack of the current inferior. In ~scoped_mock_context we don't ensure that the mock_inferior associated with the current scoped_mock_context is actually selected. In most tests we create a single scoped_mock_context, which automatically selects its contained mock_inferior. However, in the test infrun_thread_ptid_changed, we create multiple scoped_mock_context, and then change which inferior is currently selected. As a result, in one case, we end up in ~scoped_mock_context with the wrong inferior selected. The pop_all_targets_at_and_above call then removes the target_ops from the wrong inferior's target stack. This leaves the target_ops on the scoped_mock_context::mock_inferior's target stack, and, when the mock_inferior is destructed, we loose some references, this triggers the assert I placed in inferior::~inferior. To fix this I added a switch_to_inferior_no_thread call within the ~scoped_mock_context function. As I mention above, it seems weird that we call pop_all_targets_at_and_above instead of pop_all_targets, so I've changed that. I didn't see any test regressions after this, so I'm assuming this is fine. Finally, I've made use of the Python connection_removed event listener API to add a test for this issue, but in addition the py-inferior.exp test now runs without crashing (and creating a core file) on exit. --- gdb/inferior.c | 30 ++++++ gdb/scoped-mock-context.h | 5 +- .../gdb.python/py-connection-removed.exp | 92 +++++++++++++++++++ 3 files changed, 126 insertions(+), 1 deletion(-) create mode 100644 gdb/testsuite/gdb.python/py-connection-removed.exp
Comments
On 2022-09-21 09:12, Andrew Burgess via Gdb-patches wrote: > This commit started as an investigation into why the test > gdb.python/py-inferior.exp crashes when GDB exits, leaving a core file > behind. > > The crash occurs in connpy_connection_dealloc, and is actually > triggered by this assert: > > gdb_assert (conn_obj->target == nullptr); > > Now a little aside... > > ... the assert is never actually printed, instead GDB crashes due to > calling a pure virtual function. The backtrace at the point of crash > looks like this: > > #7 0x00007fef7e2cf747 in std::terminate() () from /lib64/libstdc++.so.6 > #8 0x00007fef7e2d0515 in __cxa_pure_virtual () from /lib64/libstdc++.so.6 > #9 0x0000000000de334d in target_stack::find_beneath (this=0x4934d78, t=0x2bda270 <the_dummy_target>) at ../../src/gdb/target.c:3606 > #10 0x0000000000df4380 in inferior::find_target_beneath (this=0x4934b50, t=0x2bda270 <the_dummy_target>) at ../../src/gdb/inferior.h:377 > #11 0x0000000000de2381 in target_ops::beneath (this=0x2bda270 <the_dummy_target>) at ../../src/gdb/target.c:3047 > #12 0x0000000000de68aa in target_ops::supports_terminal_ours (this=0x2bda270 <the_dummy_target>) at ../../src/gdb/target-delegates.c:1223 > #13 0x0000000000dde6b9 in target_supports_terminal_ours () at ../../src/gdb/target.c:1112 > #14 0x0000000000ee55f1 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=0x2bdab00 <internal_error_problem>, file=0x198acf0 "../../src/gdb/python/py-connection.c", line=193, fmt=0x198ac9f "%s: Assertion `%s' failed.", ap=0x7ffdc26109d8) at ../../src/gdb/utils.c:379 > > Notice in frame #12 we called target_ops::supports_terminal_ours, > however, this is the_dummy_target, which is of type dummy_target, and > so we should have called dummy_target::supports_terminal_ours. I > believe the reason we ended up in the wrong implementation of > supports_terminal_ours (which is a virtual function) is because we > made the call during GDB's shut-down, and, I suspect, the vtables were > in a weird state. > > Anyway, the point of this patch is not to fix GDB's ability to print > an assert during exit, but to address the root cause of the assert. > With that aside out of the way, we can return to the main story... > > Connections are represented in Python with gdb.TargtetConnection > objects (or its sub-classes). The assert in question confirms that > when a gdb.TargtetConnection is deallocated, the underlying GDB > connection has itself been removed from GDB. If this is not true then > we risk creating multiple different gdb.TargtetConnection objects for > the same connection, which would be bad. > > When a connection removed in GDB the connection_removed observer Missing "is". > fires, which we catch with connpy_connection_removed, this function > then sets conn_obj->target to nullptr. > > The first issue here is that connpy_connection_dealloc is being called > as part of GDB's exit code, which is run after the Python interpreter > has been shut down. The connpy_connection_dealloc function is used to > deallocate the gdb.TargtetConnection Python object. Surely it is > wrong for us to be deallocating Python objects after the interpreter > has been shut down. > > The reason why connpy_connection_dealloc is called during GDB's exit > is that the global all_connection_objects map is holding a reference > to the gdb.TargtetConnection object. When the map is destroyed during Typo in "TargtetConnection". > GDB's exit, the gdb.TargtetConnection objects within the map can > finally be deallocated. > > Another job of connpy_connection_removed (the function we mentioned > earlier) is to remove connections from the all_connection_objects map > when the connection is removed from GDB. > > And so, the reason why all_connection_objects has contents when GDB > exits, and the reason the assert fires, is that, when GDB exits, there > are still some connections that have not yet been removed from GDB, > that is, they have a non-zero reference count. > > If we take a look at quit_force (top.c) you can see that, for each > inferior, we call pop_all_targets before we (later in the function) > call do_final_cleanups. It is the do_final_cleanups call that is > responsible for shutting down the Python interpreter. > > So, in theory, we should have popped all targets be the time GDB be -> before? > exits, this should have reduced their reference counts to zero, which > in turn should have triggered the connection_removed observer, and > resulted in the connection being removed from all_connection_objects, > and the gdb.TargtetConnection object being deallocated. "TargtetConnection" > That this is not happening indicates that earlier, somewhere else in > GDB, we are leaking references to GDB's connections. > > I tracked the problem down to the 'remove-inferiors' command, > implemented with the remove_inferior_command function (in inferior.c). > This function calls delete_inferior for each inferior the user > specifies. > > In delete_inferior we do some house keeping, and then delete the > inferior object, which calls inferior::~inferior. > > In neither delete_inferior or inferior::~inferior do we call > pop_all_targets, and it is this missing call that means we leak some > references to the target_ops objects on the inferior's target_stack. > > To fix this we need to add a pop_all_targets call either in > delete_inferior or in inferior::~inferior. Currently, I think that we > should place the call in delete_inferior. > > Before calling pop_all_targets the inferior for which we are popping > needs to be made current, along with the program_space associated with > the inferior. Why does the inferior and program_space need to be made current in order to pop the targets? I understand that pop_all_targets_above and other functions use `current_inferior`, but could we convert them (or add new versions) so they don't? Off-hand I don't see why they couldn't receive the inferior as a parameter (or be made methods of inferior and/or target_stack). It shouldn't be important which inferior is the current one when calling target_close on a target. If we are closing a target, it means it is no longer controlling any inferior. > At the moment the inferior's program_space is deleted in > delete_inferior before we call inferior::~inferior, so, I think, to > place the pop_all_targets call into inferior::~inferior would require > additional adjustment to GDB. As delete_inferior already exists, and > includes various house keeping tasks, it doesn't seem unreasonable to > place the pop_all_targets call there. I don't object to fixing it like this. I'm just wondering, did you consider changing target_stack::m_stack to make it hold string references, something like std::vector<target_ops_ref>? I haven't tried so maybe this doesn't make sense / is too difficult. But if it does, I guess the problem would take care of itself. When deleting an inferior that still has some targets pushed, they would be automatically decref'd and closed if needed. > Now when I run py-inferior.exp, by the time GDB exits, the reference > counts are correct. The final pop_all_targets calls in quit_force > reduce the reference counts to zero, which means the connections are > removed before the Python interpreter is shut down. When GDB actually > exits the all_connection_objects map is empty, and no further Python > objects are deallocated at that point. The test now exits cleanly > without creating a core file. > > I've made some additional, related, changes in this commit. > > In inferior::~inferior I've added a new assert that ensures, by the > time the inferior is destructed, the inferior's target stack is > empty (with the exception of the dummy_target). If this is not true > then we will be loosing a reference to a target_ops object. > > It is worth noting that we are loosing references to the dummy_target > object, however, I've not tried to fix that problem in this patch, as > I don't think it is as important. The dummy target is a global > singleton, there's no observer for when the dummy target is deleted, > so no other parts of GDB care when the object is deleted. As a global > it is always just deleted as part of the exit code, and we never > really care what its reference count is. So, though it is a little > annoying that its reference count is wrong, it doesn't really matter. > Maybe I'll come back in a later patch and try to clean that up... but > that's for another day. > > When I tested the changes above I ran into a failure from 'maint > selftest infrun_thread_ptid_changed'. > > The problem is with scoped_mock_context. This object creates a new > inferior (called mock_inferior), with a thread, and some other > associated state, and then select this new inferior. We also push a > process_stratum_target sub-class onto the new inferior's target stack. > > In ~scoped_mock_context we call: > > pop_all_targets_at_and_above (process_stratum); > > this will remove all target_ops objects from the mock_inferior's > target stack, but leaves anything at the dummy_stratum and the > file_stratum (which I find a little weird, but more on this later). > > The problem though is that pop_all_targets_at_and_above, just like > pop_all_targets, removes things from the target stack of the current > inferior. In ~scoped_mock_context we don't ensure that the > mock_inferior associated with the current scoped_mock_context is > actually selected. > > In most tests we create a single scoped_mock_context, which > automatically selects its contained mock_inferior. However, in the > test infrun_thread_ptid_changed, we create multiple > scoped_mock_context, and then change which inferior is currently > selected. > > As a result, in one case, we end up in ~scoped_mock_context with the > wrong inferior selected. The pop_all_targets_at_and_above call then > removes the target_ops from the wrong inferior's target stack. This > leaves the target_ops on the scoped_mock_context::mock_inferior's > target stack, and, when the mock_inferior is destructed, we loose > some references, this triggers the assert I placed in > inferior::~inferior. > > To fix this I added a switch_to_inferior_no_thread call within the > ~scoped_mock_context function. Good catch. Although, if that could be fixed by making pop_all_targets_at_and_above not use the current_inferior, I think it would be nicer. And if the target stack could take care of managing the refcount, as mentioned above, even nicer. > As I mention above, it seems weird that we call > pop_all_targets_at_and_above instead of pop_all_targets, so I've > changed that. I didn't see any test regressions after this, so I'm > assuming this is fine. Seems fine to me (this is essentially what a target stack holding target_ops_refs would do). Simon
Simon Marchi <simark@simark.ca> writes: > On 2022-09-21 09:12, Andrew Burgess via Gdb-patches wrote: >> This commit started as an investigation into why the test >> gdb.python/py-inferior.exp crashes when GDB exits, leaving a core file >> behind. >> >> The crash occurs in connpy_connection_dealloc, and is actually >> triggered by this assert: >> >> gdb_assert (conn_obj->target == nullptr); >> >> Now a little aside... >> >> ... the assert is never actually printed, instead GDB crashes due to >> calling a pure virtual function. The backtrace at the point of crash >> looks like this: >> >> #7 0x00007fef7e2cf747 in std::terminate() () from /lib64/libstdc++.so.6 >> #8 0x00007fef7e2d0515 in __cxa_pure_virtual () from /lib64/libstdc++.so.6 >> #9 0x0000000000de334d in target_stack::find_beneath (this=0x4934d78, t=0x2bda270 <the_dummy_target>) at ../../src/gdb/target.c:3606 >> #10 0x0000000000df4380 in inferior::find_target_beneath (this=0x4934b50, t=0x2bda270 <the_dummy_target>) at ../../src/gdb/inferior.h:377 >> #11 0x0000000000de2381 in target_ops::beneath (this=0x2bda270 <the_dummy_target>) at ../../src/gdb/target.c:3047 >> #12 0x0000000000de68aa in target_ops::supports_terminal_ours (this=0x2bda270 <the_dummy_target>) at ../../src/gdb/target-delegates.c:1223 >> #13 0x0000000000dde6b9 in target_supports_terminal_ours () at ../../src/gdb/target.c:1112 >> #14 0x0000000000ee55f1 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=0x2bdab00 <internal_error_problem>, file=0x198acf0 "../../src/gdb/python/py-connection.c", line=193, fmt=0x198ac9f "%s: Assertion `%s' failed.", ap=0x7ffdc26109d8) at ../../src/gdb/utils.c:379 >> >> Notice in frame #12 we called target_ops::supports_terminal_ours, >> however, this is the_dummy_target, which is of type dummy_target, and >> so we should have called dummy_target::supports_terminal_ours. I >> believe the reason we ended up in the wrong implementation of >> supports_terminal_ours (which is a virtual function) is because we >> made the call during GDB's shut-down, and, I suspect, the vtables were >> in a weird state. >> >> Anyway, the point of this patch is not to fix GDB's ability to print >> an assert during exit, but to address the root cause of the assert. >> With that aside out of the way, we can return to the main story... >> >> Connections are represented in Python with gdb.TargtetConnection >> objects (or its sub-classes). The assert in question confirms that >> when a gdb.TargtetConnection is deallocated, the underlying GDB >> connection has itself been removed from GDB. If this is not true then >> we risk creating multiple different gdb.TargtetConnection objects for >> the same connection, which would be bad. >> >> When a connection removed in GDB the connection_removed observer > > Missing "is". > >> fires, which we catch with connpy_connection_removed, this function >> then sets conn_obj->target to nullptr. >> >> The first issue here is that connpy_connection_dealloc is being called >> as part of GDB's exit code, which is run after the Python interpreter >> has been shut down. The connpy_connection_dealloc function is used to >> deallocate the gdb.TargtetConnection Python object. Surely it is >> wrong for us to be deallocating Python objects after the interpreter >> has been shut down. >> >> The reason why connpy_connection_dealloc is called during GDB's exit >> is that the global all_connection_objects map is holding a reference >> to the gdb.TargtetConnection object. When the map is destroyed during > > Typo in "TargtetConnection". > >> GDB's exit, the gdb.TargtetConnection objects within the map can >> finally be deallocated. >> >> Another job of connpy_connection_removed (the function we mentioned >> earlier) is to remove connections from the all_connection_objects map >> when the connection is removed from GDB. >> >> And so, the reason why all_connection_objects has contents when GDB >> exits, and the reason the assert fires, is that, when GDB exits, there >> are still some connections that have not yet been removed from GDB, >> that is, they have a non-zero reference count. >> >> If we take a look at quit_force (top.c) you can see that, for each >> inferior, we call pop_all_targets before we (later in the function) >> call do_final_cleanups. It is the do_final_cleanups call that is >> responsible for shutting down the Python interpreter. >> >> So, in theory, we should have popped all targets be the time GDB > > be -> before? > >> exits, this should have reduced their reference counts to zero, which >> in turn should have triggered the connection_removed observer, and >> resulted in the connection being removed from all_connection_objects, >> and the gdb.TargtetConnection object being deallocated. > > "TargtetConnection" > >> That this is not happening indicates that earlier, somewhere else in >> GDB, we are leaking references to GDB's connections. >> >> I tracked the problem down to the 'remove-inferiors' command, >> implemented with the remove_inferior_command function (in inferior.c). >> This function calls delete_inferior for each inferior the user >> specifies. >> >> In delete_inferior we do some house keeping, and then delete the >> inferior object, which calls inferior::~inferior. >> >> In neither delete_inferior or inferior::~inferior do we call >> pop_all_targets, and it is this missing call that means we leak some >> references to the target_ops objects on the inferior's target_stack. >> >> To fix this we need to add a pop_all_targets call either in >> delete_inferior or in inferior::~inferior. Currently, I think that we >> should place the call in delete_inferior. >> >> Before calling pop_all_targets the inferior for which we are popping >> needs to be made current, along with the program_space associated with >> the inferior. > > Why does the inferior and program_space need to be made current in order > to pop the targets? I understand that pop_all_targets_above and other > functions use `current_inferior`, but could we convert them (or add new > versions) so they don't? Off-hand I don't see why they couldn't receive > the inferior as a parameter (or be made methods of inferior and/or > target_stack). > > It shouldn't be important which inferior is the current one when calling > target_close on a target. If we are closing a target, it means it is no > longer controlling any inferior. I agree with you 100%. Unfortunately, the following targets all seem to depend on current_inferior being set (in their ::close method): bsd_kvm_target core_target darwin_nat_target record_btrace_target ctf_target tfile_target windows_nat_target (though this is only for debug output) I suspect that this means these targets only really work when GDB has a single inferior maybe? In most cases GDB seems to be clearing out some per-inferior state relating to the target... I need to investigate more, but I guess I wanted to raise this in case you (or anyone) had thoughts. > >> At the moment the inferior's program_space is deleted in >> delete_inferior before we call inferior::~inferior, so, I think, to >> place the pop_all_targets call into inferior::~inferior would require >> additional adjustment to GDB. As delete_inferior already exists, and >> includes various house keeping tasks, it doesn't seem unreasonable to >> place the pop_all_targets call there. > > I don't object to fixing it like this. I'm just wondering, did you > consider changing target_stack::m_stack to make it hold string > references, something like std::vector<target_ops_ref>? I haven't tried > so maybe this doesn't make sense / is too difficult. But if it does, I > guess the problem would take care of itself. When deleting an inferior > that still has some targets pushed, they would be automatically decref'd > and closed if needed. I did think about this. I think in the end the fix I proposed here was just less churn. I've revisited the idea of holding target_ops_ref objects, and I have some patches that move GDB in that direction, though I haven't yet figured out if we can get rid of the whole pop_all_targets API, which I think is what you're hinting at. > >> Now when I run py-inferior.exp, by the time GDB exits, the reference >> counts are correct. The final pop_all_targets calls in quit_force >> reduce the reference counts to zero, which means the connections are >> removed before the Python interpreter is shut down. When GDB actually >> exits the all_connection_objects map is empty, and no further Python >> objects are deallocated at that point. The test now exits cleanly >> without creating a core file. >> >> I've made some additional, related, changes in this commit. >> >> In inferior::~inferior I've added a new assert that ensures, by the >> time the inferior is destructed, the inferior's target stack is >> empty (with the exception of the dummy_target). If this is not true >> then we will be loosing a reference to a target_ops object. >> >> It is worth noting that we are loosing references to the dummy_target >> object, however, I've not tried to fix that problem in this patch, as >> I don't think it is as important. The dummy target is a global >> singleton, there's no observer for when the dummy target is deleted, >> so no other parts of GDB care when the object is deleted. As a global >> it is always just deleted as part of the exit code, and we never >> really care what its reference count is. So, though it is a little >> annoying that its reference count is wrong, it doesn't really matter. >> Maybe I'll come back in a later patch and try to clean that up... but >> that's for another day. >> >> When I tested the changes above I ran into a failure from 'maint >> selftest infrun_thread_ptid_changed'. >> >> The problem is with scoped_mock_context. This object creates a new >> inferior (called mock_inferior), with a thread, and some other >> associated state, and then select this new inferior. We also push a >> process_stratum_target sub-class onto the new inferior's target stack. >> >> In ~scoped_mock_context we call: >> >> pop_all_targets_at_and_above (process_stratum); >> >> this will remove all target_ops objects from the mock_inferior's >> target stack, but leaves anything at the dummy_stratum and the >> file_stratum (which I find a little weird, but more on this later). >> >> The problem though is that pop_all_targets_at_and_above, just like >> pop_all_targets, removes things from the target stack of the current >> inferior. In ~scoped_mock_context we don't ensure that the >> mock_inferior associated with the current scoped_mock_context is >> actually selected. >> >> In most tests we create a single scoped_mock_context, which >> automatically selects its contained mock_inferior. However, in the >> test infrun_thread_ptid_changed, we create multiple >> scoped_mock_context, and then change which inferior is currently >> selected. >> >> As a result, in one case, we end up in ~scoped_mock_context with the >> wrong inferior selected. The pop_all_targets_at_and_above call then >> removes the target_ops from the wrong inferior's target stack. This >> leaves the target_ops on the scoped_mock_context::mock_inferior's >> target stack, and, when the mock_inferior is destructed, we loose >> some references, this triggers the assert I placed in >> inferior::~inferior. >> >> To fix this I added a switch_to_inferior_no_thread call within the >> ~scoped_mock_context function. > > Good catch. Although, if that could be fixed by making > pop_all_targets_at_and_above not use the current_inferior, I think it > would be nicer. And if the target stack could take care of managing the > refcount, as mentioned above, even nicer. As I mention above, right now it seems we do need th correct inferior selected, so we might need something like this, I'll see how my new patches work out. Thanks, Andrew > >> As I mention above, it seems weird that we call >> pop_all_targets_at_and_above instead of pop_all_targets, so I've >> changed that. I didn't see any test regressions after this, so I'm >> assuming this is fine. > > Seems fine to me (this is essentially what a target stack holding > target_ops_refs would do). > > Simon
> I agree with you 100%. Unfortunately, the following targets all seem to > depend on current_inferior being set (in their ::close method): > > bsd_kvm_target > core_target > darwin_nat_target > record_btrace_target > ctf_target > tfile_target > windows_nat_target (though this is only for debug output) > > I suspect that this means these targets only really work when GDB has a > single inferior maybe? In most cases GDB seems to be clearing out some > per-inferior state relating to the target... I need to investigate more, > but I guess I wanted to raise this in case you (or anyone) had thoughts. Ah, ok. Yeah I see some targets calling exit_inferior_silent on the current inferior, among other things. I guess I misunderstand what target_ops::close is meant to do, I thought this would have been done earlier. Different targets seem to use the close method for different things. So, in any case, my proposition falls into "no doable right now". >>> At the moment the inferior's program_space is deleted in >>> delete_inferior before we call inferior::~inferior, so, I think, to >>> place the pop_all_targets call into inferior::~inferior would require >>> additional adjustment to GDB. As delete_inferior already exists, and >>> includes various house keeping tasks, it doesn't seem unreasonable to >>> place the pop_all_targets call there. >> >> I don't object to fixing it like this. I'm just wondering, did you >> consider changing target_stack::m_stack to make it hold string >> references, something like std::vector<target_ops_ref>? I haven't tried >> so maybe this doesn't make sense / is too difficult. But if it does, I >> guess the problem would take care of itself. When deleting an inferior >> that still has some targets pushed, they would be automatically decref'd >> and closed if needed. > > I did think about this. I think in the end the fix I proposed here > was just less churn. Agreed. > As I mention above, right now it seems we do need th correct inferior > selected, so we might need something like this, I'll see how my new > patches work out. I think your approach is ok, it fixes a real bug, so big refactorings should not get in the way of that. I'll take a look at the code now. Simon
> diff --git a/gdb/inferior.c b/gdb/inferior.c > index 7eb2bd97907..0c0b4b78ff5 100644 > --- a/gdb/inferior.c > +++ b/gdb/inferior.c > @@ -70,6 +70,16 @@ inferior::~inferior () > { > inferior *inf = this; > > + /* Before the inferior is deleted, all target_ops should be popped from > + the target stack, this leaves just the dummy_target behind. If this > + is not done, then any target left in the target stack will be left > + with an artificially high reference count. As the dummy_target is > + still on the target stack then we are about to loose a reference to loose -> lose? > @@ -191,6 +201,26 @@ delete_inferior (struct inferior *inf) > > gdb::observers::inferior_removed.notify (inf); > > + { > + /* Limit the change of inferior to an inner scope so that the original > + inferior and program space will have been restored by the time that > + we delete the inferior INF and (possibly) its related program two spaces between related and program. > + space. */ > + scoped_restore_current_pspace_and_thread restore_pspace_and_thread; Just wondering, why do we need to restore explicitly the current pspace, instead of using just scoped_restore_current_thread? scoped_restore_current_pspace_and_thread's doc says: /* Save/restore the current program space, thread, inferior and frame. Use this when you need to call switch_to_program_space_and_thread. */ ... but you are not using switch_to_program_space_and_thread here. Maybe it's ok and I just don't understand. Same in ~scoped_mock_context. > diff --git a/gdb/testsuite/gdb.python/py-connection-removed.exp b/gdb/testsuite/gdb.python/py-connection-removed.exp > new file mode 100644 > index 00000000000..1b139cedc0d > --- /dev/null > +++ b/gdb/testsuite/gdb.python/py-connection-removed.exp > @@ -0,0 +1,92 @@ > +# Copyright (C) 2022 Free Software Foundation, Inc. > + > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 3 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see <http://www.gnu.org/licenses/>. > + > +# Check that the gdb.connect_removed event triggers when we expect it > +# too. too -> to > +# > +# Checking this event has wider implications that simply some corner simply -> imply? Or I don't get what you mean. > +# of the Python API working or not. The connection_removed event > +# triggers when the reference count of a process_stratum_target > +# reaches zero. If these events stop triggering when expected then > +# GDB might be getting the reference counting on target_ops objects > +# wrong. > + > +load_lib gdb-python.exp > + > +standard_testfile py-connection.c > + > +if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile}] } { > + return -1 > +} > + > +# Skip all tests if Python scripting is not enabled. > +if { [skip_python_tests] } { continue } > + > +if ![runto_main] then { > + return 0 > +} > + > +# Register a callback that will trigger when a connection is removed > +# (deleted) within GDB. > +gdb_test_multiline "Add connection_removed event" \ > + "python" "" \ > + "def connection_removed_handler(event):" "" \ > + " num = event.connection.num" "" \ > + " type = event.connection.type" "" \ > + " print(f'Connection {num} ({type}) removed')" "" \ I think unfortunately need to support Python versions that don't know about f-strings. Simon
Simon Marchi <simark@simark.ca> writes: >> diff --git a/gdb/inferior.c b/gdb/inferior.c >> index 7eb2bd97907..0c0b4b78ff5 100644 >> --- a/gdb/inferior.c >> +++ b/gdb/inferior.c >> @@ -70,6 +70,16 @@ inferior::~inferior () >> { >> inferior *inf = this; >> >> + /* Before the inferior is deleted, all target_ops should be popped from >> + the target stack, this leaves just the dummy_target behind. If this >> + is not done, then any target left in the target stack will be left >> + with an artificially high reference count. As the dummy_target is >> + still on the target stack then we are about to loose a reference to > loose -> lose? > >> @@ -191,6 +201,26 @@ delete_inferior (struct inferior *inf) >> >> gdb::observers::inferior_removed.notify (inf); >> >> + { >> + /* Limit the change of inferior to an inner scope so that the original >> + inferior and program space will have been restored by the time that >> + we delete the inferior INF and (possibly) its related program > > two spaces between related and program. > >> + space. */ >> + scoped_restore_current_pspace_and_thread restore_pspace_and_thread; > > Just wondering, why do we need to restore explicitly the current > pspace, instead of using just scoped_restore_current_thread? > > scoped_restore_current_pspace_and_thread's doc says: > > /* Save/restore the current program space, thread, inferior and frame. > Use this when you need to call > switch_to_program_space_and_thread. */ > > ... but you are not using switch_to_program_space_and_thread here. > Maybe it's ok and I just don't understand. Same in > ~scoped_mock_context. I suspect the comment you quote is just out of date. switch_to_program_space_and_thread can end up calling switch_to_inferior_no_thread if there are no running threads in the program space being switched too. But, even if switch_to_thread does end up being called we: - set the program space, - set the inferior, - set the current thread, - reinit the frame cache, By comparison, switch_to_inferior_no_thread does: - sets the program space, - sets the inferior, - sets the current thread (to nullptr this time though), - reinits the frame cache, As you can see they do the same set of things, all of which I think should be reverted once we leave the scope, hence scoped_restore_current_pspace_and_thread seems like the way to go. > >> diff --git a/gdb/testsuite/gdb.python/py-connection-removed.exp b/gdb/testsuite/gdb.python/py-connection-removed.exp >> new file mode 100644 >> index 00000000000..1b139cedc0d >> --- /dev/null >> +++ b/gdb/testsuite/gdb.python/py-connection-removed.exp >> @@ -0,0 +1,92 @@ >> +# Copyright (C) 2022 Free Software Foundation, Inc. >> + >> +# This program is free software; you can redistribute it and/or modify >> +# it under the terms of the GNU General Public License as published by >> +# the Free Software Foundation; either version 3 of the License, or >> +# (at your option) any later version. >> +# >> +# This program is distributed in the hope that it will be useful, >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +# GNU General Public License for more details. >> +# >> +# You should have received a copy of the GNU General Public License >> +# along with this program. If not, see <http://www.gnu.org/licenses/>. >> + >> +# Check that the gdb.connect_removed event triggers when we expect it >> +# too. > > too -> to > >> +# >> +# Checking this event has wider implications that simply some corner > > simply -> imply? Or I don't get what you mean. > >> +# of the Python API working or not. The connection_removed event >> +# triggers when the reference count of a process_stratum_target >> +# reaches zero. If these events stop triggering when expected then >> +# GDB might be getting the reference counting on target_ops objects >> +# wrong. >> + >> +load_lib gdb-python.exp >> + >> +standard_testfile py-connection.c >> + >> +if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile}] } { >> + return -1 >> +} >> + >> +# Skip all tests if Python scripting is not enabled. >> +if { [skip_python_tests] } { continue } >> + />> +if ![runto_main] then { >> + return 0 >> +} >> + >> +# Register a callback that will trigger when a connection is removed >> +# (deleted) within GDB. >> +gdb_test_multiline "Add connection_removed event" \ >> + "python" "" \ >> + "def connection_removed_handler(event):" "" \ >> + " num = event.connection.num" "" \ >> + " type = event.connection.type" "" \ >> + " print(f'Connection {num} ({type}) removed')" "" \ > > I think unfortunately need to support Python versions that don't know > about f-strings. OK. Thanks, Andrew > > Simon
>> Just wondering, why do we need to restore explicitly the current >> pspace, instead of using just scoped_restore_current_thread? >> >> scoped_restore_current_pspace_and_thread's doc says: >> >> /* Save/restore the current program space, thread, inferior and frame. >> Use this when you need to call >> switch_to_program_space_and_thread. */ >> >> ... but you are not using switch_to_program_space_and_thread here. >> Maybe it's ok and I just don't understand. Same in >> ~scoped_mock_context. > > I suspect the comment you quote is just out of date. > > switch_to_program_space_and_thread can end up calling > switch_to_inferior_no_thread if there are no running threads in the > program space being switched too. But, even if switch_to_thread does > end up being called we: > > - set the program space, > - set the inferior, > - set the current thread, > - reinit the frame cache, > > By comparison, switch_to_inferior_no_thread does: > > - sets the program space, > - sets the inferior, > - sets the current thread (to nullptr this time though), > - reinits the frame cache, > > As you can see they do the same set of things, all of which I think > should be reverted once we leave the scope, hence > scoped_restore_current_pspace_and_thread seems like the way to go. Hmm okay, but won't scoped_restore_current_thread always restore the pspace one way or another? scoped_restore_current_thread::restore will either call switch_to_thread or switch_to_inferior_no_thread, which both end up setting the pspace. I just don't understand why scoped_restore_current_pspace_and_thread exists, it seems like scoped_restore_current_thread would always work where scoped_restore_current_pspace_and_thread is used. I must be missing something, scoped_restore_current_pspace_and_thread must have been added for a reason. Simon
Simon Marchi <simark@simark.ca> writes: >>> Just wondering, why do we need to restore explicitly the current >>> pspace, instead of using just scoped_restore_current_thread? >>> >>> scoped_restore_current_pspace_and_thread's doc says: >>> >>> /* Save/restore the current program space, thread, inferior and frame. >>> Use this when you need to call >>> switch_to_program_space_and_thread. */ >>> >>> ... but you are not using switch_to_program_space_and_thread here. >>> Maybe it's ok and I just don't understand. Same in >>> ~scoped_mock_context. >> >> I suspect the comment you quote is just out of date. >> >> switch_to_program_space_and_thread can end up calling >> switch_to_inferior_no_thread if there are no running threads in the >> program space being switched too. But, even if switch_to_thread does >> end up being called we: >> >> - set the program space, >> - set the inferior, >> - set the current thread, >> - reinit the frame cache, >> >> By comparison, switch_to_inferior_no_thread does: >> >> - sets the program space, >> - sets the inferior, >> - sets the current thread (to nullptr this time though), >> - reinits the frame cache, >> >> As you can see they do the same set of things, all of which I think >> should be reverted once we leave the scope, hence >> scoped_restore_current_pspace_and_thread seems like the way to go. > > Hmm okay, but won't scoped_restore_current_thread always restore the > pspace one way or another? scoped_restore_current_thread::restore will > either call switch_to_thread or switch_to_inferior_no_thread, which both > end up setting the pspace. That sounds completely correct. So I did an experiment. I added ~scoped_restore_current_pspace_and_thread, and with a little hacking, had it check: is the program_space in m_restore_pspace the same program_space as would be restored by m_restore_thread? Turns out that it _isn't_ always the case! Just to be clear on what this means: sometimes GDB is in a state where the currently selected program_space is NOT the program_space for the currently selected inferior! One example of when this occurs is create_longjmp_master_breakpoint in breakpoint.c, in here we do this: scoped_restore_current_program_space restore_pspace; for (struct program_space *pspace : program_spaces) { set_current_program_space (pspace); ... } Obviously we enter this code with some inferior selected, but, within the body of this loop we select each program_space in turn. All but one of these will be the wrong program_space for the current inferior. This feels (to me) like a problem waiting to happen. I wonder if we should have some better approach here. Some ideas would be: 1. Change to always require switch_to_program_space_and_thread and remove set_current_program_space. This would then require that we use scoped_restore_current_thread and remove scoped_restore_current_program_space. This would mean GDB always had a sane thread/inferior/program_space combo selected, 2. Allow GDB to have no inferior selected, then, in the above code, where (I guess) we are claiming that everything in the for loop only cares about the program_space, and not the inferior, we could temporarily switch to no inferior. This would suggest adding a set_current_program_space_no_inferior function, and again, we would use scoped_restore_current_thread and remove scoped_restore_current_program_space from GDB, 3. Embrace the mismatch and have scoped_restore_current_thread capture the current_program_space as well as the current thread and inferior. It would then restore the program space after restoring the thread/inferior, thus ensuring that we always restore the program_space correctly. This would allow for scoped_restore_current_pspace_and_thread to be removed and just use scoped_restore_current_thread in all cases. I rather like #2, but I worry that too much of GDB would just assume there's always a current inferior, so I'm tempted to think #1 is probably the best approach. My concern is mostly the performance impact of searching for a suitable inferior for a given program_space (this currently involves a linear search through all inferiors). With a low number of inferiors this shouldn't be an issue, but maybe if we started to deal with large numbers of inferiors we'd need to be smarter? Ideally I think we shouldn't just live with the mismatch, though as a short term stop-gap, maybe we could implement #3? Thoughts? Thanks, Andrew > I just don't understand why > scoped_restore_current_pspace_and_thread exists, it seems like > scoped_restore_current_thread would always work where > scoped_restore_current_pspace_and_thread is used. I must be missing > something, scoped_restore_current_pspace_and_thread must have been added > for a reason. > > Simon
A complete rewrite (and expansion) of this patch after Simon's feedback. The original idea (from v1) is really covered by patches #3 and #5. There's still (maybe) an open question about the use of scoped_restore_pspace_and_thread vs scoped_restore_thread, but I hope I've addressed this question in the v1 thread. Do let me know if there's still any questions on this though. Patches #1 and #2 are target_ops ref-count related cleanups that I spotted while working on this updated series. Patch #4 is also a cleanup, moving some global functions to become member functions on the inferior class. Patch #6 is a completely random addition to add some extra maintenance output, I ended up using this while debugging patch #7, but this change isn't actually required at all. Patch #7 does relate to target_ops, and target_ops sharing, which is kind-of related to reference counting... maybe? I can easily drop this patch from the series if it turns out the idea in here is not taking GDB in the right direction. Thoughts, Thanks, Andrew --- Andrew Burgess (7): gdb/remote: remove some manual reference count handling gdb: remove decref_target gdb: have target_stack automate reference count handling gdb: remove the pop_all_targets (and friends) global functions gdb: ensure all targets are popped before an inferior is destructed gdb/maint: add core file name to 'maint info program-spaces' output gdb: some process_stratum_target should not be shared gdb/NEWS | 11 ++ gdb/corelow.c | 5 + gdb/doc/gdb.texinfo | 45 ++++- gdb/event-top.c | 3 +- gdb/inferior.c | 71 ++++++++ gdb/inferior.h | 26 ++- gdb/progspace.c | 18 +- gdb/remote.c | 40 +++-- gdb/scoped-mock-context.h | 2 +- gdb/target.c | 100 +++++------ gdb/target.h | 33 ++-- gdb/testsuite/gdb.multi/multi-core-files-1.c | 37 +++++ gdb/testsuite/gdb.multi/multi-core-files-2.c | 31 ++++ gdb/testsuite/gdb.multi/multi-core-files.exp | 156 ++++++++++++++++++ .../gdb.python/py-connection-removed.exp | 92 +++++++++++ gdb/top.c | 3 +- 16 files changed, 563 insertions(+), 110 deletions(-) create mode 100644 gdb/testsuite/gdb.multi/multi-core-files-1.c create mode 100644 gdb/testsuite/gdb.multi/multi-core-files-2.c create mode 100644 gdb/testsuite/gdb.multi/multi-core-files.exp create mode 100644 gdb/testsuite/gdb.python/py-connection-removed.exp
diff --git a/gdb/inferior.c b/gdb/inferior.c index 7eb2bd97907..0c0b4b78ff5 100644 --- a/gdb/inferior.c +++ b/gdb/inferior.c @@ -70,6 +70,16 @@ inferior::~inferior () { inferior *inf = this; + /* Before the inferior is deleted, all target_ops should be popped from + the target stack, this leaves just the dummy_target behind. If this + is not done, then any target left in the target stack will be left + with an artificially high reference count. As the dummy_target is + still on the target stack then we are about to loose a reference to + that target, leaving its reference count artificially high. However, + this is not critical as the dummy_target is a singleton. */ + target_ops *ops = m_target_stack.top (); + gdb_assert (ops->stratum () == dummy_stratum); + m_continuations.clear (); target_desc_info_free (inf->tdesc_info); } @@ -191,6 +201,26 @@ delete_inferior (struct inferior *inf) gdb::observers::inferior_removed.notify (inf); + { + /* Limit the change of inferior to an inner scope so that the original + inferior and program space will have been restored by the time that + we delete the inferior INF and (possibly) its related program + space. */ + scoped_restore_current_pspace_and_thread restore_pspace_and_thread; + switch_to_inferior_no_thread (inf); + try + { + pop_all_targets (); + } + catch (const gdb_exception &ex) + { + /* If we get an exception then it's probably too late by this point + to abandon deleting the inferior. Instead, just print the + exception and keep going. */ + exception_print (gdb_stderr, ex); + } + } + /* If this program space is rendered useless, remove it. */ if (inf->pspace->empty ()) delete inf->pspace; diff --git a/gdb/scoped-mock-context.h b/gdb/scoped-mock-context.h index a9895303015..7240da9ef57 100644 --- a/gdb/scoped-mock-context.h +++ b/gdb/scoped-mock-context.h @@ -70,8 +70,11 @@ struct scoped_mock_context ~scoped_mock_context () { + scoped_restore_current_pspace_and_thread restore_pspace_and_thread; + inferior_list.erase (inferior_list.iterator_to (mock_inferior)); - pop_all_targets_at_and_above (process_stratum); + switch_to_inferior_no_thread (&mock_inferior); + pop_all_targets (); } }; diff --git a/gdb/testsuite/gdb.python/py-connection-removed.exp b/gdb/testsuite/gdb.python/py-connection-removed.exp new file mode 100644 index 00000000000..1b139cedc0d --- /dev/null +++ b/gdb/testsuite/gdb.python/py-connection-removed.exp @@ -0,0 +1,92 @@ +# Copyright (C) 2022 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +# Check that the gdb.connect_removed event triggers when we expect it +# too. +# +# Checking this event has wider implications that simply some corner +# of the Python API working or not. The connection_removed event +# triggers when the reference count of a process_stratum_target +# reaches zero. If these events stop triggering when expected then +# GDB might be getting the reference counting on target_ops objects +# wrong. + +load_lib gdb-python.exp + +standard_testfile py-connection.c + +if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile}] } { + return -1 +} + +# Skip all tests if Python scripting is not enabled. +if { [skip_python_tests] } { continue } + +if ![runto_main] then { + return 0 +} + +# Register a callback that will trigger when a connection is removed +# (deleted) within GDB. +gdb_test_multiline "Add connection_removed event" \ + "python" "" \ + "def connection_removed_handler(event):" "" \ + " num = event.connection.num" "" \ + " type = event.connection.type" "" \ + " print(f'Connection {num} ({type}) removed')" "" \ + "gdb.events.connection_removed.connect (connection_removed_handler)" "" \ + "end" "" + +if { [target_info exists gdb_protocol] } { + if { [target_info gdb_protocol] == "extended-remote" } { + set connection_type "extended-remote" + } else { + set connection_type "remote" + } +} else { + set connection_type "native" +} + +# Add an inferior that shares a connection with inferior 1. +gdb_test "add-inferior" "Added inferior 2 on connection 1 \[^\r\n\]+" + +# Add an inferior with no connection. +gdb_test "add-inferior -no-connection" "Added inferior 3" + +# Kill the first inferior. If we are using the plain 'remote' protocol then +# it as this point that the remote target connection is removed. For the +# 'extended-remote' and 'native' targets the connection is removed later. +if { $connection_type == "remote" } { + gdb_test "with confirm off -- kill" \ + "Connection 1 \\(remote\\) removed\r\n.*" "kill inferior" +} else { + gdb_test "with confirm off -- kill" "" "kill inferior" +} + +# Switch to inferior 3 (the one with no connection). +gdb_test "inferior 3" + +# Remove inferior 1, not expecting anything interesting at this point. +gdb_test_no_output "remove-inferiors 1" + +# Now removed inferior 2. For the 'remote' target the connection has +# already been removed (see above), but for 'extended-remote' and 'native' +# targets, it is at this point that the connection is removed. +if { $connection_type == "remote" } { + gdb_test_no_output "remove-inferiors 2" +} else { + gdb_test "remove-inferiors 2" \ + "Connection 1 \\(${connection_type}\\) removed" +}