diff mbox series

gdb/py-inferior: Keep inferior threads in a map

Message ID	20221107184727.2228056-1-lancelot.six@amd.com
State	Superseded
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4E8A13858C2F Received-SPF: None (protection.outlook.com: amd.com does not designate permitted sender hosts) To: <gdb-patches@sourceware.org> CC: <lsix@lancelotsix.com>, Lancelot SIX <lancelot.six@amd.com> Subject: [PATCH] gdb/py-inferior: Keep inferior threads in a map Date: Mon, 7 Nov 2022 18:47:27 +0000 Message-ID: <20221107184727.2228056-1-lancelot.six@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: list From: Lancelot SIX via Gdb-patches <gdb-patches@sourceware.org> Reply-To: Lancelot SIX <lancelot.six@amd.com> Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org>
Series	gdb/py-inferior: Keep inferior threads in a map \| gdb/py-inferior: Keep inferior threads in a map

Commit Message

Lancelot SIX Nov. 7, 2022, 6:47 p.m. UTC

  The python code maintains a list of threads for each inferior.  This
list is implemented as a linked list.  When the number of threads grows
high, this implementation can begin to be a performance bottleneck as
finding a particular thread_object in the list has a complexity of O(N).

We see this in ROCgdb[1], a downstream port of GDB for AMDGUP.  On AMDGPU
devices, the number of threads can get significantly higher than on
usual GDB workloads.

In some situations, we can reach the end of the inferior process with
GDB still having a substantial list of known threads.  While running
target_mourn_inferior, we end up in inferior::clear_thread_list which
iterates over all remaining threads and marks each thread exited.  This
fires the gdb::observers::thread_exit observer and eventually
py-inferior.c:set_thread_exited gets called.  This function searches in
the linked list with poor performances.

This patch proposes to change the linked list that keeps the per
inferior_object list of thread_objects into a std::map.  This allows
to have the search operation complexity be O(log(N)) instead of O(N).

With this patch, we can complete clear_thread_list in about 2.5 seconds
compared to 10 minutes without it, while not seeing a significant
negative impact on the insertion performance (which goes from O(1) to
O(log(N))).

Except for the performance change, no user visible change is expected.

Regression tested on Ubuntu-22.04 x86_64.

[1] https://github.com/ROCm-Developer-Tools/ROCgdb
---
 gdb/python/py-inferior.c | 95 ++++++++++++----------------------------
 1 file changed, 29 insertions(+), 66 deletions(-)


base-commit: 91836f41e209a60a8a836faef2e7889e144df297

Comments

Tom Tromey Nov. 8, 2022, 6:12 p.m. UTC | #1

>>>>> "Lancelot" == Lancelot SIX via Gdb-patches <gdb-patches@sourceware.org> writes:

Lancelot> The python code maintains a list of threads for each inferior.  This
Lancelot> list is implemented as a linked list.  When the number of threads grows
Lancelot> high, this implementation can begin to be a performance bottleneck as
Lancelot> finding a particular thread_object in the list has a complexity of O(N).

Thanks for the patch.

Lancelot> -  delete tmp;
Lancelot> +  auto it = inf_obj->threads->find (tp);
Lancelot> +  if (it != inf_obj->threads->end ())
Lancelot> +    {
Lancelot> +      it->second->thread = nullptr;
Lancelot> +      inf_obj->threads->erase (it);

Doesn't the erase implicitly also drop the 'thread' reference?
That is, I don't think the nullptr assignment is needed here.

Lancelot> -      PyTuple_SET_ITEM (tuple, i, thr);
Lancelot> +      PyTuple_SET_ITEM (tuple, i++, thr);

It's better to do the increment separately, in case PyTuple_SET_ITEM is
a macro that evaluates arguments more than once.

Tom

Lancelot SIX Nov. 8, 2022, 7:47 p.m. UTC | #2

Hi,

Thanks for the review!

On 08/11/2022 18:12, Tom Tromey wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
>>>>>> "Lancelot" == Lancelot SIX via Gdb-patches <gdb-patches@sourceware.org> writes:
> 
> Lancelot> The python code maintains a list of threads for each inferior.  This
> Lancelot> list is implemented as a linked list.  When the number of threads grows
> Lancelot> high, this implementation can begin to be a performance bottleneck as
> Lancelot> finding a particular thread_object in the list has a complexity of O(N).
> 
> Thanks for the patch.
> 
> Lancelot> -  delete tmp;
> Lancelot> +  auto it = inf_obj->threads->find (tp);
> Lancelot> +  if (it != inf_obj->threads->end ())
> Lancelot> +    {
> Lancelot> +      it->second->thread = nullptr;
> Lancelot> +      inf_obj->threads->erase (it);
> 
> Doesn't the erase implicitly also drop the 'thread' reference?
> That is, I don't think the nullptr assignment is needed here.

Yes, the erase does decrease the refcount associated with the object. 
But here "it->second->thread = nullptr" achieves another goal.

*it->second is the thread_object (seen as a gdb.InferiorThread from 
python).  it->second->thread is the link from the python object to the 
corresponding struct thread_info in gdb core.

It is possible for the python code to hold a reference to the 
thread_object when this is called.  As the thread_info object will soon 
be freed, we need to make sure that the link is cleared (otherwise we 
would risk use after free).  This link can be later checked by 
thpy_is_valid (or gdb.InferiorThread.is_valid in python)

I did initially miss this, and this caused a regression in 
gdb.python/py-inferior.exp.

> 
> Lancelot> -      PyTuple_SET_ITEM (tuple, i, thr);
> Lancelot> +      PyTuple_SET_ITEM (tuple, i++, thr);
> 
> It's better to do the increment separately, in case PyTuple_SET_ITEM is
> a macro that evaluates arguments more than once.

Indeed, this is a good point.

I'll change this before for the next revision.

Best,
Lancelot.

> 
> Tom

Tom Tromey Nov. 8, 2022, 9:23 p.m. UTC | #3

Lancelot> Yes, the erase does decrease the refcount associated with the
Lancelot> object. But here "it->second->thread = nullptr" achieves another goal.
[...]

Ok, I see.  Would you mind adding a little comment to that effect?

I think it's fine with these minor changes, you don't need to re-submit it.

Tom

Simon Marchi Nov. 8, 2022, 9:44 p.m. UTC | #4

On 11/7/22 13:47, Lancelot SIX via Gdb-patches wrote:
> diff --git a/gdb/python/py-inferior.c b/gdb/python/py-inferior.c
> index 8847a6d9308..9fd5f30fcdb 100644
> --- a/gdb/python/py-inferior.c
> +++ b/gdb/python/py-inferior.c
> @@ -30,17 +30,9 @@
>  #include "gdbsupport/gdb_signals.h"
>  #include "py-event.h"
>  #include "py-stopevent.h"
> +#include <map>
>  
> -struct threadlist_entry
> -{
> -  threadlist_entry (gdbpy_ref<thread_object> &&ref)
> -    : thread_obj (std::move (ref))
> -  {
> -  }
> -
> -  gdbpy_ref<thread_object> thread_obj;
> -  struct threadlist_entry *next;
> -};
> +using thread_map_t = std::map<thread_info *, gdbpy_ref<thread_object>>;

You probably want to use unordered_map, which is preferable when we
don't care about key order.

Every time someone introduces a use of a map, there's the topic of "but
hashtab is faster than std::unordered_map" that comes but, and the fact
that gcc has some C++ bindings for it, and that we should import and use
it, but that it was never done because its "empty" method confusingly
empties the map, rather than returning whether the map is empty.  But
still, I think that std::unordered_map is preferable over std::map.

I haven't checked the rest of the patch, since Tom already has.

Simon

Lancelot SIX Nov. 9, 2022, 2:18 p.m. UTC | #5

On 08/11/2022 21:44, Simon Marchi wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On 11/7/22 13:47, Lancelot SIX via Gdb-patches wrote:
>> diff --git a/gdb/python/py-inferior.c b/gdb/python/py-inferior.c
>> index 8847a6d9308..9fd5f30fcdb 100644
>> --- a/gdb/python/py-inferior.c
>> +++ b/gdb/python/py-inferior.c
>> @@ -30,17 +30,9 @@
>>   #include "gdbsupport/gdb_signals.h"
>>   #include "py-event.h"
>>   #include "py-stopevent.h"
>> +#include <map>
>>
>> -struct threadlist_entry
>> -{
>> -  threadlist_entry (gdbpy_ref<thread_object> &&ref)
>> -    : thread_obj (std::move (ref))
>> -  {
>> -  }
>> -
>> -  gdbpy_ref<thread_object> thread_obj;
>> -  struct threadlist_entry *next;
>> -};
>> +using thread_map_t = std::map<thread_info *, gdbpy_ref<thread_object>>;
> 
> You probably want to use unordered_map, which is preferable when we
> don't care about key order.

Will do in the V2!

Thanks,
Lancelot.

> 
> Every time someone introduces a use of a map, there's the topic of "but
> hashtab is faster than std::unordered_map" that comes but, and the fact
> that gcc has some C++ bindings for it, and that we should import and use
> it, but that it was never done because its "empty" method confusingly
> empties the map, rather than returning whether the map is empty.  But
> still, I think that std::unordered_map is preferable over std::map.
> 
> I haven't checked the rest of the patch, since Tom already has.
> 
> Simon

diff mbox series

Patch

diff --git a/gdb/python/py-inferior.c b/gdb/python/py-inferior.c
index 8847a6d9308..9fd5f30fcdb 100644
--- a/gdb/python/py-inferior.c
+++ b/gdb/python/py-inferior.c
@@ -30,17 +30,9 @@ 
 #include "gdbsupport/gdb_signals.h"
 #include "py-event.h"
 #include "py-stopevent.h"
+#include <map>
 
-struct threadlist_entry
-{
-  threadlist_entry (gdbpy_ref<thread_object> &&ref)
-    : thread_obj (std::move (ref))
-  {
-  }
-
-  gdbpy_ref<thread_object> thread_obj;
-  struct threadlist_entry *next;
-};
+using thread_map_t = std::map<thread_info *, gdbpy_ref<thread_object>>;
 
 struct inferior_object
 {
@@ -49,12 +41,9 @@  struct inferior_object
   /* The inferior we represent.  */
   struct inferior *inferior;
 
-  /* thread_object instances under this inferior.  This list owns a
+  /* thread_object instances under this inferior.  This owns a
      reference to each object it contains.  */
-  struct threadlist_entry *threads;
-
-  /* Number of threads in the list.  */
-  int nthreads;
+  thread_map_t *threads;
 };
 
 extern PyTypeObject inferior_object_type
@@ -65,8 +54,6 @@  struct infpy_deleter
 {
   void operator() (inferior_object *obj)
   {
-    struct threadlist_entry *th_entry, *th_tmp;
-
     if (!gdb_python_initialized)
       return;
 
@@ -75,15 +62,7 @@  struct infpy_deleter
 
     inf_obj->inferior = NULL;
 
-    /* Deallocate threads list.  */
-    for (th_entry = inf_obj->threads; th_entry != NULL;)
-      {
-	th_tmp = th_entry;
-	th_entry = th_entry->next;
-	delete th_tmp;
-      }
-
-    inf_obj->nthreads = 0;
+    delete inf_obj->threads;
   }
 };
 
@@ -257,8 +236,7 @@  inferior_to_inferior_object (struct inferior *inferior)
 	return NULL;
 
       inf_obj->inferior = inferior;
-      inf_obj->threads = NULL;
-      inf_obj->nthreads = 0;
+      inf_obj->threads = new thread_map_t ();
 
       /* PyObject_New initializes the new object with a refcount of 1.  This
 	 counts for the reference we are keeping in the inferior data.  */
@@ -333,11 +311,10 @@  thread_to_thread_object (thread_info *thr)
   if (inf_obj == NULL)
     return NULL;
 
-  for (threadlist_entry *thread = inf_obj->threads;
-       thread != NULL;
-       thread = thread->next)
-    if (thread->thread_obj->thread == thr)
-      return gdbpy_ref<>::new_reference ((PyObject *) thread->thread_obj.get ());
+  auto thread_it = inf_obj->threads->find (thr);
+  if (thread_it != inf_obj->threads->end ())
+    return gdbpy_ref<>::new_reference
+      ((PyObject *) (thread_it->second.get ()));
 
   PyErr_SetString (PyExc_SystemError,
 		   _("could not find gdb thread object"));
@@ -348,7 +325,6 @@  static void
 add_thread_object (struct thread_info *tp)
 {
   inferior_object *inf_obj;
-  struct threadlist_entry *entry;
 
   if (!gdb_python_initialized)
     return;
@@ -364,18 +340,19 @@  add_thread_object (struct thread_info *tp)
 
   inf_obj = (inferior_object *) thread_obj->inf_obj;
 
-  entry = new threadlist_entry (std::move (thread_obj));
-  entry->next = inf_obj->threads;
+  auto ins_result = inf_obj->threads->emplace
+    (thread_map_t::value_type (tp, std::move (thread_obj)));
 
-  inf_obj->threads = entry;
-  inf_obj->nthreads++;
+  if (!ins_result.second)
+    return;
 
   if (evregpy_no_listeners_p (gdb_py_events.new_thread))
     return;
 
-  gdbpy_ref<> event = create_thread_event_object (&new_thread_event_object_type,
-						  (PyObject *)
-						  entry->thread_obj.get ());
+  gdbpy_ref<> event = create_thread_event_object
+    (&new_thread_event_object_type,
+     (PyObject *) ins_result.first->second.get ());
+
   if (event == NULL
       || evpy_emit_event (event.get (), gdb_py_events.new_thread) < 0)
     gdbpy_print_stack ();
@@ -384,8 +361,6 @@  add_thread_object (struct thread_info *tp)
 static void
 delete_thread_object (struct thread_info *tp, int ignore)
 {
-  struct threadlist_entry **entry, *tmp;
-
   if (!gdb_python_initialized)
     return;
 
@@ -395,29 +370,18 @@  delete_thread_object (struct thread_info *tp, int ignore)
   if (inf_obj == NULL)
     return;
 
-  /* Find thread entry in its inferior's thread_list.  */
-  for (entry = &inf_obj->threads; *entry != NULL; entry =
-	 &(*entry)->next)
-    if ((*entry)->thread_obj->thread == tp)
-      break;
-
-  if (!*entry)
-    return;
-
-  tmp = *entry;
-  tmp->thread_obj->thread = NULL;
-
-  *entry = (*entry)->next;
-  inf_obj->nthreads--;
-
-  delete tmp;
+  auto it = inf_obj->threads->find (tp);
+  if (it != inf_obj->threads->end ())
+    {
+      it->second->thread = nullptr;
+      inf_obj->threads->erase (it);
+    }
 }
 
 static PyObject *
 infpy_threads (PyObject *self, PyObject *args)
 {
-  int i;
-  struct threadlist_entry *entry;
+  int i = 0;
   inferior_object *inf_obj = (inferior_object *) self;
   PyObject *tuple;
 
@@ -432,16 +396,15 @@  infpy_threads (PyObject *self, PyObject *args)
       GDB_PY_HANDLE_EXCEPTION (except);
     }
 
-  tuple = PyTuple_New (inf_obj->nthreads);
+  tuple = PyTuple_New (inf_obj->threads->size ());
   if (!tuple)
     return NULL;
 
-  for (i = 0, entry = inf_obj->threads; i < inf_obj->nthreads;
-       i++, entry = entry->next)
+  for (const thread_map_t::value_type &entry : *inf_obj->threads)
     {
-      PyObject *thr = (PyObject *) entry->thread_obj.get ();
+      PyObject *thr = (PyObject *) entry.second.get ();
       Py_INCREF (thr);
-      PyTuple_SET_ITEM (tuple, i, thr);
+      PyTuple_SET_ITEM (tuple, i++, thr);
     }
 
   return tuple;