[1/2] gdb: Convert completion tracker to use std types

Message ID c9cf5478c7c3d4fc0af6f484d8acd007fd7bb8b2.1577481993.git.andrew.burgess@embecosm.com
State New, archived
Headers

Commit Message

Andrew Burgess Dec. 27, 2019, 9:32 p.m. UTC
  This commit converts the completion_tracker class to use std types,
while maintaining the existing functionality.

As part of this commit I have restructured how the completion tracker
works a little, the lowest common denominator is now calculated lazily
when we need it.  The reason for this is not performance, but instead
to prepare the way for the next commit which will require the
completion track to support removing possible completions.  If we
calculate the lowest common denominator as we add completions then
removing completions becomes much harder.

With the move to std types there is some ugliness relating to how
object ownership is managed that needs to be worked with.  The
completions are delivered to the completion_tracker as managed strings
within a gdb::unique_xmalloc_ptr<char>, and in an ideal world we would
use these objects as the keys and values within the hash.

The problem is that when we try to remove the items from the hash in
order to pass them to readline; I don't believe there is any easy way
to get the gdb::unique_xmalloc_ptr<char> out of the hash without
having it delete the string it's managing without using C++17 language
features, specifically the std::unordered_map::extract method.

For now then I'm holding the raw 'char *' within the unordered_map,
and rely on the completion_tracker object to delete the data if
needed, or to ensure that the data is passed over to readline, which
will do the deletion for us.

One further ugliness of the new code is that the new unordered_map
actually holds 'const char *', this will make the next commit slightly
easier, but does mean some unfortunate casting in this commit.

There should be no user visible changes after this commit.

gdb/ChangeLog:

	* completer.c (completion_tracker::completes_to_completion_word):
	Update for changes in how lowest common denominator is managed.
	(completion_tracker::completion_tracker): Now does nothing.
	(completion_tracker::discard_completions): Update for changes to
	lowest common denominator, and to m_entries_hash.
	(completion_tracker::maybe_add_completion): Likewise.
	(completion_tracker::~completion_tracker): Call
	discard_completions.
	(completion_tracker::recompute_lowest_common_denominator): Update
	for changes to lowest common denominator, and to m_entries_hash.
	(completion_tracker::build_completion_result): Likewise.
	* completer.h: Add 'unordered_map' include.
	(completion_tracker::have_completions): Use m_entries_hash.
	(completion_tracker::completion_set): New typedef.
	(completion_tracker::recompute_lowest_common_denominator): Update
	signature.
	(completion_tracker::m_entries_vec): Delete.
	(completion_tracker::m_entries_hash): Change type.
	(completion_tracker::m_lowest_common_denominator): Change type.
	(completion_tracker::m_lowest_common_denominator_valid): New
	member variable.
	(completion_tracker::m_lowest_common_denominator_max_length): New
	member variable.

Change-Id: Iac5166189103d84bac0a4181324f38ca14227c47
---
 gdb/ChangeLog   |  26 ++++++++++++
 gdb/completer.c | 129 ++++++++++++++++++++++++++++++++------------------------
 gdb/completer.h |  58 ++++++++++++++++---------
 3 files changed, 138 insertions(+), 75 deletions(-)
  

Comments

Tom Tromey Jan. 24, 2020, 6:54 p.m. UTC | #1
>>>>> "Andrew" == Andrew Burgess <andrew.burgess@embecosm.com> writes:

Andrew> For now then I'm holding the raw 'char *' within the unordered_map,
Andrew> and rely on the completion_tracker object to delete the data if
Andrew> needed, or to ensure that the data is passed over to readline, which
Andrew> will do the deletion for us.

Seems reasonable enough.

Andrew> +  for (const auto &p : m_entries_hash)
Andrew> +    {
Andrew> +      xfree ((char *) p.first);
Andrew> +      xfree ((char *) p.second);

Maybe const_cast would be better?

Andrew> +  /* A map of completions.  The key is the completion, and the value is the
Andrew> +     string to be used to compute the lowest common denominator.  Both the
Andrew> +     key and the value are owned by the completion_tracker class while
Andrew> +     being held in this map, as such completion_tracker must free these
Andrew> +     strings when finished with them, or pass ownership to someone else who
Andrew> +     will free them.  */
Andrew> +  typedef std::unordered_map<const char *, const char *, std::hash<std::string>,
Andrew> +			     std::equal_to<std::string>> completion_set;

Does using std::string here mean that a temporary std::string will be
made for each insertion in the map?  And also for comparisons?
That seems expensive if so.

Tom
  
Terekhov, Mikhail via Gdb-patches Jan. 24, 2020, 7:17 p.m. UTC | #2
On Fri, Jan 24, 2020, 20:14 Tom Tromey <tom@tromey.com> wrote:

> >>>>> "Andrew" == Andrew Burgess <andrew.burgess@embecosm.com> writes:
>
> Andrew> For now then I'm holding the raw 'char *' within the unordered_map,
> Andrew> and rely on the completion_tracker object to delete the data if
> Andrew> needed, or to ensure that the data is passed over to readline,
> which
> Andrew> will do the deletion for us.
>
> Seems reasonable enough.
>
> Andrew> +  for (const auto &p : m_entries_hash)
> Andrew> +    {
> Andrew> +      xfree ((char *) p.first);
> Andrew> +      xfree ((char *) p.second);
>
> Maybe const_cast would be better?
>
> Andrew> +  /* A map of completions.  The key is the completion, and the
> value is the
> Andrew> +     string to be used to compute the lowest common denominator.
> Both the
> Andrew> +     key and the value are owned by the completion_tracker class
> while
> Andrew> +     being held in this map, as such completion_tracker must free
> these
> Andrew> +     strings when finished with them, or pass ownership to
> someone else who
> Andrew> +     will free them.  */
> Andrew> +  typedef std::unordered_map<const char *, const char *,
> std::hash<std::string>,
> Andrew> +                            std::equal_to<std::string>>
> completion_set;
>
> Does using std::string here mean that a temporary std::string will be
> made for each insertion in the map?  And also for comparisons?
> That seems expensive if so.
>

Perhaps that's finally a use case to add hash<gdb::string_view>!


> Tom
>
  
Tom Tromey Jan. 26, 2020, 4 p.m. UTC | #3
>>>>> "Christian" == Christian Biesinger <cbiesinger@google.com> writes:

Christian> Perhaps that's finally a use case to add hash<gdb::string_view>!

I also wonder what will be the thing that will make us finally bring in
gcc's hash table.

Tom
  

Patch

diff --git a/gdb/completer.c b/gdb/completer.c
index 6658da6d7fb..93df1018f66 100644
--- a/gdb/completer.c
+++ b/gdb/completer.c
@@ -407,9 +407,10 @@  advance_to_filename_complete_word_point (completion_tracker &tracker,
 bool
 completion_tracker::completes_to_completion_word (const char *word)
 {
+  recompute_lowest_common_denominator ();
   if (m_lowest_common_denominator_unique)
     {
-      const char *lcd = m_lowest_common_denominator;
+      const char *lcd = m_lowest_common_denominator.get ();
 
       if (strncmp_iw (word, lcd, strlen (lcd)) == 0)
 	{
@@ -1512,9 +1513,7 @@  int max_completions = 200;
 
 completion_tracker::completion_tracker ()
 {
-  m_entries_hash = htab_create_alloc (INITIAL_COMPLETION_HTAB_SIZE,
-				      htab_hash_string, streq_hash,
-				      NULL, xcalloc, xfree);
+  /* Nothing.  */
 }
 
 /* See completer.h.  */
@@ -1522,25 +1521,23 @@  completion_tracker::completion_tracker ()
 void
 completion_tracker::discard_completions ()
 {
-  xfree (m_lowest_common_denominator);
   m_lowest_common_denominator = NULL;
-
   m_lowest_common_denominator_unique = false;
+  m_lowest_common_denominator_valid = false;
 
-  m_entries_vec.clear ();
-
-  htab_delete (m_entries_hash);
-  m_entries_hash = htab_create_alloc (INITIAL_COMPLETION_HTAB_SIZE,
-				      htab_hash_string, streq_hash,
-				      NULL, xcalloc, xfree);
+  for (const auto &p : m_entries_hash)
+    {
+      xfree ((char *) p.first);
+      xfree ((char *) p.second);
+    }
+  m_entries_hash.clear ();
 }
 
 /* See completer.h.  */
 
 completion_tracker::~completion_tracker ()
 {
-  xfree (m_lowest_common_denominator);
-  htab_delete (m_entries_hash);
+  discard_completions ();
 }
 
 /* See completer.h.  */
@@ -1551,17 +1548,15 @@  completion_tracker::maybe_add_completion
    completion_match_for_lcd *match_for_lcd,
    const char *text, const char *word)
 {
-  void **slot;
-
   if (max_completions == 0)
     return false;
 
-  if (htab_elements (m_entries_hash) >= max_completions)
+  if (m_entries_hash.size () >= max_completions)
     return false;
 
-  slot = htab_find_slot (m_entries_hash, name.get (), INSERT);
-  if (*slot == HTAB_EMPTY_ENTRY)
+  if (m_entries_hash.find (name.get ()) == m_entries_hash.end ())
     {
+      /* Not already in the hash.  Add it.  */
       const char *match_for_lcd_str = NULL;
 
       if (match_for_lcd != NULL)
@@ -1573,10 +1568,13 @@  completion_tracker::maybe_add_completion
       gdb::unique_xmalloc_ptr<char> lcd
 	= make_completion_match_str (match_for_lcd_str, text, word);
 
-      recompute_lowest_common_denominator (std::move (lcd));
+      size_t lcd_len = strlen (lcd.get ());
+      auto p = std::make_pair (name.release (), lcd.release ());
+      m_entries_hash.insert (std::move (p));
 
-      *slot = name.get ();
-      m_entries_vec.push_back (std::move (name));
+      m_lowest_common_denominator_valid = false;
+      m_lowest_common_denominator_max_length
+	= std::max (m_lowest_common_denominator_max_length, lcd_len);
     }
 
   return true;
@@ -1982,35 +1980,47 @@  completion_find_completion_word (completion_tracker &tracker, const char *text,
 /* See completer.h.  */
 
 void
-completion_tracker::recompute_lowest_common_denominator
-  (gdb::unique_xmalloc_ptr<char> &&new_match_up)
+completion_tracker::recompute_lowest_common_denominator ()
 {
-  if (m_lowest_common_denominator == NULL)
-    {
-      /* We don't have a lowest common denominator yet, so simply take
-	 the whole NEW_MATCH_UP as being it.  */
-      m_lowest_common_denominator = new_match_up.release ();
-      m_lowest_common_denominator_unique = true;
-    }
-  else
-    {
-      /* Find the common denominator between the currently-known
-	 lowest common denominator and NEW_MATCH_UP.  That becomes the
-	 new lowest common denominator.  */
-      size_t i;
-      const char *new_match = new_match_up.get ();
+  /* We've already done this.  */
+  if (m_lowest_common_denominator_valid)
+    return;
 
-      for (i = 0;
-	   (new_match[i] != '\0'
-	    && new_match[i] == m_lowest_common_denominator[i]);
-	   i++)
-	;
-      if (m_lowest_common_denominator[i] != new_match[i])
+  /* Resize the storage to ensure we have enough space, the plus one gives
+     us space for the trailing null terminator we will include.  */
+  char *tmp = (char *) xrealloc (m_lowest_common_denominator.release (),
+				 m_lowest_common_denominator_max_length + 1);
+  m_lowest_common_denominator.reset (tmp);
+
+  for (const auto &p : m_entries_hash)
+    {
+      if (!m_lowest_common_denominator_valid)
+	{
+	  strcpy (m_lowest_common_denominator.get (), p.second);
+	  m_lowest_common_denominator_unique = true;
+	  m_lowest_common_denominator_valid = true;
+	}
+      else
 	{
-	  m_lowest_common_denominator[i] = '\0';
-	  m_lowest_common_denominator_unique = false;
+	  /* Find the common denominator between the currently-known
+	     lowest common denominator and NEW_MATCH_UP.  That becomes the
+	     new lowest common denominator.  */
+	  size_t i;
+	  const char *new_match = p.second;
+
+	  for (i = 0;
+	       (new_match[i] != '\0'
+		&& new_match[i] == m_lowest_common_denominator.get ()[i]);
+	       i++)
+	    ;
+	  if (m_lowest_common_denominator.get ()[i] != new_match[i])
+	    {
+	      m_lowest_common_denominator.get ()[i] = '\0';
+	      m_lowest_common_denominator_unique = false;
+	    }
 	}
     }
+  m_lowest_common_denominator_valid = true;
 }
 
 /* See completer.h.  */
@@ -2092,19 +2102,17 @@  completion_result
 completion_tracker::build_completion_result (const char *text,
 					     int start, int end)
 {
-  completion_list &list = m_entries_vec;	/* The completions.  */
-
-  if (list.empty ())
+  if (m_entries_hash.empty ())
     return {};
 
   /* +1 for the LCD, and +1 for NULL termination.  */
-  char **match_list = XNEWVEC (char *, 1 + list.size () + 1);
+  char **match_list = XNEWVEC (char *, 1 + m_entries_hash.size () + 1);
 
   /* Build replacement word, based on the LCD.  */
-
+  recompute_lowest_common_denominator ();
   match_list[0]
     = expand_preserving_ws (text, end - start,
-			    m_lowest_common_denominator);
+			    m_lowest_common_denominator.get ());
 
   if (m_lowest_common_denominator_unique)
     {
@@ -2128,13 +2136,22 @@  completion_tracker::build_completion_result (const char *text,
     }
   else
     {
-      int ix;
+      int ix = 0;
 
-      for (ix = 0; ix < list.size (); ++ix)
-	match_list[ix + 1] = list[ix].release ();
+      for (const auto &ptr : m_entries_hash)
+	{
+	  match_list[ix + 1] = (char *) ptr.first;
+	  xfree ((char *) ptr.second);
+	  ix++;
+	}
       match_list[ix + 1] = NULL;
 
-      return completion_result (match_list, list.size (), false);
+      /* We must clear the hash here as ownership of all the entries is
+	 being passed over to readline, and we don't want to try and free
+	 these pointers when we ourselves are destroyed.  */
+      size_t hash_size = m_entries_hash.size ();
+      m_entries_hash.clear ();
+      return completion_result (match_list, hash_size, false);
     }
 }
 
diff --git a/gdb/completer.h b/gdb/completer.h
index 313010fce22..fd4aba79746 100644
--- a/gdb/completer.h
+++ b/gdb/completer.h
@@ -20,6 +20,8 @@ 
 #include "gdbsupport/gdb_vecs.h"
 #include "command.h"
 
+#include <unordered_map>
+
 /* Types of functions in struct match_list_displayer.  */
 
 struct match_list_displayer;
@@ -389,7 +391,7 @@  public:
 
   /* True if we have any completion match recorded.  */
   bool have_completions () const
-  { return !m_entries_vec.empty (); }
+  { return !m_entries_hash.empty (); }
 
   /* Discard the current completion match list and the current
      LCD.  */
@@ -403,6 +405,15 @@  public:
 
 private:
 
+  /* A map of completions.  The key is the completion, and the value is the
+     string to be used to compute the lowest common denominator.  Both the
+     key and the value are owned by the completion_tracker class while
+     being held in this map, as such completion_tracker must free these
+     strings when finished with them, or pass ownership to someone else who
+     will free them.  */
+  typedef std::unordered_map<const char *, const char *, std::hash<std::string>,
+			     std::equal_to<std::string>> completion_set;
+
   /* Add the completion NAME to the list of generated completions if
      it is not there already.  If false is returned, too many
      completions were found.  */
@@ -410,18 +421,11 @@  private:
 			     completion_match_for_lcd *match_for_lcd,
 			     const char *text, const char *word);
 
-  /* Given a new match, recompute the lowest common denominator (LCD)
-     to hand over to readline.  Normally readline computes this itself
-     based on the whole set of completion matches.  However, some
-     completers want to override readline, in order to be able to
-     provide a LCD that is not really a prefix of the matches, but the
-     lowest common denominator of some relevant substring of each
-     match.  E.g., "b push_ba" completes to
-     "std::vector<..>::push_back", "std::string::push_back", etc., and
-     in this case we want the lowest common denominator to be
-     "push_back" instead of "std::".  */
-  void recompute_lowest_common_denominator
-    (gdb::unique_xmalloc_ptr<char> &&new_match);
+  /* Ensure that the lowest common denominator held in the member variable
+     M_LOWEST_COMMON_DENOMINATOR is valid.  This method must be called if
+     there is any chance that new completions have been added to the
+     tracker before the lowest common denominator is read.  */
+  void recompute_lowest_common_denominator ();
 
   /* Completion match outputs returned by the symbol name matching
      routines (see symbol_name_matcher_ftype).  These results are only
@@ -430,16 +434,13 @@  private:
      symbol name matching routines.  */
   completion_match_result m_completion_match_result;
 
-  /* The completion matches found so far, in a vector.  */
-  completion_list m_entries_vec;
-
   /* The completion matches found so far, in a hash table, for
      duplicate elimination as entries are added.  Otherwise the user
      is left scratching his/her head: readline and complete_command
      will remove duplicates, and if removal of duplicates there brings
      the total under max_completions the user may think gdb quit
      searching too early.  */
-  htab_t m_entries_hash;
+  completion_set m_entries_hash;
 
   /* If non-zero, then this is the quote char that needs to be
      appended after completion (iff we have a unique completion).  We
@@ -472,8 +473,16 @@  private:
   bool m_suppress_append_ws = false;
 
   /* Our idea of lowest common denominator to hand over to readline.
-     See intro.  */
-  char *m_lowest_common_denominator = NULL;
+     Normally readline computes this itself based on the whole set of
+     completion matches.  However, some completers want to override
+     readline, in order to be able to provide a LCD that is not really a
+     prefix of the matches, but the lowest common denominator of some
+     relevant substring of each match.
+
+     E.g., "b push_ba" completes to "std::vector<..>::push_back",
+     "std::string::push_back", etc., and in this case we want the lowest
+     common denominator to be "push_back" instead of "std::".  */
+  gdb::unique_xmalloc_ptr<char> m_lowest_common_denominator = nullptr;
 
   /* If true, the LCD is unique.  I.e., all completions had the same
      MATCH_FOR_LCD substring, even if the completions were different.
@@ -483,6 +492,17 @@  private:
      "function()", instead of showing all the possible
      completions.  */
   bool m_lowest_common_denominator_unique = false;
+
+  /* True if the value in M_LOWEST_COMMON_DENOMINATOR is correct.  This is
+     set to true each time RECOMPUTE_LOWEST_COMMON_DENOMINATOR is called,
+     and reset to false whenever a new completion is added.  */
+  bool m_lowest_common_denominator_valid = false;
+
+  /* We can avoid multiple calls to xrealloc in
+     RECOMPUTE_LOWEST_COMMON_DENOMINATOR, we track the maximum possible
+     size of lowest common denominator, which we know as each completion is
+     added.  */
+  size_t m_lowest_common_denominator_max_length = 0;
 };
 
 /* Return a string to hand off to readline as a completion match