[2/9] Introduce and use language_set

Message ID 20190307205709.21919-3-tom@tromey.com
State New, archived
Headers

Commit Message

Tom Tromey March 7, 2019, 8:57 p.m. UTC
  I noticed that objfile_per_bfd_storage::demangled_hash_languages is a
std::vector, which seemed quite large for something that,
fundamentally, can be represented as a bit-set.  This patch
reimplements it as a bit-set.

gdb/ChangeLog
2019-03-07  Tom Tromey  <tom@tromey.com>

	* objfiles.h (struct objfile_per_bfd_storage)
	<demangled_hash_languages>: Now a language_set.
	* minsyms.c (add_minsym_to_demangled_hash_table): Update.
	* language.h (class language_set): New.
---
 gdb/ChangeLog  |  7 +++++
 gdb/language.h | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++
 gdb/minsyms.c  |  7 ++---
 gdb/objfiles.h |  6 ++--
 4 files changed, 95 insertions(+), 9 deletions(-)
  

Comments

John Baldwin March 7, 2019, 11:04 p.m. UTC | #1
On 3/7/19 12:57 PM, Tom Tromey wrote:
> I noticed that objfile_per_bfd_storage::demangled_hash_languages is a
> std::vector, which seemed quite large for something that,
> fundamentally, can be represented as a bit-set.  This patch
> reimplements it as a bit-set.

Hmm, did you consider using std::bitset<nr_languages> for langauge_set?  You'd
still have to write your own iterator class, so it wouldn't save much in terms
of lines of code.  It just seems a bit odd to write a bitset class from scratch?
  
Tom Tromey March 8, 2019, 10:20 p.m. UTC | #2
>>>>> "John" == John Baldwin <jhb@FreeBSD.org> writes:

John> On 3/7/19 12:57 PM, Tom Tromey wrote:
>> I noticed that objfile_per_bfd_storage::demangled_hash_languages is a
>> std::vector, which seemed quite large for something that,
>> fundamentally, can be represented as a bit-set.  This patch
>> reimplements it as a bit-set.

John> Hmm, did you consider using std::bitset<nr_languages> for langauge_set?  You'd
John> still have to write your own iterator class, so it wouldn't save much in terms
John> of lines of code.  It just seems a bit odd to write a bitset class from scratch?

I didn't really consider it.

I looked now and on my machine, std::bitset<nr_languages> is 8 bytes,
whereas langauge_set is just 4.  So, with the custom implementation, I
could pack it into a hole in objfile_per_bfd_storage.

That said, due to the minsym arrays, objfile_per_bfd_storage is
enormous, so saving 4 bytes probably isn't important.  So, I'll redo
this.

Tom
  

Patch

diff --git a/gdb/language.h b/gdb/language.h
index d56ec200208..d57eee96a12 100644
--- a/gdb/language.h
+++ b/gdb/language.h
@@ -747,4 +747,88 @@  private:
   enum language m_lang;
 };
 
+/* A set object specialized for keeping track of languages.  */
+
+class language_set
+{
+public:
+
+  language_set ()
+  {
+  }
+
+  /* Insert LANG into this set.  */
+  void insert (enum language lang)
+  {
+    m_mask |= 1 << lang;
+  }
+
+  /* The underlying type used to represent the languages.  */
+  typedef unsigned mask_type;
+
+  /* An iterator that iterates over languages contained in a language
+     set.  */
+  class iterator
+  {
+  public:
+
+    iterator (unsigned int n, mask_type mask)
+      : m_index (n),
+	m_mask (mask)
+    {
+      gdb_assert (m_index <= nr_languages);
+      update ();
+    }
+
+    enum language operator* () const
+    {
+      return (enum language) m_index;
+    }
+
+    iterator &operator++ ()
+    {
+      ++m_index;
+      update ();
+      return *this;
+    }
+
+    bool operator!= (const iterator &other) const
+    {
+      return m_index != other.m_index;
+    }
+
+  private:
+
+    /* A helper function that will advance M_INDEX to the next
+       language in the set.  */
+    void update ()
+    {
+      while (m_index < nr_languages && (m_mask & (1 << m_index)) == 0)
+	++m_index;
+    }
+
+    /* The current index of the iterator.  */
+    unsigned int m_index;
+    /* The mask from the underlying set.  */
+    mask_type m_mask;
+  };
+
+  iterator begin () const
+  {
+    return iterator (0, m_mask);
+  }
+
+  iterator end () const
+  {
+    return iterator (nr_languages, m_mask);
+  }
+
+private:
+
+  /* A bit mask of languages which are included in this set.  */
+  mask_type m_mask = 0;
+
+  gdb_static_assert (nr_languages <= (sizeof (mask_type) * CHAR_BIT));
+};
+
 #endif /* defined (LANGUAGE_H) */
diff --git a/gdb/minsyms.c b/gdb/minsyms.c
index 0513cfe69f4..6395cc4ccab 100644
--- a/gdb/minsyms.c
+++ b/gdb/minsyms.c
@@ -160,11 +160,8 @@  add_minsym_to_demangled_hash_table (struct minimal_symbol *sym,
       unsigned int hash = search_name_hash (MSYMBOL_LANGUAGE (sym),
 					    MSYMBOL_SEARCH_NAME (sym));
 
-      auto &vec = objfile->per_bfd->demangled_hash_languages;
-      auto it = std::lower_bound (vec.begin (), vec.end (),
-				  MSYMBOL_LANGUAGE (sym));
-      if (it == vec.end () || *it != MSYMBOL_LANGUAGE (sym))
-	vec.insert (it, MSYMBOL_LANGUAGE (sym));
+      objfile->per_bfd->demangled_hash_languages.insert
+	(MSYMBOL_LANGUAGE (sym));
 
       struct minimal_symbol **table
 	= objfile->per_bfd->msymbol_demangled_hash;
diff --git a/gdb/objfiles.h b/gdb/objfiles.h
index c5ce9eec955..843c44e1d24 100644
--- a/gdb/objfiles.h
+++ b/gdb/objfiles.h
@@ -313,10 +313,8 @@  struct objfile_per_bfd_storage
   minimal_symbol *msymbol_demangled_hash[MINIMAL_SYMBOL_HASH_SIZE] {};
 
   /* All the different languages of symbols found in the demangled
-     hash table.  A flat/vector-based map is more efficient than a map
-     or hash table here, since this will only usually contain zero or
-     one entries.  */
-  std::vector<enum language> demangled_hash_languages;
+     hash table.  */
+  language_set demangled_hash_languages;
 };
 
 /* Master structure for keeping track of each file from which