[applied] Detect abixml canonical type instability during abidw --debug-abidiff

Message ID 87r1hv3y1s.fsf@redhat.com
State Committed
Headers
Series [applied] Detect abixml canonical type instability during abidw --debug-abidiff |

Commit Message

Dodji Seketeli May 25, 2021, 10:30 a.m. UTC
  Hello,

In the debugging mode of self comparison induced by the invocation of
"abidw --debug-abidiff <binary>", it's useful to be able to ensure the
following invariant:

    The pointer value of the canonical type of a type T that is
    serialized into abixml with the id string "type-id-12" (for
    instance) must keep the same canonical type pointer value when
    that abixml file is de-serialized back into memory.  This is
    possible mainly because libabigail stays loaded in memory all the
    time during both serialization and de-serialization.

This patch adds support for detecting when that invariant is not
respected.

In other words it detects when the type build from de-serializing the
type which id is "type-id-12" (for instance) has a canonical type
which pointer value is different from the pointer value of the
canonical type (of the type) that was serialized as having the type id
"type-id-12".

This is done in three phases.

The first phase happens in the code of abidw itself; after the abixml
is written on disk, another file called the "typeid file" is written
on disk as well.  That later file contains a set of records; each
record associates a "type id string" (like the type IDs that appear in
the abixml file) to the pointer value of the canonical type that
matches that type id string.  That file is thus now available for
manual inspection during a later debugger session.  This is done by
invoking the new function write_canonical_type_ids.

The second phase appears right before abixml loading time.  The typeid
file is read back and the association "type-id string" <-> is stored
in a hash map that is returned by
environment::get_type_id_canonical_type_map().  This is done by
invoking the new function load_canonical_type_ids.

The third phase happens right after the canonicalization (triggered in
the abixml reader) of a type coming from abixml, corresponding to a
given type id.  It checks if the pointer value of the canonicalization
type just computed is the same as the one associated to that type id
in the map returned by environment::get_type_id_canonical_type_map.

This is a way of verifying the "stability" of a canonical type during
its serialization and de-serialization to and from abixml and it's
done as part of "abidw --debug-abidiff <binary>".

Just as an example, here is the kind of error output that I am getting
on a real life debugging session on a binary that exhibits self
comparison error:

    $ abidw  --debug-abidiff -d <some-binary>
    error: problem detected with type 'typedef Vmalloc_t' from second corpus
    error: canonical type for type 'typedef Vmalloc_t' of type-id 'type-id-179' changed from '1a083e8' to '21369b8'
    [...]
    $

From this output, I see that the first type for which libabigail
exhibits an instability on the pointer value of the canonical type is
the type 'typedef Vmalloc_t'.  In other words, when that type is saved
to abixml, the type we read back is different.  This needs further
debugging but at least it pinpoints exactly what type we are seeing
the core issue on first.  This is of a tremendous help in the root
cause analysis needed to understand why the self comparison is
failing.

	* include/abg-ir.h (environment::get_type_id_canonical_type_map):
	Declare new data member.
	* src/abg-ir.cc (environment::priv::type_id_canonical_type_map_):
	Define new data member.
	(environment::get_type_id_canonical_type_map): Define new method.
	* include/abg-reader.h (load_canonical_type_ids): Declare new
	function.
	* src/abg-reader.cc (read_context::m_pointer_type_id_map):
	Define new data member.
	(read_context::{get_pointer_type_id_map,
	maybe_check_abixml_canonical_type_stability}): Define new methods.
	(read_context::{maybe_canonicalize_type,
	perform_late_type_canonicalizing}): Invoke
	maybe_perform_self_comparison_canonical_type_check after
	canonicalization to perform canonicalization type stability
	checking.
	(build_type): Associate the pointer value for the newly built type
	with the type id string identifying it in the abixml.  Once the
	abixml representation is dropped from memory and we are about to
	perform type canonicalization, we can still know what the type id
	of a given type coming from abixml was; it's thus possible to
	verify that the canonical type associated to that type id is the
	same as the one stored in the typeid file.
	(read_type_id_string): Define new static function.
	(load_canonical_type_ids): Define new function.
	* include/abg-writer.h (write_canonical_type_ids): Likewise.
	* src/abg-writer.cc (write_canonical_type_ids): Define new
	function overloads.
	* tools/abidw.cc (options::type_id_file_path): New data member.
	(load_corpus_and_write_abixml): Write and read back the typeid
	file.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>

Applied to master.

---
 include/abg-ir.h     |   5 +
 include/abg-reader.h |   6 ++
 include/abg-writer.h |  10 ++
 src/abg-ir.cc        |  19 ++++
 src/abg-reader.cc    | 211 ++++++++++++++++++++++++++++++++++++++++++-
 src/abg-writer.cc    |  67 ++++++++++++++
 tools/abidw.cc       |  15 +++
 7 files changed, 331 insertions(+), 2 deletions(-)
  

Patch

diff --git a/include/abg-ir.h b/include/abg-ir.h
index d284995f..c4df23e0 100644
--- a/include/abg-ir.h
+++ b/include/abg-ir.h
@@ -219,6 +219,11 @@  public:
 
   type_base* get_canonical_type(const char* name, unsigned index);
 
+#ifdef WITH_DEBUG_SELF_COMPARISON
+  unordered_map<string, uintptr_t>&
+  get_type_id_canonical_type_map() const;
+#endif
+
   friend class class_or_union;
   friend class class_decl;
   friend class function_type;
diff --git a/include/abg-reader.h b/include/abg-reader.h
index 13c3ef23..a3aa0f85 100644
--- a/include/abg-reader.h
+++ b/include/abg-reader.h
@@ -82,6 +82,12 @@  void
 consider_types_not_reachable_from_public_interfaces(read_context& ctxt,
 						    bool flag);
 }//end xml_reader
+
+#ifdef WITH_DEBUG_SELF_COMPARISON
+bool
+load_canonical_type_ids(xml_reader::read_context& ctxt,
+			const string& file_path);
+#endif
 }//end namespace abigail
 
 #endif // __ABG_READER_H__
diff --git a/include/abg-writer.h b/include/abg-writer.h
index 552a3b09..70d118a5 100644
--- a/include/abg-writer.h
+++ b/include/abg-writer.h
@@ -127,6 +127,16 @@  write_corpus_group(write_context&	    ctx,
 		   unsigned		    indent);
 
 }// end namespace xml_writer
+
+#ifdef WITH_DEBUG_SELF_COMPARISON
+void
+write_canonical_type_ids(xml_writer::write_context&, ostream&);
+
+bool
+write_canonical_type_ids(xml_writer::write_context&,
+			const string &);
+#endif
+
 }// end namespace abigail
 
 #endif //  __ABG_WRITER_H__
diff --git a/src/abg-ir.cc b/src/abg-ir.cc
index 6af7fb78..d4099b2d 100644
--- a/src/abg-ir.cc
+++ b/src/abg-ir.cc
@@ -2797,6 +2797,10 @@  struct environment::priv
   // This one is the corpus for the ABIXML file representing the
   // serialization of the input binary.
   corpus_wptr				second_self_comparison_corpus_;
+  // This is also used for debugging purposes, when using
+  //   'abidw --debug-abidiff <binary>'.  It holds the set of mapping of
+  // an abixml (canonical) type and its type-id.
+  unordered_map<string, uintptr_t>	type_id_canonical_type_map_;
 #endif
   bool					canonicalization_is_done_;
   bool					do_on_the_fly_canonicalization_;
@@ -3307,6 +3311,21 @@  environment::get_canonical_type(const char* name, unsigned index)
   return (*types)[index].get();
 }
 
+#ifdef WITH_DEBUG_SELF_COMPARISON
+/// Get the set of abixml type-id and the pointer value of the
+/// (canonical) type it's associated to.
+///
+/// This is useful for debugging purposes, especially in the context
+/// of the use of the command:
+///   'abidw --debug-abidiff <binary>'.
+///
+/// @return the set of abixml type-id and the pointer value of the
+/// (canonical) type it's associated to.
+unordered_map<string, uintptr_t>&
+environment::get_type_id_canonical_type_map() const
+{return priv_->type_id_canonical_type_map_;}
+#endif
+
 // </environment stuff>
 
 // <type_or_decl_base stuff>
diff --git a/src/abg-reader.cc b/src/abg-reader.cc
index 9e7db810..bfab3b8d 100644
--- a/src/abg-reader.cc
+++ b/src/abg-reader.cc
@@ -58,6 +58,9 @@  static bool	read_is_declaration_only(xmlNodePtr, bool&);
 static bool	read_is_artificial(xmlNodePtr, bool&);
 static bool	read_tracking_non_reachable_types(xmlNodePtr, bool&);
 static bool	read_is_non_reachable_type(xmlNodePtr, bool&);
+#ifdef WITH_DEBUG_SELF_COMPARISON
+static bool	read_type_id_string(xmlNodePtr, string&);
+#endif
 
 class read_context;
 
@@ -96,6 +99,9 @@  private:
   unordered_map<string, vector<type_base_sptr> >	m_types_map;
   unordered_map<string, shared_ptr<function_tdecl> >	m_fn_tmpl_map;
   unordered_map<string, shared_ptr<class_tdecl> >	m_class_tmpl_map;
+#ifdef WITH_DEBUG_SELF_COMPARISON
+  unordered_map<uintptr_t, string>			m_pointer_type_id_map;
+#endif
   vector<type_base_sptr>				m_types_to_canonicalize;
   string_xml_node_map					m_id_xml_node_map;
   xml_node_decl_base_sptr_map				m_xml_node_decl_map;
@@ -378,6 +384,25 @@  public:
     return i->second;
   }
 
+#ifdef WITH_DEBUG_SELF_COMPARISON
+  /// Getter of the map that associates the values of type pointers to
+  /// their type-id strings.
+  ///
+  /// Note that this map is populated at abixml reading time, (by
+  /// build_type()) when a given XML element representing a type is
+  /// read into a corresponding abigail::ir::type_base.
+  ///
+  /// This is used only for the purpose of debugging the
+  /// self-comparison process.  That is, when invoking "abidw
+  /// --debug-abidiff".
+  ///
+  /// @return the map that associates the values of type pointers to
+  /// their type-id strings.
+  unordered_map<uintptr_t, string>&
+  get_pointer_type_id_map()
+  {return m_pointer_type_id_map;}
+#endif
+
   /// Return the current lexical scope.
   scope_decl*
   get_cur_scope() const
@@ -791,6 +816,67 @@  public:
     clear_decls_stack();
   }
 
+#ifdef WITH_DEBUG_SELF_COMPARISON
+  /// Perform a debugging routine for the "self-comparison" mode.
+  ///
+  /// This is done when this command is on:
+  ///
+  ///   "abidw --debug-abidiff".
+  ///
+  /// Consider a type 't' built from an XML element from the abixml
+  /// reader and that has just been canonicalized.
+  ///
+  /// This function checks if the canonical type of 't' is the same as
+  /// the canonical type of the type which was saved into the abixml
+  /// with the same "type-id" as the one of 't'.
+  ///
+  /// Note that at abixml saving time, a debugging file was saved on
+  /// disk to record the mapping of canonical type pointers and their
+  /// type-ids.  Right before reading the abixml again, that file was
+  /// read again and the mapping was loaded in the map returned by
+  /// environment::get_type_id_canonical_type_map().
+  void
+  maybe_check_abixml_canonical_type_stability(type_base_sptr& t)
+  {
+    if (!m_env->self_comparison_debug_is_on()
+	|| m_env->get_type_id_canonical_type_map().empty())
+      return ;
+
+    // Let's get the type-id of this type as recorded in the
+    // originating abixml file.
+    string type_id;
+    const auto i =
+      get_pointer_type_id_map().find(reinterpret_cast<uintptr_t>(t.get()));
+    if (i != get_pointer_type_id_map().end())
+      {
+	type_id = i->second;
+	// Now let's get the canonical type that initially led to the
+	// serialization of a type with this type-id, when the abixml
+	// was being serialized.
+	auto j = m_env->get_type_id_canonical_type_map().find(type_id);
+	if (j == m_env->get_type_id_canonical_type_map().end())
+	  std::cerr << "error: no type with type-id: '"
+		    << type_id
+		    << "' could be read back from the abixml file\n";
+	else if (j->second
+		 != reinterpret_cast<uintptr_t>(t->get_canonical_type().get()))
+	  // So thecanonical type of 't' (at abixml de-serialization
+	  // time) is different from the canonical type that led to
+	  // the serialization of 't' at abixml serialization time.
+	  // Report this because it needs further debugging.
+	  std::cerr << "error: canonical type for type '"
+		    << t->get_pretty_representation(/*internal=*/false,
+						    /*qualified=*/false)
+		    << "' of type-id '" << type_id
+		    << "' changed from '" << std::hex
+		    << j->second << "' to '" << std::hex
+		    << reinterpret_cast<uintptr_t>(t->get_canonical_type().get())
+		    << std::dec
+		    << "'\n";
+	    }
+  }
+#endif
+
   /// Test if a type should be canonicalized early.  If so,
   /// canonicalize it right away.  Otherwise, schedule it for late
   /// canonicalizing; that is, schedule it so that it's going to be
@@ -836,7 +922,12 @@  public:
 	&& !is_typedef(t)
 	&& !is_enum_type(t)
 	&& !is_function_type(t))
-      canonicalize(t);
+      {
+	canonicalize(t);
+#ifdef WITH_DEBUG_SELF_COMPARISON
+	maybe_check_abixml_canonical_type_stability(t);
+#endif
+      }
     else
       {
 	// We do not want to try to canonicalize a class type that
@@ -865,7 +956,12 @@  public:
     for (vector<type_base_sptr>::iterator i = m_types_to_canonicalize.begin();
 	 i != m_types_to_canonicalize.end();
 	 ++i)
-      canonicalize(*i);
+      {
+	canonicalize(*i);
+#ifdef WITH_DEBUG_SELF_COMPARISON
+	maybe_check_abixml_canonical_type_stability(*i);
+#endif
+      }
   }
 
   /// Test whether if a given function suppression matches a function
@@ -2704,6 +2800,26 @@  read_elf_symbol_visibility(xmlNodePtr node, elf_symbol::visibility& v)
   return false;
 }
 
+#ifdef WITH_DEBUG_SELF_COMPARISON
+/// Read the value of the 'id' attribute from a given XML node.
+///
+/// @param node the XML node to consider.
+///
+/// @param type_id the string to set the 'id' to.
+///
+/// @return true iff @p type_id was successfully set.
+static bool
+read_type_id_string(xmlNodePtr node, string& type_id)
+{
+  if (xml_char_sptr s = XML_NODE_GET_ATTRIBUTE(node, "id"))
+    {
+      type_id = CHAR_STR(s);
+      return true;
+    }
+  return false;
+}
+#endif
+
 /// Build a @ref namespace_decl from an XML element node which name is
 /// "namespace-decl".  Note that this function recursively reads the
 /// content of the namespace and builds the proper IR nodes
@@ -5541,6 +5657,16 @@  build_type(read_context&	ctxt,
 	abi->record_type_as_reachable_from_public_interfaces(*t);
     }
 
+#ifdef WITH_DEBUG_SELF_COMPARISON
+  if (t && ctxt.get_environment()->self_comparison_debug_is_on())
+    {
+      string type_id;
+      if (read_type_id_string(node, type_id))
+	// Let's store the type-id of this type pointer.
+	ctxt.get_pointer_type_id_map()[reinterpret_cast<uintptr_t>(t.get())] = type_id;
+    }
+#endif
+
   ctxt.maybe_canonicalize_type(t);
   return t;
 }
@@ -5913,4 +6039,85 @@  read_corpus_from_native_xml_file(const string& path,
 
 }//end namespace xml_reader
 
+#ifdef WITH_DEBUG_SELF_COMPARISON
+/// Load the map that is stored at
+/// environment::get_type_id_canonical_type_map().
+///
+/// That map associates type-ids to the pointer value of the canonical
+/// types they correspond to.  The map is loaded from a file that was
+/// stored on disk by some debugging primitive that is activated when
+/// the command "abidw --debug-abidiff <binary>' is used."
+///
+/// The function that stored the map in that file is
+/// write_canonical_type_ids.
+///
+/// @param ctxt the read context to use.
+///
+/// @param file_path the path to the file containing the type-ids <->
+/// canonical type mapping.
+///
+/// @return true iff the loading was successful.
+bool
+load_canonical_type_ids(xml_reader::read_context& ctxt, const string &file_path)
+{
+  xmlDocPtr doc = xmlReadFile(file_path.c_str(), NULL, XML_PARSE_NOERROR);
+  if (!doc)
+    return false;
+
+  xmlNodePtr node = xmlDocGetRootElement(doc);
+  if (!node)
+    return false;
+
+  // We expect a file which content looks like:
+  //
+  // <abixml-types-check>
+  //     <type>
+  //       <id>type-id-573</id>
+  //       <c>0x262ee28</c>
+  //     </type>
+  //     <type>
+  //       <id>type-id-569</id>
+  //       <c>0x2628298</c>
+  //     </type>
+  //     <type>
+  //       <id>type-id-575</id>
+  //       <c>0x25f9ba8</c>
+  //     </type>
+  // <abixml-types-check>
+  //
+  // So let's parse it!
+
+  if (xmlStrcmp(node->name, (xmlChar*) "abixml-types-check"))
+    return false;
+
+  for (node = xmlFirstElementChild(node);
+       node;
+       node = xmlNextElementSibling(node))
+    {
+      if (xmlStrcmp(node->name, (xmlChar*) "type"))
+	continue;
+
+      string id, canonical_address;
+      xmlNodePtr data = xmlFirstElementChild(node);
+      if (data && !xmlStrcmp(data->name, (xmlChar*) "id")
+	  && data->children && xmlNodeIsText(data->children))
+	id = (char*) XML_GET_CONTENT(data->children);
+
+      data = xmlNextElementSibling(data);
+      if (data && !xmlStrcmp(data->name, (xmlChar*) "c")
+	  && data->children && xmlNodeIsText(data->children))
+	{
+	  canonical_address = (char*) XML_GET_CONTENT(data->children);
+	  std::stringstream s;
+	  s << canonical_address;
+	  uintptr_t v = 0;
+	  s >>  std::hex >> v;
+	  if (!id.empty())
+	    ctxt.get_environment()->get_type_id_canonical_type_map()[id] = v;
+	}
+    }
+  return true;
+}
+#endif
+
 }//end namespace abigail
diff --git a/src/abg-writer.cc b/src/abg-writer.cc
index f9cae698..d0588d6b 100644
--- a/src/abg-writer.cc
+++ b/src/abg-writer.cc
@@ -4689,5 +4689,72 @@  void
 dump_decl_location(const decl_base_sptr d)
 {dump_decl_location(d.get());}
 
+#ifdef WITH_DEBUG_SELF_COMPARISON
+/// Serialize the map that is stored at
+/// environment::get_type_id_canonical_type_map() to an output stream.
+///
+/// This is for debugging purposes and is triggered ultimately by
+/// invoking the command 'abidw --debug-abidiff <binary>'.
+///
+/// @param ctxt the write context.
+///
+/// @param o the output stream to serialize the map to.
+void
+write_canonical_type_ids(xml_writer::write_context& ctxt, ostream& o)
+{
+  // We want to serialize a file which content looks like:
+  //
+  // <abixml-types-check>
+  //     <type>
+  //       <id>type-id-573</id>
+  //       <c>0x262ee28</c>
+  //     </type>
+  //     <type>
+  //       <id>type-id-569</id>
+  //       <c>0x2628298</c>
+  //     </type>
+  //     <type>
+  //       <id>type-id-575</id>
+  //       <c>0x25f9ba8</c>
+  //     </type>
+  // <abixml-types-check>
+
+  o << "<abixml-types-check>\n";
+  for (const auto &p : ctxt.get_environment()->get_canonical_types_map())
+    for (const auto& type_sptr : p.second)
+      {
+	string id = ctxt.get_id_for_type (type_sptr);
+	o << "  <type>\n"
+	  << "    <id>" << id << "</id>\n"
+	  << "    <c>"
+	  << std::hex << type_sptr->get_canonical_type().get()
+	  << "</c>\n"
+	  << "  </type>\n";
+      }
+  o << "</abixml-types-check>\n";
+}
+
+/// Serialize the map that is stored at
+/// environment::get_type_id_canonical_type_map() to a file.
+///
+/// This is for debugging purposes and is triggered ultimately by
+/// invoking the command 'abidw --debug-abidiff <binary>'.
+///
+/// @param ctxt the write context.
+///
+/// @param file_path the file to serialize the map to.
+bool
+write_canonical_type_ids(xml_writer::write_context& ctxt,
+			const string &file_path)
+{
+  std:: ofstream o (file_path);
+
+  if (!o.is_open())
+    return true;
+  write_canonical_type_ids(ctxt, o);
+  o.close();
+  return true;
+}
+#endif
 // </Debugging routines>
 } //end namespace abigail
diff --git a/tools/abidw.cc b/tools/abidw.cc
index c6f54475..2234613b 100644
--- a/tools/abidw.cc
+++ b/tools/abidw.cc
@@ -108,6 +108,9 @@  struct options
   bool			drop_private_types;
   bool			drop_undefined_syms;
   type_id_style_kind	type_id_style;
+#ifdef WITH_DEBUG_SELF_COMPARISON
+  string		type_id_file_path;
+#endif
 
   options()
     : display_version(),
@@ -575,9 +578,21 @@  load_corpus_and_write_abixml(char* argv[],
 	  write_corpus(*write_ctxt, corp, 0);
 	  tmp_file->get_stream().flush();
 
+#ifdef WITH_DEBUG_SELF_COMPARISON
+	  if (opts.debug_abidiff)
+	    {
+	      opts.type_id_file_path = tmp_file->get_path() + string(".typeid");
+	      write_canonical_type_ids(*write_ctxt, opts.type_id_file_path);
+	    }
+#endif
 	  xml_reader::read_context_sptr read_ctxt =
 	    create_native_xml_read_context(tmp_file->get_path(), env.get());
 
+#ifdef WITH_DEBUG_SELF_COMPARISON
+	  if (opts.debug_abidiff
+	      && !opts.type_id_file_path.empty())
+	    load_canonical_type_ids(*read_ctxt, opts.type_id_file_path);
+#endif
 	  t.start();
 	  corpus_sptr corp2 =
 	    read_corpus_from_input(*read_ctxt);