[applied] Detect failed self comparison in type canonicalization of abixml

Message ID 87v9773ycv.fsf@redhat.com
State Committed
Series [applied] Detect failed self comparison in type canonicalization of abixml |

Commit Message

Dodji Seketeli May 25, 2021, 10:24 a.m. UTC

During the self comparison triggered by "abidw --abidiff <binary>",
some comparison errors can happen when canonicalizing types that are
"de-serialized" from the abixml that was serialized from the input

This patch adds some debugging checks and messaging to emit a message
when a type from the abixml appears to not "match" the original type
from the initial corpus it originated from.

This is the more detailed description:

Let's consider a type T coming from the corpus of the input binary.

That input corpus is serialized into abixml and de-serialized again
into a second corpus that we shall name the abixml corpus.  From that
second corpus, let's consider the type T' that is the result of
serializing T into abixml and de-serializing it again.  T is said to
be the original type of T'.  If T is a canonical type, then T' should
equal T.  Otherwise, if T is not a canonical type, its canonical type
should equal the canonical type of T'.

For the sake of simplicity, let's consider that T is a canonical
type.  During the canonicalization of T', T' should equal T.  Each and
every canonical type coming from the abixml corpus should be equal to its
original type from the binary corpus.

If a T' is different from its original type T, then there is an
"equality problem" between T and T'.  In other words, there is a
mismatch between T and T'.  We want to be notified of that problem so
that we can debug it further and fix it.

So this patch introduces the option "abidw --debug-abidiff <binary>"
to trigger the "debug self comparison mode".  At canonicalization
time, we detect that we are in that debug self comparison mode and
during canonicalization of types from the abixml corpus, it detects
when they compare different from their counterpart from the original

This debugging capability can be enabled at configure time with a new
--enable-debug-self-comparison configure option.  That option defines
a new WITH_DEBUG_SELF_COMPARISON compile time macro that is used to
conditionally compile the implementation of this debugging feature.

So, one example of this might look like this:

    abidw  --debug-abidiff bin:
    error: problem detected with type 'typedef Vmalloc_t' from second corpus
    error: problem detected with type 'Vmalloc_t*' from second corpus

So that means the "typedef Vmalloc_t" read from the abixml compares
different from its original type where it should not.

So armed with this new insight, I know I need to debug that comparison
in particular to see why it wrongly results in two different types.

	* doc/manuals/abidw.rst: Add documentation for the --debug-abidiff
	* include/abg-ir.h (environment::{set_self_comparison_debug_input,
	get_self_comparison_debug_inputs, self_comparison_debug_is_on}):
	Declare new methods.
	* configure.ac: Define a new --enable-debug-self-comparison option
	that is disabled by default.  That option defines a new
	WITH_DEBUG_SELF_COMPARISON preprocessor macro.
	* src/abg-ir.cc
	second_self_comparison_corpus_, self_comparison_debug_on_}): New
	data members.  Also, re-indent the data members.
	get_self_comparison_debug_inputs, self_comparison_debug_is_on}):
	Define new method.
	(type_base::get_canonical_type_for): In the "debug self comparison
	mode", if a type coming from the second corpus compares different
	from its counterpart coming from the first corpus then log a debug
	* src/abg-dwarf-reader.cc (read_debug_info_into_corpus): When
	loading the first corpus, if the debug self comparison mode is on,
	then save that corpus on the side in the environment.
	* src/abg-reader.cc (read_corpus_from_input): When loading the
	second corpus, if the debug self comparison mode is on, then save
	that corpus on the side in the environment.
	* tools/abidw.cc: Include the config.h file for preprocessor
	macros defined at configure
	(options::debug_abidiff): New data member.
	(parse_command_line): Parse the --debug-abidiff option.
	(load_corpus_and_write_abixml): Switch the self debug mode on when
	the --debug-abidiff option is provided.  Use a read_context for
	the abixml loading.  That is going to be useful for subsequent

Signed-off-by: Dodji Seketeli <dodji@redhat.com>

Applied to master.

 configure.ac            |  17 ++++++
 doc/manuals/abidw.rst   |  10 ++++
 include/abg-ir.h        |  15 +++++
 src/abg-dwarf-reader.cc |  10 ++++
 src/abg-ir.cc           | 122 ++++++++++++++++++++++++++++++++++++++--
 src/abg-reader.cc       |  18 ++++++
 tools/abidw.cc          |  32 ++++++++++-
 7 files changed, 216 insertions(+), 8 deletions(-)


diff --git a/configure.ac b/configure.ac
index 0b64c3e5..735cc9de 100644
--- a/configure.ac
+++ b/configure.ac
@@ -66,6 +66,12 @@  AC_ARG_ENABLE(rpm415,
+	      AS_HELP_STRING([--enable-debug-self-comparison=yes|no],
+			     [enable debugging of self comparison with 'abidw --debug-abidiff'(default is no)]),
 			     [enable the support of deb in abipkgdiff (default is auto)]),
@@ -297,6 +303,16 @@  fi
+dnl enable the debugging of self comparison when doing abidw --debug-abidiff <binary>
+if test x$ENABLE_DEBUG_SELF_COMPARISON = xyes; then
+  AC_DEFINE([WITH_DEBUG_SELF_COMPARISON], 1, [compile support of debugging abidw --abidiff])
+  AC_MSG_NOTICE([support of debugging self comparison is enabled])
+  AC_MSG_NOTICE([support of debugging self comparison is disabled])
 dnl Check for the dpkg program
 if test x$ENABLE_DEB = xauto -o x$ENABLE_DEB = xyes; then
    AC_CHECK_PROG(HAS_DPKG, dpkg, yes, no)
@@ -914,6 +930,7 @@  AC_MSG_NOTICE([
     libdw has the dwarf_getalt function            : ${FOUND_DWARF_GETALT_IN_LIBDW}
     Enable rpm support in abipkgdiff               : ${ENABLE_RPM}
     Enable rpm 4.15 support in abipkgdiff tests    : ${ENABLE_RPM415}
+    Enable self comparison debugging               : ${ENABLE_DEBUG_SELF_COMPARISON}
     Enable deb support in abipkgdiff               : ${ENABLE_DEB}
     Enable GNU tar archive support in abipkgdiff   : ${ENABLE_TAR}
     Enable bash completion	                   : ${ENABLE_BASH_COMPLETION}
diff --git a/doc/manuals/abidw.rst b/doc/manuals/abidw.rst
index a67e5fa2..4b110d6b 100644
--- a/doc/manuals/abidw.rst
+++ b/doc/manuals/abidw.rst
@@ -241,6 +241,16 @@  Options
     This is a debugging and sanity check option.
+    *  ``--debug-abidiff``
+    Same as ``--abidiff`` but in debug mode.  In this mode, error
+    messages are emitted for types which fail type canonicalization.
+    This is an optional debugging and sanity check option.  To enable
+    it the libabigail package needs to be configured with
+    the --enable-debug-self-comparison option.
   *  ``--annotate``
     Annotate the ABIXML output with comments above most elements.  The
diff --git a/include/abg-ir.h b/include/abg-ir.h
index 2fbc12e9..d284995f 100644
--- a/include/abg-ir.h
+++ b/include/abg-ir.h
@@ -200,6 +200,21 @@  public:
   const config&
   get_config() const;
+  void
+  set_self_comparison_debug_input(const corpus_sptr& corpus);
+  void
+  get_self_comparison_debug_inputs(corpus_sptr& first_corpus,
+				   corpus_sptr& second_corpus);
+  void
+  self_comparison_debug_is_on(bool);
+  bool
+  self_comparison_debug_is_on() const;
   vector<type_base_sptr>* get_canonical_types(const char* name);
   type_base* get_canonical_type(const char* name, unsigned index);
diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index 735a4b48..a06ca88f 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -14279,6 +14279,11 @@  read_debug_info_into_corpus(read_context& ctxt)
+  if (ctxt.env()->self_comparison_debug_is_on())
+    ctxt.env()->set_self_comparison_debug_input(ctxt.current_corpus());
   // Walk all the DIEs of the debug info to build a DIE -> parent map
   // useful for get_die_parent() to work.
@@ -14451,6 +14456,11 @@  read_debug_info_into_corpus(read_context& ctxt)
+  if (ctxt.env()->self_comparison_debug_is_on())
+    ctxt.env()->set_self_comparison_debug_input(ctxt.current_corpus());
   return ctxt.current_corpus();
diff --git a/src/abg-ir.cc b/src/abg-ir.cc
index 31abcc2e..6af7fb78 100644
--- a/src/abg-ir.cc
+++ b/src/abg-ir.cc
@@ -2783,16 +2783,36 @@  struct environment::priv
   type_base_sptr			variadic_marker_type_;
   unordered_set<const class_or_union*>	classes_being_compared_;
   unordered_set<const function_type*>	fn_types_being_compared_;
-  vector<type_base_sptr>	 extra_live_types_;
-  interned_string_pool		 string_pool_;
-  bool				 canonicalization_is_done_;
-  bool				 do_on_the_fly_canonicalization_;
-  bool				 decl_only_class_equals_definition_;
+  vector<type_base_sptr>		extra_live_types_;
+  interned_string_pool			string_pool_;
+  // This is used for debugging purposes.
+  // When abidw is used with the option --debug-abidiff, some
+  // libabigail internals need to get a hold on the initial binary
+  // input of abidw, as well as as the abixml file that represents the
+  // ABI of that binary.
+  //
+  // So this one is the corpus for the input binary.
+  corpus_wptr				first_self_comparison_corpus_;
+  // This one is the corpus for the ABIXML file representing the
+  // serialization of the input binary.
+  corpus_wptr				second_self_comparison_corpus_;
+  bool					canonicalization_is_done_;
+  bool					do_on_the_fly_canonicalization_;
+  bool					decl_only_class_equals_definition_;
+  bool					self_comparison_debug_on_;
     : canonicalization_is_done_(),
+    ,
+      self_comparison_debug_on_(false)
 };// end struct environment::priv
@@ -3186,6 +3206,61 @@  const config&
 environment::get_config() const
 {return priv_->config_;}
+/// Setter of the corpus of the input corpus of the self comparison
+/// that takes place when doing "abidw --debug-abidiff <binary>".
+/// The first invocation of this function sets the first corpus of the
+/// self comparison.  The second invocation of this very same function
+/// sets the second corpus of the self comparison.  That second corpus
+/// is supposed to come from the abixml serialization of the first
+/// corpus.
+/// @param c the corpus of the input binary or the corpus of the
+/// abixml serialization of the initial binary input.
+environment::set_self_comparison_debug_input(const corpus_sptr& c)
+  self_comparison_debug_is_on(true);
+  if (priv_->first_self_comparison_corpus_.expired())
+    priv_->first_self_comparison_corpus_ = c;
+  else if (priv_->second_self_comparison_corpus_.expired()
+	   && c.get() != corpus_sptr(priv_->first_self_comparison_corpus_).get())
+    priv_->second_self_comparison_corpus_ = c;
+/// Getter for the corpora of the input binary and the intermediate
+/// abixml of the self comparison that takes place when doing
+///   'abidw --debug-abidiff <binary>'.
+/// @param first_corpus output parameter that is set to the corpus of
+/// the input corpus.
+/// @param second_corpus output parameter that is set to the corpus of
+/// the second corpus.
+environment::get_self_comparison_debug_inputs(corpus_sptr& first_corpus,
+					      corpus_sptr& second_corpus)
+    first_corpus = priv_->first_self_comparison_corpus_.lock();
+    second_corpus = priv_->second_self_comparison_corpus_.lock();
+/// Turn on/off the self comparison debug mode.
+/// @param f true iff the self comparison debug mode is turned on.
+environment::self_comparison_debug_is_on(bool f)
+{priv_->self_comparison_debug_on_ = f;}
+/// Test if the we are in the process of the 'self-comparison
+/// debugging' as triggered by 'abidw --debug-abidiff' command.
+/// @return true if self comparison debug is on.
+environment::self_comparison_debug_is_on() const
+{return priv_->self_comparison_debug_on_;}
 /// Get the vector of canonical types which have a given "string
 /// representation".
@@ -12911,6 +12986,43 @@  type_base::get_canonical_type_for(type_base_sptr t)
       if (!result)
+	  if (env->self_comparison_debug_is_on())
+	    {
+	      // So we are debugging the canonicalization process,
+	      // possibly via the use of 'abidw --debug-abidiff <binary>'.
+	      //
+	      // If 't' comes from the second corpus, then it *must*
+	      // be equal to its matching canonical type coming from
+	      // the first corpus because the second corpus is the
+	      // abixml representation of the first corpus.  In other
+	      // words, all types coming from the second corpus must
+	      // have canonical types coming from the first corpus.
+	      //
+	      // We are in the case where 't' is different from all
+	      // the canonical types of the same name that come from
+	      // the first corpus.
+	      //
+	      // If 't' indeed comes from the second corpus then this
+	      // clearly is a canonicalization failure.
+	      //
+	      // There was a problem either during the serialization
+	      // of 't' into abixml, or during the de-serialization
+	      // from abixml into abigail::ir.  Further debugging is
+	      // needed to determine what that root cause problem is.
+	      //
+	      // Note that the first canonicalization problem of this
+	      // kind must be fixed before looking at the subsequent
+	      // ones, because the later might well just be
+	      // consequences of the former.
+	      corpus_sptr corp1, corp2;
+	      env->get_self_comparison_debug_inputs(corp1, corp2);
+	      if (corp1 && corp2 && (t->get_corpus() == corp2.get()))
+		std::cerr << "error: problem detected with type '"
+			  << repr
+			  << "' from second corpus\n" << std::flush;
+	    }
 	  result = t;
diff --git a/src/abg-reader.cc b/src/abg-reader.cc
index 05f1a6fa..9e7db810 100644
--- a/src/abg-reader.cc
+++ b/src/abg-reader.cc
@@ -1835,6 +1835,11 @@  read_corpus_from_input(read_context& ctxt)
 	  corpus_sptr c(new corpus(ctxt.get_environment(), ""));
+	  if (ctxt.get_environment()->self_comparison_debug_is_on())
+	    ctxt.get_environment()->
+	      set_self_comparison_debug_input(ctxt.get_corpus());
       if (!ctxt.get_corpus_group())
@@ -1893,6 +1898,11 @@  read_corpus_from_input(read_context& ctxt)
 	  corpus_sptr c(new corpus(ctxt.get_environment(), ""));
+	  if (ctxt.get_environment()->self_comparison_debug_is_on())
+	    ctxt.get_environment()->
+	      set_self_comparison_debug_input(ctxt.get_corpus());
       if (!ctxt.get_corpus_group())
@@ -5822,6 +5832,10 @@  create_native_xml_read_context(const string& path, environment *env)
   corpus_sptr corp(new corpus(env));
+  if (env->self_comparison_debug_is_on())
+    env->set_self_comparison_debug_input(result->get_corpus());
   return result;
@@ -5841,6 +5855,10 @@  create_native_xml_read_context(std::istream* in, environment* env)
   corpus_sptr corp(new corpus(env, ""));
+  if (env->self_comparison_debug_is_on())
+    env->set_self_comparison_debug_input(result->get_corpus());
   return result;
diff --git a/tools/abidw.cc b/tools/abidw.cc
index 22f640b4..c6f54475 100644
--- a/tools/abidw.cc
+++ b/tools/abidw.cc
@@ -11,6 +11,7 @@ 
 /// DWARF format) and emit it back in a set of "text sections" in native
 /// libabigail XML format.
+#include "config.h"
 #include <unistd.h>
 #include <cassert>
 #include <cstdio>
@@ -62,6 +63,7 @@  using abigail::xml_writer::type_id_style_kind;
 using abigail::xml_writer::write_context_sptr;
 using abigail::xml_writer::write_corpus;
 using abigail::xml_reader::read_corpus_from_native_xml_file;
+using abigail::xml_reader::create_native_xml_read_context;
 using abigail::dwarf_reader::read_context;
 using abigail::dwarf_reader::read_context_sptr;
 using abigail::dwarf_reader::read_corpus_from_elf;
@@ -98,6 +100,9 @@  struct options
   bool			noout;
   bool			show_locs;
   bool			abidiff;
+  bool			debug_abidiff;
   bool			annotate;
   bool			do_log;
   bool			drop_private_types;
@@ -122,6 +127,9 @@  struct options
+      debug_abidiff(),
@@ -182,6 +190,9 @@  display_usage(const string& prog_name, ostream& out)
     << "  --vmlinux <path>  the path to the vmlinux binary to consider to emit "
        "the ABI of the union of vmlinux and its modules\n"
     << "  --abidiff  compare the loaded ABI against itself\n"
+    << "  --debug-abidiff  debug the process of comparing the loaded ABI against itself\n"
     << "  --annotate  annotate the ABI artifacts emitted in the output\n"
     << "  --stats  show statistics about various internal stuff\n"
     << "  --verbose show verbose messages about internal stuff\n";
@@ -328,6 +339,13 @@  parse_command_line(int argc, char* argv[], options& opts)
 	opts.linux_kernel_mode = false;
       else if (!strcmp(argv[i], "--abidiff"))
 	opts.abidiff = true;
+      else if (!strcmp(argv[i], "--debug-abidiff"))
+	{
+	  opts.abidiff = true;
+	  opts.debug_abidiff = true;
+	}
       else if (!strcmp(argv[i], "--annotate"))
 	opts.annotate = true;
       else if (!strcmp(argv[i], "--stats"))
@@ -467,11 +485,16 @@  static int
 load_corpus_and_write_abixml(char* argv[],
 			     environment_sptr& env,
 			     read_context_sptr& context,
-			     const options& opts)
+			     options& opts)
   int exit_code = 0;
   timer t;
+  if (opts.debug_abidiff)
+    env->self_comparison_debug_is_on(true);
   read_context& ctxt = *context;
   corpus_sptr corp;
   dwarf_reader::status s = dwarf_reader::STATUS_UNKNOWN;
@@ -551,10 +574,13 @@  load_corpus_and_write_abixml(char* argv[],
 	  set_ostream(*write_ctxt, tmp_file->get_stream());
 	  write_corpus(*write_ctxt, corp, 0);
+	  xml_reader::read_context_sptr read_ctxt =
+	    create_native_xml_read_context(tmp_file->get_path(), env.get());
 	  corpus_sptr corp2 =
-	    read_corpus_from_native_xml_file(tmp_file->get_path(),
-					     env.get());
+	    read_corpus_from_input(*read_ctxt);
 	  if (opts.do_log)
 	    emit_prefix(argv[0], cerr)