[applied] Bug 29934 - Handle buggy data members with empty names

Message ID 871qoodhbv.fsf@redhat.com
State New
Headers
Series [applied] Bug 29934 - Handle buggy data members with empty names |

Commit Message

Dodji Seketeli Dec. 24, 2022, 2:09 p.m. UTC
  Hello,

When handling the changes between two ABI corpora, the diff graph
building pass chokes on a data member which seems to erroneously have
an empty name.

The steps to reproduce the issue are explained in the problem report
at https://sourceware.org/bugzilla/show_bug.cgi?id=29934.

The root cause of the problem is that the "struct mailbox" data
structure from the binary /usr/lib64/dovecot/libdovecot-storage.so.0
coming from the dovecot-2.2.36-3.el7.x86_64.rpm package contains a
data member that has an empty name.  The source code of that data
structure can be browsed at
https://github.com/dovecot/core/blob/release-2.2.36/src/lib-storage/mail-storage-private.h

We see that the mailbox::storage data structure at line 352 ends up in
the DWARF debug info with an empty name.  Let's look at the DWARF
dump as emitted by "eu-readelf --debug-dump=info" on the
/usr/lib/debug//usr/lib64/dovecot/libdovecot-storage.so.0.debug file
from the dovecot-debuginfo-2.2.36-3.el7.x86_64.rpm package.

A relevant DIE for the "struct mailbox" is the following:

 [  3e3e]    structure_type       abbrev: 9
             name                 (strp) "mailbox"
             byte_size            (data2) 768
             decl_file            (data1) mail-storage-private.h (24)
             decl_line            (data2) 348
             sibling              (ref_udata) [  41f9]
 [  3e4a]      member               abbrev: 59
               name                 (strp) "name"
               decl_file            (data1) mail-storage-private.h (24)
               decl_line            (data2) 349
               type                 (ref_addr) [    84]
               data_member_location (data1) 0
 [  3e57]      member               abbrev: 59
               name                 (strp) "vname"
               decl_file            (data1) mail-storage-private.h (24)
               decl_line            (data2) 351
               type                 (ref_addr) [    84]
               data_member_location (data1) 8
 [  3e64]      member               abbrev: 47
               name                 (strp) ""
               decl_file            (data1) mail-storage-private.h (24)
               decl_line            (data2) 352
               type                 (ref_udata) [  3bee]
               data_member_location (data1) 16

[...]

You can see here that the DW_TAG_member DIE at offset 0x3e64 has an
empty name.  Its DW_AT_type attribute references the DIE at offset
0x3bee.

The DIE at offset 0x3bee is this one:

 [  3bee]    pointer_type         abbrev: 95
             byte_size            (data1) 8
             type                 (ref_udata) [  3a90]

[...]

It's a pointer to the type which DIE is at offset 0x3a90, which is:

 [  3a90]    structure_type       abbrev: 48
             name                 (strp) "mail_storage"
             byte_size            (data2) 352
             decl_file            (data1) mail-storage-private.h (24)
             decl_line            (data1) 132
             sibling              (ref_udata) [  3bee]

So, the data member of "struct mailbox" which has an empty name has a
type "pointer to struct mail_storage", aka "struct mail_storage*".
That indeed corresponds to the "storage" data member that we see at
line 352 of the mail-storage-private.h file, browsable at
https://github.com/dovecot/core/blob/release-2.2.36/src/lib-storage/mail-storage-private.h.

The fact that this data member has an empty name seems to me as a bug
of the DWARF emitter.

Libabigail ought to gently handle this bug instead of choking.

This patch assigns an artificial name to that empty data member to
handle this kind of cases in the future. The names looks like
"unnamed-@-<location>" where "location" is the location of the data
member.

Please note that there can be normal cases of anonymous data members
where the data member has an empty name.  In those cases, the data
member must be of type union or struct.  This is to describe the
"unnamed fields" C feature described at
https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html.

The buggy case we are seeing here is different from the "unnamed
field" case because the type of the anonymous data member is neither
struct nor union.

	* src/abg-dwarf-reader.cc (die_is_anonymous_data_member): Define
	new static function.
	(die_member_offset): Move the declaration of this up so that it
	can be used more generally.
	(reader::build_name_for_buggy_anonymous_data_member): Define new
	member function.
	(add_or_update_class_type): Generate an artificial name for buggy
	data members with empty names.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 src/abg-dwarf-reader.cc | 100 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)
  

Patch

diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index 37cd5583..5df1516d 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -382,6 +382,9 @@  get_scope_die(const reader&	rdr,
 static bool
 die_is_anonymous(const Dwarf_Die* die);
 
+static bool
+die_is_anonymous_data_member(const Dwarf_Die* die);
+
 static bool
 die_is_type(const Dwarf_Die* die);
 
@@ -486,6 +489,11 @@  die_constant_attribute(const Dwarf_Die *die,
 		       bool is_signed,
 		       array_type_def::subrange_type::bound_value &value);
 
+static bool
+die_member_offset(const reader& rdr,
+		  const Dwarf_Die* die,
+		  int64_t& offset);
+
 static bool
 form_is_DW_FORM_strx(unsigned form);
 
@@ -3876,6 +3884,61 @@  public:
     return (i != die_wip_function_types_map(source).end());
   }
 
+  /// Sometimes, a data member die can erroneously have an empty name as
+  /// a result of a bug of the DWARF emitter.
+  ///
+  /// This is what happens in
+  /// https://sourceware.org/bugzilla/show_bug.cgi?id=29934.
+  ///
+  /// In that case, this function constructs an artificial name for that
+  /// data member.  The pattern of the name is as follows:
+  ///
+  ///          "unnamed-@-<location>".
+  ///
+  ///location is either the value of the data member location of the
+  ///data member if it has one or  concatenation of its source location
+  ///if it has none.  If no location can be calculated then the function
+  ///returns the empty string.
+  string
+  build_name_for_buggy_anonymous_data_member(Dwarf_Die *die)
+  {
+    string result;
+    // Let's make sure we are looking at a data member with an empty
+    // name ...
+    if (!die
+	|| dwarf_tag(die) != DW_TAG_member
+	|| !die_name(die).empty())
+      return result;
+
+    // ... and yet, it's not an anonymous data member (aka unnamed
+    // field) as described in
+    // https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html.
+    if (die_is_anonymous_data_member(die))
+      return result;
+
+    // If we come this far, it means we are looking at a buggy data
+    // member with no name.  Let's build a name for it so that it can be
+    // addressed.
+    int64_t offset_in_bits = 0;
+    bool has_offset = die_member_offset(*this, die, offset_in_bits);
+    location loc;
+    if (!has_offset)
+      {
+	loc = die_location(*this, die);
+	if (!loc)
+	  return result;
+      }
+
+    std::ostringstream o;
+    o << "unnamed-dm-@-";
+    if (has_offset)
+      o << "offset-" << offset_in_bits << "bits";
+    else
+      o << "loc-" << loc.expand();
+
+    return o.str();
+  }
+
   /// Getter for the map of declaration-only classes that are to be
   /// resolved to their definition classes by the end of the corpus
   /// loading.
@@ -5853,6 +5916,33 @@  die_is_anonymous(const Dwarf_Die* die)
   return false;
 }
 
+/// Test if a DIE is an anonymous data member, aka, "unnamed field".
+///
+/// Unnamed fields are specified at
+/// https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html.
+///
+/// @param die the DIE to consider.
+///
+/// @return true iff @p die is an anonymous data member.
+static bool
+die_is_anonymous_data_member(const Dwarf_Die* die)
+{
+  if (!die
+      || dwarf_tag(const_cast<Dwarf_Die*>(die)) != DW_TAG_member
+      || !die_name(die).empty())
+    return false;
+
+  Dwarf_Die type_die;
+  if (!die_die_attribute(die, DW_AT_type, type_die))
+    return false;
+
+  if (dwarf_tag(&type_die) != DW_TAG_structure_type
+      && dwarf_tag(&type_die) != DW_TAG_union_type)
+  return false;
+
+  return true;
+}
+
 /// Get the value of an attribute that is supposed to be a string, or
 /// an empty string if the attribute could not be found.
 ///
@@ -13009,6 +13099,16 @@  add_or_update_class_type(reader&	 rdr,
 	      if (!t)
 		continue;
 
+	      if (n.empty() && !die_is_anonymous_data_member(&child))
+		{
+		  // We must be in a case where the data member has an
+		  // empty name because the DWARF emitter has a bug.
+		  // Let's generate an artificial name for that data
+		  // member.
+		  n = rdr.build_name_for_buggy_anonymous_data_member(&child);
+		  ABG_ASSERT(!n.empty());
+		}
+
 	      // The call to build_ir_node_from_die above could have
 	      // triggered the adding of a data member named 'n' into
 	      // result.  So let's check again if the variable is