[v0,11/15] bfd: merge duplicated subsections into one after parsing

Message ID 20250310175131.1217374-12-matthieu.longo@arm.com
State New
Headers
Series AArch64 AEABI build attributes (a.k.a. object attributes v2) |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 fail Patch failed to apply

Commit Message

Matthieu Longo March 10, 2025, 5:51 p.m. UTC
  Several subsections with the same name can be present inside an object
file depending on how it was generated. Gas won't generate such an
object as it is able to detect scattered declaration of a subsection,
and merge its content before the serialization. However, others
assemblers than Gas might keep the subsections separated, and opted for
delegating the merge to the linker.
Additionally, a subsection can be populated before the parsing the
attributes section if some GNU properties are convertible to build
attributes.
In order to make the merge logic between object files simpler, the
deserializer is able to handle to perform an union of the duplicated
subsections inside an object.
Since the same deserializaton code is used by objcopy, it means that
objcopy won't simply copy the data to the output object, but will also
merge duplicated subsections together, so the readelf output between
the two objects will be different. This might be seen as a bug but,
given that gas will never generate such an object, I would argue that
it is an acceptable design choice.
---
 bfd/elf-attrs.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 90 insertions(+), 2 deletions(-)
  

Comments

Jan Beulich March 11, 2025, 8:15 a.m. UTC | #1
On 10.03.2025 18:51, Matthieu Longo wrote:
> Several subsections with the same name can be present inside an object
> file depending on how it was generated. Gas won't generate such an
> object as it is able to detect scattered declaration of a subsection,
> and merge its content before the serialization. However, others
> assemblers than Gas might keep the subsections separated, and opted for
> delegating the merge to the linker.
> Additionally, a subsection can be populated before the parsing the
> attributes section if some GNU properties are convertible to build
> attributes.
> In order to make the merge logic between object files simpler, the
> deserializer is able to handle to perform an union of the duplicated
> subsections inside an object.
> Since the same deserializaton code is used by objcopy, it means that
> objcopy won't simply copy the data to the output object, but will also
> merge duplicated subsections together, so the readelf output between
> the two objects will be different. This might be seen as a bug but,
> given that gas will never generate such an object, I would argue that
> it is an acceptable design choice.

I tend to disagree: objcopy isn't linking. ld -r may do such combining,
but imo objcopy shouldn't.

Jan
  
Matthieu Longo March 20, 2025, 3:23 p.m. UTC | #2
On 2025-03-11 08:15, Jan Beulich wrote:
> On 10.03.2025 18:51, Matthieu Longo wrote:
>> Several subsections with the same name can be present inside an object
>> file depending on how it was generated. Gas won't generate such an
>> object as it is able to detect scattered declaration of a subsection,
>> and merge its content before the serialization. However, others
>> assemblers than Gas might keep the subsections separated, and opted for
>> delegating the merge to the linker.
>> Additionally, a subsection can be populated before the parsing the
>> attributes section if some GNU properties are convertible to build
>> attributes.
>> In order to make the merge logic between object files simpler, the
>> deserializer is able to handle to perform an union of the duplicated
>> subsections inside an object.
>> Since the same deserializaton code is used by objcopy, it means that
>> objcopy won't simply copy the data to the output object, but will also
>> merge duplicated subsections together, so the readelf output between
>> the two objects will be different. This might be seen as a bug but,
>> given that gas will never generate such an object, I would argue that
>> it is an acceptable design choice.
> 
> I tend to disagree: objcopy isn't linking. ld -r may do such combining,
> but imo objcopy shouldn't.
> 
> Jan

The way I implemented the translation of GNU properties to OAv2 was a 
quick hack. I thought the side effect was acceptable but you are very 
right on the principle that objcopy's role is copy the data in the 
sections to the output object without any transformation of those data.

I changed where this translation happened, and automatically it removed 
the deduplication of subsections and attributes, and sorting outside of 
objcopy path.

Fixed in the next revision.
  

Patch

diff --git a/bfd/elf-attrs.c b/bfd/elf-attrs.c
index 12a81181c11..49c301326f2 100644
--- a/bfd/elf-attrs.c
+++ b/bfd/elf-attrs.c
@@ -982,6 +982,78 @@  bfd_elf_parse_attrs_subsection_v2 (bfd *abfd,
   return op;
 }
 
+/* Merge two subsections together (object attributes v2 only).
+   The result is stored into subsec1.  subsec2 is destroyed.
+   Return true if the merge was successful, false otherwise.  */
+static bool
+obj_attr_subsection_v2_destructive_merge (bfd *abfd,
+					  obj_attr_subsection_v2 *subsec1,
+					  obj_attr_subsection_v2 *subsec2)
+{
+  BFD_ASSERT (subsec1->encoding == subsec2->encoding
+	   && subsec1->optional == subsec2->optional);
+
+  /* Sort the subsections before merging them together.  */
+  LINKED_LIST_MERGE_SORT(obj_attr_v2) (subsec1, _bfd_elf_obj_attr_v2_cmp);
+  subsec2->first_ = _LINKED_LIST_MERGE_SORT(obj_attr_v2)
+    (subsec2->first_, _bfd_elf_obj_attr_v2_cmp);
+
+  bool success = true;
+
+  obj_attr_v2 *a1 = subsec1->first_;
+  obj_attr_v2 *a2 = subsec2->first_;
+  while (a1 != NULL && a2 != NULL)
+    {
+      if (a1->tag < a2->tag)
+	{} /* Nothing to do, a1 is already in subsec1.  */
+      else if (a1->tag > a2->tag)
+	{
+	  /* a2 is missing in subsec1, add it.  */
+	  obj_attr_v2 *previous = LINKED_LIST_REMOVE(obj_attr_v2) (subsec2, a2);
+	  LINKED_LIST_INSERT_BEFORE(obj_attr_v2) (subsec1, a2, a1);
+	  a2 = previous;
+	}
+      else
+	{
+	  if (subsec1->encoding == ULEB128
+	   && a1->vals.uint_val != a2->vals.uint_val)
+	    {
+	      success = false;
+	      _bfd_error_handler (_("%pB: error: found 2 subsections with the "
+		"same name '%s' and found conflicting values (0x%x vs 0x%x) for"
+		" object attribute 'Tag_unknown_%u'"), abfd, subsec1->name,
+		a1->vals.uint_val, a2->vals.uint_val, a1->tag);
+	    }
+	  else if (subsec1->encoding == NTBS
+	    && strcmp (a1->vals.string_val, a2->vals.string_val) != 0)
+	    {
+	      success = false;
+	      _bfd_error_handler (_("%pB: error: found 2 subsections with the "
+		"same name '%s' and found conflicting values ('%s' vs '%s') for"
+		" object attribute 'Tag_unknown_%u'"), abfd, subsec1->name,
+		a1->vals.string_val, a2->vals.string_val, a1->tag);
+	    }
+	}
+      a1 = a1->next;
+      a2 = a2->next;
+    }
+
+  for (; a2 != NULL; a2 = a2->next)
+    {
+      /* a2 is missing in subsec1, add it.  */
+      obj_attr_v2 *previous = LINKED_LIST_REMOVE(obj_attr_v2) (subsec2, a2);
+      LINKED_LIST_INSERT_BEFORE(obj_attr_v2) (subsec1, a2, a1);
+      a2 = previous;
+    }
+
+  /* If a1 != NULL, we don't care since it is already in subsec1.  */
+
+  /* Destroy subsec2 before exiting.  */
+  _bfd_elf_obj_attr_subsection_v2_free (subsec2);
+
+  return success;
+}
+
 /* Parse the list of subsections (object attributes v2 only).  */
 static void
 bfd_elf_parse_attr_section_v2 (bfd *abfd,
@@ -1003,10 +1075,26 @@  bfd_elf_parse_attr_section_v2 (bfd *abfd,
 	  bfd_set_error (bfd_error_wrong_format);
 	  break;
 	}
+
+      /* Before appending the parsed subsection, check if this subsection was
+	 already recorded (this can happen when a GNU property was parsed, and
+	 converted to its equivalent object attributes).  If so, try to merge it
+	 with the existing one.  */
+      obj_attr_subsection_v2 *parsed_subsec =
+	(obj_attr_subsection_v2 *) op.object;
+      obj_attr_subsection_v2 *subsec = obj_attr_subsection_v2_find_by_name
+	(elf_obj_attr_subsections (abfd).first_, parsed_subsec->name, false);
+      if (subsec == NULL)
+	LINKED_LIST_APPEND(obj_attr_subsection_v2) (subsecs, parsed_subsec);
       else
-	LINKED_LIST_APPEND(obj_attr_subsection_v2) (subsecs,
-	  (obj_attr_subsection_v2 *) op.object);
+	obj_attr_subsection_v2_destructive_merge (abfd, subsec, parsed_subsec);
     }
+
+  /* Sort all the subsections and object attributes as they might not have been
+     inserted in the right order.  From now on, the subsections and attributes
+     will be assumed to be always sorted.  Any additive mutation will need to
+     preserve the order.  */
+  obj_attr_subsection_v2_sort (subsecs);
 }
 
 /* Parse an object attributes section.