[v0,11/15] bfd: merge duplicated subsections into one after parsing
Checks
Commit Message
Several subsections with the same name can be present inside an object
file depending on how it was generated. Gas won't generate such an
object as it is able to detect scattered declaration of a subsection,
and merge its content before the serialization. However, others
assemblers than Gas might keep the subsections separated, and opted for
delegating the merge to the linker.
Additionally, a subsection can be populated before the parsing the
attributes section if some GNU properties are convertible to build
attributes.
In order to make the merge logic between object files simpler, the
deserializer is able to handle to perform an union of the duplicated
subsections inside an object.
Since the same deserializaton code is used by objcopy, it means that
objcopy won't simply copy the data to the output object, but will also
merge duplicated subsections together, so the readelf output between
the two objects will be different. This might be seen as a bug but,
given that gas will never generate such an object, I would argue that
it is an acceptable design choice.
---
bfd/elf-attrs.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 90 insertions(+), 2 deletions(-)
Comments
On 10.03.2025 18:51, Matthieu Longo wrote:
> Several subsections with the same name can be present inside an object
> file depending on how it was generated. Gas won't generate such an
> object as it is able to detect scattered declaration of a subsection,
> and merge its content before the serialization. However, others
> assemblers than Gas might keep the subsections separated, and opted for
> delegating the merge to the linker.
> Additionally, a subsection can be populated before the parsing the
> attributes section if some GNU properties are convertible to build
> attributes.
> In order to make the merge logic between object files simpler, the
> deserializer is able to handle to perform an union of the duplicated
> subsections inside an object.
> Since the same deserializaton code is used by objcopy, it means that
> objcopy won't simply copy the data to the output object, but will also
> merge duplicated subsections together, so the readelf output between
> the two objects will be different. This might be seen as a bug but,
> given that gas will never generate such an object, I would argue that
> it is an acceptable design choice.
I tend to disagree: objcopy isn't linking. ld -r may do such combining,
but imo objcopy shouldn't.
Jan
On 2025-03-11 08:15, Jan Beulich wrote:
> On 10.03.2025 18:51, Matthieu Longo wrote:
>> Several subsections with the same name can be present inside an object
>> file depending on how it was generated. Gas won't generate such an
>> object as it is able to detect scattered declaration of a subsection,
>> and merge its content before the serialization. However, others
>> assemblers than Gas might keep the subsections separated, and opted for
>> delegating the merge to the linker.
>> Additionally, a subsection can be populated before the parsing the
>> attributes section if some GNU properties are convertible to build
>> attributes.
>> In order to make the merge logic between object files simpler, the
>> deserializer is able to handle to perform an union of the duplicated
>> subsections inside an object.
>> Since the same deserializaton code is used by objcopy, it means that
>> objcopy won't simply copy the data to the output object, but will also
>> merge duplicated subsections together, so the readelf output between
>> the two objects will be different. This might be seen as a bug but,
>> given that gas will never generate such an object, I would argue that
>> it is an acceptable design choice.
>
> I tend to disagree: objcopy isn't linking. ld -r may do such combining,
> but imo objcopy shouldn't.
>
> Jan
The way I implemented the translation of GNU properties to OAv2 was a
quick hack. I thought the side effect was acceptable but you are very
right on the principle that objcopy's role is copy the data in the
sections to the output object without any transformation of those data.
I changed where this translation happened, and automatically it removed
the deduplication of subsections and attributes, and sorting outside of
objcopy path.
Fixed in the next revision.
@@ -982,6 +982,78 @@ bfd_elf_parse_attrs_subsection_v2 (bfd *abfd,
return op;
}
+/* Merge two subsections together (object attributes v2 only).
+ The result is stored into subsec1. subsec2 is destroyed.
+ Return true if the merge was successful, false otherwise. */
+static bool
+obj_attr_subsection_v2_destructive_merge (bfd *abfd,
+ obj_attr_subsection_v2 *subsec1,
+ obj_attr_subsection_v2 *subsec2)
+{
+ BFD_ASSERT (subsec1->encoding == subsec2->encoding
+ && subsec1->optional == subsec2->optional);
+
+ /* Sort the subsections before merging them together. */
+ LINKED_LIST_MERGE_SORT(obj_attr_v2) (subsec1, _bfd_elf_obj_attr_v2_cmp);
+ subsec2->first_ = _LINKED_LIST_MERGE_SORT(obj_attr_v2)
+ (subsec2->first_, _bfd_elf_obj_attr_v2_cmp);
+
+ bool success = true;
+
+ obj_attr_v2 *a1 = subsec1->first_;
+ obj_attr_v2 *a2 = subsec2->first_;
+ while (a1 != NULL && a2 != NULL)
+ {
+ if (a1->tag < a2->tag)
+ {} /* Nothing to do, a1 is already in subsec1. */
+ else if (a1->tag > a2->tag)
+ {
+ /* a2 is missing in subsec1, add it. */
+ obj_attr_v2 *previous = LINKED_LIST_REMOVE(obj_attr_v2) (subsec2, a2);
+ LINKED_LIST_INSERT_BEFORE(obj_attr_v2) (subsec1, a2, a1);
+ a2 = previous;
+ }
+ else
+ {
+ if (subsec1->encoding == ULEB128
+ && a1->vals.uint_val != a2->vals.uint_val)
+ {
+ success = false;
+ _bfd_error_handler (_("%pB: error: found 2 subsections with the "
+ "same name '%s' and found conflicting values (0x%x vs 0x%x) for"
+ " object attribute 'Tag_unknown_%u'"), abfd, subsec1->name,
+ a1->vals.uint_val, a2->vals.uint_val, a1->tag);
+ }
+ else if (subsec1->encoding == NTBS
+ && strcmp (a1->vals.string_val, a2->vals.string_val) != 0)
+ {
+ success = false;
+ _bfd_error_handler (_("%pB: error: found 2 subsections with the "
+ "same name '%s' and found conflicting values ('%s' vs '%s') for"
+ " object attribute 'Tag_unknown_%u'"), abfd, subsec1->name,
+ a1->vals.string_val, a2->vals.string_val, a1->tag);
+ }
+ }
+ a1 = a1->next;
+ a2 = a2->next;
+ }
+
+ for (; a2 != NULL; a2 = a2->next)
+ {
+ /* a2 is missing in subsec1, add it. */
+ obj_attr_v2 *previous = LINKED_LIST_REMOVE(obj_attr_v2) (subsec2, a2);
+ LINKED_LIST_INSERT_BEFORE(obj_attr_v2) (subsec1, a2, a1);
+ a2 = previous;
+ }
+
+ /* If a1 != NULL, we don't care since it is already in subsec1. */
+
+ /* Destroy subsec2 before exiting. */
+ _bfd_elf_obj_attr_subsection_v2_free (subsec2);
+
+ return success;
+}
+
/* Parse the list of subsections (object attributes v2 only). */
static void
bfd_elf_parse_attr_section_v2 (bfd *abfd,
@@ -1003,10 +1075,26 @@ bfd_elf_parse_attr_section_v2 (bfd *abfd,
bfd_set_error (bfd_error_wrong_format);
break;
}
+
+ /* Before appending the parsed subsection, check if this subsection was
+ already recorded (this can happen when a GNU property was parsed, and
+ converted to its equivalent object attributes). If so, try to merge it
+ with the existing one. */
+ obj_attr_subsection_v2 *parsed_subsec =
+ (obj_attr_subsection_v2 *) op.object;
+ obj_attr_subsection_v2 *subsec = obj_attr_subsection_v2_find_by_name
+ (elf_obj_attr_subsections (abfd).first_, parsed_subsec->name, false);
+ if (subsec == NULL)
+ LINKED_LIST_APPEND(obj_attr_subsection_v2) (subsecs, parsed_subsec);
else
- LINKED_LIST_APPEND(obj_attr_subsection_v2) (subsecs,
- (obj_attr_subsection_v2 *) op.object);
+ obj_attr_subsection_v2_destructive_merge (abfd, subsec, parsed_subsec);
}
+
+ /* Sort all the subsections and object attributes as they might not have been
+ inserted in the right order. From now on, the subsections and attributes
+ will be assumed to be always sorted. Any additive mutation will need to
+ preserve the order. */
+ obj_attr_subsection_v2_sort (subsecs);
}
/* Parse an object attributes section.