[v2] XML reader: improve XML node traversal

Message ID 20210331170454.951900-1-gprocida@google.com
State Rejected, archived
Headers
Series [v2] XML reader: improve XML node traversal |

Commit Message

Giuliano Procida March 31, 2021, 5:04 p.m. UTC
  Bug 27616: libabigail treats empty and nearly empty XML elements inconsistently

The XML reader uses libxml's Reader API for parsing XML and traverses
high-level XML nodes using reader cursor advancement. Low-level nodes
are read into complete node trees and are traversed following
children, next and (in one instance) parent pointers. Some
intermediate-level nodes can be parsed both ways.

A lot of code cares about node types, typically skipping over
non-element nodes. Unfortunately, in a few places, the tracked "corpus
node" needs to land on text (whitespace) within elements for the XML
reader to work; stripping text nodes out causes some elements to be
skipped. Almost all node tree traversals are very straightforward and
can be expressed using simpler primitves than at present.

This commit changes all the node tree traversals to use two node
iteration primitives consistently. These are "first_child" and
"next_child", both of which automatically skip text nodes. They
replace the existing "advance_to_next_sibling_element" and *all*
direct references to "children" and "next" nodes. The unwanted
dependency on text within elements has been fixed.

For-loops constructed with the new primitives present only element
nodes to the loop body. This allows a lot of redundant checking for
null / non-element nodes to be removed.

These changes have been tested by using abidiff to comparing all
current test XML files with copies where all text has been removed.
One such test case is included in this commit.

	* include/abg-libxml-utils.h
	(advance_to_next_sibling_element): Remove this function.
	(first_child): Add new function. (next_child): Likewise. This
	is simpler than but functionally identical to
	advance_to_next_sibling_element.
	* src/abg-libxml-utils.cc (skip_non_elements): New function to
	skip non-element nodes in a sibling node list. (first_child):
	New function to find the first child element of a node.
	(next_child): New function to find the next sibling element of
	a node.
	* src/abg-reader.cc (walk_xml_node_to_map_type_ids): Remove
	redundant node checks. Iterate over children with first_child
	and next_child helpers. (read_translation_unit): Likewise.
	(read_translation_unit_from_input): Remove a no-longer-needed
	->next and redundant node checks. Advance "corpus node" after
	reading the translation unit. (read_symbol_db_from_input):
	Remove a no-longer-needed ->next and advance "corpus node" at
	the end of the for-loop.  Remove redundant node checks.
	(build_needed): Remove redundant node checks. Iterate over
	children with first_child and next_child.
	(read_elf_needed_from_input): Remove redundant node checks and
	simplify logic. Move to next element node after reading
	elf-needed node. (read_corpus_from_input): Remove no-children
	early return (which was inhibited by text nodes anyway). Set
	initial "corpus node" using first_child instead of children
	pointer. Remove redundant "corpus node" updates at end of
	function as the caller takes care of this and the node can
	become null with the new primitives.
	(read_corpus_group_from_input): Iterate over children using
	first_child and next child. Set "corpus node" for each corpus,
	not just the first. Incidentally keep going even if one corpus
	parse fails. (build_namespace_decl): Remove redundant node
	checks. Iterate over children using first_child and
	next_child. (build_elf_symbol): Remove redundant node checks.
	(build_elf_symbol_db): Iterate over children using first_child
	and next_child. (build_function_decl): Remove redundant node
	checks. Iterate over children using first_child and
	next_child. (build_function_type): Likewise.
	(build_array_type_def): Likewise. (build_enum_type_decl):
	Likewise. (build_class_decl): Remove redundant node
	checks. Iterate over children using first_child and
	next_child. Add comment about skipping children of
	declaration-only types (build_union_decl): Likewise.
	(build_function_tdecl): Remove redundant node checks.
	Iterate over children using first_child and next_child.
	(build_type_composition): Likewise.
	(build_template_tparameter): Likewise.
	* tests/data/Makefile.am: Add new test files.
	* tests/data/test-abidiff-exit/test-squished-report.txt: Add
	empty diff report.
	* tests/data/test-abidiff-exit/test-squished-v0.abi: Add test
	ABI file, containing all ELF elements.
	* tests/data/test-abidiff-exit/test-squished-v1.abi: Likewise
	and identical, except that all text has been stripped.
	* tests/test-abidiff-exit.cc: Add new test case.

Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Giuliano Procida <gprocida@google.com>
---
 include/abg-libxml-utils.h                    |   5 +-
 src/abg-libxml-utils.cc                       |  47 ++--
 src/abg-reader.cc                             | 260 ++++++------------
 tests/data/Makefile.am                        |   3 +
 .../test-squished-report.txt                  |   0
 .../test-abidiff-exit/test-squished-v0.abi    |  43 +++
 .../test-abidiff-exit/test-squished-v1.abi    |   1 +
 tests/test-abidiff-exit.cc                    |  11 +
 8 files changed, 167 insertions(+), 203 deletions(-)
 create mode 100644 tests/data/test-abidiff-exit/test-squished-report.txt
 create mode 100644 tests/data/test-abidiff-exit/test-squished-v0.abi
 create mode 100644 tests/data/test-abidiff-exit/test-squished-v1.abi
  

Comments

Dodji Seketeli April 6, 2021, 11:48 a.m. UTC | #1
Hello Giuliano,

Giuliano Procida <gprocida@google.com> a écrit:

[...]

Thanks for looking into this.

Sadly, as several other patches got in before I got to review this, this
patch doesn't apply to the current state of master anymore.  So overall,
I think it would need a rebasing.  Sorry about that :-(

[...]

> The XML reader uses libxml's Reader API for parsing XML and traverses
> high-level XML nodes using reader cursor advancement. Low-level nodes
> are read into complete node trees and are traversed following
> children, next and (in one instance) parent pointers. Some
> intermediate-level nodes can be parsed both ways.
>
> A lot of code cares about node types, typically skipping over
> non-element nodes. Unfortunately, in a few places, the tracked "corpus
> node" needs to land on text (whitespace) within elements for the XML
> reader to work; stripping text nodes out causes some elements to be
> skipped.

Can you please give an example of this?  It doesn't seem obvious to me.

> Almost all node tree traversals are very straightforward and
> can be expressed using simpler primitves than at present.
>
> This commit changes all the node tree traversals to use two node
> iteration primitives consistently. These are "first_child" and
> "next_child", both of which automatically skip text nodes. They
> replace the existing "advance_to_next_sibling_element" and *all*
> direct references to "children" and "next" nodes. The unwanted
> dependency on text within elements has been fixed.

I think providing an example of the initial problem would help to
understand how this fixes the initial issue.

[...]

> For-loops constructed with the new primitives present only element
> nodes to the loop body. This allows a lot of redundant checking for
> null / non-element nodes to be removed.
>
> These changes have been tested by using abidiff to comparing all
> current test XML files with copies where all text has been removed.
> One such test case is included in this commit.
>
> 	* include/abg-libxml-utils.h
> 	(advance_to_next_sibling_element): Remove this function.
> 	(first_child): Add new function. (next_child): Likewise. This

Please note how the "(next_child)" should go on the next line.

There are several instances of this issue in the ChangeLog part of the
commit log.

[...]

> 	is simpler than but functionally identical to
> 	advance_to_next_sibling_element.
> 	* src/abg-libxml-utils.cc (skip_non_elements): New function to
> 	skip non-element nodes in a sibling node list. (first_child):
> 	New function to find the first child element of a node.
> 	(next_child): New function to find the next sibling element of
> 	a node.
> 	* src/abg-reader.cc (walk_xml_node_to_map_type_ids): Remove
> 	redundant node checks. Iterate over children with first_child
> 	and next_child helpers. (read_translation_unit): Likewise.
> 	(read_translation_unit_from_input): Remove a no-longer-needed
> 	->next and redundant node checks. Advance "corpus node" after
> 	reading the translation unit. (read_symbol_db_from_input):
> 	Remove a no-longer-needed ->next and advance "corpus node" at
> 	the end of the for-loop.  Remove redundant node checks.
> 	(build_needed): Remove redundant node checks. Iterate over
> 	children with first_child and next_child.
> 	(read_elf_needed_from_input): Remove redundant node checks and
> 	simplify logic. Move to next element node after reading
> 	elf-needed node. (read_corpus_from_input): Remove no-children
> 	early return (which was inhibited by text nodes anyway). Set
> 	initial "corpus node" using first_child instead of children
> 	pointer. Remove redundant "corpus node" updates at end of
> 	function as the caller takes care of this and the node can
> 	become null with the new primitives.
> 	(read_corpus_group_from_input): Iterate over children using
> 	first_child and next child. Set "corpus node" for each corpus,
> 	not just the first. Incidentally keep going even if one corpus
> 	parse fails. (build_namespace_decl): Remove redundant node
> 	checks. Iterate over children using first_child and
> 	next_child. (build_elf_symbol): Remove redundant node checks.
> 	(build_elf_symbol_db): Iterate over children using first_child
> 	and next_child. (build_function_decl): Remove redundant node
> 	checks. Iterate over children using first_child and
> 	next_child. (build_function_type): Likewise.
> 	(build_array_type_def): Likewise. (build_enum_type_decl):
> 	Likewise. (build_class_decl): Remove redundant node
> 	checks. Iterate over children using first_child and
> 	next_child. Add comment about skipping children of
> 	declaration-only types (build_union_decl): Likewise.
> 	(build_function_tdecl): Remove redundant node checks.
> 	Iterate over children using first_child and next_child.
> 	(build_type_composition): Likewise.
> 	(build_template_tparameter): Likewise.
> 	* tests/data/Makefile.am: Add new test files.
> 	* tests/data/test-abidiff-exit/test-squished-report.txt: Add
> 	empty diff report.
> 	* tests/data/test-abidiff-exit/test-squished-v0.abi: Add test
> 	ABI file, containing all ELF elements.
> 	* tests/data/test-abidiff-exit/test-squished-v1.abi: Likewise
> 	and identical, except that all text has been stripped.
> 	* tests/test-abidiff-exit.cc: Add new test case.
>
> Reviewed-by: Matthias Maennich <maennich@google.com>
> Signed-off-by: Giuliano Procida <gprocida@google.com>
> ---
>  include/abg-libxml-utils.h                    |   5 +-
>  src/abg-libxml-utils.cc                       |  47 ++--
>  src/abg-reader.cc                             | 260 ++++++------------
>  tests/data/Makefile.am                        |   3 +
>  .../test-squished-report.txt                  |   0
>  .../test-abidiff-exit/test-squished-v0.abi    |  43 +++
>  .../test-abidiff-exit/test-squished-v1.abi    |   1 +
>  tests/test-abidiff-exit.cc                    |  11 +
>  8 files changed, 167 insertions(+), 203 deletions(-)
>  create mode 100644 tests/data/test-abidiff-exit/test-squished-report.txt
>  create mode 100644 tests/data/test-abidiff-exit/test-squished-v0.abi
>  create mode 100644 tests/data/test-abidiff-exit/test-squished-v1.abi
>
> diff --git a/include/abg-libxml-utils.h b/include/abg-libxml-utils.h
> index 69e940f6..f87ee5a0 100644
> --- a/include/abg-libxml-utils.h
> +++ b/include/abg-libxml-utils.h
> @@ -82,7 +82,10 @@ int get_xml_node_depth(xmlNodePtr);
>    reinterpret_cast<char*>(xml_char_str.get())
>  
>  xmlNodePtr
> -advance_to_next_sibling_element(xmlNodePtr node);
> +first_child(xmlNodePtr node);
> +
> +xmlNodePtr
> +next_child(xmlNodePtr node);
>  
>  void
>  escape_xml_string(const std::string& str,
> diff --git a/src/abg-libxml-utils.cc b/src/abg-libxml-utils.cc
> index a1acff1f..baea25ac 100644
> --- a/src/abg-libxml-utils.cc
> +++ b/src/abg-libxml-utils.cc
> @@ -413,40 +413,39 @@ unescape_xml_comment(const std::string& str)
>    return result;
>  }
>  
> -/// Maybe get the next sibling element node of an XML node, or stay to the sam
> +/// Skip until we reach an XML element node or run out of (child) nodes.
>  ///
> -/// If there is no next sibling xml element node, the function returns
> -/// the initial node.
> +/// @param a pointer to a node.
>  ///
> -/// @param node the initial node to consider.
> -///
> -/// @return the next sibling node or the initial node @p node.
> -static xmlNodePtr
> -go_to_next_sibling_element_or_stay(xmlNodePtr node)

Removal of this is not documented in the ChangeLog part of the commit log.

> +/// @return a pointer to an XML element node or nil.
> +xmlNodePtr
> +skip_non_elements(xmlNodePtr node)

This does exactly the same as the removed
go_to_next_sibling_element_or_stay or am I missing something?


>  {
> -  xmlNodePtr n;
> -  for (n = node; n; n = n->next)
> -    {
> -      if (n->type == XML_ELEMENT_NODE)
> -	break;
> -    }
> -  return n ? n : node;
> +  while (node && node->type != XML_ELEMENT_NODE)
> +    node = node->next;
> +  return node;
>  }
>  
> -/// Get the next sibling element node of an XML node.
> +/// Get the first child element of an XML node.
>  ///
> -/// If there is no next sibling xml element node, the function returns nil.
> +/// @param a pointer to the node whose children are to be perused.
>  ///
> -/// @param node the XML node to consider.
> +/// @return a pointer to the first XML element child of @p node or nil.
> +xmlNodePtr
> +first_child(xmlNodePtr node)

Please call this first_element_child.  Otherwise, it can be misleading,
I think.

> +{
> +  return skip_non_elements(node->children);
> +}
> +
> +/// Get the next sibling element of an XML node.
>  ///
> -/// @return the next sibling element node or nil.
> +/// @param a pointer to the XML node whose following siblings are to be perused.
> +///
> +/// @return a pointer to the next sibling element of @p node or nil.
>  xmlNodePtr
> -advance_to_next_sibling_element(xmlNodePtr node)
> +next_child(xmlNodePtr node)

Looking at the code of this function I think it should rather be called
next_element_sibling.  Or what am I missing?

>  {
> -  xmlNodePtr n = go_to_next_sibling_element_or_stay(node->next);
> -  if (n == 0 || n->type != XML_ELEMENT_NODE)
> -    return 0;
> -  return n;
> +  return skip_non_elements(node->next);
>  }

[...]

Cheers,
  
Giuliano Procida April 6, 2021, 2:35 p.m. UTC | #2
Hi.

Thanks for reviewing.

On Tue, 6 Apr 2021 at 12:48, Dodji Seketeli <dodji@seketeli.org> wrote:

> Hello Giuliano,
>
> Giuliano Procida <gprocida@google.com> a écrit:
>
> [...]
>
> Thanks for looking into this.
>
> Sadly, as several other patches got in before I got to review this, this
> patch doesn't apply to the current state of master anymore.  So overall,
> I think it would need a rebasing.  Sorry about that :-(
>
>
No problem. So long as the public master is the same as your working
master, I can easily rebase and resend. My v2 posting reflects the
current public master.


> [...]
>
> > The XML reader uses libxml's Reader API for parsing XML and traverses
> > high-level XML nodes using reader cursor advancement. Low-level nodes
> > are read into complete node trees and are traversed following
> > children, next and (in one instance) parent pointers. Some
> > intermediate-level nodes can be parsed both ways.
> >
> > A lot of code cares about node types, typically skipping over
> > non-element nodes. Unfortunately, in a few places, the tracked "corpus
> > node" needs to land on text (whitespace) within elements for the XML
> > reader to work; stripping text nodes out causes some elements to be
> > skipped.
>
> Can you please give an example of this?  It doesn't seem obvious to me.
>

Of course. The contributed test case serves as an example (it just has all
unnecessary space stripped out). It wasn't obvious to me either!

I'm sure there is a much smaller fix that could be made to address each
case of sensitivity to whitespace. However, the main contribution here are
the new primitives which make, I hope, the parsing code easier to read and
understand and harder to hide bugs in.


>
> > Almost all node tree traversals are very straightforward and
> > can be expressed using simpler primitves than at present.
> >
> > This commit changes all the node tree traversals to use two node
> > iteration primitives consistently. These are "first_child" and
> > "next_child", both of which automatically skip text nodes. They
> > replace the existing "advance_to_next_sibling_element" and *all*
> > direct references to "children" and "next" nodes. The unwanted
> > dependency on text within elements has been fixed.
>
> I think providing an example of the initial problem would help to
> understand how this fixes the initial issue.
>
>
Please see the test case. We'd like to have confidence in the semantics of
libabigail XML, independent of the current reader and writer
implementations. Transformations that do not affect the XML element
structure should not change XML meaning, but do at present.


> [...]
>
> > For-loops constructed with the new primitives present only element
> > nodes to the loop body. This allows a lot of redundant checking for
> > null / non-element nodes to be removed.
> >
> > These changes have been tested by using abidiff to comparing all
> > current test XML files with copies where all text has been removed.
> > One such test case is included in this commit.
> >
> >       * include/abg-libxml-utils.h
> >       (advance_to_next_sibling_element): Remove this function.
> >       (first_child): Add new function. (next_child): Likewise. This
>
> Please note how the "(next_child)" should go on the next line.
>
> There are several instances of this issue in the ChangeLog part of the
> commit log.
>
>
Noted. I may have been doing this inconsistently for a while. I'll correct
this.


> [...]
>
> >       is simpler than but functionally identical to
> >       advance_to_next_sibling_element.
> >       * src/abg-libxml-utils.cc (skip_non_elements): New function to
> >       skip non-element nodes in a sibling node list. (first_child):
> >       New function to find the first child element of a node.
> >       (next_child): New function to find the next sibling element of
> >       a node.
> >       * src/abg-reader.cc (walk_xml_node_to_map_type_ids): Remove
> >       redundant node checks. Iterate over children with first_child
> >       and next_child helpers. (read_translation_unit): Likewise.
> >       (read_translation_unit_from_input): Remove a no-longer-needed
> >       ->next and redundant node checks. Advance "corpus node" after
> >       reading the translation unit. (read_symbol_db_from_input):
> >       Remove a no-longer-needed ->next and advance "corpus node" at
> >       the end of the for-loop.  Remove redundant node checks.
> >       (build_needed): Remove redundant node checks. Iterate over
> >       children with first_child and next_child.
> >       (read_elf_needed_from_input): Remove redundant node checks and
> >       simplify logic. Move to next element node after reading
> >       elf-needed node. (read_corpus_from_input): Remove no-children
> >       early return (which was inhibited by text nodes anyway). Set
> >       initial "corpus node" using first_child instead of children
> >       pointer. Remove redundant "corpus node" updates at end of
> >       function as the caller takes care of this and the node can
> >       become null with the new primitives.
> >       (read_corpus_group_from_input): Iterate over children using
> >       first_child and next child. Set "corpus node" for each corpus,
> >       not just the first. Incidentally keep going even if one corpus
> >       parse fails. (build_namespace_decl): Remove redundant node
> >       checks. Iterate over children using first_child and
> >       next_child. (build_elf_symbol): Remove redundant node checks.
> >       (build_elf_symbol_db): Iterate over children using first_child
> >       and next_child. (build_function_decl): Remove redundant node
> >       checks. Iterate over children using first_child and
> >       next_child. (build_function_type): Likewise.
> >       (build_array_type_def): Likewise. (build_enum_type_decl):
> >       Likewise. (build_class_decl): Remove redundant node
> >       checks. Iterate over children using first_child and
> >       next_child. Add comment about skipping children of
> >       declaration-only types (build_union_decl): Likewise.
> >       (build_function_tdecl): Remove redundant node checks.
> >       Iterate over children using first_child and next_child.
> >       (build_type_composition): Likewise.
> >       (build_template_tparameter): Likewise.
> >       * tests/data/Makefile.am: Add new test files.
> >       * tests/data/test-abidiff-exit/test-squished-report.txt: Add
> >       empty diff report.
> >       * tests/data/test-abidiff-exit/test-squished-v0.abi: Add test
> >       ABI file, containing all ELF elements.
> >       * tests/data/test-abidiff-exit/test-squished-v1.abi: Likewise
> >       and identical, except that all text has been stripped.
> >       * tests/test-abidiff-exit.cc: Add new test case.
> >
> > Reviewed-by: Matthias Maennich <maennich@google.com>
> > Signed-off-by: Giuliano Procida <gprocida@google.com>
> > ---
> >  include/abg-libxml-utils.h                    |   5 +-
> >  src/abg-libxml-utils.cc                       |  47 ++--
> >  src/abg-reader.cc                             | 260 ++++++------------
> >  tests/data/Makefile.am                        |   3 +
> >  .../test-squished-report.txt                  |   0
> >  .../test-abidiff-exit/test-squished-v0.abi    |  43 +++
> >  .../test-abidiff-exit/test-squished-v1.abi    |   1 +
> >  tests/test-abidiff-exit.cc                    |  11 +
> >  8 files changed, 167 insertions(+), 203 deletions(-)
> >  create mode 100644 tests/data/test-abidiff-exit/test-squished-report.txt
> >  create mode 100644 tests/data/test-abidiff-exit/test-squished-v0.abi
> >  create mode 100644 tests/data/test-abidiff-exit/test-squished-v1.abi
> >
> > diff --git a/include/abg-libxml-utils.h b/include/abg-libxml-utils.h
> > index 69e940f6..f87ee5a0 100644
> > --- a/include/abg-libxml-utils.h
> > +++ b/include/abg-libxml-utils.h
> > @@ -82,7 +82,10 @@ int get_xml_node_depth(xmlNodePtr);
> >    reinterpret_cast<char*>(xml_char_str.get())
> >
> >  xmlNodePtr
> > -advance_to_next_sibling_element(xmlNodePtr node);
> > +first_child(xmlNodePtr node);
> > +
> > +xmlNodePtr
> > +next_child(xmlNodePtr node);
> >
> >  void
> >  escape_xml_string(const std::string& str,
> > diff --git a/src/abg-libxml-utils.cc b/src/abg-libxml-utils.cc
> > index a1acff1f..baea25ac 100644
> > --- a/src/abg-libxml-utils.cc
> > +++ b/src/abg-libxml-utils.cc
> > @@ -413,40 +413,39 @@ unescape_xml_comment(const std::string& str)
> >    return result;
> >  }
> >
> > -/// Maybe get the next sibling element node of an XML node, or stay to
> the sam
> > +/// Skip until we reach an XML element node or run out of (child) nodes.
> >  ///
> > -/// If there is no next sibling xml element node, the function returns
> > -/// the initial node.
> > +/// @param a pointer to a node.
> >  ///
> > -/// @param node the initial node to consider.
> > -///
> > -/// @return the next sibling node or the initial node @p node.
> > -static xmlNodePtr
> > -go_to_next_sibling_element_or_stay(xmlNodePtr node)
>
> Removal of this is not documented in the ChangeLog part of the commit log.
>
> > +/// @return a pointer to an XML element node or nil.
> > +xmlNodePtr
> > +skip_non_elements(xmlNodePtr node)
>
> This does exactly the same as the removed
> go_to_next_sibling_element_or_stay or am I missing something?
>
>
The only (but crucial) difference is that skip_non_elements is happy to
return nil when it reaches the end of the list while
go_to_next_sibling_element_or_stay will not go past the last element.

For example, consider a sibling (->next) chain: whitespace1 -> elf-symbol
-> whitespace2 -> nil that might occur in a XML file containing a single
ELF symbol.

Once the parser has consumed the symbol, we want the next element (but
there is none).

The new code does: next_child which calls skip_non_elements(node->next) and
both return nil.
The old code does: advance_to_next_sibling_element(node) which
calls go_to_next_sibling_element_or_stay(node->next) which returns
whitespace2 and then an extra check is needed on the node type to
eventually return nil.

So the new code can be thought of as factoring the ? : conditional logic
in go_to_next_sibling_element_or_stay out into the
calling advance_to_next_sibling_element and renaming the two functions.
Then there's adding first_child which uses the newly factored function in a
consistent fashion. Each node's type is tested exactly once.


> >  {
> > -  xmlNodePtr n;
> > -  for (n = node; n; n = n->next)
> > -    {
> > -      if (n->type == XML_ELEMENT_NODE)
> > -     break;
> > -    }
> > -  return n ? n : node;
> > +  while (node && node->type != XML_ELEMENT_NODE)
> > +    node = node->next;
> > +  return node;
> >  }
> >
> > -/// Get the next sibling element node of an XML node.
> > +/// Get the first child element of an XML node.
> >  ///
> > -/// If there is no next sibling xml element node, the function returns
> nil.
> > +/// @param a pointer to the node whose children are to be perused.
> >  ///
> > -/// @param node the XML node to consider.
> > +/// @return a pointer to the first XML element child of @p node or nil.
> > +xmlNodePtr
> > +first_child(xmlNodePtr node)
>
> Please call this first_element_child.  Otherwise, it can be misleading,
> I think.
>
>
The short names are to aid readability of the for-loops. Unless we are
going to start caring about non-element nodes, "element" is redundant.

One step (much) further would be to define an adaptor object and iterators
so the loops could look like:

for (const auto* child : children(node)) { ... }

or even

for (const auto& child : children(node)) { ... }

but that's more machinery and coding to worry about.

> +{
> > +  return skip_non_elements(node->children);
> > +}
> > +
> > +/// Get the next sibling element of an XML node.
> >  ///
> > -/// @return the next sibling element node or nil.
> > +/// @param a pointer to the XML node whose following siblings are to be
> perused.
> > +///
> > +/// @return a pointer to the next sibling element of @p node or nil.
> >  xmlNodePtr
> > -advance_to_next_sibling_element(xmlNodePtr node)
> > +next_child(xmlNodePtr node)
>
> Looking at the code of this function I think it should rather be called
> next_element_sibling.  Or what am I missing?
>
>
As above, we don't want to even care about non-element nodes outside
skip_non_elements and mentioning "element" is redundant. All the rest of
the code works with nodes of type XML_ELEMENT_NODE only (and this constant
now appears exactly once in the code).

"child" vs "sibling" are close to interchangeable in the context of
traversing the children of a given node:
* next sibling = next child of the same parent
* "child" is 2 characters shorter than sibling
* "first sibling" means the wrong thing

>  {
> > -  xmlNodePtr n = go_to_next_sibling_element_or_stay(node->next);
> > -  if (n == 0 || n->type != XML_ELEMENT_NODE)
> > -    return 0;
> > -  return n;
> > +  return skip_non_elements(node->next);
> >  }
>
> [...]
>
> Cheers,
>
>
> --
>                 Dodji
>

Regards,
Giuliano.
  
Dodji Seketeli April 15, 2021, 2:12 p.m. UTC | #3
Giuliano Procida <gprocida@google.com> a écrit:

> Hi.
>
> Thanks for reviewing.
>
> On Tue, 6 Apr 2021 at 12:48, Dodji Seketeli <dodji@seketeli.org> wrote:
>
>> Hello Giuliano,
>>
>> Giuliano Procida <gprocida@google.com> a écrit:
>>
>> [...]
>>
>> Thanks for looking into this.
>>
>> Sadly, as several other patches got in before I got to review this, this
>> patch doesn't apply to the current state of master anymore.  So overall,
>> I think it would need a rebasing.  Sorry about that :-(
>>
>>
> No problem. So long as the public master is the same as your working
> master, I can easily rebase and resend. My v2 posting reflects the
> current public master.

Thanks.  I have seen the v4 posting too.  This message is also a reply
to that post.

[...]

>> > The XML reader uses libxml's Reader API for parsing XML and traverses
>> > high-level XML nodes using reader cursor advancement. Low-level nodes
>> > are read into complete node trees and are traversed following
>> > children, next and (in one instance) parent pointers. Some
>> > intermediate-level nodes can be parsed both ways.
>> >
>> > A lot of code cares about node types, typically skipping over
>> > non-element nodes. Unfortunately, in a few places, the tracked "corpus
>> > node" needs to land on text (whitespace) within elements for the XML
>> > reader to work; stripping text nodes out causes some elements to be
>> > skipped.
>>
>> Can you please give an example of this?  It doesn't seem obvious to me.
>>
>
> Of course. The contributed test case serves as an example (it just has all
> unnecessary space stripped out). It wasn't obvious to me either!

Sorry, I wasn't clear enough.  I guess my main point is that in this
introductory text, I find it important to be precise about what the
actual issue is.  Yes I can dig into the information around and
understand it, but having it clearly stated in the commit log does have
real value for people trying to understand what is going on.

No problem.  I dug into it and I think I have understood what is going
on.  I'll update the commit log accordingly.

> I'm sure there is a much smaller fix that could be made to address each
> case of sensitivity to whitespace. However, the main contribution here are
> the new primitives which make, I hope, the parsing code easier to read and
> understand and harder to hide bugs in.

After digging into all this, I think the issue can indeed be addressed
by a smaller fix overall.  I don't think an almost re-write like this is
necessary.  By default I tend to be reluctant of re-writes in general,
unless the original code really doesn't cut if for what we want to do.
I've accepted re-writes, so I it's really not a religious position.  In
any case, I'd rather try to understand and address the root cause of the
particular issue at hand.

I do appreciate all the time and effort you did put into this, and I
thank you for that.  I honestly am not comfortable doing this, but I
really think it's better to keep what's in there and fix the root cause
of issues instead.  What is there has been tested and is consistent.
When we find issues, if the foundations appear to not be adequate to
serve what we want to do, then I am fine with coming up with something
different and new, as long as it integrates well (in its spirit and
form) with the rest. Otherwise, I'd rather make what we have evolve.

So I am posting a candidate patch that builds on your work and that
should address the issues your raised in
https://sourceware.org/bugzilla/show_bug.cgi?id=27616.

[...]

> Please see the test case. We'd like to have confidence in the semantics of
> libabigail XML, independent of the current reader and writer
> implementations. Transformations that do not affect the XML element
> structure should not change XML meaning, but do at present.

Well, I don't disagree with the intent.  There are issues in the
current interpretation of the abixml and they ought to be fixed.  And
this is hopefully happening right now.

[...]

I am proposing this evolutive fix that should address the issue you
raised.

From 94d0194819a34a4972a1575da93a77fe5e19a010 Mon Sep 17 00:00:00 2001
From: Dodji Seketeli <dodji@redhat.com>
Date: Wed, 14 Apr 2021 15:04:35 +0200
Subject: [PATCH] reader: Handle 'abi-corpus' element being possibly empty

The abixml reader wrongly assumes that the 'abi-corpus' element is
always non-empty.  Note that until now, the only emitter of abixml
consumed in practice was abg-writer.cc and it only emits non-empty
'abi-corpus' elements.  So the issue wasn't exposed.

So, the reader assumes that an 'abi-corpus' element has at least a
text node.

For instance, consider this minimal input file named test-v0.abi:

    $cat test-v0.abi

    <abi-corpus-group architecture='elf-arm-aarch64'>
     <abi-corpus path='vmlinux' architecture='elf-arm-aarch64'>
     </abi-corpus>
    </abi-corpus-group>

    $

Now, compare it to this file where the abi-corpus element is an empty
element (doesn't even contain any text):

    $cat test-v0.abi

    <abi-corpus-group architecture='elf-arm-aarch64'>
     <abi-corpus path='vmlinux'/>
    </abi-corpus-group>

    $

comparing the two files with abidiff (wrongly) reports:

    $ abidiff test-v0.abi test-v1.abi
    ELF architecture changed
    Functions changes summary: 0 Removed, 0 Changed, 0 Added function
    Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

    architecture changed from 'elf-arm-aarch64' to ''
    $

What's happening is that read_corpus_from_input is getting out early
when it sees that the node is empty.  This is at:

   xmlNodePtr node = ctxt.get_corpus_node();
@@ -1907,10 +1925,14 @@ read_corpus_from_input(read_context& ctxt)
 	corp.set_soname(reinterpret_cast<char*>(soname_str.get()));
     }

  if (!node->children)  // <---- we get out early here and we
    return nil;         // forget about the properties of
                        // the current empty corpus element node

So, at its core, fixing the issue at hand involves avoiding the early
return there.

But then, it turns out that's not enough.

In the current setting, the different abixml processing entry points
are designed to be used in a semi "streaming" mode.

So for instance, read_translation_unit_from_input can be invoked
repeatedly to "stream in" the next translation unit at each
invocation.

Alternatively, the lower level xmlTextReaderNext can be used to
iterate over XML node until we reach the translation unit XML element
we are interested in.  At that point xmlTextReaderExpand can be used
to expand the XML node, then we let the context know that this is
the current node of the corpus that needs to be processed, using
read_context::get_corpus_node.  Once we've done that,
read_translation_unit_from_input can be called to process that
particular corpus node.  Note that the corpus node at hand, that needs
to be processed will be retrieved by read_context::get_corpus_node.

These two modes of operation are also available for
read_corpus_from_input, read_symbol_db_from_input,
read_elf_needed_from_input etc.

Today, these functions all assume that the current node returned by
read_context::get_corpus_node is the node /before/ the node of the
corpus to be processed.  So they all start looking at the /next sibling/
of the node returned by read_context::get_corpus_node.  So the code
was implicitly assuming that read_context::get_corpus_node was
pointing to a text node that was before the node of the corpus that we
want to process.

This is wrong.  read_context::get_corpus_node should just return the
current node of the corpus that needs to be processed and voila.

And so read_context::set_corpus_node should be used to set the current
node of the corpus to the current element node that needs to be processed.

That's the spirit of the change done by this patch.

As its name suggests, the existing
xml::advance_to_next_sibling_element is used to skip non element xml
nodes (including text nodes) and move to the next element node to
process, which is set to the context using
read_context::set_corpus_node.

Then the actual processing functions like read_corpus_from_input get
the node to process, using read_context::get_corpus_node and process
it rather than processing the sibling node that comes after it.

The other changes are either to prevent related crashes that I noticed
while doing various tests, update the abilint tool used to read and
debug abixml input files and add better documentation.

	* src/abg-reader.cc (read_context::get_corpus_node): Add comment
	to this member function.
	(read_translation_unit_from_input, read_symbol_db_from_input)
	(read_elf_needed_from_input): Start processing the current node of
	the corpus that needs to be processed rather than its next
	sibling.  Once the processing is done, set the new "current node
	of the corpus to be processed" properly by skipping to the next
	element node to be processed.
	(read_corpus_from_input): Don't get out early when the
	'abi-corpus' element is empty.  If, however, it has children node,
	skip to the first child element and flag it -- using
	read_context::set_corpus_node -- as being the element node to be
	processed by the processing facilities of the reader.  If we are
	in a mode where we called xmlTextReaderExpand ourselves to get the
	node to process, then it means we need to free that node
	indirectly by calling xmlTextReaderNext.  In that case, that node
	should not be flagged by read_context::set_corpus_node.  Add more
	comments.
	* src/abg-corpus.cc (corpus::is_empty): Do not crash when no
	symtab is around.
	* src/abg-libxml-utils.cc (go_to_next_sibling_element_or_stay):
	Fix typo in comment.
	(advance_to_next_sibling_element): Don't crash when given a nil
	node.
	* tests/data/test-abidiff/test-PR27616-squished-v0.abi: Add new
	test input.
	* tests/data/test-abidiff/test-PR27616-squished-v1.abi: Likewise.
	* tests/data/test-abidiff/test-PR27616-v0.xml: Likewise.
	* tests/data/test-abidiff/test-PR27616-v1.xml: Likewise.
	* tests/data/Makefile.am: Add the new test inputs above to source
	distribution.
	* tests/test-abidiff.cc (specs): Add the new tests inputs above to
	this harness.
	* tools/abilint.cc (main): Support writing corpus groups.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 src/abg-corpus.cc                             |  2 +-
 src/abg-libxml-utils.cc                       |  6 +-
 src/abg-reader.cc                             | 65 +++++++++++++------
 tests/data/Makefile.am                        |  4 ++
 .../test-abidiff/test-PR27616-squished-v0.abi | 43 ++++++++++++
 .../test-abidiff/test-PR27616-squished-v1.abi |  1 +
 tests/data/test-abidiff/test-PR27616-v0.xml   |  4 ++
 tests/data/test-abidiff/test-PR27616-v1.xml   |  3 +
 tests/test-abidiff.cc                         | 12 ++++
 tools/abilint.cc                              |  9 ++-
 10 files changed, 126 insertions(+), 23 deletions(-)
 create mode 100644 tests/data/test-abidiff/test-PR27616-squished-v0.abi
 create mode 100644 tests/data/test-abidiff/test-PR27616-squished-v1.abi
 create mode 100644 tests/data/test-abidiff/test-PR27616-v0.xml
 create mode 100644 tests/data/test-abidiff/test-PR27616-v1.xml

diff --git a/src/abg-corpus.cc b/src/abg-corpus.cc
index c9f5c56b..0c684a93 100644
--- a/src/abg-corpus.cc
+++ b/src/abg-corpus.cc
@@ -1009,7 +1009,7 @@ corpus::is_empty() const
 	}
     }
   return (members_empty
-	  && !get_symtab()->has_symbols()
+	  && (!get_symtab() || !get_symtab()->has_symbols())
 	  && priv_->soname.empty()
 	  && priv_->needed.empty());
 }
diff --git a/src/abg-libxml-utils.cc b/src/abg-libxml-utils.cc
index a1acff1f..6b55d3ed 100644
--- a/src/abg-libxml-utils.cc
+++ b/src/abg-libxml-utils.cc
@@ -413,7 +413,8 @@ unescape_xml_comment(const std::string& str)
   return result;
 }
 
-/// Maybe get the next sibling element node of an XML node, or stay to the sam
+/// Maybe get the next sibling element node of an XML node, or stay to
+/// the same.
 ///
 /// If there is no next sibling xml element node, the function returns
 /// the initial node.
@@ -443,6 +444,9 @@ go_to_next_sibling_element_or_stay(xmlNodePtr node)
 xmlNodePtr
 advance_to_next_sibling_element(xmlNodePtr node)
 {
+  if (!node)
+    return 0;
+
   xmlNodePtr n = go_to_next_sibling_element_or_stay(node->next);
   if (n == 0 || n->type != XML_ELEMENT_NODE)
     return 0;
diff --git a/src/abg-reader.cc b/src/abg-reader.cc
index f40e3d22..d2c76a38 100644
--- a/src/abg-reader.cc
+++ b/src/abg-reader.cc
@@ -196,10 +196,20 @@ public:
   get_reader() const
   {return m_reader;}
 
+  /// Getter of the current XML node in the corpus element sub-tree
+  /// that needs to be processed.
+  ///
+  /// @return the current XML node in the corpus element sub-tree that
+  /// needs to be processed.
   xmlNodePtr
   get_corpus_node() const
   {return m_corp_node;}
 
+  /// Setter of the current XML node in the corpus element sub-tree
+  /// that needs to be processed.
+  ///
+  /// @param node set the current XML node in the corpus element
+  /// sub-tree that needs to be processed.
   void
   set_corpus_node(xmlNodePtr node)
   {m_corp_node = node;}
@@ -1485,7 +1495,7 @@ read_translation_unit_from_input(read_context&	ctxt)
   else
     {
       node = 0;
-      for (xmlNodePtr n = ctxt.get_corpus_node()->next; n; n = n->next)
+      for (xmlNodePtr n = ctxt.get_corpus_node(); n; n = n->next)
 	{
 	  if (!n
 	      || n->type != XML_ELEMENT_NODE)
@@ -1501,15 +1511,17 @@ read_translation_unit_from_input(read_context&	ctxt)
     return nil;
 
   tu = get_or_read_and_add_translation_unit(ctxt, node);
-  // So read_translation_unit() can trigger (under the hood) reading
-  // from several translation units just because
-  // read_context::get_scope_for_node() has been called.  In that
-  // case, after that unexpected call to read_translation_unit(), the
-  // current corpus node of the context is going to point to that
-  // translation unit that has been read under the hood.  Let's set
-  // the corpus node to the one we initially called
-  // read_translation_unit() on here.
-  ctxt.set_corpus_node(node);
+
+  if (ctxt.get_corpus_node())
+    {
+      // We are not in the mode where the current corpus node came
+      // from a local invocation of xmlTextReaderExpand.  So let's set
+      // ctxt.get_corpus_node to the next child element node of the
+      // corpus that needs to be processed.
+      node = xml::advance_to_next_sibling_element(node);
+      ctxt.set_corpus_node(node);
+    }
+
   return tu;
 }
 
@@ -1583,7 +1595,7 @@ read_symbol_db_from_input(read_context&		 ctxt,
 	xmlTextReaderNext(reader.get());
       }
   else
-    for (xmlNodePtr n = ctxt.get_corpus_node()->next; n; n = n->next)
+    for (xmlNodePtr n = ctxt.get_corpus_node(); n; n = n->next)
       {
 	if (!n || n->type != XML_ELEMENT_NODE)
 	  continue;
@@ -1594,8 +1606,11 @@ read_symbol_db_from_input(read_context&		 ctxt,
 	else if (xmlStrEqual(n->name, BAD_CAST("elf-variable-symbols")))
 	  has_var_syms = true;
 	else
-	  break;
-	ctxt.set_corpus_node(n);
+	  {
+	    ctxt.set_corpus_node(n);
+	    break;
+	  }
+
 	if (has_fn_syms)
 	  {
 	    fn_symdb = build_elf_symbol_db(ctxt, n, true);
@@ -1688,7 +1703,7 @@ read_elf_needed_from_input(read_context&	ctxt,
     }
   else
     {
-      for (xmlNodePtr n = ctxt.get_corpus_node()->next; n; n = n->next)
+      for (xmlNodePtr n = ctxt.get_corpus_node(); n; n = n->next)
 	{
 	  if (!n || n->type != XML_ELEMENT_NODE)
 	    continue;
@@ -1703,6 +1718,7 @@ read_elf_needed_from_input(read_context&	ctxt,
   if (node)
     {
       result = build_needed(node, needed);
+      node = xml::advance_to_next_sibling_element(node);
       ctxt.set_corpus_node(node);
     }
 
@@ -1806,6 +1822,8 @@ read_corpus_from_input(read_context& ctxt)
   if (!reader)
     return nil;
 
+  // This is to remember to call xmlTextReaderNext if we ever call
+  // xmlTextReaderExpand.
   bool call_reader_next = false;
 
   xmlNodePtr node = ctxt.get_corpus_node();
@@ -1907,10 +1925,14 @@ read_corpus_from_input(read_context& ctxt)
 	corp.set_soname(reinterpret_cast<char*>(soname_str.get()));
     }
 
-  if (!node->children)
-    return nil;
-
-  ctxt.set_corpus_node(node->children);
+  // If the corpus element node has children nodes, make
+  // ctxt.get_corpus_node() returns the first child element node of
+  // the corpus element that *needs* to be processed.
+  if (node->children)
+    {
+      xmlNodePtr n = xml::advance_to_next_sibling_element(node->children);
+      ctxt.set_corpus_node(n);
+    }
 
   corpus& corp = *ctxt.get_corpus();
 
@@ -1966,6 +1988,10 @@ read_corpus_from_input(read_context& ctxt)
       // This is the necessary counter-part of the xmlTextReaderExpand()
       // call at the beginning of the function.
       xmlTextReaderNext(reader.get());
+      // The call above invalidates the xml node returned by
+      // xmlTextReaderExpand, which is can still be accessed via
+      // ctxt.set_corpus_node.
+      ctxt.set_corpus_node(0);
     }
   else
     {
@@ -1974,7 +2000,8 @@ read_corpus_from_input(read_context& ctxt)
       if (!node)
 	{
 	  node = ctxt.get_corpus_node();
-	  node = xml::advance_to_next_sibling_element(node->parent);
+	  if (node)
+	    node = xml::advance_to_next_sibling_element(node->parent);
 	}
       ctxt.set_corpus_node(node);
     }
diff --git a/tests/data/Makefile.am b/tests/data/Makefile.am
index 576309e8..0edfe26c 100644
--- a/tests/data/Makefile.am
+++ b/tests/data/Makefile.am
@@ -206,6 +206,10 @@ test-abidiff-exit/test-crc-v1.abi \
 test-abidiff-exit/test-missing-alias-report.txt \
 test-abidiff-exit/test-missing-alias.abi \
 test-abidiff-exit/test-missing-alias.suppr \
+test-abidiff/test-PR27616-squished-v0.abi \
+test-abidiff/test-PR27616-squished-v1.abi \
+test-abidiff/test-PR27616-v0.xml \
+test-abidiff/test-PR27616-v1.xml \
 \
 test-diff-dwarf/test0-v0.cc		\
 test-diff-dwarf/test0-v0.o			\
diff --git a/tests/data/test-abidiff/test-PR27616-squished-v0.abi b/tests/data/test-abidiff/test-PR27616-squished-v0.abi
new file mode 100644
index 00000000..6b3d0460
--- /dev/null
+++ b/tests/data/test-abidiff/test-PR27616-squished-v0.abi
@@ -0,0 +1,43 @@
+<abi-corpus version='2.0' path='data/test-read-dwarf/test6.so'>
+  <elf-needed>
+    <dependency name='libstdc++.so.6'/>
+    <dependency name='libm.so.6'/>
+    <dependency name='libgcc_s.so.1'/>
+    <dependency name='libc.so.6'/>
+  </elf-needed>
+  <elf-function-symbols>
+    <elf-symbol name='_Z3barv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
+    <elf-symbol name='_Z4blehv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
+    <elf-symbol name='_ZN1B3fooEv' type='func-type' binding='weak-binding' visibility='default-visibility' is-defined='yes'/>
+    <elf-symbol name='_fini' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
+    <elf-symbol name='_init' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
+  </elf-function-symbols>
+  <elf-variable-symbols>
+    <elf-symbol name='_ZN1CIiE3barE' size='4' type='object-type' binding='gnu-unique-binding' visibility='default-visibility' is-defined='yes'/>
+    <elf-symbol name='_ZZN1B3fooEvE1a' size='4' type='object-type' binding='gnu-unique-binding' visibility='default-visibility' is-defined='yes'/>
+  </elf-variable-symbols>
+  <abi-instr address-size='64' path='test6.cc' comp-dir-path='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf' language='LANG_C_plus_plus'>
+    <type-decl name='int' size-in-bits='32' id='type-id-1'/>
+    <class-decl name='B' size-in-bits='8' is-struct='yes' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='9' column='1' id='type-id-2'>
+      <member-function access='public'>
+        <function-decl name='foo' mangled-name='_ZN1B3fooEv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='11' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_ZN1B3fooEv'>
+          <parameter type-id='type-id-3' name='this' is-artificial='yes'/>
+          <return type-id='type-id-1'/>
+        </function-decl>
+      </member-function>
+    </class-decl>
+    <class-decl name='C&lt;int&gt;' size-in-bits='8' is-struct='yes' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='26' column='1' id='type-id-4'>
+      <data-member access='public' static='yes'>
+        <var-decl name='bar' type-id='type-id-1' mangled-name='_ZN1CIiE3barE' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='31' column='1' elf-symbol-id='_ZN1CIiE3barE'/>
+      </data-member>
+    </class-decl>
+    <pointer-type-def type-id='type-id-2' size-in-bits='64' id='type-id-5'/>
+    <qualified-type-def type-id='type-id-5' const='yes' id='type-id-3'/>
+    <function-decl name='bar' mangled-name='_Z3barv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='19' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_Z3barv'>
+      <return type-id='type-id-1'/>
+    </function-decl>
+    <function-decl name='bleh' mangled-name='_Z4blehv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='34' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_Z4blehv'>
+      <return type-id='type-id-1'/>
+    </function-decl>
+  </abi-instr>
+</abi-corpus>
diff --git a/tests/data/test-abidiff/test-PR27616-squished-v1.abi b/tests/data/test-abidiff/test-PR27616-squished-v1.abi
new file mode 100644
index 00000000..20495f00
--- /dev/null
+++ b/tests/data/test-abidiff/test-PR27616-squished-v1.abi
@@ -0,0 +1 @@
+<abi-corpus version='2.0' path='data/test-read-dwarf/test6.so'><elf-needed><dependency name='libstdc++.so.6'/><dependency name='libm.so.6'/><dependency name='libgcc_s.so.1'/><dependency name='libc.so.6'/></elf-needed><elf-function-symbols><elf-symbol name='_Z3barv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_Z4blehv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_ZN1B3fooEv' type='func-type' binding='weak-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_fini' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_init' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/></elf-function-symbols><elf-variable-symbols><elf-symbol name='_ZN1CIiE3barE' size='4' type='object-type' binding='gnu-unique-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_ZZN1B3fooEvE1a' size='4' type='object-type' binding='gnu-unique-binding' visibility='default-visibility' is-defined='yes'/></elf-variable-symbols><abi-instr address-size='64' path='test6.cc' comp-dir-path='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf' language='LANG_C_plus_plus'><type-decl name='int' size-in-bits='32' id='type-id-1'/><class-decl name='B' size-in-bits='8' is-struct='yes' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='9' column='1' id='type-id-2'><member-function access='public'><function-decl name='foo' mangled-name='_ZN1B3fooEv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='11' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_ZN1B3fooEv'><parameter type-id='type-id-3' name='this' is-artificial='yes'/><return type-id='type-id-1'/></function-decl></member-function></class-decl><class-decl name='C&lt;int&gt;' size-in-bits='8' is-struct='yes' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='26' column='1' id='type-id-4'><data-member access='public' static='yes'><var-decl name='bar' type-id='type-id-1' mangled-name='_ZN1CIiE3barE' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='31' column='1' elf-symbol-id='_ZN1CIiE3barE'/></data-member></class-decl><pointer-type-def type-id='type-id-2' size-in-bits='64' id='type-id-5'/><qualified-type-def type-id='type-id-5' const='yes' id='type-id-3'/><function-decl name='bar' mangled-name='_Z3barv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='19' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_Z3barv'><return type-id='type-id-1'/></function-decl><function-decl name='bleh' mangled-name='_Z4blehv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='34' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_Z4blehv'><return type-id='type-id-1'/></function-decl></abi-instr></abi-corpus>
diff --git a/tests/data/test-abidiff/test-PR27616-v0.xml b/tests/data/test-abidiff/test-PR27616-v0.xml
new file mode 100644
index 00000000..5e025ff6
--- /dev/null
+++ b/tests/data/test-abidiff/test-PR27616-v0.xml
@@ -0,0 +1,4 @@
+<abi-corpus-group architecture='elf-arm-aarch64'>
+  <abi-corpus path='vmlinux' architecture='elf-arm-aarch64'>
+  </abi-corpus>
+</abi-corpus-group>
diff --git a/tests/data/test-abidiff/test-PR27616-v1.xml b/tests/data/test-abidiff/test-PR27616-v1.xml
new file mode 100644
index 00000000..aa8059bf
--- /dev/null
+++ b/tests/data/test-abidiff/test-PR27616-v1.xml
@@ -0,0 +1,3 @@
+<abi-corpus-group architecture='elf-arm-aarch64'>
+  <abi-corpus path='vmlinux' architecture='elf-arm-aarch64'/>
+</abi-corpus-group>
diff --git a/tests/test-abidiff.cc b/tests/test-abidiff.cc
index 3452b7ee..6ef18654 100644
--- a/tests/test-abidiff.cc
+++ b/tests/test-abidiff.cc
@@ -128,6 +128,18 @@ static InOutSpec specs[] =
     "data/test-abidiff/test-crc-report.txt",
     "output/test-abidiff/test-crc-report-1-2.txt"
   },
+  {
+    "data/test-abidiff/test-PR27616-v0.xml",
+    "data/test-abidiff/test-PR27616-v1.xml",
+    "data/test-abidiff/empty-report.txt",
+    "output/test-abidiff/empty-report.txt"
+  },
+  {
+    "data/test-abidiff/test-PR27616-squished-v0.abi",
+    "data/test-abidiff/test-PR27616-squished-v1.abi",
+    "data/test-abidiff/empty-report.txt",
+    "output/test-abidiff/empty-report.txt"
+  },
   // This should be the last entry.
   {0, 0, 0, 0}
 };
diff --git a/tools/abilint.cc b/tools/abilint.cc
index c8598465..856f935d 100644
--- a/tools/abilint.cc
+++ b/tools/abilint.cc
@@ -440,11 +440,16 @@ main(int argc, char* argv[])
       else
 	{
 	  if (type == abigail::tools_utils::FILE_TYPE_XML_CORPUS
-	      ||type == abigail::tools_utils::FILE_TYPE_XML_CORPUS_GROUP
+	      || type == abigail::tools_utils::FILE_TYPE_XML_CORPUS_GROUP
 	      || type == abigail::tools_utils::FILE_TYPE_ELF)
 	    {
 	      if (!opts.noout)
-		is_ok = write_corpus(*ctxt, corp, 0);
+		{
+		  if (corp)
+		    is_ok = write_corpus(*ctxt, corp, 0);
+		  else if (group)
+		    is_ok = write_corpus_group(*ctxt, group, 0);
+		}
 	    }
 	}
  
Giuliano Procida April 16, 2021, 12:03 p.m. UTC | #4
Hi Dodji.

On Thu, 15 Apr 2021 at 15:12, Dodji Seketeli <dodji@seketeli.org> wrote:
>
> Giuliano Procida <gprocida@google.com> a écrit:
>
> > Hi.
> >
> > Thanks for reviewing.
> >
> > On Tue, 6 Apr 2021 at 12:48, Dodji Seketeli <dodji@seketeli.org> wrote:
> >
> >> Hello Giuliano,
> >>
> >> Giuliano Procida <gprocida@google.com> a écrit:
> >>
> >> [...]
> >>
> >> Thanks for looking into this.
> >>
> >> Sadly, as several other patches got in before I got to review this, this
> >> patch doesn't apply to the current state of master anymore.  So overall,
> >> I think it would need a rebasing.  Sorry about that :-(
> >>
> >>
> > No problem. So long as the public master is the same as your working
> > master, I can easily rebase and resend. My v2 posting reflects the
> > current public master.
>
> Thanks.  I have seen the v4 posting too.  This message is also a reply
> to that post.
>
> [...]
>
> >> > The XML reader uses libxml's Reader API for parsing XML and traverses
> >> > high-level XML nodes using reader cursor advancement. Low-level nodes
> >> > are read into complete node trees and are traversed following
> >> > children, next and (in one instance) parent pointers. Some
> >> > intermediate-level nodes can be parsed both ways.
> >> >
> >> > A lot of code cares about node types, typically skipping over
> >> > non-element nodes. Unfortunately, in a few places, the tracked "corpus
> >> > node" needs to land on text (whitespace) within elements for the XML
> >> > reader to work; stripping text nodes out causes some elements to be
> >> > skipped.
> >>
> >> Can you please give an example of this?  It doesn't seem obvious to me.
> >>
> >
> > Of course. The contributed test case serves as an example (it just has all
> > unnecessary space stripped out). It wasn't obvious to me either!
>
> Sorry, I wasn't clear enough.  I guess my main point is that in this
> introductory text, I find it important to be precise about what the
> actual issue is.  Yes I can dig into the information around and
> understand it, but having it clearly stated in the commit log does have
> real value for people trying to understand what is going on.
>

From my perspective, the issues I was trying to highlight in the
commit text above were:

1. the parsing uses two different libxml APIs and sometimes both
conditionally in the same function whereas using the Tree API
consistently would have simplified the code; there is no direct
commentary on this in the code, but I guessed it was probably a
performance (memory) optimisation for very large ABIs
2. the repeated checks for XML_ELEMENT_NODE clutter the code (though
any performance impact is likely negligible) and there were several
variations of the looping schemes for iterating over node children

1. and 2. made it harder for me to a) find the cause of the
"significant whitespace" issue and b) be sure no other similar issues
might be lurking. In fact, simplifying the traversals helped me to
reason about the remaining code.

>
> No problem.  I dug into it and I think I have understood what is going
> on.  I'll update the commit log accordingly.
>
> > I'm sure there is a much smaller fix that could be made to address each
> > case of sensitivity to whitespace. However, the main contribution here are
> > the new primitives which make, I hope, the parsing code easier to read and
> > understand and harder to hide bugs in.
>

The above comment was to reflect that I saw two benefits from my
changes: a) fixing the "significant whitespace" issue and b) reducing
the chance that future XML reader changes would introduce similarly
subtle issues.

In fact, by rigorously changing all the loops I could to the same
form, it became apparent that the XML reader was discarding the
contents of certain elements and I added code comments in my commit to
highlight this non-obvious behaviour.

>
> After digging into all this, I think the issue can indeed be addressed
> by a smaller fix overall.  I don't think an almost re-write like this is
> necessary.  By default I tend to be reluctant of re-writes in general,
> unless the original code really doesn't cut if for what we want to do.
> I've accepted re-writes, so I it's really not a religious position.  In
> any case, I'd rather try to understand and address the root cause of the
> particular issue at hand.
>

I only changed the looping primitives and logic. I didn't attempt to
do anything about other things that made it hard for me to understand
the code (which were the storing of a node in the context object and
the use of two different APIs). While most of the changes were
mechanical, the remaining ones pointed to interesting subtleties. The
test suite and the extra testing I did gave me confidence that my
changes were correct.

There is one thing I might have been able to do to improve the code
review process; I could have tried to split the fix and the looping
changes into separate commits. I did consider this, but I wasn't sure
what order I would have done them in; the looping changes might have
forced the fix; the fix before the looping changes might be a
completely different thing.

>
> I do appreciate all the time and effort you did put into this, and I
> thank you for that.  I honestly am not comfortable doing this, but I
> really think it's better to keep what's in there and fix the root cause
> of issues instead.  What is there has been tested and is consistent.
>

Thank you in turn for going through things carefully. Besides the code
itself, I think it's important we understand each other's priorities
when it comes to coding and this dialogue helps us align.

Generally, my priorities are correctness, simplicity, features,
performance (and usually in that order). I've worked in environments
where mistakes can result (and have resulted) in large financial
losses and reputational damage and that might be a contributing
factor. The longer I've programmed, the more I've come to appreciate
simplicity and this usually translates into spending time finding
abstractions that allow software to be expressed in the clearest
fashion, without sacrificing correctness, features or much
performance. I've also worked in a role where performance was
everything and I achieved significant gains by simplifying APIs.

Some of the more elaborate code in the XML reader may be needed for
performance reasons. However, by building on primitives which
guarantee certain invariants, we can reason on top of them, instead of
finding ourselves reasoning everything from scratch. To be concrete,
this is the distinction between traversing a tree of arbitrary nodes
and traversing just the elements. I believe this choice of abstraction
and simplification has real value.

And full disclosure, I've since found (a day or so ago) that the
libxml Tree API already defines xmlFirstElementChild and
xmlNextElementSibling which are essentially identical to the
primitives I defined. This also gives me a boost in confidence that
they are the right primitives for traversing the XML tree for
libabigail.

>
> When we find issues, if the foundations appear to not be adequate to
> serve what we want to do, then I am fine with coming up with something
> different and new, as long as it integrates well (in its spirit and
> form) with the rest. Otherwise, I'd rather make what we have evolve.
>

We are all busy people, and if something is working, I will be
spending my time on something that is not. However, my attention was
directed to XML parser by the issue in question and I set out with the
intention of resolving the issue and only changed the loop primitives
as a means to an end.

>
> So I am posting a candidate patch that builds on your work and that
> should address the issues your raised in
> https://sourceware.org/bugzilla/show_bug.cgi?id=27616.
>

I was going to reply separately to review, but the commit below is
quite short. I'm sure it addresses the issue of significant
whitespace. If you haven't done so already, please test by removing
all whitespace from all the test XML files and then running abidiff vs
the originals. There is a short recipe for doing this in the bug
report.

Best regards,
Giuliano.

>
> [...]
>
> > Please see the test case. We'd like to have confidence in the semantics of
> > libabigail XML, independent of the current reader and writer
> > implementations. Transformations that do not affect the XML element
> > structure should not change XML meaning, but do at present.
>
> Well, I don't disagree with the intent.  There are issues in the
> current interpretation of the abixml and they ought to be fixed.  And
> this is hopefully happening right now.
>
> [...]
>
> I am proposing this evolutive fix that should address the issue you
> raised.
>
> From 94d0194819a34a4972a1575da93a77fe5e19a010 Mon Sep 17 00:00:00 2001
> From: Dodji Seketeli <dodji@redhat.com>
> Date: Wed, 14 Apr 2021 15:04:35 +0200
> Subject: [PATCH] reader: Handle 'abi-corpus' element being possibly empty
>
> The abixml reader wrongly assumes that the 'abi-corpus' element is
> always non-empty.  Note that until now, the only emitter of abixml
> consumed in practice was abg-writer.cc and it only emits non-empty
> 'abi-corpus' elements.  So the issue wasn't exposed.
>
> So, the reader assumes that an 'abi-corpus' element has at least a
> text node.
>
> For instance, consider this minimal input file named test-v0.abi:
>
>     $cat test-v0.abi
>
>     <abi-corpus-group architecture='elf-arm-aarch64'>
>      <abi-corpus path='vmlinux' architecture='elf-arm-aarch64'>
>      </abi-corpus>
>     </abi-corpus-group>
>
>     $
>
> Now, compare it to this file where the abi-corpus element is an empty
> element (doesn't even contain any text):
>
>     $cat test-v0.abi
>
>     <abi-corpus-group architecture='elf-arm-aarch64'>
>      <abi-corpus path='vmlinux'/>
>     </abi-corpus-group>
>
>     $
>
> comparing the two files with abidiff (wrongly) reports:
>
>     $ abidiff test-v0.abi test-v1.abi
>     ELF architecture changed
>     Functions changes summary: 0 Removed, 0 Changed, 0 Added function
>     Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
>
>     architecture changed from 'elf-arm-aarch64' to ''
>     $
>
> What's happening is that read_corpus_from_input is getting out early
> when it sees that the node is empty.  This is at:
>
>    xmlNodePtr node = ctxt.get_corpus_node();
> @@ -1907,10 +1925,14 @@ read_corpus_from_input(read_context& ctxt)
>         corp.set_soname(reinterpret_cast<char*>(soname_str.get()));
>      }
>
>   if (!node->children)  // <---- we get out early here and we
>     return nil;         // forget about the properties of
>                         // the current empty corpus element node
>
> So, at its core, fixing the issue at hand involves avoiding the early
> return there.
>
> But then, it turns out that's not enough.
>
> In the current setting, the different abixml processing entry points
> are designed to be used in a semi "streaming" mode.
>
> So for instance, read_translation_unit_from_input can be invoked
> repeatedly to "stream in" the next translation unit at each
> invocation.
>
> Alternatively, the lower level xmlTextReaderNext can be used to
> iterate over XML node until we reach the translation unit XML element
> we are interested in.  At that point xmlTextReaderExpand can be used
> to expand the XML node, then we let the context know that this is
> the current node of the corpus that needs to be processed, using
> read_context::get_corpus_node.  Once we've done that,
> read_translation_unit_from_input can be called to process that
> particular corpus node.  Note that the corpus node at hand, that needs
> to be processed will be retrieved by read_context::get_corpus_node.
>
> These two modes of operation are also available for
> read_corpus_from_input, read_symbol_db_from_input,
> read_elf_needed_from_input etc.
>
> Today, these functions all assume that the current node returned by
> read_context::get_corpus_node is the node /before/ the node of the
> corpus to be processed.  So they all start looking at the /next sibling/
> of the node returned by read_context::get_corpus_node.  So the code
> was implicitly assuming that read_context::get_corpus_node was
> pointing to a text node that was before the node of the corpus that we
> want to process.
>
> This is wrong.  read_context::get_corpus_node should just return the
> current node of the corpus that needs to be processed and voila.
>
> And so read_context::set_corpus_node should be used to set the current
> node of the corpus to the current element node that needs to be processed.
>
> That's the spirit of the change done by this patch.
>
> As its name suggests, the existing
> xml::advance_to_next_sibling_element is used to skip non element xml
> nodes (including text nodes) and move to the next element node to
> process, which is set to the context using
> read_context::set_corpus_node.
>
> Then the actual processing functions like read_corpus_from_input get
> the node to process, using read_context::get_corpus_node and process
> it rather than processing the sibling node that comes after it.
>
> The other changes are either to prevent related crashes that I noticed
> while doing various tests, update the abilint tool used to read and
> debug abixml input files and add better documentation.
>
>         * src/abg-reader.cc (read_context::get_corpus_node): Add comment
>         to this member function.
>         (read_translation_unit_from_input, read_symbol_db_from_input)
>         (read_elf_needed_from_input): Start processing the current node of
>         the corpus that needs to be processed rather than its next
>         sibling.  Once the processing is done, set the new "current node
>         of the corpus to be processed" properly by skipping to the next
>         element node to be processed.
>         (read_corpus_from_input): Don't get out early when the
>         'abi-corpus' element is empty.  If, however, it has children node,
>         skip to the first child element and flag it -- using
>         read_context::set_corpus_node -- as being the element node to be
>         processed by the processing facilities of the reader.  If we are
>         in a mode where we called xmlTextReaderExpand ourselves to get the
>         node to process, then it means we need to free that node
>         indirectly by calling xmlTextReaderNext.  In that case, that node
>         should not be flagged by read_context::set_corpus_node.  Add more
>         comments.
>         * src/abg-corpus.cc (corpus::is_empty): Do not crash when no
>         symtab is around.
>         * src/abg-libxml-utils.cc (go_to_next_sibling_element_or_stay):
>         Fix typo in comment.
>         (advance_to_next_sibling_element): Don't crash when given a nil
>         node.
>         * tests/data/test-abidiff/test-PR27616-squished-v0.abi: Add new
>         test input.
>         * tests/data/test-abidiff/test-PR27616-squished-v1.abi: Likewise.
>         * tests/data/test-abidiff/test-PR27616-v0.xml: Likewise.
>         * tests/data/test-abidiff/test-PR27616-v1.xml: Likewise.
>         * tests/data/Makefile.am: Add the new test inputs above to source
>         distribution.
>         * tests/test-abidiff.cc (specs): Add the new tests inputs above to
>         this harness.
>         * tools/abilint.cc (main): Support writing corpus groups.
>
> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
> ---
>  src/abg-corpus.cc                             |  2 +-
>  src/abg-libxml-utils.cc                       |  6 +-
>  src/abg-reader.cc                             | 65 +++++++++++++------
>  tests/data/Makefile.am                        |  4 ++
>  .../test-abidiff/test-PR27616-squished-v0.abi | 43 ++++++++++++
>  .../test-abidiff/test-PR27616-squished-v1.abi |  1 +
>  tests/data/test-abidiff/test-PR27616-v0.xml   |  4 ++
>  tests/data/test-abidiff/test-PR27616-v1.xml   |  3 +
>  tests/test-abidiff.cc                         | 12 ++++
>  tools/abilint.cc                              |  9 ++-
>  10 files changed, 126 insertions(+), 23 deletions(-)
>  create mode 100644 tests/data/test-abidiff/test-PR27616-squished-v0.abi
>  create mode 100644 tests/data/test-abidiff/test-PR27616-squished-v1.abi
>  create mode 100644 tests/data/test-abidiff/test-PR27616-v0.xml
>  create mode 100644 tests/data/test-abidiff/test-PR27616-v1.xml
>
> diff --git a/src/abg-corpus.cc b/src/abg-corpus.cc
> index c9f5c56b..0c684a93 100644
> --- a/src/abg-corpus.cc
> +++ b/src/abg-corpus.cc
> @@ -1009,7 +1009,7 @@ corpus::is_empty() const
>         }
>      }
>    return (members_empty
> -         && !get_symtab()->has_symbols()
> +         && (!get_symtab() || !get_symtab()->has_symbols())
>           && priv_->soname.empty()
>           && priv_->needed.empty());
>  }
> diff --git a/src/abg-libxml-utils.cc b/src/abg-libxml-utils.cc
> index a1acff1f..6b55d3ed 100644
> --- a/src/abg-libxml-utils.cc
> +++ b/src/abg-libxml-utils.cc
> @@ -413,7 +413,8 @@ unescape_xml_comment(const std::string& str)
>    return result;
>  }
>
> -/// Maybe get the next sibling element node of an XML node, or stay to the sam
> +/// Maybe get the next sibling element node of an XML node, or stay to
> +/// the same.
>  ///
>  /// If there is no next sibling xml element node, the function returns
>  /// the initial node.
> @@ -443,6 +444,9 @@ go_to_next_sibling_element_or_stay(xmlNodePtr node)
>  xmlNodePtr
>  advance_to_next_sibling_element(xmlNodePtr node)
>  {
> +  if (!node)
> +    return 0;
> +
>    xmlNodePtr n = go_to_next_sibling_element_or_stay(node->next);
>    if (n == 0 || n->type != XML_ELEMENT_NODE)
>      return 0;
> diff --git a/src/abg-reader.cc b/src/abg-reader.cc
> index f40e3d22..d2c76a38 100644
> --- a/src/abg-reader.cc
> +++ b/src/abg-reader.cc
> @@ -196,10 +196,20 @@ public:
>    get_reader() const
>    {return m_reader;}
>
> +  /// Getter of the current XML node in the corpus element sub-tree
> +  /// that needs to be processed.
> +  ///
> +  /// @return the current XML node in the corpus element sub-tree that
> +  /// needs to be processed.
>    xmlNodePtr
>    get_corpus_node() const
>    {return m_corp_node;}
>
> +  /// Setter of the current XML node in the corpus element sub-tree
> +  /// that needs to be processed.
> +  ///
> +  /// @param node set the current XML node in the corpus element
> +  /// sub-tree that needs to be processed.
>    void
>    set_corpus_node(xmlNodePtr node)
>    {m_corp_node = node;}
> @@ -1485,7 +1495,7 @@ read_translation_unit_from_input(read_context&    ctxt)
>    else
>      {
>        node = 0;
> -      for (xmlNodePtr n = ctxt.get_corpus_node()->next; n; n = n->next)
> +      for (xmlNodePtr n = ctxt.get_corpus_node(); n; n = n->next)
>         {
>           if (!n
>               || n->type != XML_ELEMENT_NODE)
> @@ -1501,15 +1511,17 @@ read_translation_unit_from_input(read_context&  ctxt)
>      return nil;
>
>    tu = get_or_read_and_add_translation_unit(ctxt, node);
> -  // So read_translation_unit() can trigger (under the hood) reading
> -  // from several translation units just because
> -  // read_context::get_scope_for_node() has been called.  In that
> -  // case, after that unexpected call to read_translation_unit(), the
> -  // current corpus node of the context is going to point to that
> -  // translation unit that has been read under the hood.  Let's set
> -  // the corpus node to the one we initially called
> -  // read_translation_unit() on here.
> -  ctxt.set_corpus_node(node);
> +
> +  if (ctxt.get_corpus_node())
> +    {
> +      // We are not in the mode where the current corpus node came
> +      // from a local invocation of xmlTextReaderExpand.  So let's set
> +      // ctxt.get_corpus_node to the next child element node of the
> +      // corpus that needs to be processed.
> +      node = xml::advance_to_next_sibling_element(node);
> +      ctxt.set_corpus_node(node);
> +    }
> +
>    return tu;
>  }
>
> @@ -1583,7 +1595,7 @@ read_symbol_db_from_input(read_context&            ctxt,
>         xmlTextReaderNext(reader.get());
>        }
>    else
> -    for (xmlNodePtr n = ctxt.get_corpus_node()->next; n; n = n->next)
> +    for (xmlNodePtr n = ctxt.get_corpus_node(); n; n = n->next)
>        {
>         if (!n || n->type != XML_ELEMENT_NODE)
>           continue;
> @@ -1594,8 +1606,11 @@ read_symbol_db_from_input(read_context&           ctxt,
>         else if (xmlStrEqual(n->name, BAD_CAST("elf-variable-symbols")))
>           has_var_syms = true;
>         else
> -         break;
> -       ctxt.set_corpus_node(n);
> +         {
> +           ctxt.set_corpus_node(n);
> +           break;
> +         }
> +
>         if (has_fn_syms)
>           {
>             fn_symdb = build_elf_symbol_db(ctxt, n, true);
> @@ -1688,7 +1703,7 @@ read_elf_needed_from_input(read_context&  ctxt,
>      }
>    else
>      {
> -      for (xmlNodePtr n = ctxt.get_corpus_node()->next; n; n = n->next)
> +      for (xmlNodePtr n = ctxt.get_corpus_node(); n; n = n->next)
>         {
>           if (!n || n->type != XML_ELEMENT_NODE)
>             continue;
> @@ -1703,6 +1718,7 @@ read_elf_needed_from_input(read_context&  ctxt,
>    if (node)
>      {
>        result = build_needed(node, needed);
> +      node = xml::advance_to_next_sibling_element(node);
>        ctxt.set_corpus_node(node);
>      }
>
> @@ -1806,6 +1822,8 @@ read_corpus_from_input(read_context& ctxt)
>    if (!reader)
>      return nil;
>
> +  // This is to remember to call xmlTextReaderNext if we ever call
> +  // xmlTextReaderExpand.
>    bool call_reader_next = false;
>
>    xmlNodePtr node = ctxt.get_corpus_node();
> @@ -1907,10 +1925,14 @@ read_corpus_from_input(read_context& ctxt)
>         corp.set_soname(reinterpret_cast<char*>(soname_str.get()));
>      }
>
> -  if (!node->children)
> -    return nil;
> -
> -  ctxt.set_corpus_node(node->children);
> +  // If the corpus element node has children nodes, make
> +  // ctxt.get_corpus_node() returns the first child element node of
> +  // the corpus element that *needs* to be processed.
> +  if (node->children)
> +    {
> +      xmlNodePtr n = xml::advance_to_next_sibling_element(node->children);
> +      ctxt.set_corpus_node(n);
> +    }
>
>    corpus& corp = *ctxt.get_corpus();
>
> @@ -1966,6 +1988,10 @@ read_corpus_from_input(read_context& ctxt)
>        // This is the necessary counter-part of the xmlTextReaderExpand()
>        // call at the beginning of the function.
>        xmlTextReaderNext(reader.get());
> +      // The call above invalidates the xml node returned by
> +      // xmlTextReaderExpand, which is can still be accessed via
> +      // ctxt.set_corpus_node.
> +      ctxt.set_corpus_node(0);
>      }
>    else
>      {
> @@ -1974,7 +2000,8 @@ read_corpus_from_input(read_context& ctxt)
>        if (!node)
>         {
>           node = ctxt.get_corpus_node();
> -         node = xml::advance_to_next_sibling_element(node->parent);
> +         if (node)
> +           node = xml::advance_to_next_sibling_element(node->parent);
>         }
>        ctxt.set_corpus_node(node);
>      }
> diff --git a/tests/data/Makefile.am b/tests/data/Makefile.am
> index 576309e8..0edfe26c 100644
> --- a/tests/data/Makefile.am
> +++ b/tests/data/Makefile.am
> @@ -206,6 +206,10 @@ test-abidiff-exit/test-crc-v1.abi \
>  test-abidiff-exit/test-missing-alias-report.txt \
>  test-abidiff-exit/test-missing-alias.abi \
>  test-abidiff-exit/test-missing-alias.suppr \
> +test-abidiff/test-PR27616-squished-v0.abi \
> +test-abidiff/test-PR27616-squished-v1.abi \
> +test-abidiff/test-PR27616-v0.xml \
> +test-abidiff/test-PR27616-v1.xml \
>  \
>  test-diff-dwarf/test0-v0.cc            \
>  test-diff-dwarf/test0-v0.o                     \
> diff --git a/tests/data/test-abidiff/test-PR27616-squished-v0.abi b/tests/data/test-abidiff/test-PR27616-squished-v0.abi
> new file mode 100644
> index 00000000..6b3d0460
> --- /dev/null
> +++ b/tests/data/test-abidiff/test-PR27616-squished-v0.abi
> @@ -0,0 +1,43 @@
> +<abi-corpus version='2.0' path='data/test-read-dwarf/test6.so'>
> +  <elf-needed>
> +    <dependency name='libstdc++.so.6'/>
> +    <dependency name='libm.so.6'/>
> +    <dependency name='libgcc_s.so.1'/>
> +    <dependency name='libc.so.6'/>
> +  </elf-needed>
> +  <elf-function-symbols>
> +    <elf-symbol name='_Z3barv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
> +    <elf-symbol name='_Z4blehv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
> +    <elf-symbol name='_ZN1B3fooEv' type='func-type' binding='weak-binding' visibility='default-visibility' is-defined='yes'/>
> +    <elf-symbol name='_fini' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
> +    <elf-symbol name='_init' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
> +  </elf-function-symbols>
> +  <elf-variable-symbols>
> +    <elf-symbol name='_ZN1CIiE3barE' size='4' type='object-type' binding='gnu-unique-binding' visibility='default-visibility' is-defined='yes'/>
> +    <elf-symbol name='_ZZN1B3fooEvE1a' size='4' type='object-type' binding='gnu-unique-binding' visibility='default-visibility' is-defined='yes'/>
> +  </elf-variable-symbols>
> +  <abi-instr address-size='64' path='test6.cc' comp-dir-path='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf' language='LANG_C_plus_plus'>
> +    <type-decl name='int' size-in-bits='32' id='type-id-1'/>
> +    <class-decl name='B' size-in-bits='8' is-struct='yes' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='9' column='1' id='type-id-2'>
> +      <member-function access='public'>
> +        <function-decl name='foo' mangled-name='_ZN1B3fooEv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='11' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_ZN1B3fooEv'>
> +          <parameter type-id='type-id-3' name='this' is-artificial='yes'/>
> +          <return type-id='type-id-1'/>
> +        </function-decl>
> +      </member-function>
> +    </class-decl>
> +    <class-decl name='C&lt;int&gt;' size-in-bits='8' is-struct='yes' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='26' column='1' id='type-id-4'>
> +      <data-member access='public' static='yes'>
> +        <var-decl name='bar' type-id='type-id-1' mangled-name='_ZN1CIiE3barE' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='31' column='1' elf-symbol-id='_ZN1CIiE3barE'/>
> +      </data-member>
> +    </class-decl>
> +    <pointer-type-def type-id='type-id-2' size-in-bits='64' id='type-id-5'/>
> +    <qualified-type-def type-id='type-id-5' const='yes' id='type-id-3'/>
> +    <function-decl name='bar' mangled-name='_Z3barv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='19' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_Z3barv'>
> +      <return type-id='type-id-1'/>
> +    </function-decl>
> +    <function-decl name='bleh' mangled-name='_Z4blehv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='34' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_Z4blehv'>
> +      <return type-id='type-id-1'/>
> +    </function-decl>
> +  </abi-instr>
> +</abi-corpus>
> diff --git a/tests/data/test-abidiff/test-PR27616-squished-v1.abi b/tests/data/test-abidiff/test-PR27616-squished-v1.abi
> new file mode 100644
> index 00000000..20495f00
> --- /dev/null
> +++ b/tests/data/test-abidiff/test-PR27616-squished-v1.abi
> @@ -0,0 +1 @@
> +<abi-corpus version='2.0' path='data/test-read-dwarf/test6.so'><elf-needed><dependency name='libstdc++.so.6'/><dependency name='libm.so.6'/><dependency name='libgcc_s.so.1'/><dependency name='libc.so.6'/></elf-needed><elf-function-symbols><elf-symbol name='_Z3barv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_Z4blehv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_ZN1B3fooEv' type='func-type' binding='weak-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_fini' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_init' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/></elf-function-symbols><elf-variable-symbols><elf-symbol name='_ZN1CIiE3barE' size='4' type='object-type' binding='gnu-unique-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_ZZN1B3fooEvE1a' size='4' type='object-type' binding='gnu-unique-binding' visibility='default-visibility' is-defined='yes'/></elf-variable-symbols><abi-instr address-size='64' path='test6.cc' comp-dir-path='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf' language='LANG_C_plus_plus'><type-decl name='int' size-in-bits='32' id='type-id-1'/><class-decl name='B' size-in-bits='8' is-struct='yes' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='9' column='1' id='type-id-2'><member-function access='public'><function-decl name='foo' mangled-name='_ZN1B3fooEv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='11' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_ZN1B3fooEv'><parameter type-id='type-id-3' name='this' is-artificial='yes'/><return type-id='type-id-1'/></function-decl></member-function></class-decl><class-decl name='C&lt;int&gt;' size-in-bits='8' is-struct='yes' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='26' column='1' id='type-id-4'><data-member access='public' static='yes'><var-decl name='bar' type-id='type-id-1' mangled-name='_ZN1CIiE3barE' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='31' column='1' elf-symbol-id='_ZN1CIiE3barE'/></data-member></class-decl><pointer-type-def type-id='type-id-2' size-in-bits='64' id='type-id-5'/><qualified-type-def type-id='type-id-5' const='yes' id='type-id-3'/><function-decl name='bar' mangled-name='_Z3barv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='19' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_Z3barv'><return type-id='type-id-1'/></function-decl><function-decl name='bleh' mangled-name='_Z4blehv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='34' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_Z4blehv'><return type-id='type-id-1'/></function-decl></abi-instr></abi-corpus>
> diff --git a/tests/data/test-abidiff/test-PR27616-v0.xml b/tests/data/test-abidiff/test-PR27616-v0.xml
> new file mode 100644
> index 00000000..5e025ff6
> --- /dev/null
> +++ b/tests/data/test-abidiff/test-PR27616-v0.xml
> @@ -0,0 +1,4 @@
> +<abi-corpus-group architecture='elf-arm-aarch64'>
> +  <abi-corpus path='vmlinux' architecture='elf-arm-aarch64'>
> +  </abi-corpus>
> +</abi-corpus-group>
> diff --git a/tests/data/test-abidiff/test-PR27616-v1.xml b/tests/data/test-abidiff/test-PR27616-v1.xml
> new file mode 100644
> index 00000000..aa8059bf
> --- /dev/null
> +++ b/tests/data/test-abidiff/test-PR27616-v1.xml
> @@ -0,0 +1,3 @@
> +<abi-corpus-group architecture='elf-arm-aarch64'>
> +  <abi-corpus path='vmlinux' architecture='elf-arm-aarch64'/>
> +</abi-corpus-group>
> diff --git a/tests/test-abidiff.cc b/tests/test-abidiff.cc
> index 3452b7ee..6ef18654 100644
> --- a/tests/test-abidiff.cc
> +++ b/tests/test-abidiff.cc
> @@ -128,6 +128,18 @@ static InOutSpec specs[] =
>      "data/test-abidiff/test-crc-report.txt",
>      "output/test-abidiff/test-crc-report-1-2.txt"
>    },
> +  {
> +    "data/test-abidiff/test-PR27616-v0.xml",
> +    "data/test-abidiff/test-PR27616-v1.xml",
> +    "data/test-abidiff/empty-report.txt",
> +    "output/test-abidiff/empty-report.txt"
> +  },
> +  {
> +    "data/test-abidiff/test-PR27616-squished-v0.abi",
> +    "data/test-abidiff/test-PR27616-squished-v1.abi",
> +    "data/test-abidiff/empty-report.txt",
> +    "output/test-abidiff/empty-report.txt"
> +  },
>    // This should be the last entry.
>    {0, 0, 0, 0}
>  };
> diff --git a/tools/abilint.cc b/tools/abilint.cc
> index c8598465..856f935d 100644
> --- a/tools/abilint.cc
> +++ b/tools/abilint.cc
> @@ -440,11 +440,16 @@ main(int argc, char* argv[])
>        else
>         {
>           if (type == abigail::tools_utils::FILE_TYPE_XML_CORPUS
> -             ||type == abigail::tools_utils::FILE_TYPE_XML_CORPUS_GROUP
> +             || type == abigail::tools_utils::FILE_TYPE_XML_CORPUS_GROUP
>               || type == abigail::tools_utils::FILE_TYPE_ELF)
>             {
>               if (!opts.noout)
> -               is_ok = write_corpus(*ctxt, corp, 0);
> +               {
> +                 if (corp)
> +                   is_ok = write_corpus(*ctxt, corp, 0);
> +                 else if (group)
> +                   is_ok = write_corpus_group(*ctxt, group, 0);
> +               }
>             }
>         }
>
> --
> 2.30.0
>
>
>
> --
>                 Dodji
  
Dodji Seketeli April 19, 2021, 1:40 p.m. UTC | #5
Hello Giuliano,

Thanks for the your thourough response.

[...]

> And full disclosure, I've since found (a day or so ago) that the
> libxml Tree API already defines xmlFirstElementChild and
> xmlNextElementSibling which are essentially identical to the
> primitives I defined. This also gives me a boost in confidence that
> they are the right primitives for traversing the XML tree for
> libabigail.

Right.  As the text/element node handling issue is addressed by the
patch I sent, I went ahead and (hopefully) address the confusion and
useless redundancy that arise from not using the xmlFirstElementChild
and xmlNextElementSibling APIs.  I am thus adding two patches to that
one to use those functions instead.

[...]

> I was going to reply separately to review, but the commit below is
> quite short. I'm sure it addresses the issue of significant
> whitespace. If you haven't done so already, please test by removing
> all whitespace from all the test XML files and then running abidiff vs
> the originals. There is a short recipe for doing this in the bug
> report.

I have tested the series of 3 patches that I am going to post as reply
of this message with that script you added to the bug report at
https://sourceware.org/bugzilla/show_bug.cgi?id=27616.

There are two crashes that are unrelated to this issue at hand.  That's because
libabigail doesn't (yet) support comparing decls coming from templates
and that's a known issue.

The two invocations that crash are: 

build/tools/abidiff --harmless tests/data/test-read-write/test12.xml tests/data/test-read-write/test12.xml.wrung
build/tools/abidiff --harmless tests/data/test-read-write/test13.xml tests/data/test-read-write/test13.xml.wrung

Other than that, everything else seem to work.

Thank you for your time and dedication!

[...]

Cheers,
  

Patch

diff --git a/include/abg-libxml-utils.h b/include/abg-libxml-utils.h
index 69e940f6..f87ee5a0 100644
--- a/include/abg-libxml-utils.h
+++ b/include/abg-libxml-utils.h
@@ -82,7 +82,10 @@  int get_xml_node_depth(xmlNodePtr);
   reinterpret_cast<char*>(xml_char_str.get())
 
 xmlNodePtr
-advance_to_next_sibling_element(xmlNodePtr node);
+first_child(xmlNodePtr node);
+
+xmlNodePtr
+next_child(xmlNodePtr node);
 
 void
 escape_xml_string(const std::string& str,
diff --git a/src/abg-libxml-utils.cc b/src/abg-libxml-utils.cc
index a1acff1f..baea25ac 100644
--- a/src/abg-libxml-utils.cc
+++ b/src/abg-libxml-utils.cc
@@ -413,40 +413,39 @@  unescape_xml_comment(const std::string& str)
   return result;
 }
 
-/// Maybe get the next sibling element node of an XML node, or stay to the sam
+/// Skip until we reach an XML element node or run out of (child) nodes.
 ///
-/// If there is no next sibling xml element node, the function returns
-/// the initial node.
+/// @param a pointer to a node.
 ///
-/// @param node the initial node to consider.
-///
-/// @return the next sibling node or the initial node @p node.
-static xmlNodePtr
-go_to_next_sibling_element_or_stay(xmlNodePtr node)
+/// @return a pointer to an XML element node or nil.
+xmlNodePtr
+skip_non_elements(xmlNodePtr node)
 {
-  xmlNodePtr n;
-  for (n = node; n; n = n->next)
-    {
-      if (n->type == XML_ELEMENT_NODE)
-	break;
-    }
-  return n ? n : node;
+  while (node && node->type != XML_ELEMENT_NODE)
+    node = node->next;
+  return node;
 }
 
-/// Get the next sibling element node of an XML node.
+/// Get the first child element of an XML node.
 ///
-/// If there is no next sibling xml element node, the function returns nil.
+/// @param a pointer to the node whose children are to be perused.
 ///
-/// @param node the XML node to consider.
+/// @return a pointer to the first XML element child of @p node or nil.
+xmlNodePtr
+first_child(xmlNodePtr node)
+{
+  return skip_non_elements(node->children);
+}
+
+/// Get the next sibling element of an XML node.
 ///
-/// @return the next sibling element node or nil.
+/// @param a pointer to the XML node whose following siblings are to be perused.
+///
+/// @return a pointer to the next sibling element of @p node or nil.
 xmlNodePtr
-advance_to_next_sibling_element(xmlNodePtr node)
+next_child(xmlNodePtr node)
 {
-  xmlNodePtr n = go_to_next_sibling_element_or_stay(node->next);
-  if (n == 0 || n->type != XML_ELEMENT_NODE)
-    return 0;
-  return n;
+  return skip_non_elements(node->next);
 }
 
 }//end namespace xml
diff --git a/src/abg-reader.cc b/src/abg-reader.cc
index ef0b7d57..f1b08474 100644
--- a/src/abg-reader.cc
+++ b/src/abg-reader.cc
@@ -1333,18 +1333,13 @@  static void
 walk_xml_node_to_map_type_ids(read_context& ctxt,
 			      xmlNodePtr node)
 {
-  xmlNodePtr n = node;
-
-  if (!n || n->type != XML_ELEMENT_NODE)
-    return;
-
-  if (xml_char_sptr s = XML_NODE_GET_ATTRIBUTE(n, "id"))
+  if (xml_char_sptr s = XML_NODE_GET_ATTRIBUTE(node, "id"))
     {
       string id = CHAR_STR(s);
-      ctxt.map_id_and_node(id, n);
+      ctxt.map_id_and_node(id, node);
     }
 
-  for (n = n->children; n; n = n->next)
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
     walk_xml_node_to_map_type_ids(ctxt, n);
 }
 
@@ -1386,12 +1381,8 @@  read_translation_unit(read_context& ctxt, translation_unit& tu, xmlNodePtr node)
       || !ctxt.get_corpus())
     walk_xml_node_to_map_type_ids(ctxt, node);
 
-  for (xmlNodePtr n = node->children; n; n = n->next)
-    {
-      if (n->type != XML_ELEMENT_NODE)
-	continue;
-      handle_element_node(ctxt, n, /*add_decl_to_scope=*/true);
-    }
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
+    handle_element_node(ctxt, n, /*add_decl_to_scope=*/true);
 
   ctxt.pop_scope_or_abort(tu.get_global_scope());
 
@@ -1485,16 +1476,9 @@  read_translation_unit_from_input(read_context&	ctxt)
   else
     {
       node = 0;
-      for (xmlNodePtr n = ctxt.get_corpus_node()->next; n; n = n->next)
-	{
-	  if (!n
-	      || n->type != XML_ELEMENT_NODE)
-	    continue;
-	  if (!xmlStrEqual(n->name, BAD_CAST("abi-instr")))
-	    return nil;
-	  node = n;
-	  break;
-	}
+      xmlNodePtr n = ctxt.get_corpus_node();
+      if (n && xmlStrEqual(n->name, BAD_CAST("abi-instr")))
+	node = n;
     }
 
   if (node == 0)
@@ -1507,9 +1491,9 @@  read_translation_unit_from_input(read_context&	ctxt)
   // case, after that unexpected call to read_translation_unit(), the
   // current corpus node of the context is going to point to that
   // translation unit that has been read under the hood.  Let's set
-  // the corpus node to the one we initially called
+  // the corpus node to just after the one we initially called
   // read_translation_unit() on here.
-  ctxt.set_corpus_node(node);
+  ctxt.set_corpus_node(xml::next_child(node));
   return tu;
 }
 
@@ -1583,32 +1567,32 @@  read_symbol_db_from_input(read_context&		 ctxt,
 	xmlTextReaderNext(reader.get());
       }
   else
-    for (xmlNodePtr n = ctxt.get_corpus_node()->next; n; n = n->next)
-      {
-	if (!n || n->type != XML_ELEMENT_NODE)
-	  continue;
-
-	bool has_fn_syms = false, has_var_syms = false;
-	if (xmlStrEqual(n->name, BAD_CAST("elf-function-symbols")))
-	  has_fn_syms = true;
-	else if (xmlStrEqual(n->name, BAD_CAST("elf-variable-symbols")))
-	  has_var_syms = true;
-	else
-	  break;
-	ctxt.set_corpus_node(n);
-	if (has_fn_syms)
-	  {
-	    fn_symdb = build_elf_symbol_db(ctxt, n, true);
-	    found = true;
-	  }
-	else if (has_var_syms)
-	  {
-	    var_symdb = build_elf_symbol_db(ctxt, n, false);
-	    found = true;
-	  }
-	else
-	  break;
-      }
+    {
+      xmlNodePtr n = ctxt.get_corpus_node();
+      for (; n; n = xml::next_child(n))
+	{
+	  bool has_fn_syms = false, has_var_syms = false;
+	  if (xmlStrEqual(n->name, BAD_CAST("elf-function-symbols")))
+	    has_fn_syms = true;
+	  else if (xmlStrEqual(n->name, BAD_CAST("elf-variable-symbols")))
+	    has_var_syms = true;
+	  else
+	    break;
+	  if (has_fn_syms)
+	    {
+	      fn_symdb = build_elf_symbol_db(ctxt, n, true);
+	      found = true;
+	    }
+	  else if (has_var_syms)
+	    {
+	      var_symdb = build_elf_symbol_db(ctxt, n, false);
+	      found = true;
+	    }
+	  else
+	    break;
+	}
+      ctxt.set_corpus_node(n);
+    }
 
   return found;
 }
@@ -1626,16 +1610,12 @@  read_symbol_db_from_input(read_context&		 ctxt,
 static bool
 build_needed(xmlNode* node, vector<string>& needed)
 {
-  if (!node)
-    return false;
-
-  if (!node || !xmlStrEqual(node->name,BAD_CAST("elf-needed")))
+  if (!xmlStrEqual(node->name,BAD_CAST("elf-needed")))
     return false;
 
-  for (xmlNodePtr n = node->children; n; n = n->next)
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
     {
-      if (n->type != XML_ELEMENT_NODE
-	  || !xmlStrEqual(n->name, BAD_CAST("dependency")))
+      if (!xmlStrEqual(n->name, BAD_CAST("dependency")))
 	continue;
 
       string name;
@@ -1683,27 +1663,19 @@  read_elf_needed_from_input(read_context&	ctxt,
 	return false;
 
       node = xmlTextReaderExpand(reader.get());
-      if (!node)
-	return false;
     }
   else
     {
-      for (xmlNodePtr n = ctxt.get_corpus_node()->next; n; n = n->next)
-	{
-	  if (!n || n->type != XML_ELEMENT_NODE)
-	    continue;
-	  if (!xmlStrEqual(n->name, BAD_CAST("elf-needed")))
-	    return false;
-	  node = n;
-	  break;
-	}
+      xmlNodePtr n = ctxt.get_corpus_node();
+      if (n && xmlStrEqual(n->name, BAD_CAST("elf-needed")))
+	node = n;
     }
 
   bool result = false;
   if (node)
     {
       result = build_needed(node, needed);
-      ctxt.set_corpus_node(node);
+      ctxt.set_corpus_node(xml::next_child(node));
     }
 
   return result;
@@ -1907,10 +1879,7 @@  read_corpus_from_input(read_context& ctxt)
 	corp.set_soname(reinterpret_cast<char*>(soname_str.get()));
     }
 
-  if (!node->children)
-    return nil;
-
-  ctxt.set_corpus_node(node->children);
+  ctxt.set_corpus_node(xml::first_child(node));
 
   corpus& corp = *ctxt.get_corpus();
 
@@ -1976,17 +1945,6 @@  read_corpus_from_input(read_context& ctxt)
       // call at the beginning of the function.
       xmlTextReaderNext(reader.get());
     }
-  else
-    {
-      node = ctxt.get_corpus_node();
-      node = xml::advance_to_next_sibling_element(node);
-      if (!node)
-	{
-	  node = ctxt.get_corpus_node();
-	  node = xml::advance_to_next_sibling_element(node->parent);
-	}
-      ctxt.set_corpus_node(node);
-    }
 
   return ctxt.get_corpus();
 }
@@ -2036,13 +1994,13 @@  read_corpus_group_from_input(read_context& ctxt)
   if (!node)
     return nil;
 
-  //node = xml::get_first_element_sibling_if_text(node->children);
-  node = xml::advance_to_next_sibling_element(node->children);
-  ctxt.set_corpus_node(node);
-
-  corpus_sptr corp;
-  while ((corp = read_corpus_from_input(ctxt)))
-    ctxt.get_corpus_group()->add_corpus(corp);
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
+    {
+      ctxt.set_corpus_node(n);
+      corpus_sptr corp = read_corpus_from_input(ctxt);
+      if (corp)
+	ctxt.get_corpus_group()->add_corpus(corp);
+    }
 
   xmlTextReaderNext(reader.get());
 
@@ -2728,12 +2686,8 @@  build_namespace_decl(read_context&	ctxt,
   ctxt.push_decl_to_current_scope(decl, add_to_current_scope);
   ctxt.map_xml_node_to_decl(node, decl);
 
-  for (xmlNodePtr n = node->children; n; n = n->next)
-    {
-      if (n->type != XML_ELEMENT_NODE)
-	continue;
-      handle_element_node(ctxt, n, /*add_to_current_scope=*/true);
-    }
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
+    handle_element_node(ctxt, n, /*add_to_current_scope=*/true);
 
   ctxt.pop_scope_or_abort(decl);
 
@@ -2757,9 +2711,7 @@  build_elf_symbol(read_context& ctxt, const xmlNodePtr node,
 {
   elf_symbol_sptr nil;
 
-  if (!node
-      || node->type != XML_ELEMENT_NODE
-      || !xmlStrEqual(node->name, BAD_CAST("elf-symbol")))
+  if (!xmlStrEqual(node->name, BAD_CAST("elf-symbol")))
     return nil;
 
   string name;
@@ -2918,7 +2870,7 @@  build_elf_symbol_db(read_context& ctxt,
   xml_node_ptr_elf_symbol_sptr_map_type xml_node_ptr_elf_symbol_map;
 
   elf_symbol_sptr sym;
-  for (xmlNodePtr n = node->children; n; n = n->next)
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
     {
       if ((sym = build_elf_symbol(ctxt, n, /*drop_if_suppress=*/true)))
 	{
@@ -3084,12 +3036,9 @@  build_function_decl(read_context&	ctxt,
   std::vector<function_decl::parameter_sptr> parms;
   type_base_sptr return_type = env->get_void_type();
 
-  for (xmlNodePtr n = node->children; n ; n = n->next)
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
     {
-      if (n->type != XML_ELEMENT_NODE)
-	continue;
-
-      else if (xmlStrEqual(n->name, BAD_CAST("parameter")))
+      if (xmlStrEqual(n->name, BAD_CAST("parameter")))
 	{
 	  if (function_decl::parameter_sptr p =
 	      build_function_parameter(ctxt, n))
@@ -3779,12 +3728,9 @@  build_function_type(read_context&	ctxt,
   ctxt.get_translation_unit()->bind_function_type_life_time(fn_type);
   ctxt.key_type_decl(fn_type, id);
 
-  for (xmlNodePtr n = node->children; n ; n = n->next)
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
     {
-      if (n->type != XML_ELEMENT_NODE)
-	continue;
-
-      else if (xmlStrEqual(n->name, BAD_CAST("parameter")))
+      if (xmlStrEqual(n->name, BAD_CAST("parameter")))
 	{
 	  if (function_decl::parameter_sptr p =
 	      build_function_parameter(ctxt, n))
@@ -4015,12 +3961,9 @@  build_array_type_def(read_context&	ctxt,
   read_location(ctxt, node, loc);
   array_type_def::subranges_type subranges;
 
-  for (xmlNodePtr n = node->children; n ; n = n->next)
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
     {
-      if (n->type != XML_ELEMENT_NODE)
-	continue;
-
-      else if (xmlStrEqual(n->name, BAD_CAST("subrange")))
+      if (xmlStrEqual(n->name, BAD_CAST("subrange")))
 	{
 	  if (array_type_def::subrange_sptr s =
 	      build_subrange_type(ctxt, n))
@@ -4173,11 +4116,8 @@  build_enum_type_decl(read_context&	ctxt,
 
   string base_type_id;
   enum_type_decl::enumerators enums;
-  for (xmlNodePtr n = node->children; n; n = n->next)
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
     {
-      if (n->type != XML_ELEMENT_NODE)
-	continue;
-
       if (xmlStrEqual(n->name, BAD_CAST("underlying-type")))
 	{
 	  xml_char_sptr a = xml::build_sptr(xmlGetProp(n, BAD_CAST("type-id")));
@@ -4544,11 +4484,11 @@  build_class_decl(read_context&		ctxt,
       decl->set_naming_typedef(naming_typedef);
     }
 
-  for (xmlNodePtr n = node->children; !is_decl_only && n; n = n->next)
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
     {
-      if (n->type != XML_ELEMENT_NODE)
-	continue;
-
+      // See https://sourceware.org/bugzilla/show_bug.cgi?id=26591#c7.
+      if (is_decl_only)
+        continue;
       if (xmlStrEqual(n->name, BAD_CAST("base-class")))
 	{
 	  access_specifier access =
@@ -4595,11 +4535,8 @@  build_class_decl(read_context&		ctxt,
 
 	  ctxt.map_xml_node_to_decl(n, decl);
 
-	  for (xmlNodePtr p = n->children; p; p = p->next)
+	  for (xmlNodePtr p = xml::first_child(n); p; p = xml::next_child(p))
 	    {
-	      if (p->type != XML_ELEMENT_NODE)
-		continue;
-
 	      if (type_base_sptr t =
 		  build_type(ctxt, p, /*add_to_current_scope=*/true))
 		{
@@ -4633,11 +4570,8 @@  build_class_decl(read_context&		ctxt,
 	  bool is_static = false;
 	  read_static(n, is_static);
 
-	  for (xmlNodePtr p = n->children; p; p = p->next)
+	  for (xmlNodePtr p = xml::first_child(n); p; p = xml::next_child(p))
 	    {
-	      if (p->type != XML_ELEMENT_NODE)
-		continue;
-
 	      if (var_decl_sptr v =
 		  build_var_decl(ctxt, p, /*add_to_cur_scope=*/false))
 		{
@@ -4691,11 +4625,8 @@  build_class_decl(read_context&		ctxt,
 	  bool is_ctor = false, is_dtor = false, is_const = false;
 	  read_cdtor_const(n, is_ctor, is_dtor, is_const);
 
-	  for (xmlNodePtr p = n->children; p; p = p->next)
+	  for (xmlNodePtr p = xml::first_child(n); p; p = xml::next_child(p))
 	    {
-	      if (p->type != XML_ELEMENT_NODE)
-		continue;
-
 	      if (function_decl_sptr f =
 		  build_function_decl_if_not_suppressed(ctxt, p, decl,
 							/*add_to_cur_sc=*/true))
@@ -4730,11 +4661,8 @@  build_class_decl(read_context&		ctxt,
 	  bool is_ctor = false, is_dtor = false, is_const = false;
 	  read_cdtor_const(n, is_ctor, is_dtor, is_const);
 
-	  for (xmlNodePtr p = n->children; p; p = p->next)
+	  for (xmlNodePtr p = xml::first_child(n); p; p = xml::next_child(p))
 	    {
-	      if (p->type != XML_ELEMENT_NODE)
-		continue;
-
 	      if (shared_ptr<function_tdecl> f =
 		  build_function_tdecl(ctxt, p,
 				       /*add_to_current_scope=*/true))
@@ -4944,11 +4872,11 @@  build_union_decl(read_context& ctxt,
   ctxt.map_xml_node_to_decl(node, decl);
   ctxt.key_type_decl(decl, id);
 
-  for (xmlNodePtr n = node->children; !is_decl_only && n; n = n->next)
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
     {
-      if (n->type != XML_ELEMENT_NODE)
-	continue;
-
+      // See https://sourceware.org/bugzilla/show_bug.cgi?id=26591#c7.
+      if (is_decl_only)
+        continue;
       if (xmlStrEqual(n->name, BAD_CAST("member-type")))
 	{
 	  access_specifier access = private_access;
@@ -4956,11 +4884,8 @@  build_union_decl(read_context& ctxt,
 
 	  ctxt.map_xml_node_to_decl(n, decl);
 
-	  for (xmlNodePtr p = n->children; p; p = p->next)
+	  for (xmlNodePtr p = xml::first_child(n); p; p = xml::next_child(p))
 	    {
-	      if (p->type != XML_ELEMENT_NODE)
-		continue;
-
 	      if (type_base_sptr t =
 		  build_type(ctxt, p, /*add_to_current_scope=*/true))
 		{
@@ -4988,11 +4913,8 @@  build_union_decl(read_context& ctxt,
 	  bool is_static = false;
 	  read_static(n, is_static);
 
-	  for (xmlNodePtr p = n->children; p; p = p->next)
+	  for (xmlNodePtr p = xml::first_child(n); p; p = xml::next_child(p))
 	    {
-	      if (p->type != XML_ELEMENT_NODE)
-		continue;
-
 	      if (var_decl_sptr v =
 		  build_var_decl(ctxt, p, /*add_to_cur_scope=*/false))
 		{
@@ -5030,11 +4952,8 @@  build_union_decl(read_context& ctxt,
 	  bool is_ctor = false, is_dtor = false, is_const = false;
 	  read_cdtor_const(n, is_ctor, is_dtor, is_const);
 
-	  for (xmlNodePtr p = n->children; p; p = p->next)
+	  for (xmlNodePtr p = xml::first_child(n); p; p = xml::next_child(p))
 	    {
-	      if (p->type != XML_ELEMENT_NODE)
-		continue;
-
 	      if (function_decl_sptr f =
 		  build_function_decl_if_not_suppressed(ctxt, p, decl,
 							/*add_to_cur_sc=*/true))
@@ -5063,11 +4982,8 @@  build_union_decl(read_context& ctxt,
 	  bool is_ctor = false, is_dtor = false, is_const = false;
 	  read_cdtor_const(n, is_ctor, is_dtor, is_const);
 
-	  for (xmlNodePtr p = n->children; p; p = p->next)
+	  for (xmlNodePtr p = xml::first_child(n); p; p = xml::next_child(p))
 	    {
-	      if (p->type != XML_ELEMENT_NODE)
-		continue;
-
 	      if (function_tdecl_sptr f =
 		  build_function_tdecl(ctxt, p,
 				       /*add_to_current_scope=*/true))
@@ -5142,11 +5058,8 @@  build_function_tdecl(read_context& ctxt,
   ctxt.push_decl_to_current_scope(fn_tmpl_decl, add_to_current_scope);
 
   unsigned parm_index = 0;
-  for (xmlNodePtr n = node->children; n; n = n->next)
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
     {
-      if (n->type != XML_ELEMENT_NODE)
-	continue;
-
       if (template_parameter_sptr parm =
 	  build_template_parameter(ctxt, n, parm_index, fn_tmpl_decl))
 	{
@@ -5206,11 +5119,8 @@  build_class_tdecl(read_context&	ctxt,
   ctxt.push_decl_to_current_scope(class_tmpl, add_to_current_scope);
 
   unsigned parm_index = 0;
-  for (xmlNodePtr n = node->children; n; n = n->next)
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
     {
-      if (n->type != XML_ELEMENT_NODE)
-	continue;
-
       if (template_parameter_sptr parm=
 	  build_template_parameter(ctxt, n, parm_index, class_tmpl))
 	{
@@ -5325,11 +5235,8 @@  build_type_composition(read_context&		ctxt,
   ctxt.push_decl_to_current_scope(dynamic_pointer_cast<decl_base>(result),
 				  /*add_to_current_scope=*/true);
 
-  for (xmlNodePtr n = node->children; n; n = n->next)
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
     {
-      if (n->type != XML_ELEMENT_NODE)
-	continue;
-
       if ((composed_type =
 	   build_pointer_type_def(ctxt, n,
 				  /*add_to_current_scope=*/true))
@@ -5455,11 +5362,8 @@  build_template_tparameter(read_context&	ctxt,
 
   // Go parse template parameters that are children nodes
   int parm_index = 0;
-  for (xmlNodePtr n = node->children; n; n = n->next)
+  for (xmlNodePtr n = xml::first_child(node); n; n = xml::next_child(n))
     {
-      if (n->type != XML_ELEMENT_NODE)
-	continue;
-
       if (shared_ptr<template_parameter> p =
 	  build_template_parameter(ctxt, n, parm_index, result))
 	{
diff --git a/tests/data/Makefile.am b/tests/data/Makefile.am
index 0b406f3b..a844a309 100644
--- a/tests/data/Makefile.am
+++ b/tests/data/Makefile.am
@@ -197,6 +197,9 @@  test-abidiff-exit/test-non-leaf-array-v0.o \
 test-abidiff-exit/test-non-leaf-array-v1.c \
 test-abidiff-exit/test-non-leaf-array-v1.o \
 test-abidiff-exit/test-non-leaf-array-report.txt \
+test-abidiff-exit/test-squished-v0.abi \
+test-abidiff-exit/test-squished-v1.abi \
+test-abidiff-exit/test-squished-report.txt \
 \
 test-diff-dwarf/test0-v0.cc		\
 test-diff-dwarf/test0-v0.o			\
diff --git a/tests/data/test-abidiff-exit/test-squished-report.txt b/tests/data/test-abidiff-exit/test-squished-report.txt
new file mode 100644
index 00000000..e69de29b
diff --git a/tests/data/test-abidiff-exit/test-squished-v0.abi b/tests/data/test-abidiff-exit/test-squished-v0.abi
new file mode 100644
index 00000000..6b3d0460
--- /dev/null
+++ b/tests/data/test-abidiff-exit/test-squished-v0.abi
@@ -0,0 +1,43 @@ 
+<abi-corpus version='2.0' path='data/test-read-dwarf/test6.so'>
+  <elf-needed>
+    <dependency name='libstdc++.so.6'/>
+    <dependency name='libm.so.6'/>
+    <dependency name='libgcc_s.so.1'/>
+    <dependency name='libc.so.6'/>
+  </elf-needed>
+  <elf-function-symbols>
+    <elf-symbol name='_Z3barv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
+    <elf-symbol name='_Z4blehv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
+    <elf-symbol name='_ZN1B3fooEv' type='func-type' binding='weak-binding' visibility='default-visibility' is-defined='yes'/>
+    <elf-symbol name='_fini' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
+    <elf-symbol name='_init' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
+  </elf-function-symbols>
+  <elf-variable-symbols>
+    <elf-symbol name='_ZN1CIiE3barE' size='4' type='object-type' binding='gnu-unique-binding' visibility='default-visibility' is-defined='yes'/>
+    <elf-symbol name='_ZZN1B3fooEvE1a' size='4' type='object-type' binding='gnu-unique-binding' visibility='default-visibility' is-defined='yes'/>
+  </elf-variable-symbols>
+  <abi-instr address-size='64' path='test6.cc' comp-dir-path='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf' language='LANG_C_plus_plus'>
+    <type-decl name='int' size-in-bits='32' id='type-id-1'/>
+    <class-decl name='B' size-in-bits='8' is-struct='yes' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='9' column='1' id='type-id-2'>
+      <member-function access='public'>
+        <function-decl name='foo' mangled-name='_ZN1B3fooEv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='11' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_ZN1B3fooEv'>
+          <parameter type-id='type-id-3' name='this' is-artificial='yes'/>
+          <return type-id='type-id-1'/>
+        </function-decl>
+      </member-function>
+    </class-decl>
+    <class-decl name='C&lt;int&gt;' size-in-bits='8' is-struct='yes' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='26' column='1' id='type-id-4'>
+      <data-member access='public' static='yes'>
+        <var-decl name='bar' type-id='type-id-1' mangled-name='_ZN1CIiE3barE' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='31' column='1' elf-symbol-id='_ZN1CIiE3barE'/>
+      </data-member>
+    </class-decl>
+    <pointer-type-def type-id='type-id-2' size-in-bits='64' id='type-id-5'/>
+    <qualified-type-def type-id='type-id-5' const='yes' id='type-id-3'/>
+    <function-decl name='bar' mangled-name='_Z3barv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='19' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_Z3barv'>
+      <return type-id='type-id-1'/>
+    </function-decl>
+    <function-decl name='bleh' mangled-name='_Z4blehv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='34' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_Z4blehv'>
+      <return type-id='type-id-1'/>
+    </function-decl>
+  </abi-instr>
+</abi-corpus>
diff --git a/tests/data/test-abidiff-exit/test-squished-v1.abi b/tests/data/test-abidiff-exit/test-squished-v1.abi
new file mode 100644
index 00000000..3ffa272b
--- /dev/null
+++ b/tests/data/test-abidiff-exit/test-squished-v1.abi
@@ -0,0 +1 @@ 
+<abi-corpus version='2.0' path='data/test-read-dwarf/test6.so'><elf-needed><dependency name='libstdc++.so.6'/><dependency name='libm.so.6'/><dependency name='libgcc_s.so.1'/><dependency name='libc.so.6'/></elf-needed><elf-function-symbols><elf-symbol name='_Z3barv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_Z4blehv' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_ZN1B3fooEv' type='func-type' binding='weak-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_fini' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_init' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/></elf-function-symbols><elf-variable-symbols><elf-symbol name='_ZN1CIiE3barE' size='4' type='object-type' binding='gnu-unique-binding' visibility='default-visibility' is-defined='yes'/><elf-symbol name='_ZZN1B3fooEvE1a' size='4' type='object-type' binding='gnu-unique-binding' visibility='default-visibility' is-defined='yes'/></elf-variable-symbols><abi-instr address-size='64' path='test6.cc' comp-dir-path='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf' language='LANG_C_plus_plus'><type-decl name='int' size-in-bits='32' id='type-id-1'/><class-decl name='B' size-in-bits='8' is-struct='yes' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='9' column='1' id='type-id-2'><member-function access='public'><function-decl name='foo' mangled-name='_ZN1B3fooEv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='11' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_ZN1B3fooEv'><parameter type-id='type-id-3' name='this' is-artificial='yes'/><return type-id='type-id-1'/></function-decl></member-function></class-decl><class-decl name='C&lt;int&gt;' size-in-bits='8' is-struct='yes' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='26' column='1' id='type-id-4'><data-member access='public' static='yes'><var-decl name='bar' type-id='type-id-1' mangled-name='_ZN1CIiE3barE' visibility='default' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='31' column='1' elf-symbol-id='_ZN1CIiE3barE'/></data-member></class-decl><pointer-type-def type-id='type-id-2' size-in-bits='64' id='type-id-5'/><qualified-type-def type-id='type-id-5' const='yes' id='type-id-3'/><function-decl name='bar' mangled-name='_Z3barv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='19' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_Z3barv'><return type-id='type-id-1'/></function-decl><function-decl name='bleh' mangled-name='_Z4blehv' filepath='/home/skumari/Tasks/source_repo/dodji/libabigail/tests/data/test-read-dwarf/test6.cc' line='34' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_Z4blehv'><return type-id='type-id-1'/></function-decl></abi-instr></abi-corpus>
\ No newline at end of file
diff --git a/tests/test-abidiff-exit.cc b/tests/test-abidiff-exit.cc
index 4283e145..f1526b9e 100644
--- a/tests/test-abidiff-exit.cc
+++ b/tests/test-abidiff-exit.cc
@@ -392,6 +392,17 @@  InOutSpec in_out_specs[] =
     "data/test-abidiff-exit/test-non-leaf-array-report.txt",
     "output/test-abidiff-exit/test-non-leaf-array-report.txt"
   },
+  {
+    "data/test-abidiff-exit/test-squished-v0.abi",
+    "data/test-abidiff-exit/test-squished-v1.abi",
+    "",
+    "",
+    "",
+    "--harmless",
+    abigail::tools_utils::ABIDIFF_OK,
+    "data/test-abidiff-exit/test-squished-report.txt",
+    "output/test-abidiff-exit/test-squished-report.txt"
+  },
   {0, 0, 0 ,0, 0, 0, abigail::tools_utils::ABIDIFF_OK, 0, 0}
 };