[2/2,RFC] Allow restricting analyzed decls to exported symbols

Message ID 87sfl4izh9.fsf@seketeli.org
State New
Headers
Series Speed up type DIEs canonicalization |

Commit Message

Dodji Seketeli Sept. 6, 2022, 10:11 a.m. UTC
  Hello,

Profiling showed that the DWARF reader scans too much data.

Basically, in build_translation_unit_and_add_to_ir,
build_ir_node_from_die is called on every single DIE that is seen, for
a given translation unit.

There are interfaces (function and variable decls) that are not
associated with exported ELF symbols and that are analyzed by
build_ir_node_from_die nonetheless.  For instance, interfaces that are
visible outside of their translation units are analyzed and the types
that are reachable from those interfaces are analyzed as well.

Once that is done, an ABI corpus is built with the subset of
interfaces that have exported ELF symbol (strictly those that are part
of the ABI), but types that are not necessarily reachable from those
ABI interfaces can also be put into the ABI corpus.

Some tools make use of this "lose" behaviour of libabigail.  For
instance, abicompat precisely wants to analyze interfaces with
undefined symbols.  For an application, those interfaces represents
the interfaces that the application expects to be provided by some
shared library.

When analyzing the exported interface of the Linux Kernel (or any
other huge application) however, analyzing more types than necessary
appears to incur a huge time penalty.

So, this patch introduces an optional behaviour whereby
build_translation_unit_and_add_to_ir is restricted to analyzing
interfaces that have exported ELF symbols only.  So only the types
reachable from those interfaces are analyzed.  This more than halves
the time spent by "abidw --noout vmlinux".

Strictly speaking, this new behaviour is triggered by a new option named
--exported-interfaces-only, supported by the tools abidw, abidiff,
abipkgdiff and kmidiff.

When looking at the Linux Kernel however, this option is enabled by
default.

Note that an option --allow-non-exported-interfaces is also introduce
to function under the previous model of operations.  This option is
enabled by default on all the tools when they are not looking at the
Linux Kernel.

With this enabled, analyzing the Linux Kernel is back to taking less
than a minute on a reasonable machine.

	* doc/manuals/tools-use-libabigail.txt: New doc text.
	* doc/manuals/Makefile.am: Add the new tools-use-libabigail.rst
	tool to the source distribution.
	* doc/manuals/abidiff.rst: Include the new
	tools-use-libabigail.rst.  Document the --exported-interfaces-only
	and --allow-non-exported-interfaces.
	* doc/manuals/abidw.rst: Likewise.
	* doc/manuals/abipkgdiff.rst: Likewise.
	* doc/manuals/kmidiff.rst: Likewise.
	* include/abg-ir.h
	(environment::{user_set_analyze_exported_interfaces_only,
	analyze_exported_interfaces_only}): Declare new accessors.
	* src/abg-ir.cc
	(environment::{user_set_analyze_exported_interfaces_only,
	analyze_exported_interfaces_only}): Define new accessors.
	* src/abg-dwarf-reader.cc (die_is_variable_decl)
	(die_is_function_decl): Define new static functions.
	(read_context::is_decl_die_with_exported_symbol): Define new
	member function.
	(read_context::get_{function,variable}_address): Const-ify the
	Dwarf_Die* parameter.
	(build_translation_unit_and_add_to_ir): If the user asks to
	analyze exported interfaces only,  the analyze only interfaces
	that have exported ELF symbols.
	(read_debug_info_into_corpus): If we are looking at the Linux
	Kernel, then only analyze exported interfaces unless the user asks
	otherwise.
	* src/abg-ir-priv.h
	(environment::priv::analyze_exported_interfaces_only_): Define new
	data member.
	* tools/abidiff.cc (options::exported_interfaces_only): Define new
	data member.
	(display_usage): Add new help strings for
	--exported-interfaces-only and --allow-non-exported-interfaces.
	(parse_command_line): Parse the new options
	--exported-interfaces-only and --allow-non-exported-interfaces.
	(main): Pass the value of opts.exported_interfaces_only to the
	environment.
	* tools/abidw.cc (options::exported_interfaces_only): Define new
	data member.
	(display_usage): Add new help strings for
	--exported-interfaces-only and --allow-non-exported-interfaces.
	(parse_command_line): Parse the new options
	(load_corpus_and_write_abixml)
	(load_kernel_corpus_group_and_write_abixml): Pass the value of
	opts.exported_interfaces_only onto the environment.
	* tools/abipkgdiff.cc (options::exported_interfaces_only): Define new
	data member.
	(display_usage): Add new help strings for
	--exported-interfaces-only and --allow-non-exported-interfaces.
	(parse_command_line): Parse the new options
	(compare_task::perform, self_compare_task::perform): Pass the
	value of opts.exported_interfaces_only onto the environment.
	(compare_prepared_linux_kernel_packages): Likewise.
	* tools/kmidiff.cc(options::exported_interfaces_only): Define new
	data member.
	(display_usage): Add new help strings for
	--exported-interfaces-only and --allow-non-exported-interfaces.
	(parse_command_line): Parse the new options
	(main): Pass the value of opts.exported_interfaces_only onto the
	environment.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 doc/manuals/Makefile.am              |   3 +-
 doc/manuals/abidiff.rst              |  52 ++++++++++++
 doc/manuals/abidw.rst                |  82 ++++++++++++++++---
 doc/manuals/abipkgdiff.rst           |  51 ++++++++++++
 doc/manuals/kmidiff.rst              |  52 +++++++++++-
 doc/manuals/tools-use-libabigail.txt |  16 ++++
 include/abg-ir.h                     |   9 +++
 src/abg-dwarf-reader.cc              | 113 ++++++++++++++++++++++++---
 src/abg-ir-priv.h                    |   2 +
 src/abg-ir.cc                        |  36 +++++++++
 tools/abidiff.cc                     |  12 +++
 tools/abidw.cc                       |  15 ++++
 tools/abipkgdiff.cc                  |  21 +++++
 tools/kmidiff.cc                     |  12 +++
 14 files changed, 449 insertions(+), 27 deletions(-)
 create mode 100644 doc/manuals/tools-use-libabigail.txt
  

Comments

Giuliano Procida Sept. 9, 2022, 1:03 p.m. UTC | #1
Hi Dodji.

Sorry for the late reply. I was down with Covid for a while.

I feel this commit deserves some feedback.

My understanding is that the intention here is to make the DWARF
reader do less work (look at fewer type DIEs) than at present.

We are actually hoping that we may be able to make the DWARF reader
look at more type DIEs so that it is more likely to pick up full
definitions of types instead of declarations.

The rationale behind the change appears to be that DWARF processing is
expensive, in particular for kernel ABIs. I would say "measure first".
Here's roughly how I think about things:

1. building the IR is very cheap

A kernel ABI may end up with 40k IR elements. The cost of allocating
memory and calling constructors should be negligible. Any improvements
to this end of things is pointless.

2. reading DWARF information is fairly cheap

We may have 100MB of DWARF but just reading the data (decoding
attribute formats in particular) won't take that long.

Reducing the number of DIEs examined at the top-level by a factor of 2
will speed up this part by a factor of 2, but in the grand scheme of
things that may not be very important.

3. chasing references is a bit more expensive

Cross-references in DWARF are pretty common and the lack of locality
means that chasing cross-references is going to be a constant factor
slower than iterating through the main DWARF tree.

4. deciding whether a DIE needs to be turned into IR is currently very expensive

This is because it involves multiple look-ups and recursive comparison
of DIEs which cannot be unconditionally memoised.

Those are only my thoughts. Some profiling should give a more accurate picture.

I was curious, so I did an analysis of the connectivity of a kernel
ABI (using the STG IR, not libabigail's - there are minor
differences). Here are some fun facts.

The ABI has 34541 nodes.
There are 25196 strongly-connected components.
25053 SCCs are just singleton nodes.
The largest 3 SCCs have sizes: 4960, 784, 343.
1/7 of the ABI nodes are in one SCC!

Completely idle speculation: Perhaps the really huge SCC contributes
significantly to comparison cost.

Regards,
Giuliano.


On Tue, 6 Sept 2022 at 11:11, Dodji Seketeli <dodji@seketeli.org> wrote:
>
> Hello,
>
> Profiling showed that the DWARF reader scans too much data.
>
> Basically, in build_translation_unit_and_add_to_ir,
> build_ir_node_from_die is called on every single DIE that is seen, for
> a given translation unit.
>
> There are interfaces (function and variable decls) that are not
> associated with exported ELF symbols and that are analyzed by
> build_ir_node_from_die nonetheless.  For instance, interfaces that are
> visible outside of their translation units are analyzed and the types
> that are reachable from those interfaces are analyzed as well.
>
> Once that is done, an ABI corpus is built with the subset of
> interfaces that have exported ELF symbol (strictly those that are part
> of the ABI), but types that are not necessarily reachable from those
> ABI interfaces can also be put into the ABI corpus.
>
> Some tools make use of this "lose" behaviour of libabigail.  For
> instance, abicompat precisely wants to analyze interfaces with
> undefined symbols.  For an application, those interfaces represents
> the interfaces that the application expects to be provided by some
> shared library.
>
> When analyzing the exported interface of the Linux Kernel (or any
> other huge application) however, analyzing more types than necessary
> appears to incur a huge time penalty.
>
> So, this patch introduces an optional behaviour whereby
> build_translation_unit_and_add_to_ir is restricted to analyzing
> interfaces that have exported ELF symbols only.  So only the types
> reachable from those interfaces are analyzed.  This more than halves
> the time spent by "abidw --noout vmlinux".
>
> Strictly speaking, this new behaviour is triggered by a new option named
> --exported-interfaces-only, supported by the tools abidw, abidiff,
> abipkgdiff and kmidiff.
>
> When looking at the Linux Kernel however, this option is enabled by
> default.
>
> Note that an option --allow-non-exported-interfaces is also introduce
> to function under the previous model of operations.  This option is
> enabled by default on all the tools when they are not looking at the
> Linux Kernel.
>
> With this enabled, analyzing the Linux Kernel is back to taking less
> than a minute on a reasonable machine.
>
>         * doc/manuals/tools-use-libabigail.txt: New doc text.
>         * doc/manuals/Makefile.am: Add the new tools-use-libabigail.rst
>         tool to the source distribution.
>         * doc/manuals/abidiff.rst: Include the new
>         tools-use-libabigail.rst.  Document the --exported-interfaces-only
>         and --allow-non-exported-interfaces.
>         * doc/manuals/abidw.rst: Likewise.
>         * doc/manuals/abipkgdiff.rst: Likewise.
>         * doc/manuals/kmidiff.rst: Likewise.
>         * include/abg-ir.h
>         (environment::{user_set_analyze_exported_interfaces_only,
>         analyze_exported_interfaces_only}): Declare new accessors.
>         * src/abg-ir.cc
>         (environment::{user_set_analyze_exported_interfaces_only,
>         analyze_exported_interfaces_only}): Define new accessors.
>         * src/abg-dwarf-reader.cc (die_is_variable_decl)
>         (die_is_function_decl): Define new static functions.
>         (read_context::is_decl_die_with_exported_symbol): Define new
>         member function.
>         (read_context::get_{function,variable}_address): Const-ify the
>         Dwarf_Die* parameter.
>         (build_translation_unit_and_add_to_ir): If the user asks to
>         analyze exported interfaces only,  the analyze only interfaces
>         that have exported ELF symbols.
>         (read_debug_info_into_corpus): If we are looking at the Linux
>         Kernel, then only analyze exported interfaces unless the user asks
>         otherwise.
>         * src/abg-ir-priv.h
>         (environment::priv::analyze_exported_interfaces_only_): Define new
>         data member.
>         * tools/abidiff.cc (options::exported_interfaces_only): Define new
>         data member.
>         (display_usage): Add new help strings for
>         --exported-interfaces-only and --allow-non-exported-interfaces.
>         (parse_command_line): Parse the new options
>         --exported-interfaces-only and --allow-non-exported-interfaces.
>         (main): Pass the value of opts.exported_interfaces_only to the
>         environment.
>         * tools/abidw.cc (options::exported_interfaces_only): Define new
>         data member.
>         (display_usage): Add new help strings for
>         --exported-interfaces-only and --allow-non-exported-interfaces.
>         (parse_command_line): Parse the new options
>         (load_corpus_and_write_abixml)
>         (load_kernel_corpus_group_and_write_abixml): Pass the value of
>         opts.exported_interfaces_only onto the environment.
>         * tools/abipkgdiff.cc (options::exported_interfaces_only): Define new
>         data member.
>         (display_usage): Add new help strings for
>         --exported-interfaces-only and --allow-non-exported-interfaces.
>         (parse_command_line): Parse the new options
>         (compare_task::perform, self_compare_task::perform): Pass the
>         value of opts.exported_interfaces_only onto the environment.
>         (compare_prepared_linux_kernel_packages): Likewise.
>         * tools/kmidiff.cc(options::exported_interfaces_only): Define new
>         data member.
>         (display_usage): Add new help strings for
>         --exported-interfaces-only and --allow-non-exported-interfaces.
>         (parse_command_line): Parse the new options
>         (main): Pass the value of opts.exported_interfaces_only onto the
>         environment.
>
> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
> ---
>  doc/manuals/Makefile.am              |   3 +-
>  doc/manuals/abidiff.rst              |  52 ++++++++++++
>  doc/manuals/abidw.rst                |  82 ++++++++++++++++---
>  doc/manuals/abipkgdiff.rst           |  51 ++++++++++++
>  doc/manuals/kmidiff.rst              |  52 +++++++++++-
>  doc/manuals/tools-use-libabigail.txt |  16 ++++
>  include/abg-ir.h                     |   9 +++
>  src/abg-dwarf-reader.cc              | 113 ++++++++++++++++++++++++---
>  src/abg-ir-priv.h                    |   2 +
>  src/abg-ir.cc                        |  36 +++++++++
>  tools/abidiff.cc                     |  12 +++
>  tools/abidw.cc                       |  15 ++++
>  tools/abipkgdiff.cc                  |  21 +++++
>  tools/kmidiff.cc                     |  12 +++
>  14 files changed, 449 insertions(+), 27 deletions(-)
>  create mode 100644 doc/manuals/tools-use-libabigail.txt
>
> diff --git a/doc/manuals/Makefile.am b/doc/manuals/Makefile.am
> index 894b38f1..e2813785 100644
> --- a/doc/manuals/Makefile.am
> +++ b/doc/manuals/Makefile.am
> @@ -14,7 +14,8 @@ libabigail-concepts.rst \
>  libabigail-overview.rst \
>  libabigail-tools.rst \
>  fedabipkgdiff.rst \
> -kmidiff.rst
> +kmidiff.rst \
> +tools-use-libabigail.txt
>
>  # You can set these variables from the command line.
>  SPHINXOPTS    =
> diff --git a/doc/manuals/abidiff.rst b/doc/manuals/abidiff.rst
> index a15515be..0c711d9e 100644
> --- a/doc/manuals/abidiff.rst
> +++ b/doc/manuals/abidiff.rst
> @@ -18,6 +18,8 @@ be accompanied with their debug information in `DWARF`_ format.
>  Otherwise, only `ELF`_ symbols that were added or removed are
>  reported.
>
> +.. include:: tools-use-libabigail.txt
> +
>  .. _abidiff_invocation_label:
>
>  Invocation
> @@ -197,6 +199,56 @@ Options
>      consumption of the tool on binaries with a lot of publicly defined
>      and exported types.
>
> +  * ``--exported-interfaces-only``
> +
> +    By default, when looking at the debug information accompanying a
> +    binary, this tool analyzes the descriptions of the types reachable
> +    by the interfaces (functions and variables) that are visible
> +    outside of their translation unit.  Once that analysis is done, an
> +    ABI corpus is constructed by only considering the subset of types
> +    reachable from interfaces associated to `ELF`_ symbols that are
> +    defined and exported by the binary.  It's those final ABI Corpora
> +    that are compared by this tool.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, especially when those binaries are
> +    applications, as opposed to shared libraries.  One example of such
> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
> +    corpora like these can be extremely slow.
> +
> +    To mitigate that performance issue, this option allows libabigail
> +    to only analyze types that are reachable from interfaces
> +    associated with defined and exported `ELF`_ symbols.
> +
> +    Note that this option is turned on by default when analyzing the
> +    `Linux Kernel`_.  Otherwise, it's turned off by default.
> +
> +  * ``--allow-non-exported-interfaces``
> +
> +    When looking at the debug information accompanying a binary, this
> +    tool analyzes the descriptions of the types reachable by the
> +    interfaces (functions and variables) that are visible outside of
> +    their translation unit.  Once that analysis is done, an ABI corpus
> +    is constructed by only considering the subset of types reachable
> +    from interfaces associated to `ELF`_ symbols that are defined and
> +    exported by the binary.  It's those final ABI Corpora that are
> +    compared by this tool.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, especially when those binaries are
> +    applications, as opposed to shared libraries.  One example of such
> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
> +    Corpora like these can be extremely slow.
> +
> +    In the presence of an "average sized" binary however one can
> +    afford having libabigail analyze all interfaces that are visible
> +    outside of their translation unit, using this option.
> +
> +    Note that this option is turned on by default, unless we are in
> +    the presence of the `Linux Kernel`_.
> +
>    * ``--stat``
>
>      Rather than displaying the detailed ABI differences between
> diff --git a/doc/manuals/abidw.rst b/doc/manuals/abidw.rst
> index bdd6204d..a3055c7e 100644
> --- a/doc/manuals/abidw.rst
> +++ b/doc/manuals/abidw.rst
> @@ -12,14 +12,19 @@ defined ELF symbols of the file.  The input shared library must
>  contain associated debug information in `DWARF`_ format.
>
>  When given the ``--linux-tree`` option, this program can also handle a
> -Linux kernel tree.  That is, a directory tree that contains both the
> -vmlinux binary and Linux kernel modules.  It analyses those Linux
> -kernel binaries and emits an XML representation of the interface
> -between the kernel and its module, to standard output.  In this case,
> -we don't call it an ABI, but a KMI (Kernel Module Interface).  The
> -emitted KMI includes all the globally defined functions and variables,
> -along with a complete representation of their types.  The input
> -binaries must contain associated debug information in `DWARF`_ format.
> +`Linux kernel`_ tree.  That is, a directory tree that contains both
> +the vmlinux binary and `Linux Kernel`_ modules.  It analyses those
> +`Linux Kernel`_ binaries and emits an XML representation of the
> +interface between the kernel and its module, to standard output.  In
> +this case, we don't call it an ABI, but a KMI (Kernel Module
> +Interface).  The emitted KMI includes all the globally defined
> +functions and variables, along with a complete representation of their
> +types.  The input binaries must contain associated debug information
> +in `DWARF`_ format.
> +
> +.. include:: tools-use-libabigail.txt
> +
> +.. _abidiff_invocation_label:
>
>  Invocation
>  ==========
> @@ -92,7 +97,7 @@ Options
>
>    * ``--kmi-whitelist | -kaw`` <*path-to-whitelist*>
>
> -    When analyzing a Linux kernel binary, this option points to the
> +    When analyzing a `Linux Kernel`_ binary, this option points to the
>      white list of names of ELF symbols of functions and variables
>      which ABI must be written out.  That white list is called a "
>      Kernel Module Interface white list".  This is because for the
> @@ -105,7 +110,7 @@ Options
>
>      If this option is not provided -- thus if no white list is
>      provided -- then the entire KMI, that is, all publicly defined and
> -    exported functions and global variables by the Linux Kernel
> +    exported functions and global variables by the `Linux Kernel`_
>      binaries is emitted.
>
>    * ``--linux-tree | --lt``
> @@ -115,9 +120,10 @@ Options
>      In that case, this program emits the representation of the Kernel
>      Module Interface (KMI) on the standard output.
>
> -    Below is an example of usage of ``abidw`` on a Linux Kernel tree.
> +    Below is an example of usage of ``abidw`` on a `Linux Kernel`_
> +    tree.
>
> -    First, checkout a Linux kernel source tree and build it.  Then
> +    First, checkout a `Linux Kernel`_ source tree and build it.  Then
>      install the kernel modules in a directory somewhere.  Copy the
>      vmlinux binary into that directory too.  And then serialize the
>      KMI of that kernel to disk, using ``abidw``: ::
> @@ -171,6 +177,56 @@ Options
>      representation build by Libabigail to represent the ABI and will
>      not end up in the abi XML file.
>
> +  * ``--exported-interfaces-only``
> +
> +    By default, when looking at the debug information accompanying a
> +    binary, this tool analyzes the descriptions of the types reachable
> +    by the interfaces (functions and variables) that are visible
> +    outside of their translation unit.  Once that analysis is done, an
> +    ABI corpus is constructed by only considering the subset of types
> +    reachable from interfaces associated to `ELF`_ symbols that are
> +    defined and exported by the binary.  It's that final ABI corpus
> +    which textual representation is saved as ``ABIXML``.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, especially when those binaries are
> +    applications, as opposed to shared libraries.  One example of such
> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
> +    corpora like these can be extremely slow.
> +
> +    To mitigate that performance issue, this option allows libabigail
> +    to only analyze types that are reachable from interfaces
> +    associated with defined and exported `ELF`_ symbols.
> +
> +    Note that this option is turned on by default when analyzing the
> +    `Linux Kernel`_.  Otherwise, it's turned off by default.
> +
> +  * ``--allow-non-exported-interfaces``
> +
> +    When looking at the debug information accompanying a binary, this
> +    tool analyzes the descriptions of the types reachable by the
> +    interfaces (functions and variables) that are visible outside of
> +    their translation unit.  Once that analysis is done, an ABI corpus
> +    is constructed by only considering the subset of types reachable
> +    from interfaces associated to `ELF`_ symbols that are defined and
> +    exported by the binary.  It's that final ABI corpus which textual
> +    representation is saved as ``ABIXML``.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, especially when those binaries are
> +    applications, as opposed to shared libraries.  One example of such
> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
> +    corpora like these can be extremely slow.
> +
> +    In the presence of an "average sized" binary however one can
> +    afford having libabigail analyze all interfaces that are visible
> +    outside of their translation unit, using this option.
> +
> +    Note that this option is turned on by default, unless we are in
> +    the presence of the `Linux Kernel`_.
> +
>    * ``--no-linux-kernel-mode``
>
>      Without this option, if abipkgiff detects that the binaries it is
> @@ -308,4 +364,4 @@ standard `here
>  .. _ELF: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
>  .. _DWARF: http://www.dwarfstd.org
>  .. _GNU: http://www.gnu.org
> -
> +.. _Linux Kernel: https://kernel.org/
> diff --git a/doc/manuals/abipkgdiff.rst b/doc/manuals/abipkgdiff.rst
> index 15ea9072..9114775a 100644
> --- a/doc/manuals/abipkgdiff.rst
> +++ b/doc/manuals/abipkgdiff.rst
> @@ -19,6 +19,7 @@ information directly in a section of said binaries.  In those cases,
>  obviously, no separate debug information package is needed as the tool
>  will find the debug information inside the binaries.
>
> +.. include:: tools-use-libabigail.txt
>
>  .. _abipkgdiff_invocation_label:
>
> @@ -277,6 +278,56 @@ Options
>      global functions and variables are analyzed, so the tool detects
>      and reports changes on these reachable types only.
>
> +  * ``--exported-interfaces-only``
> +
> +    By default, when looking at the debug information accompanying a
> +    binary, this tool analyzes the descriptions of the types reachable
> +    by the interfaces (functions and variables) that are visible
> +    outside of their translation unit.  Once that analysis is done, an
> +    ABI corpus is constructed by only considering the subset of types
> +    reachable from interfaces associated to `ELF`_ symbols that are
> +    defined and exported by the binary.  It's those final ABI Corpora
> +    that are compared by this tool.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, especially when those binaries are
> +    applications, as opposed to shared libraries.  One example of such
> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
> +    corpora like these can be extremely slow.
> +
> +    To mitigate that performance issue, this option allows libabigail
> +    to only analyze types that are reachable from interfaces
> +    associated with defined and exported `ELF`_ symbols.
> +
> +    Note that this option is turned on by default when analyzing the
> +    `Linux Kernel`_.  Otherwise, it's turned off by default.
> +
> +  * ``--allow-non-exported-interfaces``
> +
> +    When looking at the debug information accompanying a binary, this
> +    tool analyzes the descriptions of the types reachable by the
> +    interfaces (functions and variables) that are visible outside of
> +    their translation unit.  Once that analysis is done, an ABI corpus
> +    is constructed by only considering the subset of types reachable
> +    from interfaces associated to `ELF`_ symbols that are defined and
> +    exported by the binary.  It's those final ABI Corpora that are
> +    compared by this tool.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, especially when those binaries are
> +    applications, as opposed to shared libraries.  One example of such
> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
> +    Corpora like these can be extremely slow.
> +
> +    In the presence of an "average sized" binary however one can
> +    afford having libabigail analyze all interfaces that are visible
> +    outside of their translation unit, using this option.
> +
> +    Note that this option is turned on by default, unless we are in
> +    the presence of the `Linux Kernel`_.
> +
>    *  ``--redundant``
>
>      In the diff reports, do display redundant changes.  A redundant
> diff --git a/doc/manuals/kmidiff.rst b/doc/manuals/kmidiff.rst
> index ce8168ae..53010189 100644
> --- a/doc/manuals/kmidiff.rst
> +++ b/doc/manuals/kmidiff.rst
> @@ -55,6 +55,10 @@ command line looks like: ::
>                        linux/v4.5/build/modules \
>                        linux/v4.6/build/modules
>
> +
> +.. include:: tools-use-libabigail.txt
> +
> +
>  Invocation
>  ==========
>
> @@ -67,8 +71,8 @@ Environment
>
>  By default, ``kmidiff`` compares all the interfaces (exported
>  functions and variables) between the Kernel and its modules.  In
> -practice, though, users want to compare a subset of the those
> -interfaces.
> +practice, though, some users might want to compare a subset of the
> +those interfaces.
>
>  Users can then define a "white list" of the interfaces to compare.
>  Such a white list is a just a file in the "INI" format that looks
> @@ -91,8 +95,11 @@ function or variable.  Only those interfaces along with the types
>  reachable from their signatures are going to be compared by
>  ``kmidiff`` recursively.
>
> -Note that kmidiff compares the interfaces exported by the ``vmlinux``
> -binary and by the all of the compiled modules.
> +Note that by default kmidiff analyzes the types reachable from the
> +interfaces associated with `ELF`_ symbols that are defined and
> +exported by the `Linux Kernel`_ as being the union of the ``vmlinux``
> +binary and all its compiled modules.  It then compares those
> +interfaces (along with their types).
>
>  Options
>  =======
> @@ -180,6 +187,38 @@ Options
>      exported interfaces.  This is the default kind of report emitted
>      by tools like ``abidiff`` or ``abipkgdiff``.
>
> +  * ``--exported-interfaces-only``
> +
> +    When using this option, this tool analyzes the descriptions of the
> +    types reachable by the interfaces (functions and variables)
> +    associated with `ELF`_ symbols that are defined and exported by
> +    the `Linux Kernel`_.
> +
> +    Otherwise, the tool also has the ability to analyze the
> +    descriptions of the types reachable by the interfaces associated
> +    with `ELF`_ symbols that are visible outside their translation
> +    unit.  This later possibility is however much more resource
> +    intensive and results in much slower operations.
> +
> +    That is why this option is enabled by default.
> +
> +
> +  * ``--allow-non-exported-interfaces``
> +
> +    When using this option, this tool analyzes the descriptions of the
> +    types reachable by the interfaces (functions and variables) that
> +    are visible outside of their translation unit.  Once that analysis
> +    is done, an ABI Corpus is constructed by only considering the
> +    subset of types reachable from interfaces associated to `ELF`_
> +    symbols that are defined and exported by the binary.  It's that
> +    final ABI corpus which is compared against another one.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, leading to very slow operations.
> +
> +    Note that this option is turned off by default.
> +
>    * ``--show-bytes``
>
>      Show sizes and offsets in bytes, not bits.  This option is
> @@ -198,3 +237,8 @@ Options
>    * ``--show-dec``
>
>      Show sizes and offsets in decimal base.
> +
> +
> +.. _ELF: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
> +.. _ksymtab: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
> +.. _Linux Kernel: https://kernel.org
> diff --git a/doc/manuals/tools-use-libabigail.txt b/doc/manuals/tools-use-libabigail.txt
> new file mode 100644
> index 00000000..43edf296
> --- /dev/null
> +++ b/doc/manuals/tools-use-libabigail.txt
> @@ -0,0 +1,16 @@
> +This tool uses the libabigail library to analyze the binary as well as its
> +associated debug information.  Here is its general mode of operation.
> +
> +When instructed to do so, a binary and its associated debug
> +information is read and analyzed.  To that effect, libabigail analyzes
> +by default the descriptions of the types reachable by the interfaces
> +(functions and variables) that are visible outside of their
> +translation unit.  Once that analysis is done, an Application Binary
> +Interface Corpus is constructed by only considering the subset of
> +types reachable from interfaces associated to `ELF`_ symbols that are
> +defined and exported by the binary.  It's that final ABI corpus which
> +libabigail considers as representing the ABI of the analyzed binary.
> +
> +Libabigail then has capabilities to generate textual representations
> +of ABI Corpora, compare them, analyze their changes and report about
> +them.
> diff --git a/include/abg-ir.h b/include/abg-ir.h
> index a857d041..61338edb 100644
> --- a/include/abg-ir.h
> +++ b/include/abg-ir.h
> @@ -197,6 +197,15 @@ public:
>    const config&
>    get_config() const;
>
> +  bool
> +  user_set_analyze_exported_interfaces_only() const;
> +
> +  void
> +  analyze_exported_interfaces_only(bool f);
> +
> +  bool
> +  analyze_exported_interfaces_only() const;
> +
>  #ifdef WITH_DEBUG_SELF_COMPARISON
>    void
>    set_self_comparison_debug_input(const corpus_sptr& corpus);
> diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
> index e41172c1..cba89664 100644
> --- a/src/abg-dwarf-reader.cc
> +++ b/src/abg-dwarf-reader.cc
> @@ -402,6 +402,12 @@ die_is_decl(const Dwarf_Die* die);
>  static bool
>  die_is_declaration_only(Dwarf_Die* die);
>
> +static bool
> +die_is_variable_decl(const Dwarf_Die *die);
> +
> +static bool
> +die_is_function_decl(const Dwarf_Die *die);
> +
>  static bool
>  die_has_size_attribute(const Dwarf_Die *die);
>
> @@ -5303,6 +5309,44 @@ public:
>      return symbol;
>    }
>
> +  /// Test if a DIE represents a decl (function or variable) that has
> +  /// a symbol that is exported, whatever that means.  This is
> +  /// supposed to work for Linux Kernel binaries as well.
> +  ///
> +  /// This is useful to limit the amount of DIEs taken into account to
> +  /// the strict limit of what an ABI actually means.  Limiting the
> +  /// volume of DIEs analyzed this way is an important optimization to
> +  /// keep big binaries "manageable" by libabigail.
> +  ///
> +  /// @param DIE the die to consider.
> +  bool
> +  is_decl_die_with_exported_symbol(const Dwarf_Die *die)
> +  {
> +    if (!die || !die_is_decl(die))
> +      return false;
> +
> +    bool result = false, address_found = false, symbol_is_exported = false;;
> +    Dwarf_Addr decl_symbol_address = 0;
> +
> +    if (die_is_variable_decl(die))
> +      {
> +       if ((address_found = get_variable_address(die, decl_symbol_address)))
> +         symbol_is_exported =
> +           !!variable_symbol_is_exported(decl_symbol_address);
> +      }
> +    else if (die_is_function_decl(die))
> +      {
> +       if ((address_found = get_function_address(die, decl_symbol_address)))
> +         symbol_is_exported =
> +           !!function_symbol_is_exported(decl_symbol_address);
> +      }
> +
> +    if (address_found)
> +      result = symbol_is_exported;
> +
> +    return result;
> +  }
> +
>    /// Getter for the symtab reader. Will load the symtab from the elf handle if
>    /// not yet set.
>    ///
> @@ -5580,16 +5624,18 @@ public:
>    ///
>    /// @return true if the function address was found.
>    bool
> -  get_function_address(Dwarf_Die* function_die, Dwarf_Addr& address) const
> +  get_function_address(const Dwarf_Die* function_die, Dwarf_Addr& address) const
>    {
> -    if (!die_address_attribute(function_die, DW_AT_low_pc, address))
> +    if (!die_address_attribute(const_cast<Dwarf_Die*>(function_die),
> +                              DW_AT_low_pc, address))
>        // So no DW_AT_low_pc was found.  Let's see if the function DIE
>        // has got a DW_AT_ranges attribute instead.  If it does, the
>        // first address of the set of addresses represented by the
>        // value of that DW_AT_ranges represents the function (symbol)
>        // address we are looking for.
> -      if (!get_first_exported_fn_address_from_DW_AT_ranges(function_die,
> -                                                          address))
> +      if (!get_first_exported_fn_address_from_DW_AT_ranges
> +         (const_cast<Dwarf_Die*>(function_die),
> +          address))
>         return false;
>
>      address = maybe_adjust_fn_sym_address(address);
> @@ -5611,11 +5657,12 @@ public:
>    ///
>    /// @return true if the variable address was found.
>    bool
> -  get_variable_address(Dwarf_Die*      variable_die,
> +  get_variable_address(const Dwarf_Die* variable_die,
>                        Dwarf_Addr&      address) const
>    {
>      bool is_tls_address = false;
> -    if (!die_location_address(variable_die, address, is_tls_address))
> +    if (!die_location_address(const_cast<Dwarf_Die*>(variable_die),
> +                             address, is_tls_address))
>        return false;
>      if (!is_tls_address)
>        address = maybe_adjust_var_sym_address(address);
> @@ -7155,6 +7202,40 @@ die_is_declaration_only(Dwarf_Die* die)
>    return false;
>  }
>
> +/// Test if a DIE is for a function decl.
> +///
> +/// @param die the DIE to consider.
> +///
> +/// @return true iff @p die represents a function decl.
> +static bool
> +die_is_function_decl(const Dwarf_Die *die)
> +{
> +  if (!die)
> +    return false;
> +
> +  int tag = dwarf_tag(const_cast<Dwarf_Die*>(die));
> +  if (tag == DW_TAG_subprogram)
> +    return true;
> +  return false;
> +}
> +
> +/// Test if a DIE is for a variable decl.
> +///
> +/// @param die the DIE to consider.
> +///
> +/// @return true iff @p die represents a variable decl.
> +static bool
> +die_is_variable_decl(const Dwarf_Die *die)
> +{
> +    if (!die)
> +    return false;
> +
> +  int tag = dwarf_tag(const_cast<Dwarf_Die*>(die));
> +  if (tag == DW_TAG_variable)
> +    return true;
> +  return false;
> +}
> +
>  /// Test if a DIE has size attribute.
>  ///
>  /// @param die the DIE to consider.
> @@ -12690,9 +12771,13 @@ build_translation_unit_and_add_to_ir(read_context&     ctxt,
>    result->set_is_constructed(false);
>
>    do
> -    build_ir_node_from_die(ctxt, &child,
> -                          die_is_public_decl(&child),
> -                          dwarf_dieoffset(&child));
> +    // Analyze all the DIEs we encounter unless we are asked to only
> +    // analyze exported interfaces and the types reachables from them.
> +    if (!ctxt.env()->analyze_exported_interfaces_only()
> +       || ctxt.is_decl_die_with_exported_symbol(&child))
> +      build_ir_node_from_die(ctxt, &child,
> +                            die_is_public_decl(&child),
> +                            dwarf_dieoffset(&child));
>    while (dwarf_siblingof(&child, &child) == 0);
>
>    if (!ctxt.var_decls_to_re_add_to_tree().empty())
> @@ -15699,6 +15784,16 @@ read_debug_info_into_corpus(read_context& ctxt)
>      origin |= corpus::LINUX_KERNEL_BINARY_ORIGIN;
>    ctxt.current_corpus()->set_origin(origin);
>
> +  if (origin & corpus::LINUX_KERNEL_BINARY_ORIGIN
> +      && !ctxt.env()->user_set_analyze_exported_interfaces_only())
> +    // So we are looking at the Linux Kernel and the user has not set
> +    // any particular option regarding the amount of types to analyse.
> +    // In that case, we need to only analyze types that are reachable
> +    // from exported interfaces otherwise we get such a massive amount
> +    // of type DIEs to look at that things are just too slow down the
> +    // road.
> +    ctxt.env()->analyze_exported_interfaces_only(true);
> +
>    ctxt.current_corpus()->set_soname(ctxt.dt_soname());
>    ctxt.current_corpus()->set_needed(ctxt.dt_needed());
>    ctxt.current_corpus()->set_architecture_name(ctxt.elf_architecture());
> diff --git a/src/abg-ir-priv.h b/src/abg-ir-priv.h
> index 45b711b7..21734b25 100644
> --- a/src/abg-ir-priv.h
> +++ b/src/abg-ir-priv.h
> @@ -26,6 +26,7 @@ namespace ir
>  {
>
>  using std::string;
> +using abg_compat::optional;
>
>  /// The result of structural comparison of type ABI artifacts.
>  enum comparison_result
> @@ -443,6 +444,7 @@ struct environment::priv
>    bool                                 decl_only_class_equals_definition_;
>    bool                                 use_enum_binary_only_equality_;
>    bool                                 allow_type_comparison_results_caching_;
> +  optional<bool>                       analyze_exported_interfaces_only_;
>  #ifdef WITH_DEBUG_SELF_COMPARISON
>    bool                                 self_comparison_debug_on_;
>  #endif
> diff --git a/src/abg-ir.cc b/src/abg-ir.cc
> index 91c8e99b..02d68e63 100644
> --- a/src/abg-ir.cc
> +++ b/src/abg-ir.cc
> @@ -3674,6 +3674,42 @@ const config&
>  environment::get_config() const
>  {return priv_->config_;}
>
> +/// Getter for a property that says if the user actually did set the
> +/// analyze_exported_interfaces_only() property.  If not, it means
> +/// the default behaviour prevails.
> +///
> +/// @return tru iff the user did set the
> +/// analyze_exported_interfaces_only() property.
> +bool
> +environment::user_set_analyze_exported_interfaces_only() const
> +{return priv_->analyze_exported_interfaces_only_.has_value();}
> +
> +/// Setter for the property that controls if we are to restrict the
> +/// analysis to the types that are only reachable from the exported
> +/// interfaces only, or if the set of types should be more broad than
> +/// that.  Typically, we'd restrict the analysis to types reachable
> +/// from exported interfaces only (stricto sensu, that would really be
> +/// only the types that are part of the ABI of well designed
> +/// libraries) for performance reasons.
> +///
> +/// @param f the value of the flag.
> +void
> +environment::analyze_exported_interfaces_only(bool f)
> +{priv_->analyze_exported_interfaces_only_ = f;}
> +
> +/// Getter for the property that controls if we are to restrict the
> +/// analysis to the types that are only reachable from the exported
> +/// interfaces only, or if the set of types should be more broad than
> +/// that.  Typically, we'd restrict the analysis to types reachable
> +/// from exported interfaces only (stricto sensu, that would really be
> +/// only the types that are part of the ABI of well designed
> +/// libraries) for performance reasons.
> +///
> +/// @param f the value of the flag.
> +bool
> +environment::analyze_exported_interfaces_only() const
> +{return priv_->analyze_exported_interfaces_only_.value_or(false);}
> +
>  #ifdef WITH_DEBUG_SELF_COMPARISON
>  /// Setter of the corpus of the input corpus of the self comparison
>  /// that takes place when doing "abidw --debug-abidiff <binary>".
> diff --git a/tools/abidiff.cc b/tools/abidiff.cc
> index 97b036cb..e0bb35ac 100644
> --- a/tools/abidiff.cc
> +++ b/tools/abidiff.cc
> @@ -29,6 +29,7 @@ using std::ostream;
>  using std::cout;
>  using std::cerr;
>  using std::shared_ptr;
> +using abg_compat::optional;
>  using abigail::ir::environment;
>  using abigail::ir::environment_sptr;
>  using abigail::translation_unit;
> @@ -74,6 +75,7 @@ struct options
>    vector<string>       headers_dirs2;
>    vector<string>        header_files2;
>    bool                 drop_private_types;
> +  optional<bool>       exported_interfaces_only;
>    bool                 linux_kernel_mode;
>    bool                 no_default_supprs;
>    bool                 no_arch;
> @@ -197,6 +199,9 @@ display_usage(const string& prog_name, ostream& out)
>      << " --header-file2|--hf2 <path>  the path to one header of file2\n"
>      << " --drop-private-types  drop private types from "
>      "internal representation\n"
> +    << "  --exported-interfaces-only  analyze exported interfaces only\n"
> +    << "  --allow-non-exported-interfaces  analyze interfaces that "
> +    "might not be exported\n"
>      << " --no-linux-kernel-mode  don't consider the input binaries as "
>         "linux kernel binaries\n"
>      << " --kmi-whitelist|-w  path to a "
> @@ -403,6 +408,10 @@ parse_command_line(int argc, char* argv[], options& opts)
>         }
>        else if (!strcmp(argv[i], "--drop-private-types"))
>         opts.drop_private_types = true;
> +      else if (!strcmp(argv[i], "--exported-interfaces-only"))
> +       opts.exported_interfaces_only = true;
> +      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
> +       opts.exported_interfaces_only = false;
>        else if (!strcmp(argv[i], "--no-default-suppression"))
>         opts.no_default_supprs = true;
>        else if (!strcmp(argv[i], "--no-architecture"))
> @@ -1130,6 +1139,9 @@ main(int argc, char* argv[])
>        t2_type = guess_file_type(opts.file2);
>
>        environment_sptr env(new environment);
> +      if (opts.exported_interfaces_only.has_value())
> +       env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
> +
>  #ifdef WITH_DEBUG_SELF_COMPARISON
>             if (opts.do_debug)
>               env->self_comparison_debug_is_on(true);
> diff --git a/tools/abidw.cc b/tools/abidw.cc
> index 9a27a029..f38d6048 100644
> --- a/tools/abidw.cc
> +++ b/tools/abidw.cc
> @@ -40,6 +40,7 @@ using std::ostream;
>  using std::ofstream;
>  using std::vector;
>  using std::shared_ptr;
> +using abg_compat::optional;
>  using abigail::tools_utils::emit_prefix;
>  using abigail::tools_utils::temp_file;
>  using abigail::tools_utils::temp_file_sptr;
> @@ -114,6 +115,7 @@ struct options
>    bool                 do_log;
>    bool                 drop_private_types;
>    bool                 drop_undefined_syms;
> +  optional<bool>       exported_interfaces_only;
>    type_id_style_kind   type_id_style;
>  #ifdef WITH_DEBUG_SELF_COMPARISON
>    string               type_id_file_path;
> @@ -187,6 +189,9 @@ display_usage(const string& prog_name, ostream& out)
>      << "  --short-locs  only print filenames rather than paths\n"
>      << "  --drop-private-types  drop private types from representation\n"
>      << "  --drop-undefined-syms  drop undefined symbols from representation\n"
> +    << "  --exported-interfaces-only  analyze exported interfaces only\n"
> +    << "  --allow-non-exported-interfaces  analyze interfaces that "
> +    "might not be exported\n"
>      << "  --no-comp-dir-path  do not show compilation path information\n"
>      << "  --no-elf-needed  do not show the DT_NEEDED information\n"
>      << "  --no-write-default-sizes  do not emit pointer size when it equals"
> @@ -368,6 +373,10 @@ parse_command_line(int argc, char* argv[], options& opts)
>         opts.drop_private_types = true;
>        else if (!strcmp(argv[i], "--drop-undefined-syms"))
>         opts.drop_undefined_syms = true;
> +      else if (!strcmp(argv[i], "--exported-interfaces-only"))
> +       opts.exported_interfaces_only = true;
> +      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
> +       opts.exported_interfaces_only = false;
>        else if (!strcmp(argv[i], "--no-linux-kernel-mode"))
>         opts.linux_kernel_mode = false;
>        else if (!strcmp(argv[i], "--abidiff"))
> @@ -606,6 +615,9 @@ load_corpus_and_write_abixml(char* argv[],
>              }
>          }
>
> +      if (opts.exported_interfaces_only.has_value())
> +       env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
> +
>        t.start();
>        corp = dwarf_reader::read_corpus_from_elf(ctxt, s);
>        t.stop();
> @@ -813,6 +825,9 @@ load_kernel_corpus_group_and_write_abixml(char* argv[],
>    timer t, global_timer;
>    suppressions_type supprs;
>
> +  if (opts.exported_interfaces_only.has_value())
> +    env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
> +
>    if (opts.do_log)
>      emit_prefix(argv[0], cerr)
>        << "going to build ABI representation of the Linux Kernel ...\n";
> diff --git a/tools/abipkgdiff.cc b/tools/abipkgdiff.cc
> index 551080b9..656d5882 100644
> --- a/tools/abipkgdiff.cc
> +++ b/tools/abipkgdiff.cc
> @@ -106,6 +106,7 @@ using std::set;
>  using std::ostringstream;
>  using std::shared_ptr;
>  using std::dynamic_pointer_cast;
> +using abg_compat::optional;
>  using abigail::workers::task;
>  using abigail::workers::task_sptr;
>  using abigail::workers::queue;
> @@ -205,6 +206,7 @@ public:
>    bool         fail_if_no_debug_info;
>    bool         show_identical_binaries;
>    bool         self_check;
> +  optional<bool> exported_interfaces_only;
>  #ifdef WITH_CTF
>    bool         use_ctf;
>  #endif
> @@ -868,6 +870,9 @@ display_usage(const string& prog_name, ostream& out)
>      "full impact analysis report rather than the default leaf changes reports\n"
>      << " --non-reachable-types|-t  consider types non reachable"
>      " from public interfaces\n"
> +    << "  --exported-interfaces-only  analyze exported interfaces only\n"
> +    << "  --allow-non-exported-interfaces  analyze interfaces that "
> +    "might not be exported\n"
>      << " --no-linkage-name             do not display linkage names of "
>      "added/removed/changed\n"
>      << " --redundant                    display redundant changes\n"
> @@ -2076,6 +2081,10 @@ public:
>      abigail::elf_reader::status detailed_status =
>        abigail::elf_reader::STATUS_UNKNOWN;
>
> +    if (args->opts.exported_interfaces_only.has_value())
> +      env->analyze_exported_interfaces_only
> +       (*args->opts.exported_interfaces_only);
> +
>      status |= compare(args->elf1, args->debug_dir1, args->private_types_suppr1,
>                       args->elf2, args->debug_dir2, args->private_types_suppr2,
>                       args->opts, env, diff, ctxt, &detailed_status);
> @@ -2142,6 +2151,10 @@ public:
>      diff_context_sptr ctxt;
>      corpus_diff_sptr diff;
>
> +    if (args->opts.exported_interfaces_only.has_value())
> +      env->analyze_exported_interfaces_only
> +       (*args->opts.exported_interfaces_only);
> +
>      abigail::elf_reader::status detailed_status =
>        abigail::elf_reader::STATUS_UNKNOWN;
>
> @@ -3024,6 +3037,10 @@ compare_prepared_linux_kernel_packages(package& first_package,
>    string dist_root2 = second_package.extracted_dir_path();
>
>    abigail::ir::environment_sptr env(new abigail::ir::environment);
> +  if (opts.exported_interfaces_only.has_value())
> +    env->analyze_exported_interfaces_only
> +      (*opts.exported_interfaces_only);
> +
>    suppressions_type supprs;
>    corpus_group_sptr corpus1, corpus2;
>    corpus1 = build_corpus_group_from_kernel_dist_under(dist_root1,
> @@ -3326,6 +3343,10 @@ parse_command_line(int argc, char* argv[], options& opts)
>        else if (!strcmp(argv[i], "--full-impact")
>                ||!strcmp(argv[i], "-f"))
>         opts.show_full_impact_report = true;
> +      else if (!strcmp(argv[i], "--exported-interfaces-only"))
> +       opts.exported_interfaces_only = true;
> +      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
> +       opts.exported_interfaces_only = false;
>        else if (!strcmp(argv[i], "--no-linkage-name"))
>         opts.show_linkage_names = false;
>        else if (!strcmp(argv[i], "--redundant"))
> diff --git a/tools/kmidiff.cc b/tools/kmidiff.cc
> index 8fd3fed9..f3332765 100644
> --- a/tools/kmidiff.cc
> +++ b/tools/kmidiff.cc
> @@ -29,6 +29,7 @@ using std::vector;
>  using std::ostream;
>  using std::cout;
>  using std::cerr;
> +using abg_compat::optional;
>
>  using namespace abigail::tools_utils;
>  using namespace abigail::dwarf_reader;
> @@ -60,6 +61,7 @@ struct options
>    bool                 show_hexadecimal_values;
>    bool                 show_offsets_sizes_in_bits;
>    bool                 show_impacted_interfaces;
> +  optional<bool>       exported_interfaces_only;
>  #ifdef WITH_CTF
>    bool                 use_ctf;
>  #endif
> @@ -120,6 +122,9 @@ display_usage(const string& prog_name, ostream& out)
>      << " --impacted-interfaces|-i  show interfaces impacted by ABI changes\n"
>      << " --full-impact|-f  show the full impact of changes on top-most "
>          "interfaces\n"
> +    << "  --exported-interfaces-only  analyze exported interfaces only\n"
> +    << "  --allow-non-exported-interfaces  analyze interfaces that "
> +    "might not be exported\n"
>      << " --show-bytes  show size and offsets in bytes\n"
>      << " --show-bits  show size and offsets in bits\n"
>      << " --show-hex  show size and offset in hexadecimal\n"
> @@ -262,6 +267,10 @@ parse_command_line(int argc, char* argv[], options& opts)
>        else if (!strcmp(argv[i], "--full-impact")
>                || !strcmp(argv[i], "-f"))
>         opts.leaf_changes_only = false;
> +      else if (!strcmp(argv[i], "--exported-interfaces-only"))
> +       opts.exported_interfaces_only = true;
> +      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
> +       opts.exported_interfaces_only = false;
>        else if (!strcmp(argv[i], "--show-bytes"))
>         opts.show_offsets_sizes_in_bits = false;
>        else if (!strcmp(argv[i], "--show-bits"))
> @@ -408,6 +417,9 @@ main(int argc, char* argv[])
>
>    environment_sptr env(new environment);
>
> +  if (opts.exported_interfaces_only.has_value())
> +    env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
> +
>    corpus_group_sptr group1, group2;
>    string debug_info_root_dir;
>    corpus::origin origin =
> --
> 2.37.2
>
>
>
> --
>                 Dodji
  
Ben Woodard Sept. 9, 2022, 8:32 p.m. UTC | #2
> On Sep 9, 2022, at 6:03 AM, Giuliano Procida via Libabigail <libabigail@sourceware.org> wrote:
> 
> Hi Dodji.
> 
> Sorry for the late reply. I was down with Covid for a while.

And I’ve been on vacation.

> 
> I feel this commit deserves some feedback.
> 
Yeah I’ve been thinking about this for quite some time too and have been trying to put my intuition into words

> My understanding is that the intention here is to make the DWARF
> reader do less work (look at fewer type DIEs) than at present.
> 
> We are actually hoping that we may be able to make the DWARF reader
> look at more type DIEs so that it is more likely to pick up full
> definitions of types instead of declarations.

I think that reducing it to just exported interfaces and associated types may be too limited as well.

1) Exceptions - classes thrown as exceptions will not be included in the list of types related to exported functions or variables even though they can cross ABI boundaries. They need to be dug out of the LSDA.
Note: I believe that any type that makes use of RTTI can leak information across the ABI boundary but I haven’t thought carefully enough about it to come up with an example other than exceptions. 

2) Inline functions and the types that they use - when doing ABI compatibility testing, We should also include the prototypes for inline functions and the types that they use. These will not be exported functions. I think something that has been saving us has been the deep type evaluation including non-exported functions and types. Changes that would have been overlooked when considering only exported would still be evaluated in many cases. If we trim down to just exported functions, I think that some of these will be missed. To be able to find the list of inline functions which need to be considered you are going to have to go through the callsite data in DWARF. Unfortunately, this is something that clang doesn’t generate well.  Never the less these functions and their associated types need to be part of the considered ABI.
Note: We cannot evaluate if two inline function calls in two different objects are in fact the same because of how optimization can mutate the code and there can be semantic differences between different versions but at the very least we should be able to verify that the function prototypes and the associated types are in fact the same. 
Certainly any function which is inline in one TU and exported in another is an ABI break. Detecting this would also require parsing of callsite information.

3) if a library is dlopen'ed, then a function is dlsym’ed, and then cast into the appropriate type, that function and its associated types are part of the ABI. If all types within an executable are considered, we would detect this change but if only exported types are considered, then a change could be overlooked. A heuristic that we could use is if, dlsym is called we need to include all types not just exported types. I do not know how to solve this one without binary analysis. 

4) PIE - with more and more thing being compiled as PIE, it can become increasingly hard to know what is a library vs. what is an executable. Even normal executables may provide functions and variables to their libraries. In abicompat we currently only consider external undefined symbols, this is insufficient tobecause a library may call a function or reference a variable in the executable. Therefore, the function prototype and its related types or the variable type needs to also be considered. I haven’t looked carefully at Dodji’s branch to only analyze exported symbols but I would hope that it includes not only undefined references but also defined exported symbols when considering executables.

So I would say that my feedback is "restricting analyzed decls to exported symbols” is too limited, It needs to also include:
1) any type which has a typeinfo structure in the LSDA for exceptions and RTTI
2) any inline function and associated types which has a callsite in the object being analyzed
3) if dlopen or dlmopen are used, then either
   a) every decl needs to be analyzed
   b) the type for the destination variable for every dlsym call needs to be included in the analysis. (more binary analysis than is easily accomplished with the current code base)
4) all decls which are exported and their associated types not just defined ones.
> 
> The rationale behind the change appears to be that DWARF processing is
> expensive, in particular for kernel ABIs. I would say "measure first".
> Here's roughly how I think about things:
> 

I’m not entirely sure your breakdown of what consumes time is correct. Dodji is much more familiar with that part of the code than I am. The vast majority of my contribution to the project has been on the theoretical side of ABI and testing.

> 1. building the IR is very cheap
> 
> A kernel ABI may end up with 40k IR elements. The cost of allocating
> memory and calling constructors should be negligible. Any improvements
> to this end of things is pointless.
> 
> 2. reading DWARF information is fairly cheap
> 
> We may have 100MB of DWARF but just reading the data (decoding
> attribute formats in particular) won't take that long.
> 
> Reducing the number of DIEs examined at the top-level by a factor of 2
> will speed up this part by a factor of 2, but in the grand scheme of
> things that may not be very important.
> 
> 3. chasing references is a bit more expensive
> 
> Cross-references in DWARF are pretty common and the lack of locality
> means that chasing cross-references is going to be a constant factor
> slower than iterating through the main DWARF tree.

I may not completely understand your description but perceive this as a more expensive operation than you seem to. 
In my understanding of this, it includes the canonicalization of the types that appear in the various TUs. This seems to be a non-trivial task. It seems like the optimization which had to be undone for correctness was in the canonicalization of types.

> 
> 4. deciding whether a DIE needs to be turned into IR is currently very expensive
> 
> This is because it involves multiple look-ups and recursive comparison
> of DIEs which cannot be unconditionally memoised.

I do believe that this phase could be a good target for optimization.
> 
> Those are only my thoughts. Some profiling should give a more accurate picture.
> 
> I was curious, so I did an analysis of the connectivity of a kernel
> ABI (using the STG IR, not libabigail's - there are minor
> differences). Here are some fun facts.
> 
> The ABI has 34541 nodes.
> There are 25196 strongly-connected components.
> 25053 SCCs are just singleton nodes.
> The largest 3 SCCs have sizes: 4960, 784, 343.
> 1/7 of the ABI nodes are in one SCC!
> 
> Completely idle speculation: Perhaps the really huge SCC contributes
> significantly to comparison cost.
> 
> Regards,
> Giuliano.
> 
> 
> On Tue, 6 Sept 2022 at 11:11, Dodji Seketeli <dodji@seketeli.org> wrote:
>> 
>> Hello,
>> 
>> Profiling showed that the DWARF reader scans too much data.
>> 
>> Basically, in build_translation_unit_and_add_to_ir,
>> build_ir_node_from_die is called on every single DIE that is seen, for
>> a given translation unit.
>> 
>> There are interfaces (function and variable decls) that are not
>> associated with exported ELF symbols and that are analyzed by
>> build_ir_node_from_die nonetheless.  For instance, interfaces that are
>> visible outside of their translation units are analyzed and the types
>> that are reachable from those interfaces are analyzed as well.
>> 
>> Once that is done, an ABI corpus is built with the subset of
>> interfaces that have exported ELF symbol (strictly those that are part
>> of the ABI), but types that are not necessarily reachable from those
>> ABI interfaces can also be put into the ABI corpus.
>> 
>> Some tools make use of this "lose" behaviour of libabigail.  For
>> instance, abicompat precisely wants to analyze interfaces with
>> undefined symbols.  For an application, those interfaces represents
>> the interfaces that the application expects to be provided by some
>> shared library.
>> 
>> When analyzing the exported interface of the Linux Kernel (or any
>> other huge application) however, analyzing more types than necessary
>> appears to incur a huge time penalty.
>> 
>> So, this patch introduces an optional behaviour whereby
>> build_translation_unit_and_add_to_ir is restricted to analyzing
>> interfaces that have exported ELF symbols only.  So only the types
>> reachable from those interfaces are analyzed.  This more than halves
>> the time spent by "abidw --noout vmlinux".
>> 
>> Strictly speaking, this new behaviour is triggered by a new option named
>> --exported-interfaces-only, supported by the tools abidw, abidiff,
>> abipkgdiff and kmidiff.
>> 
>> When looking at the Linux Kernel however, this option is enabled by
>> default.
>> 
>> Note that an option --allow-non-exported-interfaces is also introduce
>> to function under the previous model of operations.  This option is
>> enabled by default on all the tools when they are not looking at the
>> Linux Kernel.
>> 
>> With this enabled, analyzing the Linux Kernel is back to taking less
>> than a minute on a reasonable machine.
>> 
>>        * doc/manuals/tools-use-libabigail.txt: New doc text.
>>        * doc/manuals/Makefile.am: Add the new tools-use-libabigail.rst
>>        tool to the source distribution.
>>        * doc/manuals/abidiff.rst: Include the new
>>        tools-use-libabigail.rst.  Document the --exported-interfaces-only
>>        and --allow-non-exported-interfaces.
>>        * doc/manuals/abidw.rst: Likewise.
>>        * doc/manuals/abipkgdiff.rst: Likewise.
>>        * doc/manuals/kmidiff.rst: Likewise.
>>        * include/abg-ir.h
>>        (environment::{user_set_analyze_exported_interfaces_only,
>>        analyze_exported_interfaces_only}): Declare new accessors.
>>        * src/abg-ir.cc
>>        (environment::{user_set_analyze_exported_interfaces_only,
>>        analyze_exported_interfaces_only}): Define new accessors.
>>        * src/abg-dwarf-reader.cc (die_is_variable_decl)
>>        (die_is_function_decl): Define new static functions.
>>        (read_context::is_decl_die_with_exported_symbol): Define new
>>        member function.
>>        (read_context::get_{function,variable}_address): Const-ify the
>>        Dwarf_Die* parameter.
>>        (build_translation_unit_and_add_to_ir): If the user asks to
>>        analyze exported interfaces only,  the analyze only interfaces
>>        that have exported ELF symbols.
>>        (read_debug_info_into_corpus): If we are looking at the Linux
>>        Kernel, then only analyze exported interfaces unless the user asks
>>        otherwise.
>>        * src/abg-ir-priv.h
>>        (environment::priv::analyze_exported_interfaces_only_): Define new
>>        data member.
>>        * tools/abidiff.cc (options::exported_interfaces_only): Define new
>>        data member.
>>        (display_usage): Add new help strings for
>>        --exported-interfaces-only and --allow-non-exported-interfaces.
>>        (parse_command_line): Parse the new options
>>        --exported-interfaces-only and --allow-non-exported-interfaces.
>>        (main): Pass the value of opts.exported_interfaces_only to the
>>        environment.
>>        * tools/abidw.cc (options::exported_interfaces_only): Define new
>>        data member.
>>        (display_usage): Add new help strings for
>>        --exported-interfaces-only and --allow-non-exported-interfaces.
>>        (parse_command_line): Parse the new options
>>        (load_corpus_and_write_abixml)
>>        (load_kernel_corpus_group_and_write_abixml): Pass the value of
>>        opts.exported_interfaces_only onto the environment.
>>        * tools/abipkgdiff.cc (options::exported_interfaces_only): Define new
>>        data member.
>>        (display_usage): Add new help strings for
>>        --exported-interfaces-only and --allow-non-exported-interfaces.
>>        (parse_command_line): Parse the new options
>>        (compare_task::perform, self_compare_task::perform): Pass the
>>        value of opts.exported_interfaces_only onto the environment.
>>        (compare_prepared_linux_kernel_packages): Likewise.
>>        * tools/kmidiff.cc(options::exported_interfaces_only): Define new
>>        data member.
>>        (display_usage): Add new help strings for
>>        --exported-interfaces-only and --allow-non-exported-interfaces.
>>        (parse_command_line): Parse the new options
>>        (main): Pass the value of opts.exported_interfaces_only onto the
>>        environment.
>> 
>> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
>> ---
>> doc/manuals/Makefile.am              |   3 +-
>> doc/manuals/abidiff.rst              |  52 ++++++++++++
>> doc/manuals/abidw.rst                |  82 ++++++++++++++++---
>> doc/manuals/abipkgdiff.rst           |  51 ++++++++++++
>> doc/manuals/kmidiff.rst              |  52 +++++++++++-
>> doc/manuals/tools-use-libabigail.txt |  16 ++++
>> include/abg-ir.h                     |   9 +++
>> src/abg-dwarf-reader.cc              | 113 ++++++++++++++++++++++++---
>> src/abg-ir-priv.h                    |   2 +
>> src/abg-ir.cc                        |  36 +++++++++
>> tools/abidiff.cc                     |  12 +++
>> tools/abidw.cc                       |  15 ++++
>> tools/abipkgdiff.cc                  |  21 +++++
>> tools/kmidiff.cc                     |  12 +++
>> 14 files changed, 449 insertions(+), 27 deletions(-)
>> create mode 100644 doc/manuals/tools-use-libabigail.txt
>> 
>> diff --git a/doc/manuals/Makefile.am b/doc/manuals/Makefile.am
>> index 894b38f1..e2813785 100644
>> --- a/doc/manuals/Makefile.am
>> +++ b/doc/manuals/Makefile.am
>> @@ -14,7 +14,8 @@ libabigail-concepts.rst \
>> libabigail-overview.rst \
>> libabigail-tools.rst \
>> fedabipkgdiff.rst \
>> -kmidiff.rst
>> +kmidiff.rst \
>> +tools-use-libabigail.txt
>> 
>> # You can set these variables from the command line.
>> SPHINXOPTS    =
>> diff --git a/doc/manuals/abidiff.rst b/doc/manuals/abidiff.rst
>> index a15515be..0c711d9e 100644
>> --- a/doc/manuals/abidiff.rst
>> +++ b/doc/manuals/abidiff.rst
>> @@ -18,6 +18,8 @@ be accompanied with their debug information in `DWARF`_ format.
>> Otherwise, only `ELF`_ symbols that were added or removed are
>> reported.
>> 
>> +.. include:: tools-use-libabigail.txt
>> +
>> .. _abidiff_invocation_label:
>> 
>> Invocation
>> @@ -197,6 +199,56 @@ Options
>>     consumption of the tool on binaries with a lot of publicly defined
>>     and exported types.
>> 
>> +  * ``--exported-interfaces-only``
>> +
>> +    By default, when looking at the debug information accompanying a
>> +    binary, this tool analyzes the descriptions of the types reachable
>> +    by the interfaces (functions and variables) that are visible
>> +    outside of their translation unit.  Once that analysis is done, an
>> +    ABI corpus is constructed by only considering the subset of types
>> +    reachable from interfaces associated to `ELF`_ symbols that are
>> +    defined and exported by the binary.  It's those final ABI Corpora
>> +    that are compared by this tool.
>> +
>> +    The problem with that approach however is that analyzing all the
>> +    interfaces that are visible from outside their translation unit
>> +    can amount to a lot of data, especially when those binaries are
>> +    applications, as opposed to shared libraries.  One example of such
>> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
>> +    corpora like these can be extremely slow.
>> +
>> +    To mitigate that performance issue, this option allows libabigail
>> +    to only analyze types that are reachable from interfaces
>> +    associated with defined and exported `ELF`_ symbols.
>> +
>> +    Note that this option is turned on by default when analyzing the
>> +    `Linux Kernel`_.  Otherwise, it's turned off by default.
>> +
>> +  * ``--allow-non-exported-interfaces``
>> +
>> +    When looking at the debug information accompanying a binary, this
>> +    tool analyzes the descriptions of the types reachable by the
>> +    interfaces (functions and variables) that are visible outside of
>> +    their translation unit.  Once that analysis is done, an ABI corpus
>> +    is constructed by only considering the subset of types reachable
>> +    from interfaces associated to `ELF`_ symbols that are defined and
>> +    exported by the binary.  It's those final ABI Corpora that are
>> +    compared by this tool.
>> +
>> +    The problem with that approach however is that analyzing all the
>> +    interfaces that are visible from outside their translation unit
>> +    can amount to a lot of data, especially when those binaries are
>> +    applications, as opposed to shared libraries.  One example of such
>> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
>> +    Corpora like these can be extremely slow.
>> +
>> +    In the presence of an "average sized" binary however one can
>> +    afford having libabigail analyze all interfaces that are visible
>> +    outside of their translation unit, using this option.
>> +
>> +    Note that this option is turned on by default, unless we are in
>> +    the presence of the `Linux Kernel`_.
>> +
>>   * ``--stat``
>> 
>>     Rather than displaying the detailed ABI differences between
>> diff --git a/doc/manuals/abidw.rst b/doc/manuals/abidw.rst
>> index bdd6204d..a3055c7e 100644
>> --- a/doc/manuals/abidw.rst
>> +++ b/doc/manuals/abidw.rst
>> @@ -12,14 +12,19 @@ defined ELF symbols of the file.  The input shared library must
>> contain associated debug information in `DWARF`_ format.
>> 
>> When given the ``--linux-tree`` option, this program can also handle a
>> -Linux kernel tree.  That is, a directory tree that contains both the
>> -vmlinux binary and Linux kernel modules.  It analyses those Linux
>> -kernel binaries and emits an XML representation of the interface
>> -between the kernel and its module, to standard output.  In this case,
>> -we don't call it an ABI, but a KMI (Kernel Module Interface).  The
>> -emitted KMI includes all the globally defined functions and variables,
>> -along with a complete representation of their types.  The input
>> -binaries must contain associated debug information in `DWARF`_ format.
>> +`Linux kernel`_ tree.  That is, a directory tree that contains both
>> +the vmlinux binary and `Linux Kernel`_ modules.  It analyses those
>> +`Linux Kernel`_ binaries and emits an XML representation of the
>> +interface between the kernel and its module, to standard output.  In
>> +this case, we don't call it an ABI, but a KMI (Kernel Module
>> +Interface).  The emitted KMI includes all the globally defined
>> +functions and variables, along with a complete representation of their
>> +types.  The input binaries must contain associated debug information
>> +in `DWARF`_ format.
>> +
>> +.. include:: tools-use-libabigail.txt
>> +
>> +.. _abidiff_invocation_label:
>> 
>> Invocation
>> ==========
>> @@ -92,7 +97,7 @@ Options
>> 
>>   * ``--kmi-whitelist | -kaw`` <*path-to-whitelist*>
>> 
>> -    When analyzing a Linux kernel binary, this option points to the
>> +    When analyzing a `Linux Kernel`_ binary, this option points to the
>>     white list of names of ELF symbols of functions and variables
>>     which ABI must be written out.  That white list is called a "
>>     Kernel Module Interface white list".  This is because for the
>> @@ -105,7 +110,7 @@ Options
>> 
>>     If this option is not provided -- thus if no white list is
>>     provided -- then the entire KMI, that is, all publicly defined and
>> -    exported functions and global variables by the Linux Kernel
>> +    exported functions and global variables by the `Linux Kernel`_
>>     binaries is emitted.
>> 
>>   * ``--linux-tree | --lt``
>> @@ -115,9 +120,10 @@ Options
>>     In that case, this program emits the representation of the Kernel
>>     Module Interface (KMI) on the standard output.
>> 
>> -    Below is an example of usage of ``abidw`` on a Linux Kernel tree.
>> +    Below is an example of usage of ``abidw`` on a `Linux Kernel`_
>> +    tree.
>> 
>> -    First, checkout a Linux kernel source tree and build it.  Then
>> +    First, checkout a `Linux Kernel`_ source tree and build it.  Then
>>     install the kernel modules in a directory somewhere.  Copy the
>>     vmlinux binary into that directory too.  And then serialize the
>>     KMI of that kernel to disk, using ``abidw``: ::
>> @@ -171,6 +177,56 @@ Options
>>     representation build by Libabigail to represent the ABI and will
>>     not end up in the abi XML file.
>> 
>> +  * ``--exported-interfaces-only``
>> +
>> +    By default, when looking at the debug information accompanying a
>> +    binary, this tool analyzes the descriptions of the types reachable
>> +    by the interfaces (functions and variables) that are visible
>> +    outside of their translation unit.  Once that analysis is done, an
>> +    ABI corpus is constructed by only considering the subset of types
>> +    reachable from interfaces associated to `ELF`_ symbols that are
>> +    defined and exported by the binary.  It's that final ABI corpus
>> +    which textual representation is saved as ``ABIXML``.
>> +
>> +    The problem with that approach however is that analyzing all the
>> +    interfaces that are visible from outside their translation unit
>> +    can amount to a lot of data, especially when those binaries are
>> +    applications, as opposed to shared libraries.  One example of such
>> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
>> +    corpora like these can be extremely slow.
>> +
>> +    To mitigate that performance issue, this option allows libabigail
>> +    to only analyze types that are reachable from interfaces
>> +    associated with defined and exported `ELF`_ symbols.
>> +
>> +    Note that this option is turned on by default when analyzing the
>> +    `Linux Kernel`_.  Otherwise, it's turned off by default.
>> +
>> +  * ``--allow-non-exported-interfaces``
>> +
>> +    When looking at the debug information accompanying a binary, this
>> +    tool analyzes the descriptions of the types reachable by the
>> +    interfaces (functions and variables) that are visible outside of
>> +    their translation unit.  Once that analysis is done, an ABI corpus
>> +    is constructed by only considering the subset of types reachable
>> +    from interfaces associated to `ELF`_ symbols that are defined and
>> +    exported by the binary.  It's that final ABI corpus which textual
>> +    representation is saved as ``ABIXML``.
>> +
>> +    The problem with that approach however is that analyzing all the
>> +    interfaces that are visible from outside their translation unit
>> +    can amount to a lot of data, especially when those binaries are
>> +    applications, as opposed to shared libraries.  One example of such
>> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
>> +    corpora like these can be extremely slow.
>> +
>> +    In the presence of an "average sized" binary however one can
>> +    afford having libabigail analyze all interfaces that are visible
>> +    outside of their translation unit, using this option.
>> +
>> +    Note that this option is turned on by default, unless we are in
>> +    the presence of the `Linux Kernel`_.
>> +
>>   * ``--no-linux-kernel-mode``
>> 
>>     Without this option, if abipkgiff detects that the binaries it is
>> @@ -308,4 +364,4 @@ standard `here
>> .. _ELF: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
>> .. _DWARF: http://www.dwarfstd.org
>> .. _GNU: http://www.gnu.org
>> -
>> +.. _Linux Kernel: https://kernel.org/
>> diff --git a/doc/manuals/abipkgdiff.rst b/doc/manuals/abipkgdiff.rst
>> index 15ea9072..9114775a 100644
>> --- a/doc/manuals/abipkgdiff.rst
>> +++ b/doc/manuals/abipkgdiff.rst
>> @@ -19,6 +19,7 @@ information directly in a section of said binaries.  In those cases,
>> obviously, no separate debug information package is needed as the tool
>> will find the debug information inside the binaries.
>> 
>> +.. include:: tools-use-libabigail.txt
>> 
>> .. _abipkgdiff_invocation_label:
>> 
>> @@ -277,6 +278,56 @@ Options
>>     global functions and variables are analyzed, so the tool detects
>>     and reports changes on these reachable types only.
>> 
>> +  * ``--exported-interfaces-only``
>> +
>> +    By default, when looking at the debug information accompanying a
>> +    binary, this tool analyzes the descriptions of the types reachable
>> +    by the interfaces (functions and variables) that are visible
>> +    outside of their translation unit.  Once that analysis is done, an
>> +    ABI corpus is constructed by only considering the subset of types
>> +    reachable from interfaces associated to `ELF`_ symbols that are
>> +    defined and exported by the binary.  It's those final ABI Corpora
>> +    that are compared by this tool.
>> +
>> +    The problem with that approach however is that analyzing all the
>> +    interfaces that are visible from outside their translation unit
>> +    can amount to a lot of data, especially when those binaries are
>> +    applications, as opposed to shared libraries.  One example of such
>> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
>> +    corpora like these can be extremely slow.
>> +
>> +    To mitigate that performance issue, this option allows libabigail
>> +    to only analyze types that are reachable from interfaces
>> +    associated with defined and exported `ELF`_ symbols.
>> +
>> +    Note that this option is turned on by default when analyzing the
>> +    `Linux Kernel`_.  Otherwise, it's turned off by default.
>> +
>> +  * ``--allow-non-exported-interfaces``
>> +
>> +    When looking at the debug information accompanying a binary, this
>> +    tool analyzes the descriptions of the types reachable by the
>> +    interfaces (functions and variables) that are visible outside of
>> +    their translation unit.  Once that analysis is done, an ABI corpus
>> +    is constructed by only considering the subset of types reachable
>> +    from interfaces associated to `ELF`_ symbols that are defined and
>> +    exported by the binary.  It's those final ABI Corpora that are
>> +    compared by this tool.
>> +
>> +    The problem with that approach however is that analyzing all the
>> +    interfaces that are visible from outside their translation unit
>> +    can amount to a lot of data, especially when those binaries are
>> +    applications, as opposed to shared libraries.  One example of such
>> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
>> +    Corpora like these can be extremely slow.
>> +
>> +    In the presence of an "average sized" binary however one can
>> +    afford having libabigail analyze all interfaces that are visible
>> +    outside of their translation unit, using this option.
>> +
>> +    Note that this option is turned on by default, unless we are in
>> +    the presence of the `Linux Kernel`_.
>> +
>>   *  ``--redundant``
>> 
>>     In the diff reports, do display redundant changes.  A redundant
>> diff --git a/doc/manuals/kmidiff.rst b/doc/manuals/kmidiff.rst
>> index ce8168ae..53010189 100644
>> --- a/doc/manuals/kmidiff.rst
>> +++ b/doc/manuals/kmidiff.rst
>> @@ -55,6 +55,10 @@ command line looks like: ::
>>                       linux/v4.5/build/modules \
>>                       linux/v4.6/build/modules
>> 
>> +
>> +.. include:: tools-use-libabigail.txt
>> +
>> +
>> Invocation
>> ==========
>> 
>> @@ -67,8 +71,8 @@ Environment
>> 
>> By default, ``kmidiff`` compares all the interfaces (exported
>> functions and variables) between the Kernel and its modules.  In
>> -practice, though, users want to compare a subset of the those
>> -interfaces.
>> +practice, though, some users might want to compare a subset of the
>> +those interfaces.
>> 
>> Users can then define a "white list" of the interfaces to compare.
>> Such a white list is a just a file in the "INI" format that looks
>> @@ -91,8 +95,11 @@ function or variable.  Only those interfaces along with the types
>> reachable from their signatures are going to be compared by
>> ``kmidiff`` recursively.
>> 
>> -Note that kmidiff compares the interfaces exported by the ``vmlinux``
>> -binary and by the all of the compiled modules.
>> +Note that by default kmidiff analyzes the types reachable from the
>> +interfaces associated with `ELF`_ symbols that are defined and
>> +exported by the `Linux Kernel`_ as being the union of the ``vmlinux``
>> +binary and all its compiled modules.  It then compares those
>> +interfaces (along with their types).
>> 
>> Options
>> =======
>> @@ -180,6 +187,38 @@ Options
>>     exported interfaces.  This is the default kind of report emitted
>>     by tools like ``abidiff`` or ``abipkgdiff``.
>> 
>> +  * ``--exported-interfaces-only``
>> +
>> +    When using this option, this tool analyzes the descriptions of the
>> +    types reachable by the interfaces (functions and variables)
>> +    associated with `ELF`_ symbols that are defined and exported by
>> +    the `Linux Kernel`_.
>> +
>> +    Otherwise, the tool also has the ability to analyze the
>> +    descriptions of the types reachable by the interfaces associated
>> +    with `ELF`_ symbols that are visible outside their translation
>> +    unit.  This later possibility is however much more resource
>> +    intensive and results in much slower operations.
>> +
>> +    That is why this option is enabled by default.
>> +
>> +
>> +  * ``--allow-non-exported-interfaces``
>> +
>> +    When using this option, this tool analyzes the descriptions of the
>> +    types reachable by the interfaces (functions and variables) that
>> +    are visible outside of their translation unit.  Once that analysis
>> +    is done, an ABI Corpus is constructed by only considering the
>> +    subset of types reachable from interfaces associated to `ELF`_
>> +    symbols that are defined and exported by the binary.  It's that
>> +    final ABI corpus which is compared against another one.
>> +
>> +    The problem with that approach however is that analyzing all the
>> +    interfaces that are visible from outside their translation unit
>> +    can amount to a lot of data, leading to very slow operations.
>> +
>> +    Note that this option is turned off by default.
>> +
>>   * ``--show-bytes``
>> 
>>     Show sizes and offsets in bytes, not bits.  This option is
>> @@ -198,3 +237,8 @@ Options
>>   * ``--show-dec``
>> 
>>     Show sizes and offsets in decimal base.
>> +
>> +
>> +.. _ELF: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
>> +.. _ksymtab: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
>> +.. _Linux Kernel: https://kernel.org
>> diff --git a/doc/manuals/tools-use-libabigail.txt b/doc/manuals/tools-use-libabigail.txt
>> new file mode 100644
>> index 00000000..43edf296
>> --- /dev/null
>> +++ b/doc/manuals/tools-use-libabigail.txt
>> @@ -0,0 +1,16 @@
>> +This tool uses the libabigail library to analyze the binary as well as its
>> +associated debug information.  Here is its general mode of operation.
>> +
>> +When instructed to do so, a binary and its associated debug
>> +information is read and analyzed.  To that effect, libabigail analyzes
>> +by default the descriptions of the types reachable by the interfaces
>> +(functions and variables) that are visible outside of their
>> +translation unit.  Once that analysis is done, an Application Binary
>> +Interface Corpus is constructed by only considering the subset of
>> +types reachable from interfaces associated to `ELF`_ symbols that are
>> +defined and exported by the binary.  It's that final ABI corpus which
>> +libabigail considers as representing the ABI of the analyzed binary.
>> +
>> +Libabigail then has capabilities to generate textual representations
>> +of ABI Corpora, compare them, analyze their changes and report about
>> +them.
>> diff --git a/include/abg-ir.h b/include/abg-ir.h
>> index a857d041..61338edb 100644
>> --- a/include/abg-ir.h
>> +++ b/include/abg-ir.h
>> @@ -197,6 +197,15 @@ public:
>>   const config&
>>   get_config() const;
>> 
>> +  bool
>> +  user_set_analyze_exported_interfaces_only() const;
>> +
>> +  void
>> +  analyze_exported_interfaces_only(bool f);
>> +
>> +  bool
>> +  analyze_exported_interfaces_only() const;
>> +
>> #ifdef WITH_DEBUG_SELF_COMPARISON
>>   void
>>   set_self_comparison_debug_input(const corpus_sptr& corpus);
>> diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>> index e41172c1..cba89664 100644
>> --- a/src/abg-dwarf-reader.cc
>> +++ b/src/abg-dwarf-reader.cc
>> @@ -402,6 +402,12 @@ die_is_decl(const Dwarf_Die* die);
>> static bool
>> die_is_declaration_only(Dwarf_Die* die);
>> 
>> +static bool
>> +die_is_variable_decl(const Dwarf_Die *die);
>> +
>> +static bool
>> +die_is_function_decl(const Dwarf_Die *die);
>> +
>> static bool
>> die_has_size_attribute(const Dwarf_Die *die);
>> 
>> @@ -5303,6 +5309,44 @@ public:
>>     return symbol;
>>   }
>> 
>> +  /// Test if a DIE represents a decl (function or variable) that has
>> +  /// a symbol that is exported, whatever that means.  This is
>> +  /// supposed to work for Linux Kernel binaries as well.
>> +  ///
>> +  /// This is useful to limit the amount of DIEs taken into account to
>> +  /// the strict limit of what an ABI actually means.  Limiting the
>> +  /// volume of DIEs analyzed this way is an important optimization to
>> +  /// keep big binaries "manageable" by libabigail.
>> +  ///
>> +  /// @param DIE the die to consider.
>> +  bool
>> +  is_decl_die_with_exported_symbol(const Dwarf_Die *die)
>> +  {
>> +    if (!die || !die_is_decl(die))
>> +      return false;
>> +
>> +    bool result = false, address_found = false, symbol_is_exported = false;;
>> +    Dwarf_Addr decl_symbol_address = 0;
>> +
>> +    if (die_is_variable_decl(die))
>> +      {
>> +       if ((address_found = get_variable_address(die, decl_symbol_address)))
>> +         symbol_is_exported =
>> +           !!variable_symbol_is_exported(decl_symbol_address);
>> +      }
>> +    else if (die_is_function_decl(die))
>> +      {
>> +       if ((address_found = get_function_address(die, decl_symbol_address)))
>> +         symbol_is_exported =
>> +           !!function_symbol_is_exported(decl_symbol_address);
>> +      }
>> +
>> +    if (address_found)
>> +      result = symbol_is_exported;
>> +
>> +    return result;
>> +  }
>> +
>>   /// Getter for the symtab reader. Will load the symtab from the elf handle if
>>   /// not yet set.
>>   ///
>> @@ -5580,16 +5624,18 @@ public:
>>   ///
>>   /// @return true if the function address was found.
>>   bool
>> -  get_function_address(Dwarf_Die* function_die, Dwarf_Addr& address) const
>> +  get_function_address(const Dwarf_Die* function_die, Dwarf_Addr& address) const
>>   {
>> -    if (!die_address_attribute(function_die, DW_AT_low_pc, address))
>> +    if (!die_address_attribute(const_cast<Dwarf_Die*>(function_die),
>> +                              DW_AT_low_pc, address))
>>       // So no DW_AT_low_pc was found.  Let's see if the function DIE
>>       // has got a DW_AT_ranges attribute instead.  If it does, the
>>       // first address of the set of addresses represented by the
>>       // value of that DW_AT_ranges represents the function (symbol)
>>       // address we are looking for.
>> -      if (!get_first_exported_fn_address_from_DW_AT_ranges(function_die,
>> -                                                          address))
>> +      if (!get_first_exported_fn_address_from_DW_AT_ranges
>> +         (const_cast<Dwarf_Die*>(function_die),
>> +          address))
>>        return false;
>> 
>>     address = maybe_adjust_fn_sym_address(address);
>> @@ -5611,11 +5657,12 @@ public:
>>   ///
>>   /// @return true if the variable address was found.
>>   bool
>> -  get_variable_address(Dwarf_Die*      variable_die,
>> +  get_variable_address(const Dwarf_Die* variable_die,
>>                       Dwarf_Addr&      address) const
>>   {
>>     bool is_tls_address = false;
>> -    if (!die_location_address(variable_die, address, is_tls_address))
>> +    if (!die_location_address(const_cast<Dwarf_Die*>(variable_die),
>> +                             address, is_tls_address))
>>       return false;
>>     if (!is_tls_address)
>>       address = maybe_adjust_var_sym_address(address);
>> @@ -7155,6 +7202,40 @@ die_is_declaration_only(Dwarf_Die* die)
>>   return false;
>> }
>> 
>> +/// Test if a DIE is for a function decl.
>> +///
>> +/// @param die the DIE to consider.
>> +///
>> +/// @return true iff @p die represents a function decl.
>> +static bool
>> +die_is_function_decl(const Dwarf_Die *die)
>> +{
>> +  if (!die)
>> +    return false;
>> +
>> +  int tag = dwarf_tag(const_cast<Dwarf_Die*>(die));
>> +  if (tag == DW_TAG_subprogram)
>> +    return true;
>> +  return false;
>> +}
>> +
>> +/// Test if a DIE is for a variable decl.
>> +///
>> +/// @param die the DIE to consider.
>> +///
>> +/// @return true iff @p die represents a variable decl.
>> +static bool
>> +die_is_variable_decl(const Dwarf_Die *die)
>> +{
>> +    if (!die)
>> +    return false;
>> +
>> +  int tag = dwarf_tag(const_cast<Dwarf_Die*>(die));
>> +  if (tag == DW_TAG_variable)
>> +    return true;
>> +  return false;
>> +}
>> +
>> /// Test if a DIE has size attribute.
>> ///
>> /// @param die the DIE to consider.
>> @@ -12690,9 +12771,13 @@ build_translation_unit_and_add_to_ir(read_context&     ctxt,
>>   result->set_is_constructed(false);
>> 
>>   do
>> -    build_ir_node_from_die(ctxt, &child,
>> -                          die_is_public_decl(&child),
>> -                          dwarf_dieoffset(&child));
>> +    // Analyze all the DIEs we encounter unless we are asked to only
>> +    // analyze exported interfaces and the types reachables from them.
>> +    if (!ctxt.env()->analyze_exported_interfaces_only()
>> +       || ctxt.is_decl_die_with_exported_symbol(&child))
>> +      build_ir_node_from_die(ctxt, &child,
>> +                            die_is_public_decl(&child),
>> +                            dwarf_dieoffset(&child));
>>   while (dwarf_siblingof(&child, &child) == 0);
>> 
>>   if (!ctxt.var_decls_to_re_add_to_tree().empty())
>> @@ -15699,6 +15784,16 @@ read_debug_info_into_corpus(read_context& ctxt)
>>     origin |= corpus::LINUX_KERNEL_BINARY_ORIGIN;
>>   ctxt.current_corpus()->set_origin(origin);
>> 
>> +  if (origin & corpus::LINUX_KERNEL_BINARY_ORIGIN
>> +      && !ctxt.env()->user_set_analyze_exported_interfaces_only())
>> +    // So we are looking at the Linux Kernel and the user has not set
>> +    // any particular option regarding the amount of types to analyse.
>> +    // In that case, we need to only analyze types that are reachable
>> +    // from exported interfaces otherwise we get such a massive amount
>> +    // of type DIEs to look at that things are just too slow down the
>> +    // road.
>> +    ctxt.env()->analyze_exported_interfaces_only(true);
>> +
>>   ctxt.current_corpus()->set_soname(ctxt.dt_soname());
>>   ctxt.current_corpus()->set_needed(ctxt.dt_needed());
>>   ctxt.current_corpus()->set_architecture_name(ctxt.elf_architecture());
>> diff --git a/src/abg-ir-priv.h b/src/abg-ir-priv.h
>> index 45b711b7..21734b25 100644
>> --- a/src/abg-ir-priv.h
>> +++ b/src/abg-ir-priv.h
>> @@ -26,6 +26,7 @@ namespace ir
>> {
>> 
>> using std::string;
>> +using abg_compat::optional;
>> 
>> /// The result of structural comparison of type ABI artifacts.
>> enum comparison_result
>> @@ -443,6 +444,7 @@ struct environment::priv
>>   bool                                 decl_only_class_equals_definition_;
>>   bool                                 use_enum_binary_only_equality_;
>>   bool                                 allow_type_comparison_results_caching_;
>> +  optional<bool>                       analyze_exported_interfaces_only_;
>> #ifdef WITH_DEBUG_SELF_COMPARISON
>>   bool                                 self_comparison_debug_on_;
>> #endif
>> diff --git a/src/abg-ir.cc b/src/abg-ir.cc
>> index 91c8e99b..02d68e63 100644
>> --- a/src/abg-ir.cc
>> +++ b/src/abg-ir.cc
>> @@ -3674,6 +3674,42 @@ const config&
>> environment::get_config() const
>> {return priv_->config_;}
>> 
>> +/// Getter for a property that says if the user actually did set the
>> +/// analyze_exported_interfaces_only() property.  If not, it means
>> +/// the default behaviour prevails.
>> +///
>> +/// @return tru iff the user did set the
>> +/// analyze_exported_interfaces_only() property.
>> +bool
>> +environment::user_set_analyze_exported_interfaces_only() const
>> +{return priv_->analyze_exported_interfaces_only_.has_value();}
>> +
>> +/// Setter for the property that controls if we are to restrict the
>> +/// analysis to the types that are only reachable from the exported
>> +/// interfaces only, or if the set of types should be more broad than
>> +/// that.  Typically, we'd restrict the analysis to types reachable
>> +/// from exported interfaces only (stricto sensu, that would really be
>> +/// only the types that are part of the ABI of well designed
>> +/// libraries) for performance reasons.
>> +///
>> +/// @param f the value of the flag.
>> +void
>> +environment::analyze_exported_interfaces_only(bool f)
>> +{priv_->analyze_exported_interfaces_only_ = f;}
>> +
>> +/// Getter for the property that controls if we are to restrict the
>> +/// analysis to the types that are only reachable from the exported
>> +/// interfaces only, or if the set of types should be more broad than
>> +/// that.  Typically, we'd restrict the analysis to types reachable
>> +/// from exported interfaces only (stricto sensu, that would really be
>> +/// only the types that are part of the ABI of well designed
>> +/// libraries) for performance reasons.
>> +///
>> +/// @param f the value of the flag.
>> +bool
>> +environment::analyze_exported_interfaces_only() const
>> +{return priv_->analyze_exported_interfaces_only_.value_or(false);}
>> +
>> #ifdef WITH_DEBUG_SELF_COMPARISON
>> /// Setter of the corpus of the input corpus of the self comparison
>> /// that takes place when doing "abidw --debug-abidiff <binary>".
>> diff --git a/tools/abidiff.cc b/tools/abidiff.cc
>> index 97b036cb..e0bb35ac 100644
>> --- a/tools/abidiff.cc
>> +++ b/tools/abidiff.cc
>> @@ -29,6 +29,7 @@ using std::ostream;
>> using std::cout;
>> using std::cerr;
>> using std::shared_ptr;
>> +using abg_compat::optional;
>> using abigail::ir::environment;
>> using abigail::ir::environment_sptr;
>> using abigail::translation_unit;
>> @@ -74,6 +75,7 @@ struct options
>>   vector<string>       headers_dirs2;
>>   vector<string>        header_files2;
>>   bool                 drop_private_types;
>> +  optional<bool>       exported_interfaces_only;
>>   bool                 linux_kernel_mode;
>>   bool                 no_default_supprs;
>>   bool                 no_arch;
>> @@ -197,6 +199,9 @@ display_usage(const string& prog_name, ostream& out)
>>     << " --header-file2|--hf2 <path>  the path to one header of file2\n"
>>     << " --drop-private-types  drop private types from "
>>     "internal representation\n"
>> +    << "  --exported-interfaces-only  analyze exported interfaces only\n"
>> +    << "  --allow-non-exported-interfaces  analyze interfaces that "
>> +    "might not be exported\n"
>>     << " --no-linux-kernel-mode  don't consider the input binaries as "
>>        "linux kernel binaries\n"
>>     << " --kmi-whitelist|-w  path to a "
>> @@ -403,6 +408,10 @@ parse_command_line(int argc, char* argv[], options& opts)
>>        }
>>       else if (!strcmp(argv[i], "--drop-private-types"))
>>        opts.drop_private_types = true;
>> +      else if (!strcmp(argv[i], "--exported-interfaces-only"))
>> +       opts.exported_interfaces_only = true;
>> +      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
>> +       opts.exported_interfaces_only = false;
>>       else if (!strcmp(argv[i], "--no-default-suppression"))
>>        opts.no_default_supprs = true;
>>       else if (!strcmp(argv[i], "--no-architecture"))
>> @@ -1130,6 +1139,9 @@ main(int argc, char* argv[])
>>       t2_type = guess_file_type(opts.file2);
>> 
>>       environment_sptr env(new environment);
>> +      if (opts.exported_interfaces_only.has_value())
>> +       env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
>> +
>> #ifdef WITH_DEBUG_SELF_COMPARISON
>>            if (opts.do_debug)
>>              env->self_comparison_debug_is_on(true);
>> diff --git a/tools/abidw.cc b/tools/abidw.cc
>> index 9a27a029..f38d6048 100644
>> --- a/tools/abidw.cc
>> +++ b/tools/abidw.cc
>> @@ -40,6 +40,7 @@ using std::ostream;
>> using std::ofstream;
>> using std::vector;
>> using std::shared_ptr;
>> +using abg_compat::optional;
>> using abigail::tools_utils::emit_prefix;
>> using abigail::tools_utils::temp_file;
>> using abigail::tools_utils::temp_file_sptr;
>> @@ -114,6 +115,7 @@ struct options
>>   bool                 do_log;
>>   bool                 drop_private_types;
>>   bool                 drop_undefined_syms;
>> +  optional<bool>       exported_interfaces_only;
>>   type_id_style_kind   type_id_style;
>> #ifdef WITH_DEBUG_SELF_COMPARISON
>>   string               type_id_file_path;
>> @@ -187,6 +189,9 @@ display_usage(const string& prog_name, ostream& out)
>>     << "  --short-locs  only print filenames rather than paths\n"
>>     << "  --drop-private-types  drop private types from representation\n"
>>     << "  --drop-undefined-syms  drop undefined symbols from representation\n"
>> +    << "  --exported-interfaces-only  analyze exported interfaces only\n"
>> +    << "  --allow-non-exported-interfaces  analyze interfaces that "
>> +    "might not be exported\n"
>>     << "  --no-comp-dir-path  do not show compilation path information\n"
>>     << "  --no-elf-needed  do not show the DT_NEEDED information\n"
>>     << "  --no-write-default-sizes  do not emit pointer size when it equals"
>> @@ -368,6 +373,10 @@ parse_command_line(int argc, char* argv[], options& opts)
>>        opts.drop_private_types = true;
>>       else if (!strcmp(argv[i], "--drop-undefined-syms"))
>>        opts.drop_undefined_syms = true;
>> +      else if (!strcmp(argv[i], "--exported-interfaces-only"))
>> +       opts.exported_interfaces_only = true;
>> +      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
>> +       opts.exported_interfaces_only = false;
>>       else if (!strcmp(argv[i], "--no-linux-kernel-mode"))
>>        opts.linux_kernel_mode = false;
>>       else if (!strcmp(argv[i], "--abidiff"))
>> @@ -606,6 +615,9 @@ load_corpus_and_write_abixml(char* argv[],
>>             }
>>         }
>> 
>> +      if (opts.exported_interfaces_only.has_value())
>> +       env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
>> +
>>       t.start();
>>       corp = dwarf_reader::read_corpus_from_elf(ctxt, s);
>>       t.stop();
>> @@ -813,6 +825,9 @@ load_kernel_corpus_group_and_write_abixml(char* argv[],
>>   timer t, global_timer;
>>   suppressions_type supprs;
>> 
>> +  if (opts.exported_interfaces_only.has_value())
>> +    env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
>> +
>>   if (opts.do_log)
>>     emit_prefix(argv[0], cerr)
>>       << "going to build ABI representation of the Linux Kernel ...\n";
>> diff --git a/tools/abipkgdiff.cc b/tools/abipkgdiff.cc
>> index 551080b9..656d5882 100644
>> --- a/tools/abipkgdiff.cc
>> +++ b/tools/abipkgdiff.cc
>> @@ -106,6 +106,7 @@ using std::set;
>> using std::ostringstream;
>> using std::shared_ptr;
>> using std::dynamic_pointer_cast;
>> +using abg_compat::optional;
>> using abigail::workers::task;
>> using abigail::workers::task_sptr;
>> using abigail::workers::queue;
>> @@ -205,6 +206,7 @@ public:
>>   bool         fail_if_no_debug_info;
>>   bool         show_identical_binaries;
>>   bool         self_check;
>> +  optional<bool> exported_interfaces_only;
>> #ifdef WITH_CTF
>>   bool         use_ctf;
>> #endif
>> @@ -868,6 +870,9 @@ display_usage(const string& prog_name, ostream& out)
>>     "full impact analysis report rather than the default leaf changes reports\n"
>>     << " --non-reachable-types|-t  consider types non reachable"
>>     " from public interfaces\n"
>> +    << "  --exported-interfaces-only  analyze exported interfaces only\n"
>> +    << "  --allow-non-exported-interfaces  analyze interfaces that "
>> +    "might not be exported\n"
>>     << " --no-linkage-name             do not display linkage names of "
>>     "added/removed/changed\n"
>>     << " --redundant                    display redundant changes\n"
>> @@ -2076,6 +2081,10 @@ public:
>>     abigail::elf_reader::status detailed_status =
>>       abigail::elf_reader::STATUS_UNKNOWN;
>> 
>> +    if (args->opts.exported_interfaces_only.has_value())
>> +      env->analyze_exported_interfaces_only
>> +       (*args->opts.exported_interfaces_only);
>> +
>>     status |= compare(args->elf1, args->debug_dir1, args->private_types_suppr1,
>>                      args->elf2, args->debug_dir2, args->private_types_suppr2,
>>                      args->opts, env, diff, ctxt, &detailed_status);
>> @@ -2142,6 +2151,10 @@ public:
>>     diff_context_sptr ctxt;
>>     corpus_diff_sptr diff;
>> 
>> +    if (args->opts.exported_interfaces_only.has_value())
>> +      env->analyze_exported_interfaces_only
>> +       (*args->opts.exported_interfaces_only);
>> +
>>     abigail::elf_reader::status detailed_status =
>>       abigail::elf_reader::STATUS_UNKNOWN;
>> 
>> @@ -3024,6 +3037,10 @@ compare_prepared_linux_kernel_packages(package& first_package,
>>   string dist_root2 = second_package.extracted_dir_path();
>> 
>>   abigail::ir::environment_sptr env(new abigail::ir::environment);
>> +  if (opts.exported_interfaces_only.has_value())
>> +    env->analyze_exported_interfaces_only
>> +      (*opts.exported_interfaces_only);
>> +
>>   suppressions_type supprs;
>>   corpus_group_sptr corpus1, corpus2;
>>   corpus1 = build_corpus_group_from_kernel_dist_under(dist_root1,
>> @@ -3326,6 +3343,10 @@ parse_command_line(int argc, char* argv[], options& opts)
>>       else if (!strcmp(argv[i], "--full-impact")
>>               ||!strcmp(argv[i], "-f"))
>>        opts.show_full_impact_report = true;
>> +      else if (!strcmp(argv[i], "--exported-interfaces-only"))
>> +       opts.exported_interfaces_only = true;
>> +      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
>> +       opts.exported_interfaces_only = false;
>>       else if (!strcmp(argv[i], "--no-linkage-name"))
>>        opts.show_linkage_names = false;
>>       else if (!strcmp(argv[i], "--redundant"))
>> diff --git a/tools/kmidiff.cc b/tools/kmidiff.cc
>> index 8fd3fed9..f3332765 100644
>> --- a/tools/kmidiff.cc
>> +++ b/tools/kmidiff.cc
>> @@ -29,6 +29,7 @@ using std::vector;
>> using std::ostream;
>> using std::cout;
>> using std::cerr;
>> +using abg_compat::optional;
>> 
>> using namespace abigail::tools_utils;
>> using namespace abigail::dwarf_reader;
>> @@ -60,6 +61,7 @@ struct options
>>   bool                 show_hexadecimal_values;
>>   bool                 show_offsets_sizes_in_bits;
>>   bool                 show_impacted_interfaces;
>> +  optional<bool>       exported_interfaces_only;
>> #ifdef WITH_CTF
>>   bool                 use_ctf;
>> #endif
>> @@ -120,6 +122,9 @@ display_usage(const string& prog_name, ostream& out)
>>     << " --impacted-interfaces|-i  show interfaces impacted by ABI changes\n"
>>     << " --full-impact|-f  show the full impact of changes on top-most "
>>         "interfaces\n"
>> +    << "  --exported-interfaces-only  analyze exported interfaces only\n"
>> +    << "  --allow-non-exported-interfaces  analyze interfaces that "
>> +    "might not be exported\n"
>>     << " --show-bytes  show size and offsets in bytes\n"
>>     << " --show-bits  show size and offsets in bits\n"
>>     << " --show-hex  show size and offset in hexadecimal\n"
>> @@ -262,6 +267,10 @@ parse_command_line(int argc, char* argv[], options& opts)
>>       else if (!strcmp(argv[i], "--full-impact")
>>               || !strcmp(argv[i], "-f"))
>>        opts.leaf_changes_only = false;
>> +      else if (!strcmp(argv[i], "--exported-interfaces-only"))
>> +       opts.exported_interfaces_only = true;
>> +      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
>> +       opts.exported_interfaces_only = false;
>>       else if (!strcmp(argv[i], "--show-bytes"))
>>        opts.show_offsets_sizes_in_bits = false;
>>       else if (!strcmp(argv[i], "--show-bits"))
>> @@ -408,6 +417,9 @@ main(int argc, char* argv[])
>> 
>>   environment_sptr env(new environment);
>> 
>> +  if (opts.exported_interfaces_only.has_value())
>> +    env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
>> +
>>   corpus_group_sptr group1, group2;
>>   string debug_info_root_dir;
>>   corpus::origin origin =
>> --
>> 2.37.2
>> 
>> 
>> 
>> --
>>                Dodji
>
  
Giuliano Procida Sept. 10, 2022, 10:56 a.m. UTC | #3
Hi.

On Fri, 9 Sept 2022, 21:32 Ben Woodard, <woodard@redhat.com> wrote:

>
>
> On Sep 9, 2022, at 6:03 AM, Giuliano Procida via Libabigail <
> libabigail@sourceware.org> wrote:
>
> Hi Dodji.
>
> Sorry for the late reply. I was down with Covid for a while.
>
>
> And I’ve been on vacation.
>
>
> I feel this commit deserves some feedback.
>
> Yeah I’ve been thinking about this for quite some time too and have been
> trying to put my intuition into words
>
> My understanding is that the intention here is to make the DWARF
> reader do less work (look at fewer type DIEs) than at present.
>
> We are actually hoping that we may be able to make the DWARF reader
> look at more type DIEs so that it is more likely to pick up full
> definitions of types instead of declarations.
>
>
> I think that reducing it to just exported interfaces and associated types
> may be too limited as well.
>
> 1) Exceptions - classes thrown as exceptions will not be included in the
> list of types related to exported functions or variables even though they
> can cross ABI boundaries. They need to be dug out of the LSDA.
> Note: I believe that any type that makes use of RTTI can leak information
> across the ABI boundary but I haven’t thought carefully enough about it to
> come up with an example other than exceptions.
>
> 2) Inline functions and the types that they use - when doing ABI
> compatibility testing, We should also include the prototypes for inline
> functions and the types that they use. These will not be exported
> functions. I think something that has been saving us has been the deep type
> evaluation including non-exported functions and types. Changes that would
> have been overlooked when considering only exported would still be
> evaluated in many cases. If we trim down to just exported functions, I
> think that some of these will be missed. To be able to find the list of
> inline functions which need to be considered you are going to have to go
> through the callsite data in DWARF. Unfortunately, this is something that
> clang doesn’t generate well.  Never the less these functions and their
> associated types need to be part of the considered ABI.
> Note: We cannot evaluate if two inline function calls in two different
> objects are in fact the same because of how optimization can mutate the
> code and there can be semantic differences between different versions but
> at the very least we should be able to verify that the function prototypes
> and the associated types are in fact the same.
> Certainly any function which is inline in one TU and exported in another
> is an ABI break. Detecting this would also require parsing of callsite
> information.
>
> 3) if a library is dlopen'ed, then a function is dlsym’ed, and then cast
> into the appropriate type, that function and its associated types are part
> of the ABI. If all types within an executable are considered, we would
> detect this change but if only exported types are considered, then a change
> could be overlooked. A heuristic that we could use is if, dlsym is called
> we need to include all types not just exported types. I do not know how to
> solve this one without binary analysis.
>
> 4) PIE - with more and more thing being compiled as PIE, it can become
> increasingly hard to know what is a library vs. what is an executable. Even
> normal executables may provide functions and variables to their libraries.
> In abicompat we currently only consider external undefined symbols, this is
> insufficient tobecause a library may call a function or reference a
> variable in the executable. Therefore, the function prototype and its
> related types or the variable type needs to also be considered. I haven’t
> looked carefully at Dodji’s branch to only analyze exported symbols but I
> would hope that it includes not only undefined references but also defined
> exported symbols when considering executables.
>
> So I would say that my feedback is "restricting analyzed decls to exported
> symbols” is too limited, It needs to also include:
> 1) any type which has a typeinfo structure in the LSDA for exceptions and
> RTTI
> 2) any inline function and associated types which has a callsite in the
> object being analyzed
> 3) if dlopen or dlmopen are used, then either
>    a) every decl needs to be analyzed
>    b) the type for the destination variable for every dlsym call needs to
> be included in the analysis. (more binary analysis than is easily
> accomplished with the current code base)
> 4) all decls which are exported and their associated types not just
> defined ones.
>
>
> The rationale behind the change appears to be that DWARF processing is
> expensive, in particular for kernel ABIs. I would say "measure first".
> Here's roughly how I think about things:
>
>
> I’m not entirely sure your breakdown of what consumes time is correct.
> Dodji is much more familiar with that part of the code than I am. The vast
> majority of my contribution to the project has been on the theoretical side
> of ABI and testing.
>
> 1. building the IR is very cheap
>
> A kernel ABI may end up with 40k IR elements. The cost of allocating
> memory and calling constructors should be negligible. Any improvements
> to this end of things is pointless.
>
> 2. reading DWARF information is fairly cheap
>
> We may have 100MB of DWARF but just reading the data (decoding
> attribute formats in particular) won't take that long.
>
> Reducing the number of DIEs examined at the top-level by a factor of 2
> will speed up this part by a factor of 2, but in the grand scheme of
> things that may not be very important.
>
> 3. chasing references is a bit more expensive
>
> Cross-references in DWARF are pretty common and the lack of locality
> means that chasing cross-references is going to be a constant factor
> slower than iterating through the main DWARF tree.
>
>
> I may not completely understand your description but perceive this as a
> more expensive operation than you seem to.
>

I simply meant the cost of decoding a reference attribute and making a
recursive call to process a DIE (that may or may not have already been
visited).

In my understanding of this, it includes the canonicalization of the types
> that appear in the various TUs. This seems to be a non-trivial task. It
> seems like the optimization which had to be undone for correctness was in
> the canonicalization of types.
>

I was including the canonicalisation cost in 4. If it turns out that the
code is creating a lot of IR nodes (of the order of the number of DIEs)
which are then mostly discarded, then the cost 1. of construction (and
destruction) of IR nodes becomes much more significant. And point 4. now
includes "deciding which bits of the IR to preserve".

>
> 4. deciding whether a DIE needs to be turned into IR is currently very
> expensive
>
> This is because it involves multiple look-ups and recursive comparison
> of DIEs which cannot be unconditionally memoised.
>
>
> I do believe that this phase could be a good target for optimization.
>

I am thinking about doing "this" (identification and elimination of
identical subgraphs) from first principles, so my mental model doesn't
necessarily match what libabigail does. In particular, libabigail can do
deduplication (canonicalisation) work at multiple different stages of DWARF
processing so I'm probably muddling up some things.

>
> Those are only my thoughts. Some profiling should give a more accurate
> picture.
>
> I was curious, so I did an analysis of the connectivity of a kernel
> ABI (using the STG IR, not libabigail's - there are minor
> differences). Here are some fun facts.
>
> The ABI has 34541 nodes.
> There are 25196 strongly-connected components.
> 25053 SCCs are just singleton nodes.
> The largest 3 SCCs have sizes: 4960, 784, 343.
> 1/7 of the ABI nodes are in one SCC!
>
> Completely idle speculation: Perhaps the really huge SCC contributes
> significantly to comparison cost.
>
> As a follow-up, I looked at the ABI for a large non-Android SDK C++
library. It has >800k nodes of which ~¾ are in one SCC. Think what this
means in terms of ABI equivence checking if just one node somewhere is
tweaked! I think this library has around 1000 symbols.

One analysis I haven't done is look at the longest paths and cycles
(without revisiting a node) that exist. These will give limits to depth of
recursion.

>
> Regards,
> Giuliano.
>
>
> On Tue, 6 Sept 2022 at 11:11, Dodji Seketeli <dodji@seketeli.org> wrote:
>
>
> Hello,
>
> Profiling showed that the DWARF reader scans too much data.
>
> Basically, in build_translation_unit_and_add_to_ir,
> build_ir_node_from_die is called on every single DIE that is seen, for
> a given translation unit.
>
> There are interfaces (function and variable decls) that are not
> associated with exported ELF symbols and that are analyzed by
> build_ir_node_from_die nonetheless.  For instance, interfaces that are
> visible outside of their translation units are analyzed and the types
> that are reachable from those interfaces are analyzed as well.
>
> Once that is done, an ABI corpus is built with the subset of
> interfaces that have exported ELF symbol (strictly those that are part
> of the ABI), but types that are not necessarily reachable from those
> ABI interfaces can also be put into the ABI corpus.
>
> Some tools make use of this "lose" behaviour of libabigail.  For
> instance, abicompat precisely wants to analyze interfaces with
> undefined symbols.  For an application, those interfaces represents
> the interfaces that the application expects to be provided by some
> shared library.
>
> When analyzing the exported interface of the Linux Kernel (or any
> other huge application) however, analyzing more types than necessary
> appears to incur a huge time penalty.
>
> So, this patch introduces an optional behaviour whereby
> build_translation_unit_and_add_to_ir is restricted to analyzing
> interfaces that have exported ELF symbols only.  So only the types
> reachable from those interfaces are analyzed.  This more than halves
> the time spent by "abidw --noout vmlinux".
>
> Strictly speaking, this new behaviour is triggered by a new option named
> --exported-interfaces-only, supported by the tools abidw, abidiff,
> abipkgdiff and kmidiff.
>
> When looking at the Linux Kernel however, this option is enabled by
> default.
>
> Note that an option --allow-non-exported-interfaces is also introduce
> to function under the previous model of operations.  This option is
> enabled by default on all the tools when they are not looking at the
> Linux Kernel.
>
> With this enabled, analyzing the Linux Kernel is back to taking less
> than a minute on a reasonable machine.
>
>        * doc/manuals/tools-use-libabigail.txt: New doc text.
>        * doc/manuals/Makefile.am: Add the new tools-use-libabigail.rst
>        tool to the source distribution.
>        * doc/manuals/abidiff.rst: Include the new
>        tools-use-libabigail.rst.  Document the --exported-interfaces-only
>        and --allow-non-exported-interfaces.
>        * doc/manuals/abidw.rst: Likewise.
>        * doc/manuals/abipkgdiff.rst: Likewise.
>        * doc/manuals/kmidiff.rst: Likewise.
>        * include/abg-ir.h
>        (environment::{user_set_analyze_exported_interfaces_only,
>        analyze_exported_interfaces_only}): Declare new accessors.
>        * src/abg-ir.cc
>        (environment::{user_set_analyze_exported_interfaces_only,
>        analyze_exported_interfaces_only}): Define new accessors.
>        * src/abg-dwarf-reader.cc (die_is_variable_decl)
>        (die_is_function_decl): Define new static functions.
>        (read_context::is_decl_die_with_exported_symbol): Define new
>        member function.
>        (read_context::get_{function,variable}_address): Const-ify the
>        Dwarf_Die* parameter.
>        (build_translation_unit_and_add_to_ir): If the user asks to
>        analyze exported interfaces only,  the analyze only interfaces
>        that have exported ELF symbols.
>        (read_debug_info_into_corpus): If we are looking at the Linux
>        Kernel, then only analyze exported interfaces unless the user asks
>        otherwise.
>        * src/abg-ir-priv.h
>        (environment::priv::analyze_exported_interfaces_only_): Define new
>        data member.
>        * tools/abidiff.cc (options::exported_interfaces_only): Define new
>        data member.
>        (display_usage): Add new help strings for
>        --exported-interfaces-only and --allow-non-exported-interfaces.
>        (parse_command_line): Parse the new options
>        --exported-interfaces-only and --allow-non-exported-interfaces.
>        (main): Pass the value of opts.exported_interfaces_only to the
>        environment.
>        * tools/abidw.cc (options::exported_interfaces_only): Define new
>        data member.
>        (display_usage): Add new help strings for
>        --exported-interfaces-only and --allow-non-exported-interfaces.
>        (parse_command_line): Parse the new options
>        (load_corpus_and_write_abixml)
>        (load_kernel_corpus_group_and_write_abixml): Pass the value of
>        opts.exported_interfaces_only onto the environment.
>        * tools/abipkgdiff.cc (options::exported_interfaces_only): Define
> new
>        data member.
>        (display_usage): Add new help strings for
>        --exported-interfaces-only and --allow-non-exported-interfaces.
>        (parse_command_line): Parse the new options
>        (compare_task::perform, self_compare_task::perform): Pass the
>        value of opts.exported_interfaces_only onto the environment.
>        (compare_prepared_linux_kernel_packages): Likewise.
>        * tools/kmidiff.cc(options::exported_interfaces_only): Define new
>        data member.
>        (display_usage): Add new help strings for
>        --exported-interfaces-only and --allow-non-exported-interfaces.
>        (parse_command_line): Parse the new options
>        (main): Pass the value of opts.exported_interfaces_only onto the
>        environment.
>
> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
> ---
> doc/manuals/Makefile.am              |   3 +-
> doc/manuals/abidiff.rst              |  52 ++++++++++++
> doc/manuals/abidw.rst                |  82 ++++++++++++++++---
> doc/manuals/abipkgdiff.rst           |  51 ++++++++++++
> doc/manuals/kmidiff.rst              |  52 +++++++++++-
> doc/manuals/tools-use-libabigail.txt |  16 ++++
> include/abg-ir.h                     |   9 +++
> src/abg-dwarf-reader.cc              | 113 ++++++++++++++++++++++++---
> src/abg-ir-priv.h                    |   2 +
> src/abg-ir.cc                        |  36 +++++++++
> tools/abidiff.cc                     |  12 +++
> tools/abidw.cc                       |  15 ++++
> tools/abipkgdiff.cc                  |  21 +++++
> tools/kmidiff.cc                     |  12 +++
> 14 files changed, 449 insertions(+), 27 deletions(-)
> create mode 100644 doc/manuals/tools-use-libabigail.txt
>
> diff --git a/doc/manuals/Makefile.am b/doc/manuals/Makefile.am
> index 894b38f1..e2813785 100644
> --- a/doc/manuals/Makefile.am
> +++ b/doc/manuals/Makefile.am
> @@ -14,7 +14,8 @@ libabigail-concepts.rst \
> libabigail-overview.rst \
> libabigail-tools.rst \
> fedabipkgdiff.rst \
> -kmidiff.rst
> +kmidiff.rst \
> +tools-use-libabigail.txt
>
> # You can set these variables from the command line.
> SPHINXOPTS    =
> diff --git a/doc/manuals/abidiff.rst b/doc/manuals/abidiff.rst
> index a15515be..0c711d9e 100644
> --- a/doc/manuals/abidiff.rst
> +++ b/doc/manuals/abidiff.rst
> @@ -18,6 +18,8 @@ be accompanied with their debug information in `DWARF`_
> format.
> Otherwise, only `ELF`_ symbols that were added or removed are
> reported.
>
> +.. include:: tools-use-libabigail.txt
> +
> .. _abidiff_invocation_label:
>
> Invocation
> @@ -197,6 +199,56 @@ Options
>     consumption of the tool on binaries with a lot of publicly defined
>     and exported types.
>
> +  * ``--exported-interfaces-only``
> +
> +    By default, when looking at the debug information accompanying a
> +    binary, this tool analyzes the descriptions of the types reachable
> +    by the interfaces (functions and variables) that are visible
> +    outside of their translation unit.  Once that analysis is done, an
> +    ABI corpus is constructed by only considering the subset of types
> +    reachable from interfaces associated to `ELF`_ symbols that are
> +    defined and exported by the binary.  It's those final ABI Corpora
> +    that are compared by this tool.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, especially when those binaries are
> +    applications, as opposed to shared libraries.  One example of such
> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
> +    corpora like these can be extremely slow.
> +
> +    To mitigate that performance issue, this option allows libabigail
> +    to only analyze types that are reachable from interfaces
> +    associated with defined and exported `ELF`_ symbols.
> +
> +    Note that this option is turned on by default when analyzing the
> +    `Linux Kernel`_.  Otherwise, it's turned off by default.
> +
> +  * ``--allow-non-exported-interfaces``
> +
> +    When looking at the debug information accompanying a binary, this
> +    tool analyzes the descriptions of the types reachable by the
> +    interfaces (functions and variables) that are visible outside of
> +    their translation unit.  Once that analysis is done, an ABI corpus
> +    is constructed by only considering the subset of types reachable
> +    from interfaces associated to `ELF`_ symbols that are defined and
> +    exported by the binary.  It's those final ABI Corpora that are
> +    compared by this tool.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, especially when those binaries are
> +    applications, as opposed to shared libraries.  One example of such
> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
> +    Corpora like these can be extremely slow.
> +
> +    In the presence of an "average sized" binary however one can
> +    afford having libabigail analyze all interfaces that are visible
> +    outside of their translation unit, using this option.
> +
> +    Note that this option is turned on by default, unless we are in
> +    the presence of the `Linux Kernel`_.
> +
>   * ``--stat``
>
>     Rather than displaying the detailed ABI differences between
> diff --git a/doc/manuals/abidw.rst b/doc/manuals/abidw.rst
> index bdd6204d..a3055c7e 100644
> --- a/doc/manuals/abidw.rst
> +++ b/doc/manuals/abidw.rst
> @@ -12,14 +12,19 @@ defined ELF symbols of the file.  The input shared
> library must
> contain associated debug information in `DWARF`_ format.
>
> When given the ``--linux-tree`` option, this program can also handle a
> -Linux kernel tree.  That is, a directory tree that contains both the
> -vmlinux binary and Linux kernel modules.  It analyses those Linux
> -kernel binaries and emits an XML representation of the interface
> -between the kernel and its module, to standard output.  In this case,
> -we don't call it an ABI, but a KMI (Kernel Module Interface).  The
> -emitted KMI includes all the globally defined functions and variables,
> -along with a complete representation of their types.  The input
> -binaries must contain associated debug information in `DWARF`_ format.
> +`Linux kernel`_ tree.  That is, a directory tree that contains both
> +the vmlinux binary and `Linux Kernel`_ modules.  It analyses those
> +`Linux Kernel`_ binaries and emits an XML representation of the
> +interface between the kernel and its module, to standard output.  In
> +this case, we don't call it an ABI, but a KMI (Kernel Module
> +Interface).  The emitted KMI includes all the globally defined
> +functions and variables, along with a complete representation of their
> +types.  The input binaries must contain associated debug information
> +in `DWARF`_ format.
> +
> +.. include:: tools-use-libabigail.txt
> +
> +.. _abidiff_invocation_label:
>
> Invocation
> ==========
> @@ -92,7 +97,7 @@ Options
>
>   * ``--kmi-whitelist | -kaw`` <*path-to-whitelist*>
>
> -    When analyzing a Linux kernel binary, this option points to the
> +    When analyzing a `Linux Kernel`_ binary, this option points to the
>     white list of names of ELF symbols of functions and variables
>     which ABI must be written out.  That white list is called a "
>     Kernel Module Interface white list".  This is because for the
> @@ -105,7 +110,7 @@ Options
>
>     If this option is not provided -- thus if no white list is
>     provided -- then the entire KMI, that is, all publicly defined and
> -    exported functions and global variables by the Linux Kernel
> +    exported functions and global variables by the `Linux Kernel`_
>     binaries is emitted.
>
>   * ``--linux-tree | --lt``
> @@ -115,9 +120,10 @@ Options
>     In that case, this program emits the representation of the Kernel
>     Module Interface (KMI) on the standard output.
>
> -    Below is an example of usage of ``abidw`` on a Linux Kernel tree.
> +    Below is an example of usage of ``abidw`` on a `Linux Kernel`_
> +    tree.
>
> -    First, checkout a Linux kernel source tree and build it.  Then
> +    First, checkout a `Linux Kernel`_ source tree and build it.  Then
>     install the kernel modules in a directory somewhere.  Copy the
>     vmlinux binary into that directory too.  And then serialize the
>     KMI of that kernel to disk, using ``abidw``: ::
> @@ -171,6 +177,56 @@ Options
>     representation build by Libabigail to represent the ABI and will
>     not end up in the abi XML file.
>
> +  * ``--exported-interfaces-only``
> +
> +    By default, when looking at the debug information accompanying a
> +    binary, this tool analyzes the descriptions of the types reachable
> +    by the interfaces (functions and variables) that are visible
> +    outside of their translation unit.  Once that analysis is done, an
> +    ABI corpus is constructed by only considering the subset of types
> +    reachable from interfaces associated to `ELF`_ symbols that are
> +    defined and exported by the binary.  It's that final ABI corpus
> +    which textual representation is saved as ``ABIXML``.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, especially when those binaries are
> +    applications, as opposed to shared libraries.  One example of such
> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
> +    corpora like these can be extremely slow.
> +
> +    To mitigate that performance issue, this option allows libabigail
> +    to only analyze types that are reachable from interfaces
> +    associated with defined and exported `ELF`_ symbols.
> +
> +    Note that this option is turned on by default when analyzing the
> +    `Linux Kernel`_.  Otherwise, it's turned off by default.
> +
> +  * ``--allow-non-exported-interfaces``
> +
> +    When looking at the debug information accompanying a binary, this
> +    tool analyzes the descriptions of the types reachable by the
> +    interfaces (functions and variables) that are visible outside of
> +    their translation unit.  Once that analysis is done, an ABI corpus
> +    is constructed by only considering the subset of types reachable
> +    from interfaces associated to `ELF`_ symbols that are defined and
> +    exported by the binary.  It's that final ABI corpus which textual
> +    representation is saved as ``ABIXML``.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, especially when those binaries are
> +    applications, as opposed to shared libraries.  One example of such
> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
> +    corpora like these can be extremely slow.
> +
> +    In the presence of an "average sized" binary however one can
> +    afford having libabigail analyze all interfaces that are visible
> +    outside of their translation unit, using this option.
> +
> +    Note that this option is turned on by default, unless we are in
> +    the presence of the `Linux Kernel`_.
> +
>   * ``--no-linux-kernel-mode``
>
>     Without this option, if abipkgiff detects that the binaries it is
> @@ -308,4 +364,4 @@ standard `here
> .. _ELF: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
> .. _DWARF: http://www.dwarfstd.org
> .. _GNU: http://www.gnu.org
> -
> +.. _Linux Kernel: https://kernel.org/
> diff --git a/doc/manuals/abipkgdiff.rst b/doc/manuals/abipkgdiff.rst
> index 15ea9072..9114775a 100644
> --- a/doc/manuals/abipkgdiff.rst
> +++ b/doc/manuals/abipkgdiff.rst
> @@ -19,6 +19,7 @@ information directly in a section of said binaries.  In
> those cases,
> obviously, no separate debug information package is needed as the tool
> will find the debug information inside the binaries.
>
> +.. include:: tools-use-libabigail.txt
>
> .. _abipkgdiff_invocation_label:
>
> @@ -277,6 +278,56 @@ Options
>     global functions and variables are analyzed, so the tool detects
>     and reports changes on these reachable types only.
>
> +  * ``--exported-interfaces-only``
> +
> +    By default, when looking at the debug information accompanying a
> +    binary, this tool analyzes the descriptions of the types reachable
> +    by the interfaces (functions and variables) that are visible
> +    outside of their translation unit.  Once that analysis is done, an
> +    ABI corpus is constructed by only considering the subset of types
> +    reachable from interfaces associated to `ELF`_ symbols that are
> +    defined and exported by the binary.  It's those final ABI Corpora
> +    that are compared by this tool.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, especially when those binaries are
> +    applications, as opposed to shared libraries.  One example of such
> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
> +    corpora like these can be extremely slow.
> +
> +    To mitigate that performance issue, this option allows libabigail
> +    to only analyze types that are reachable from interfaces
> +    associated with defined and exported `ELF`_ symbols.
> +
> +    Note that this option is turned on by default when analyzing the
> +    `Linux Kernel`_.  Otherwise, it's turned off by default.
> +
> +  * ``--allow-non-exported-interfaces``
> +
> +    When looking at the debug information accompanying a binary, this
> +    tool analyzes the descriptions of the types reachable by the
> +    interfaces (functions and variables) that are visible outside of
> +    their translation unit.  Once that analysis is done, an ABI corpus
> +    is constructed by only considering the subset of types reachable
> +    from interfaces associated to `ELF`_ symbols that are defined and
> +    exported by the binary.  It's those final ABI Corpora that are
> +    compared by this tool.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, especially when those binaries are
> +    applications, as opposed to shared libraries.  One example of such
> +    applications is the `Linux Kernel`_.  Analyzing massive ABI
> +    Corpora like these can be extremely slow.
> +
> +    In the presence of an "average sized" binary however one can
> +    afford having libabigail analyze all interfaces that are visible
> +    outside of their translation unit, using this option.
> +
> +    Note that this option is turned on by default, unless we are in
> +    the presence of the `Linux Kernel`_.
> +
>   *  ``--redundant``
>
>     In the diff reports, do display redundant changes.  A redundant
> diff --git a/doc/manuals/kmidiff.rst b/doc/manuals/kmidiff.rst
> index ce8168ae..53010189 100644
> --- a/doc/manuals/kmidiff.rst
> +++ b/doc/manuals/kmidiff.rst
> @@ -55,6 +55,10 @@ command line looks like: ::
>                       linux/v4.5/build/modules \
>                       linux/v4.6/build/modules
>
> +
> +.. include:: tools-use-libabigail.txt
> +
> +
> Invocation
> ==========
>
> @@ -67,8 +71,8 @@ Environment
>
> By default, ``kmidiff`` compares all the interfaces (exported
> functions and variables) between the Kernel and its modules.  In
> -practice, though, users want to compare a subset of the those
> -interfaces.
> +practice, though, some users might want to compare a subset of the
> +those interfaces.
>
> Users can then define a "white list" of the interfaces to compare.
> Such a white list is a just a file in the "INI" format that looks
> @@ -91,8 +95,11 @@ function or variable.  Only those interfaces along with
> the types
> reachable from their signatures are going to be compared by
> ``kmidiff`` recursively.
>
> -Note that kmidiff compares the interfaces exported by the ``vmlinux``
> -binary and by the all of the compiled modules.
> +Note that by default kmidiff analyzes the types reachable from the
> +interfaces associated with `ELF`_ symbols that are defined and
> +exported by the `Linux Kernel`_ as being the union of the ``vmlinux``
> +binary and all its compiled modules.  It then compares those
> +interfaces (along with their types).
>
> Options
> =======
> @@ -180,6 +187,38 @@ Options
>     exported interfaces.  This is the default kind of report emitted
>     by tools like ``abidiff`` or ``abipkgdiff``.
>
> +  * ``--exported-interfaces-only``
> +
> +    When using this option, this tool analyzes the descriptions of the
> +    types reachable by the interfaces (functions and variables)
> +    associated with `ELF`_ symbols that are defined and exported by
> +    the `Linux Kernel`_.
> +
> +    Otherwise, the tool also has the ability to analyze the
> +    descriptions of the types reachable by the interfaces associated
> +    with `ELF`_ symbols that are visible outside their translation
> +    unit.  This later possibility is however much more resource
> +    intensive and results in much slower operations.
> +
> +    That is why this option is enabled by default.
> +
> +
> +  * ``--allow-non-exported-interfaces``
> +
> +    When using this option, this tool analyzes the descriptions of the
> +    types reachable by the interfaces (functions and variables) that
> +    are visible outside of their translation unit.  Once that analysis
> +    is done, an ABI Corpus is constructed by only considering the
> +    subset of types reachable from interfaces associated to `ELF`_
> +    symbols that are defined and exported by the binary.  It's that
> +    final ABI corpus which is compared against another one.
> +
> +    The problem with that approach however is that analyzing all the
> +    interfaces that are visible from outside their translation unit
> +    can amount to a lot of data, leading to very slow operations.
> +
> +    Note that this option is turned off by default.
> +
>   * ``--show-bytes``
>
>     Show sizes and offsets in bytes, not bits.  This option is
> @@ -198,3 +237,8 @@ Options
>   * ``--show-dec``
>
>     Show sizes and offsets in decimal base.
> +
> +
> +.. _ELF: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
> +.. _ksymtab: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
> +.. _Linux Kernel: https://kernel.org
> diff --git a/doc/manuals/tools-use-libabigail.txt
> b/doc/manuals/tools-use-libabigail.txt
> new file mode 100644
> index 00000000..43edf296
> --- /dev/null
> +++ b/doc/manuals/tools-use-libabigail.txt
> @@ -0,0 +1,16 @@
> +This tool uses the libabigail library to analyze the binary as well as its
> +associated debug information.  Here is its general mode of operation.
> +
> +When instructed to do so, a binary and its associated debug
> +information is read and analyzed.  To that effect, libabigail analyzes
> +by default the descriptions of the types reachable by the interfaces
> +(functions and variables) that are visible outside of their
> +translation unit.  Once that analysis is done, an Application Binary
> +Interface Corpus is constructed by only considering the subset of
> +types reachable from interfaces associated to `ELF`_ symbols that are
> +defined and exported by the binary.  It's that final ABI corpus which
> +libabigail considers as representing the ABI of the analyzed binary.
> +
> +Libabigail then has capabilities to generate textual representations
> +of ABI Corpora, compare them, analyze their changes and report about
> +them.
> diff --git a/include/abg-ir.h b/include/abg-ir.h
> index a857d041..61338edb 100644
> --- a/include/abg-ir.h
> +++ b/include/abg-ir.h
> @@ -197,6 +197,15 @@ public:
>   const config&
>   get_config() const;
>
> +  bool
> +  user_set_analyze_exported_interfaces_only() const;
> +
> +  void
> +  analyze_exported_interfaces_only(bool f);
> +
> +  bool
> +  analyze_exported_interfaces_only() const;
> +
> #ifdef WITH_DEBUG_SELF_COMPARISON
>   void
>   set_self_comparison_debug_input(const corpus_sptr& corpus);
> diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
> index e41172c1..cba89664 100644
> --- a/src/abg-dwarf-reader.cc
> +++ b/src/abg-dwarf-reader.cc
> @@ -402,6 +402,12 @@ die_is_decl(const Dwarf_Die* die);
> static bool
> die_is_declaration_only(Dwarf_Die* die);
>
> +static bool
> +die_is_variable_decl(const Dwarf_Die *die);
> +
> +static bool
> +die_is_function_decl(const Dwarf_Die *die);
> +
> static bool
> die_has_size_attribute(const Dwarf_Die *die);
>
> @@ -5303,6 +5309,44 @@ public:
>     return symbol;
>   }
>
> +  /// Test if a DIE represents a decl (function or variable) that has
> +  /// a symbol that is exported, whatever that means.  This is
> +  /// supposed to work for Linux Kernel binaries as well.
> +  ///
> +  /// This is useful to limit the amount of DIEs taken into account to
> +  /// the strict limit of what an ABI actually means.  Limiting the
> +  /// volume of DIEs analyzed this way is an important optimization to
> +  /// keep big binaries "manageable" by libabigail.
> +  ///
> +  /// @param DIE the die to consider.
> +  bool
> +  is_decl_die_with_exported_symbol(const Dwarf_Die *die)
> +  {
> +    if (!die || !die_is_decl(die))
> +      return false;
> +
> +    bool result = false, address_found = false, symbol_is_exported =
> false;;
> +    Dwarf_Addr decl_symbol_address = 0;
> +
> +    if (die_is_variable_decl(die))
> +      {
> +       if ((address_found = get_variable_address(die,
> decl_symbol_address)))
> +         symbol_is_exported =
> +           !!variable_symbol_is_exported(decl_symbol_address);
> +      }
> +    else if (die_is_function_decl(die))
> +      {
> +       if ((address_found = get_function_address(die,
> decl_symbol_address)))
> +         symbol_is_exported =
> +           !!function_symbol_is_exported(decl_symbol_address);
> +      }
> +
> +    if (address_found)
> +      result = symbol_is_exported;
> +
> +    return result;
> +  }
> +
>   /// Getter for the symtab reader. Will load the symtab from the elf
> handle if
>   /// not yet set.
>   ///
> @@ -5580,16 +5624,18 @@ public:
>   ///
>   /// @return true if the function address was found.
>   bool
> -  get_function_address(Dwarf_Die* function_die, Dwarf_Addr& address) const
> +  get_function_address(const Dwarf_Die* function_die, Dwarf_Addr&
> address) const
>   {
> -    if (!die_address_attribute(function_die, DW_AT_low_pc, address))
> +    if (!die_address_attribute(const_cast<Dwarf_Die*>(function_die),
> +                              DW_AT_low_pc, address))
>       // So no DW_AT_low_pc was found.  Let's see if the function DIE
>       // has got a DW_AT_ranges attribute instead.  If it does, the
>       // first address of the set of addresses represented by the
>       // value of that DW_AT_ranges represents the function (symbol)
>       // address we are looking for.
> -      if (!get_first_exported_fn_address_from_DW_AT_ranges(function_die,
> -                                                          address))
> +      if (!get_first_exported_fn_address_from_DW_AT_ranges
> +         (const_cast<Dwarf_Die*>(function_die),
> +          address))
>        return false;
>
>     address = maybe_adjust_fn_sym_address(address);
> @@ -5611,11 +5657,12 @@ public:
>   ///
>   /// @return true if the variable address was found.
>   bool
> -  get_variable_address(Dwarf_Die*      variable_die,
> +  get_variable_address(const Dwarf_Die* variable_die,
>                       Dwarf_Addr&      address) const
>   {
>     bool is_tls_address = false;
> -    if (!die_location_address(variable_die, address, is_tls_address))
> +    if (!die_location_address(const_cast<Dwarf_Die*>(variable_die),
> +                             address, is_tls_address))
>       return false;
>     if (!is_tls_address)
>       address = maybe_adjust_var_sym_address(address);
> @@ -7155,6 +7202,40 @@ die_is_declaration_only(Dwarf_Die* die)
>   return false;
> }
>
> +/// Test if a DIE is for a function decl.
> +///
> +/// @param die the DIE to consider.
> +///
> +/// @return true iff @p die represents a function decl.
> +static bool
> +die_is_function_decl(const Dwarf_Die *die)
> +{
> +  if (!die)
> +    return false;
> +
> +  int tag = dwarf_tag(const_cast<Dwarf_Die*>(die));
> +  if (tag == DW_TAG_subprogram)
> +    return true;
> +  return false;
> +}
> +
> +/// Test if a DIE is for a variable decl.
> +///
> +/// @param die the DIE to consider.
> +///
> +/// @return true iff @p die represents a variable decl.
> +static bool
> +die_is_variable_decl(const Dwarf_Die *die)
> +{
> +    if (!die)
> +    return false;
> +
> +  int tag = dwarf_tag(const_cast<Dwarf_Die*>(die));
> +  if (tag == DW_TAG_variable)
> +    return true;
> +  return false;
> +}
> +
> /// Test if a DIE has size attribute.
> ///
> /// @param die the DIE to consider.
> @@ -12690,9 +12771,13 @@
> build_translation_unit_and_add_to_ir(read_context&     ctxt,
>   result->set_is_constructed(false);
>
>   do
> -    build_ir_node_from_die(ctxt, &child,
> -                          die_is_public_decl(&child),
> -                          dwarf_dieoffset(&child));
> +    // Analyze all the DIEs we encounter unless we are asked to only
> +    // analyze exported interfaces and the types reachables from them.
> +    if (!ctxt.env()->analyze_exported_interfaces_only()
> +       || ctxt.is_decl_die_with_exported_symbol(&child))
> +      build_ir_node_from_die(ctxt, &child,
> +                            die_is_public_decl(&child),
> +                            dwarf_dieoffset(&child));
>   while (dwarf_siblingof(&child, &child) == 0);
>
>   if (!ctxt.var_decls_to_re_add_to_tree().empty())
> @@ -15699,6 +15784,16 @@ read_debug_info_into_corpus(read_context& ctxt)
>     origin |= corpus::LINUX_KERNEL_BINARY_ORIGIN;
>   ctxt.current_corpus()->set_origin(origin);
>
> +  if (origin & corpus::LINUX_KERNEL_BINARY_ORIGIN
> +      && !ctxt.env()->user_set_analyze_exported_interfaces_only())
> +    // So we are looking at the Linux Kernel and the user has not set
> +    // any particular option regarding the amount of types to analyse.
> +    // In that case, we need to only analyze types that are reachable
> +    // from exported interfaces otherwise we get such a massive amount
> +    // of type DIEs to look at that things are just too slow down the
> +    // road.
> +    ctxt.env()->analyze_exported_interfaces_only(true);
> +
>   ctxt.current_corpus()->set_soname(ctxt.dt_soname());
>   ctxt.current_corpus()->set_needed(ctxt.dt_needed());
>   ctxt.current_corpus()->set_architecture_name(ctxt.elf_architecture());
> diff --git a/src/abg-ir-priv.h b/src/abg-ir-priv.h
> index 45b711b7..21734b25 100644
> --- a/src/abg-ir-priv.h
> +++ b/src/abg-ir-priv.h
> @@ -26,6 +26,7 @@ namespace ir
> {
>
> using std::string;
> +using abg_compat::optional;
>
> /// The result of structural comparison of type ABI artifacts.
> enum comparison_result
> @@ -443,6 +444,7 @@ struct environment::priv
>   bool                                 decl_only_class_equals_definition_;
>   bool                                 use_enum_binary_only_equality_;
>   bool
>                                 allow_type_comparison_results_caching_;
> +  optional<bool>                       analyze_exported_interfaces_only_;
> #ifdef WITH_DEBUG_SELF_COMPARISON
>   bool                                 self_comparison_debug_on_;
> #endif
> diff --git a/src/abg-ir.cc b/src/abg-ir.cc
> index 91c8e99b..02d68e63 100644
> --- a/src/abg-ir.cc
> +++ b/src/abg-ir.cc
> @@ -3674,6 +3674,42 @@ const config&
> environment::get_config() const
> {return priv_->config_;}
>
> +/// Getter for a property that says if the user actually did set the
> +/// analyze_exported_interfaces_only() property.  If not, it means
> +/// the default behaviour prevails.
> +///
> +/// @return tru iff the user did set the
> +/// analyze_exported_interfaces_only() property.
> +bool
> +environment::user_set_analyze_exported_interfaces_only() const
> +{return priv_->analyze_exported_interfaces_only_.has_value();}
> +
> +/// Setter for the property that controls if we are to restrict the
> +/// analysis to the types that are only reachable from the exported
> +/// interfaces only, or if the set of types should be more broad than
> +/// that.  Typically, we'd restrict the analysis to types reachable
> +/// from exported interfaces only (stricto sensu, that would really be
> +/// only the types that are part of the ABI of well designed
> +/// libraries) for performance reasons.
> +///
> +/// @param f the value of the flag.
> +void
> +environment::analyze_exported_interfaces_only(bool f)
> +{priv_->analyze_exported_interfaces_only_ = f;}
> +
> +/// Getter for the property that controls if we are to restrict the
> +/// analysis to the types that are only reachable from the exported
> +/// interfaces only, or if the set of types should be more broad than
> +/// that.  Typically, we'd restrict the analysis to types reachable
> +/// from exported interfaces only (stricto sensu, that would really be
> +/// only the types that are part of the ABI of well designed
> +/// libraries) for performance reasons.
> +///
> +/// @param f the value of the flag.
> +bool
> +environment::analyze_exported_interfaces_only() const
> +{return priv_->analyze_exported_interfaces_only_.value_or(false);}
> +
> #ifdef WITH_DEBUG_SELF_COMPARISON
> /// Setter of the corpus of the input corpus of the self comparison
> /// that takes place when doing "abidw --debug-abidiff <binary>".
> diff --git a/tools/abidiff.cc b/tools/abidiff.cc
> index 97b036cb..e0bb35ac 100644
> --- a/tools/abidiff.cc
> +++ b/tools/abidiff.cc
> @@ -29,6 +29,7 @@ using std::ostream;
> using std::cout;
> using std::cerr;
> using std::shared_ptr;
> +using abg_compat::optional;
> using abigail::ir::environment;
> using abigail::ir::environment_sptr;
> using abigail::translation_unit;
> @@ -74,6 +75,7 @@ struct options
>   vector<string>       headers_dirs2;
>   vector<string>        header_files2;
>   bool                 drop_private_types;
> +  optional<bool>       exported_interfaces_only;
>   bool                 linux_kernel_mode;
>   bool                 no_default_supprs;
>   bool                 no_arch;
> @@ -197,6 +199,9 @@ display_usage(const string& prog_name, ostream& out)
>     << " --header-file2|--hf2 <path>  the path to one header of file2\n"
>     << " --drop-private-types  drop private types from "
>     "internal representation\n"
> +    << "  --exported-interfaces-only  analyze exported interfaces only\n"
> +    << "  --allow-non-exported-interfaces  analyze interfaces that "
> +    "might not be exported\n"
>     << " --no-linux-kernel-mode  don't consider the input binaries as "
>        "linux kernel binaries\n"
>     << " --kmi-whitelist|-w  path to a "
> @@ -403,6 +408,10 @@ parse_command_line(int argc, char* argv[], options&
> opts)
>        }
>       else if (!strcmp(argv[i], "--drop-private-types"))
>        opts.drop_private_types = true;
> +      else if (!strcmp(argv[i], "--exported-interfaces-only"))
> +       opts.exported_interfaces_only = true;
> +      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
> +       opts.exported_interfaces_only = false;
>       else if (!strcmp(argv[i], "--no-default-suppression"))
>        opts.no_default_supprs = true;
>       else if (!strcmp(argv[i], "--no-architecture"))
> @@ -1130,6 +1139,9 @@ main(int argc, char* argv[])
>       t2_type = guess_file_type(opts.file2);
>
>       environment_sptr env(new environment);
> +      if (opts.exported_interfaces_only.has_value())
> +
>       env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
> +
> #ifdef WITH_DEBUG_SELF_COMPARISON
>            if (opts.do_debug)
>              env->self_comparison_debug_is_on(true);
> diff --git a/tools/abidw.cc b/tools/abidw.cc
> index 9a27a029..f38d6048 100644
> --- a/tools/abidw.cc
> +++ b/tools/abidw.cc
> @@ -40,6 +40,7 @@ using std::ostream;
> using std::ofstream;
> using std::vector;
> using std::shared_ptr;
> +using abg_compat::optional;
> using abigail::tools_utils::emit_prefix;
> using abigail::tools_utils::temp_file;
> using abigail::tools_utils::temp_file_sptr;
> @@ -114,6 +115,7 @@ struct options
>   bool                 do_log;
>   bool                 drop_private_types;
>   bool                 drop_undefined_syms;
> +  optional<bool>       exported_interfaces_only;
>   type_id_style_kind   type_id_style;
> #ifdef WITH_DEBUG_SELF_COMPARISON
>   string               type_id_file_path;
> @@ -187,6 +189,9 @@ display_usage(const string& prog_name, ostream& out)
>     << "  --short-locs  only print filenames rather than paths\n"
>     << "  --drop-private-types  drop private types from representation\n"
>     << "  --drop-undefined-syms  drop undefined symbols from
> representation\n"
> +    << "  --exported-interfaces-only  analyze exported interfaces only\n"
> +    << "  --allow-non-exported-interfaces  analyze interfaces that "
> +    "might not be exported\n"
>     << "  --no-comp-dir-path  do not show compilation path information\n"
>     << "  --no-elf-needed  do not show the DT_NEEDED information\n"
>     << "  --no-write-default-sizes  do not emit pointer size when it
> equals"
> @@ -368,6 +373,10 @@ parse_command_line(int argc, char* argv[], options&
> opts)
>        opts.drop_private_types = true;
>       else if (!strcmp(argv[i], "--drop-undefined-syms"))
>        opts.drop_undefined_syms = true;
> +      else if (!strcmp(argv[i], "--exported-interfaces-only"))
> +       opts.exported_interfaces_only = true;
> +      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
> +       opts.exported_interfaces_only = false;
>       else if (!strcmp(argv[i], "--no-linux-kernel-mode"))
>        opts.linux_kernel_mode = false;
>       else if (!strcmp(argv[i], "--abidiff"))
> @@ -606,6 +615,9 @@ load_corpus_and_write_abixml(char* argv[],
>             }
>         }
>
> +      if (opts.exported_interfaces_only.has_value())
> +
>       env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
> +
>       t.start();
>       corp = dwarf_reader::read_corpus_from_elf(ctxt, s);
>       t.stop();
> @@ -813,6 +825,9 @@ load_kernel_corpus_group_and_write_abixml(char* argv[],
>   timer t, global_timer;
>   suppressions_type supprs;
>
> +  if (opts.exported_interfaces_only.has_value())
> +    env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
> +
>   if (opts.do_log)
>     emit_prefix(argv[0], cerr)
>       << "going to build ABI representation of the Linux Kernel ...\n";
> diff --git a/tools/abipkgdiff.cc b/tools/abipkgdiff.cc
> index 551080b9..656d5882 100644
> --- a/tools/abipkgdiff.cc
> +++ b/tools/abipkgdiff.cc
> @@ -106,6 +106,7 @@ using std::set;
> using std::ostringstream;
> using std::shared_ptr;
> using std::dynamic_pointer_cast;
> +using abg_compat::optional;
> using abigail::workers::task;
> using abigail::workers::task_sptr;
> using abigail::workers::queue;
> @@ -205,6 +206,7 @@ public:
>   bool         fail_if_no_debug_info;
>   bool         show_identical_binaries;
>   bool         self_check;
> +  optional<bool> exported_interfaces_only;
> #ifdef WITH_CTF
>   bool         use_ctf;
> #endif
> @@ -868,6 +870,9 @@ display_usage(const string& prog_name, ostream& out)
>     "full impact analysis report rather than the default leaf changes
> reports\n"
>     << " --non-reachable-types|-t  consider types non reachable"
>     " from public interfaces\n"
> +    << "  --exported-interfaces-only  analyze exported interfaces only\n"
> +    << "  --allow-non-exported-interfaces  analyze interfaces that "
> +    "might not be exported\n"
>     << " --no-linkage-name             do not display linkage names of "
>     "added/removed/changed\n"
>     << " --redundant                    display redundant changes\n"
> @@ -2076,6 +2081,10 @@ public:
>     abigail::elf_reader::status detailed_status =
>       abigail::elf_reader::STATUS_UNKNOWN;
>
> +    if (args->opts.exported_interfaces_only.has_value())
> +      env->analyze_exported_interfaces_only
> +       (*args->opts.exported_interfaces_only);
> +
>     status |= compare(args->elf1, args->debug_dir1,
> args->private_types_suppr1,
>                      args->elf2, args->debug_dir2,
> args->private_types_suppr2,
>                      args->opts, env, diff, ctxt, &detailed_status);
> @@ -2142,6 +2151,10 @@ public:
>     diff_context_sptr ctxt;
>     corpus_diff_sptr diff;
>
> +    if (args->opts.exported_interfaces_only.has_value())
> +      env->analyze_exported_interfaces_only
> +       (*args->opts.exported_interfaces_only);
> +
>     abigail::elf_reader::status detailed_status =
>       abigail::elf_reader::STATUS_UNKNOWN;
>
> @@ -3024,6 +3037,10 @@ compare_prepared_linux_kernel_packages(package&
> first_package,
>   string dist_root2 = second_package.extracted_dir_path();
>
>   abigail::ir::environment_sptr env(new abigail::ir::environment);
> +  if (opts.exported_interfaces_only.has_value())
> +    env->analyze_exported_interfaces_only
> +      (*opts.exported_interfaces_only);
> +
>   suppressions_type supprs;
>   corpus_group_sptr corpus1, corpus2;
>   corpus1 = build_corpus_group_from_kernel_dist_under(dist_root1,
> @@ -3326,6 +3343,10 @@ parse_command_line(int argc, char* argv[], options&
> opts)
>       else if (!strcmp(argv[i], "--full-impact")
>               ||!strcmp(argv[i], "-f"))
>        opts.show_full_impact_report = true;
> +      else if (!strcmp(argv[i], "--exported-interfaces-only"))
> +       opts.exported_interfaces_only = true;
> +      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
> +       opts.exported_interfaces_only = false;
>       else if (!strcmp(argv[i], "--no-linkage-name"))
>        opts.show_linkage_names = false;
>       else if (!strcmp(argv[i], "--redundant"))
> diff --git a/tools/kmidiff.cc b/tools/kmidiff.cc
> index 8fd3fed9..f3332765 100644
> --- a/tools/kmidiff.cc
> +++ b/tools/kmidiff.cc
> @@ -29,6 +29,7 @@ using std::vector;
> using std::ostream;
> using std::cout;
> using std::cerr;
> +using abg_compat::optional;
>
> using namespace abigail::tools_utils;
> using namespace abigail::dwarf_reader;
> @@ -60,6 +61,7 @@ struct options
>   bool                 show_hexadecimal_values;
>   bool                 show_offsets_sizes_in_bits;
>   bool                 show_impacted_interfaces;
> +  optional<bool>       exported_interfaces_only;
> #ifdef WITH_CTF
>   bool                 use_ctf;
> #endif
> @@ -120,6 +122,9 @@ display_usage(const string& prog_name, ostream& out)
>     << " --impacted-interfaces|-i  show interfaces impacted by ABI
> changes\n"
>     << " --full-impact|-f  show the full impact of changes on top-most "
>         "interfaces\n"
> +    << "  --exported-interfaces-only  analyze exported interfaces only\n"
> +    << "  --allow-non-exported-interfaces  analyze interfaces that "
> +    "might not be exported\n"
>     << " --show-bytes  show size and offsets in bytes\n"
>     << " --show-bits  show size and offsets in bits\n"
>     << " --show-hex  show size and offset in hexadecimal\n"
> @@ -262,6 +267,10 @@ parse_command_line(int argc, char* argv[], options&
> opts)
>       else if (!strcmp(argv[i], "--full-impact")
>               || !strcmp(argv[i], "-f"))
>        opts.leaf_changes_only = false;
> +      else if (!strcmp(argv[i], "--exported-interfaces-only"))
> +       opts.exported_interfaces_only = true;
> +      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
> +       opts.exported_interfaces_only = false;
>       else if (!strcmp(argv[i], "--show-bytes"))
>        opts.show_offsets_sizes_in_bits = false;
>       else if (!strcmp(argv[i], "--show-bits"))
> @@ -408,6 +417,9 @@ main(int argc, char* argv[])
>
>   environment_sptr env(new environment);
>
> +  if (opts.exported_interfaces_only.has_value())
> +    env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
> +
>   corpus_group_sptr group1, group2;
>   string debug_info_root_dir;
>   corpus::origin origin =
> --
> 2.37.2
>
>
>
> --
>                Dodji
>
>
>
>
  
Dodji Seketeli Sept. 19, 2022, 9:34 a.m. UTC | #4
Hello,

Giuliano Procida <gprocida@google.com> a écrit:

> Hi Dodji.
>
> Sorry for the late reply. I was down with Covid for a while.

No problem.  I hope you are doing fine now.

[...]


> My understanding is that the intention here is to make the DWARF
> reader do less work (look at fewer type DIEs) than at present.

I'd rather say, to make it do less unnecessary work by default.

It first look at *interface* DIEs.  And then it walks the graph of node
to reach the type nodes that are connected.  If it first encounters a
type DIE, it ignores it.  I will reach that DIE only if it's connected
to an interface.

> We are actually hoping that we may be able to make the DWARF reader
> look at more type DIEs so that it is more likely to pick up full
> definitions of types instead of declarations.

If the definitions are not "used" by the interfaces, then by default,
why bother, as far as ABI is concerned (not necessarily APIs)?  But in
any case, if you really want to look at type definitions even those
that are not strictly connected to definitions, you still can.  Just
don't use the new option, I'd say.

Said otherwise I don't think this change will inherently look at less actually used
type definitions.  If it does, then it's a bug and it ought to be fixed.

>
> The rationale behind the change appears to be that DWARF processing is
> expensive, in particular for kernel ABIs. I would say "measure first".

I've profiledof the code and what I think I am seeing is that we are
just looking at too many DIEs that incurs a lot of comparison at
de-duplication (canonicalization) time.

But I believe the heuristics I am using to speed up comparison can be
improved.  They are just taking me time.  So I figured being able to
avoid a significant number of comparisons to begin with was a somewhat
lower hanging fruits, at least for me to be able to release 2.1.

I'll keep on looking at ways to ameliorate the DIE canonicalization even
more in the future, after 2.1.

> Here's roughly how I think about things:
>
> 1. building the IR is very cheap

Well, that's not what I see, when we have lots of DIEs that are built
'unnecessarily', and that would have to be compared to be de-duplicated.
The reason why I am de-duplicating some types (not all of DIE kinds) at
the DIE level is because I have seen over time that it significantly
drops the size and time of de-duplication at the IR level.  To see that
for yourself, it's quite easy, I believe, to disable DIE de-duplication
and run, says, "abidw --noout" on a sizeable binary.

> A kernel ABI may end up with 40k IR elements. The cost of allocating
> memory and calling constructors should be negligible. Any improvements
> to this end of things is pointless.

Again, you can just disable DIE canonicalization in the code and run
abidw --noout on vmlinux to see for yourself how things degrade in
practise.

> 2. reading DWARF information is fairly cheap
>
> We may have 100MB of DWARF but just reading the data (decoding
> attribute formats in particular) won't take that long.
>
> Reducing the number of DIEs examined at the top-level by a factor of 2
> will speed up this part by a factor of 2, but in the grand scheme of
> things that may not be very important.

Indeed.  But I guess the gain might depends on the kind of nodes that
got dropped.  If the nodes are for types that participate in quadratic
comparisons, then the gain might be higher.

But I agree that all in all, what I am seeing is indeed a linear gain in
general.

It might not be very important in the grand scheme, but in practise,
I'll take it :-)

I understand though, that I still need to keep working on this to find
ways to come up with better de-duplication schemes.  That would be
definitely for after 2.1 that I really want to see out now.  It's
overdue.

> 3. chasing references is a bit more expensive
>
> Cross-references in DWARF are pretty common and the lack of locality
> means that chasing cross-references is going to be a constant factor
> slower than iterating through the main DWARF tree.
>
> 4. deciding whether a DIE needs to be turned into IR is currently very expensive
>
> This is because it involves multiple look-ups and recursive comparison
> of DIEs which cannot be unconditionally memoised.

A.k.a de-duplication / canonicalization.

> Those are only my thoughts. Some profiling should give a more accurate picture.
>
> I was curious, so I did an analysis of the connectivity of a kernel
> ABI (using the STG IR, not libabigail's - there are minor
> differences). Here are some fun facts.
>
> The ABI has 34541 nodes.
> There are 25196 strongly-connected components.
> 25053 SCCs are just singleton nodes.
> The largest 3 SCCs have sizes: 4960, 784, 343.
> 1/7 of the ABI nodes are in one SCC!
>
> Completely idle speculation: Perhaps the really huge SCC contributes
> significantly to comparison cost.

I don't think we are saying different things.

Thanks for the comments.

[...]

Cheers,
  

Patch

diff --git a/doc/manuals/Makefile.am b/doc/manuals/Makefile.am
index 894b38f1..e2813785 100644
--- a/doc/manuals/Makefile.am
+++ b/doc/manuals/Makefile.am
@@ -14,7 +14,8 @@  libabigail-concepts.rst \
 libabigail-overview.rst \
 libabigail-tools.rst \
 fedabipkgdiff.rst \
-kmidiff.rst
+kmidiff.rst \
+tools-use-libabigail.txt
 
 # You can set these variables from the command line.
 SPHINXOPTS    =
diff --git a/doc/manuals/abidiff.rst b/doc/manuals/abidiff.rst
index a15515be..0c711d9e 100644
--- a/doc/manuals/abidiff.rst
+++ b/doc/manuals/abidiff.rst
@@ -18,6 +18,8 @@  be accompanied with their debug information in `DWARF`_ format.
 Otherwise, only `ELF`_ symbols that were added or removed are
 reported.
 
+.. include:: tools-use-libabigail.txt
+
 .. _abidiff_invocation_label:
 
 Invocation
@@ -197,6 +199,56 @@  Options
     consumption of the tool on binaries with a lot of publicly defined
     and exported types.
 
+  * ``--exported-interfaces-only``
+
+    By default, when looking at the debug information accompanying a
+    binary, this tool analyzes the descriptions of the types reachable
+    by the interfaces (functions and variables) that are visible
+    outside of their translation unit.  Once that analysis is done, an
+    ABI corpus is constructed by only considering the subset of types
+    reachable from interfaces associated to `ELF`_ symbols that are
+    defined and exported by the binary.  It's those final ABI Corpora
+    that are compared by this tool.
+
+    The problem with that approach however is that analyzing all the
+    interfaces that are visible from outside their translation unit
+    can amount to a lot of data, especially when those binaries are
+    applications, as opposed to shared libraries.  One example of such
+    applications is the `Linux Kernel`_.  Analyzing massive ABI
+    corpora like these can be extremely slow.
+
+    To mitigate that performance issue, this option allows libabigail
+    to only analyze types that are reachable from interfaces
+    associated with defined and exported `ELF`_ symbols.
+
+    Note that this option is turned on by default when analyzing the
+    `Linux Kernel`_.  Otherwise, it's turned off by default.
+
+  * ``--allow-non-exported-interfaces``
+
+    When looking at the debug information accompanying a binary, this
+    tool analyzes the descriptions of the types reachable by the
+    interfaces (functions and variables) that are visible outside of
+    their translation unit.  Once that analysis is done, an ABI corpus
+    is constructed by only considering the subset of types reachable
+    from interfaces associated to `ELF`_ symbols that are defined and
+    exported by the binary.  It's those final ABI Corpora that are
+    compared by this tool.
+
+    The problem with that approach however is that analyzing all the
+    interfaces that are visible from outside their translation unit
+    can amount to a lot of data, especially when those binaries are
+    applications, as opposed to shared libraries.  One example of such
+    applications is the `Linux Kernel`_.  Analyzing massive ABI
+    Corpora like these can be extremely slow.
+
+    In the presence of an "average sized" binary however one can
+    afford having libabigail analyze all interfaces that are visible
+    outside of their translation unit, using this option.
+
+    Note that this option is turned on by default, unless we are in
+    the presence of the `Linux Kernel`_.
+
   * ``--stat``
 
     Rather than displaying the detailed ABI differences between
diff --git a/doc/manuals/abidw.rst b/doc/manuals/abidw.rst
index bdd6204d..a3055c7e 100644
--- a/doc/manuals/abidw.rst
+++ b/doc/manuals/abidw.rst
@@ -12,14 +12,19 @@  defined ELF symbols of the file.  The input shared library must
 contain associated debug information in `DWARF`_ format.
 
 When given the ``--linux-tree`` option, this program can also handle a
-Linux kernel tree.  That is, a directory tree that contains both the
-vmlinux binary and Linux kernel modules.  It analyses those Linux
-kernel binaries and emits an XML representation of the interface
-between the kernel and its module, to standard output.  In this case,
-we don't call it an ABI, but a KMI (Kernel Module Interface).  The
-emitted KMI includes all the globally defined functions and variables,
-along with a complete representation of their types.  The input
-binaries must contain associated debug information in `DWARF`_ format.
+`Linux kernel`_ tree.  That is, a directory tree that contains both
+the vmlinux binary and `Linux Kernel`_ modules.  It analyses those
+`Linux Kernel`_ binaries and emits an XML representation of the
+interface between the kernel and its module, to standard output.  In
+this case, we don't call it an ABI, but a KMI (Kernel Module
+Interface).  The emitted KMI includes all the globally defined
+functions and variables, along with a complete representation of their
+types.  The input binaries must contain associated debug information
+in `DWARF`_ format.
+
+.. include:: tools-use-libabigail.txt
+
+.. _abidiff_invocation_label:
 
 Invocation
 ==========
@@ -92,7 +97,7 @@  Options
 
   * ``--kmi-whitelist | -kaw`` <*path-to-whitelist*>
 
-    When analyzing a Linux kernel binary, this option points to the
+    When analyzing a `Linux Kernel`_ binary, this option points to the
     white list of names of ELF symbols of functions and variables
     which ABI must be written out.  That white list is called a "
     Kernel Module Interface white list".  This is because for the
@@ -105,7 +110,7 @@  Options
 
     If this option is not provided -- thus if no white list is
     provided -- then the entire KMI, that is, all publicly defined and
-    exported functions and global variables by the Linux Kernel
+    exported functions and global variables by the `Linux Kernel`_
     binaries is emitted.
     
   * ``--linux-tree | --lt``
@@ -115,9 +120,10 @@  Options
     In that case, this program emits the representation of the Kernel
     Module Interface (KMI) on the standard output.
 
-    Below is an example of usage of ``abidw`` on a Linux Kernel tree.
+    Below is an example of usage of ``abidw`` on a `Linux Kernel`_
+    tree.
 
-    First, checkout a Linux kernel source tree and build it.  Then
+    First, checkout a `Linux Kernel`_ source tree and build it.  Then
     install the kernel modules in a directory somewhere.  Copy the
     vmlinux binary into that directory too.  And then serialize the
     KMI of that kernel to disk, using ``abidw``: ::
@@ -171,6 +177,56 @@  Options
     representation build by Libabigail to represent the ABI and will
     not end up in the abi XML file.
 
+  * ``--exported-interfaces-only``
+
+    By default, when looking at the debug information accompanying a
+    binary, this tool analyzes the descriptions of the types reachable
+    by the interfaces (functions and variables) that are visible
+    outside of their translation unit.  Once that analysis is done, an
+    ABI corpus is constructed by only considering the subset of types
+    reachable from interfaces associated to `ELF`_ symbols that are
+    defined and exported by the binary.  It's that final ABI corpus
+    which textual representation is saved as ``ABIXML``.
+
+    The problem with that approach however is that analyzing all the
+    interfaces that are visible from outside their translation unit
+    can amount to a lot of data, especially when those binaries are
+    applications, as opposed to shared libraries.  One example of such
+    applications is the `Linux Kernel`_.  Analyzing massive ABI
+    corpora like these can be extremely slow.
+
+    To mitigate that performance issue, this option allows libabigail
+    to only analyze types that are reachable from interfaces
+    associated with defined and exported `ELF`_ symbols.
+
+    Note that this option is turned on by default when analyzing the
+    `Linux Kernel`_.  Otherwise, it's turned off by default.
+
+  * ``--allow-non-exported-interfaces``
+
+    When looking at the debug information accompanying a binary, this
+    tool analyzes the descriptions of the types reachable by the
+    interfaces (functions and variables) that are visible outside of
+    their translation unit.  Once that analysis is done, an ABI corpus
+    is constructed by only considering the subset of types reachable
+    from interfaces associated to `ELF`_ symbols that are defined and
+    exported by the binary.  It's that final ABI corpus which textual
+    representation is saved as ``ABIXML``.
+
+    The problem with that approach however is that analyzing all the
+    interfaces that are visible from outside their translation unit
+    can amount to a lot of data, especially when those binaries are
+    applications, as opposed to shared libraries.  One example of such
+    applications is the `Linux Kernel`_.  Analyzing massive ABI
+    corpora like these can be extremely slow.
+
+    In the presence of an "average sized" binary however one can
+    afford having libabigail analyze all interfaces that are visible
+    outside of their translation unit, using this option.
+
+    Note that this option is turned on by default, unless we are in
+    the presence of the `Linux Kernel`_.
+
   * ``--no-linux-kernel-mode``
 
     Without this option, if abipkgiff detects that the binaries it is
@@ -308,4 +364,4 @@  standard `here
 .. _ELF: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
 .. _DWARF: http://www.dwarfstd.org
 .. _GNU: http://www.gnu.org
-
+.. _Linux Kernel: https://kernel.org/
diff --git a/doc/manuals/abipkgdiff.rst b/doc/manuals/abipkgdiff.rst
index 15ea9072..9114775a 100644
--- a/doc/manuals/abipkgdiff.rst
+++ b/doc/manuals/abipkgdiff.rst
@@ -19,6 +19,7 @@  information directly in a section of said binaries.  In those cases,
 obviously, no separate debug information package is needed as the tool
 will find the debug information inside the binaries.
 
+.. include:: tools-use-libabigail.txt
 
 .. _abipkgdiff_invocation_label:
 
@@ -277,6 +278,56 @@  Options
     global functions and variables are analyzed, so the tool detects
     and reports changes on these reachable types only.
 
+  * ``--exported-interfaces-only``
+
+    By default, when looking at the debug information accompanying a
+    binary, this tool analyzes the descriptions of the types reachable
+    by the interfaces (functions and variables) that are visible
+    outside of their translation unit.  Once that analysis is done, an
+    ABI corpus is constructed by only considering the subset of types
+    reachable from interfaces associated to `ELF`_ symbols that are
+    defined and exported by the binary.  It's those final ABI Corpora
+    that are compared by this tool.
+
+    The problem with that approach however is that analyzing all the
+    interfaces that are visible from outside their translation unit
+    can amount to a lot of data, especially when those binaries are
+    applications, as opposed to shared libraries.  One example of such
+    applications is the `Linux Kernel`_.  Analyzing massive ABI
+    corpora like these can be extremely slow.
+
+    To mitigate that performance issue, this option allows libabigail
+    to only analyze types that are reachable from interfaces
+    associated with defined and exported `ELF`_ symbols.
+
+    Note that this option is turned on by default when analyzing the
+    `Linux Kernel`_.  Otherwise, it's turned off by default.
+
+  * ``--allow-non-exported-interfaces``
+
+    When looking at the debug information accompanying a binary, this
+    tool analyzes the descriptions of the types reachable by the
+    interfaces (functions and variables) that are visible outside of
+    their translation unit.  Once that analysis is done, an ABI corpus
+    is constructed by only considering the subset of types reachable
+    from interfaces associated to `ELF`_ symbols that are defined and
+    exported by the binary.  It's those final ABI Corpora that are
+    compared by this tool.
+
+    The problem with that approach however is that analyzing all the
+    interfaces that are visible from outside their translation unit
+    can amount to a lot of data, especially when those binaries are
+    applications, as opposed to shared libraries.  One example of such
+    applications is the `Linux Kernel`_.  Analyzing massive ABI
+    Corpora like these can be extremely slow.
+
+    In the presence of an "average sized" binary however one can
+    afford having libabigail analyze all interfaces that are visible
+    outside of their translation unit, using this option.
+
+    Note that this option is turned on by default, unless we are in
+    the presence of the `Linux Kernel`_.
+
   *  ``--redundant``
 
     In the diff reports, do display redundant changes.  A redundant
diff --git a/doc/manuals/kmidiff.rst b/doc/manuals/kmidiff.rst
index ce8168ae..53010189 100644
--- a/doc/manuals/kmidiff.rst
+++ b/doc/manuals/kmidiff.rst
@@ -55,6 +55,10 @@  command line looks like: ::
 		       linux/v4.5/build/modules \
 		       linux/v4.6/build/modules
 
+
+.. include:: tools-use-libabigail.txt
+
+
 Invocation
 ==========
 
@@ -67,8 +71,8 @@  Environment
 
 By default, ``kmidiff`` compares all the interfaces (exported
 functions and variables) between the Kernel and its modules.  In
-practice, though, users want to compare a subset of the those
-interfaces.
+practice, though, some users might want to compare a subset of the
+those interfaces.
 
 Users can then define a "white list" of the interfaces to compare.
 Such a white list is a just a file in the "INI" format that looks
@@ -91,8 +95,11 @@  function or variable.  Only those interfaces along with the types
 reachable from their signatures are going to be compared by
 ``kmidiff`` recursively.
 
-Note that kmidiff compares the interfaces exported by the ``vmlinux``
-binary and by the all of the compiled modules.
+Note that by default kmidiff analyzes the types reachable from the
+interfaces associated with `ELF`_ symbols that are defined and
+exported by the `Linux Kernel`_ as being the union of the ``vmlinux``
+binary and all its compiled modules.  It then compares those
+interfaces (along with their types).
 
 Options
 =======
@@ -180,6 +187,38 @@  Options
     exported interfaces.  This is the default kind of report emitted
     by tools like ``abidiff`` or ``abipkgdiff``.
 
+  * ``--exported-interfaces-only``
+
+    When using this option, this tool analyzes the descriptions of the
+    types reachable by the interfaces (functions and variables)
+    associated with `ELF`_ symbols that are defined and exported by
+    the `Linux Kernel`_.
+
+    Otherwise, the tool also has the ability to analyze the
+    descriptions of the types reachable by the interfaces associated
+    with `ELF`_ symbols that are visible outside their translation
+    unit.  This later possibility is however much more resource
+    intensive and results in much slower operations.
+
+    That is why this option is enabled by default.
+
+
+  * ``--allow-non-exported-interfaces``
+
+    When using this option, this tool analyzes the descriptions of the
+    types reachable by the interfaces (functions and variables) that
+    are visible outside of their translation unit.  Once that analysis
+    is done, an ABI Corpus is constructed by only considering the
+    subset of types reachable from interfaces associated to `ELF`_
+    symbols that are defined and exported by the binary.  It's that
+    final ABI corpus which is compared against another one.
+
+    The problem with that approach however is that analyzing all the
+    interfaces that are visible from outside their translation unit
+    can amount to a lot of data, leading to very slow operations.
+
+    Note that this option is turned off by default.
+
   * ``--show-bytes``
 
     Show sizes and offsets in bytes, not bits.  This option is
@@ -198,3 +237,8 @@  Options
   * ``--show-dec``
 
     Show sizes and offsets in decimal base.
+
+
+.. _ELF: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
+.. _ksymtab: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
+.. _Linux Kernel: https://kernel.org
diff --git a/doc/manuals/tools-use-libabigail.txt b/doc/manuals/tools-use-libabigail.txt
new file mode 100644
index 00000000..43edf296
--- /dev/null
+++ b/doc/manuals/tools-use-libabigail.txt
@@ -0,0 +1,16 @@ 
+This tool uses the libabigail library to analyze the binary as well as its
+associated debug information.  Here is its general mode of operation.
+
+When instructed to do so, a binary and its associated debug
+information is read and analyzed.  To that effect, libabigail analyzes
+by default the descriptions of the types reachable by the interfaces
+(functions and variables) that are visible outside of their
+translation unit.  Once that analysis is done, an Application Binary
+Interface Corpus is constructed by only considering the subset of
+types reachable from interfaces associated to `ELF`_ symbols that are
+defined and exported by the binary.  It's that final ABI corpus which
+libabigail considers as representing the ABI of the analyzed binary.
+
+Libabigail then has capabilities to generate textual representations
+of ABI Corpora, compare them, analyze their changes and report about
+them.
diff --git a/include/abg-ir.h b/include/abg-ir.h
index a857d041..61338edb 100644
--- a/include/abg-ir.h
+++ b/include/abg-ir.h
@@ -197,6 +197,15 @@  public:
   const config&
   get_config() const;
 
+  bool
+  user_set_analyze_exported_interfaces_only() const;
+
+  void
+  analyze_exported_interfaces_only(bool f);
+
+  bool
+  analyze_exported_interfaces_only() const;
+
 #ifdef WITH_DEBUG_SELF_COMPARISON
   void
   set_self_comparison_debug_input(const corpus_sptr& corpus);
diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index e41172c1..cba89664 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -402,6 +402,12 @@  die_is_decl(const Dwarf_Die* die);
 static bool
 die_is_declaration_only(Dwarf_Die* die);
 
+static bool
+die_is_variable_decl(const Dwarf_Die *die);
+
+static bool
+die_is_function_decl(const Dwarf_Die *die);
+
 static bool
 die_has_size_attribute(const Dwarf_Die *die);
 
@@ -5303,6 +5309,44 @@  public:
     return symbol;
   }
 
+  /// Test if a DIE represents a decl (function or variable) that has
+  /// a symbol that is exported, whatever that means.  This is
+  /// supposed to work for Linux Kernel binaries as well.
+  ///
+  /// This is useful to limit the amount of DIEs taken into account to
+  /// the strict limit of what an ABI actually means.  Limiting the
+  /// volume of DIEs analyzed this way is an important optimization to
+  /// keep big binaries "manageable" by libabigail.
+  ///
+  /// @param DIE the die to consider.
+  bool
+  is_decl_die_with_exported_symbol(const Dwarf_Die *die)
+  {
+    if (!die || !die_is_decl(die))
+      return false;
+
+    bool result = false, address_found = false, symbol_is_exported = false;;
+    Dwarf_Addr decl_symbol_address = 0;
+
+    if (die_is_variable_decl(die))
+      {
+	if ((address_found = get_variable_address(die, decl_symbol_address)))
+	  symbol_is_exported =
+	    !!variable_symbol_is_exported(decl_symbol_address);
+      }
+    else if (die_is_function_decl(die))
+      {
+	if ((address_found = get_function_address(die, decl_symbol_address)))
+	  symbol_is_exported =
+	    !!function_symbol_is_exported(decl_symbol_address);
+      }
+
+    if (address_found)
+      result = symbol_is_exported;
+
+    return result;
+  }
+
   /// Getter for the symtab reader. Will load the symtab from the elf handle if
   /// not yet set.
   ///
@@ -5580,16 +5624,18 @@  public:
   ///
   /// @return true if the function address was found.
   bool
-  get_function_address(Dwarf_Die* function_die, Dwarf_Addr& address) const
+  get_function_address(const Dwarf_Die* function_die, Dwarf_Addr& address) const
   {
-    if (!die_address_attribute(function_die, DW_AT_low_pc, address))
+    if (!die_address_attribute(const_cast<Dwarf_Die*>(function_die),
+			       DW_AT_low_pc, address))
       // So no DW_AT_low_pc was found.  Let's see if the function DIE
       // has got a DW_AT_ranges attribute instead.  If it does, the
       // first address of the set of addresses represented by the
       // value of that DW_AT_ranges represents the function (symbol)
       // address we are looking for.
-      if (!get_first_exported_fn_address_from_DW_AT_ranges(function_die,
-							   address))
+      if (!get_first_exported_fn_address_from_DW_AT_ranges
+	  (const_cast<Dwarf_Die*>(function_die),
+	   address))
 	return false;
 
     address = maybe_adjust_fn_sym_address(address);
@@ -5611,11 +5657,12 @@  public:
   ///
   /// @return true if the variable address was found.
   bool
-  get_variable_address(Dwarf_Die*	variable_die,
+  get_variable_address(const Dwarf_Die* variable_die,
 		       Dwarf_Addr&	address) const
   {
     bool is_tls_address = false;
-    if (!die_location_address(variable_die, address, is_tls_address))
+    if (!die_location_address(const_cast<Dwarf_Die*>(variable_die),
+			      address, is_tls_address))
       return false;
     if (!is_tls_address)
       address = maybe_adjust_var_sym_address(address);
@@ -7155,6 +7202,40 @@  die_is_declaration_only(Dwarf_Die* die)
   return false;
 }
 
+/// Test if a DIE is for a function decl.
+///
+/// @param die the DIE to consider.
+///
+/// @return true iff @p die represents a function decl.
+static bool
+die_is_function_decl(const Dwarf_Die *die)
+{
+  if (!die)
+    return false;
+
+  int tag = dwarf_tag(const_cast<Dwarf_Die*>(die));
+  if (tag == DW_TAG_subprogram)
+    return true;
+  return false;
+}
+
+/// Test if a DIE is for a variable decl.
+///
+/// @param die the DIE to consider.
+///
+/// @return true iff @p die represents a variable decl.
+static bool
+die_is_variable_decl(const Dwarf_Die *die)
+{
+    if (!die)
+    return false;
+
+  int tag = dwarf_tag(const_cast<Dwarf_Die*>(die));
+  if (tag == DW_TAG_variable)
+    return true;
+  return false;
+}
+
 /// Test if a DIE has size attribute.
 ///
 /// @param die the DIE to consider.
@@ -12690,9 +12771,13 @@  build_translation_unit_and_add_to_ir(read_context&	ctxt,
   result->set_is_constructed(false);
 
   do
-    build_ir_node_from_die(ctxt, &child,
-			   die_is_public_decl(&child),
-			   dwarf_dieoffset(&child));
+    // Analyze all the DIEs we encounter unless we are asked to only
+    // analyze exported interfaces and the types reachables from them.
+    if (!ctxt.env()->analyze_exported_interfaces_only()
+	|| ctxt.is_decl_die_with_exported_symbol(&child))
+      build_ir_node_from_die(ctxt, &child,
+			     die_is_public_decl(&child),
+			     dwarf_dieoffset(&child));
   while (dwarf_siblingof(&child, &child) == 0);
 
   if (!ctxt.var_decls_to_re_add_to_tree().empty())
@@ -15699,6 +15784,16 @@  read_debug_info_into_corpus(read_context& ctxt)
     origin |= corpus::LINUX_KERNEL_BINARY_ORIGIN;
   ctxt.current_corpus()->set_origin(origin);
 
+  if (origin & corpus::LINUX_KERNEL_BINARY_ORIGIN
+      && !ctxt.env()->user_set_analyze_exported_interfaces_only())
+    // So we are looking at the Linux Kernel and the user has not set
+    // any particular option regarding the amount of types to analyse.
+    // In that case, we need to only analyze types that are reachable
+    // from exported interfaces otherwise we get such a massive amount
+    // of type DIEs to look at that things are just too slow down the
+    // road.
+    ctxt.env()->analyze_exported_interfaces_only(true);
+
   ctxt.current_corpus()->set_soname(ctxt.dt_soname());
   ctxt.current_corpus()->set_needed(ctxt.dt_needed());
   ctxt.current_corpus()->set_architecture_name(ctxt.elf_architecture());
diff --git a/src/abg-ir-priv.h b/src/abg-ir-priv.h
index 45b711b7..21734b25 100644
--- a/src/abg-ir-priv.h
+++ b/src/abg-ir-priv.h
@@ -26,6 +26,7 @@  namespace ir
 {
 
 using std::string;
+using abg_compat::optional;
 
 /// The result of structural comparison of type ABI artifacts.
 enum comparison_result
@@ -443,6 +444,7 @@  struct environment::priv
   bool					decl_only_class_equals_definition_;
   bool					use_enum_binary_only_equality_;
   bool					allow_type_comparison_results_caching_;
+  optional<bool>			analyze_exported_interfaces_only_;
 #ifdef WITH_DEBUG_SELF_COMPARISON
   bool					self_comparison_debug_on_;
 #endif
diff --git a/src/abg-ir.cc b/src/abg-ir.cc
index 91c8e99b..02d68e63 100644
--- a/src/abg-ir.cc
+++ b/src/abg-ir.cc
@@ -3674,6 +3674,42 @@  const config&
 environment::get_config() const
 {return priv_->config_;}
 
+/// Getter for a property that says if the user actually did set the
+/// analyze_exported_interfaces_only() property.  If not, it means
+/// the default behaviour prevails.
+///
+/// @return tru iff the user did set the
+/// analyze_exported_interfaces_only() property.
+bool
+environment::user_set_analyze_exported_interfaces_only() const
+{return priv_->analyze_exported_interfaces_only_.has_value();}
+
+/// Setter for the property that controls if we are to restrict the
+/// analysis to the types that are only reachable from the exported
+/// interfaces only, or if the set of types should be more broad than
+/// that.  Typically, we'd restrict the analysis to types reachable
+/// from exported interfaces only (stricto sensu, that would really be
+/// only the types that are part of the ABI of well designed
+/// libraries) for performance reasons.
+///
+/// @param f the value of the flag.
+void
+environment::analyze_exported_interfaces_only(bool f)
+{priv_->analyze_exported_interfaces_only_ = f;}
+
+/// Getter for the property that controls if we are to restrict the
+/// analysis to the types that are only reachable from the exported
+/// interfaces only, or if the set of types should be more broad than
+/// that.  Typically, we'd restrict the analysis to types reachable
+/// from exported interfaces only (stricto sensu, that would really be
+/// only the types that are part of the ABI of well designed
+/// libraries) for performance reasons.
+///
+/// @param f the value of the flag.
+bool
+environment::analyze_exported_interfaces_only() const
+{return priv_->analyze_exported_interfaces_only_.value_or(false);}
+
 #ifdef WITH_DEBUG_SELF_COMPARISON
 /// Setter of the corpus of the input corpus of the self comparison
 /// that takes place when doing "abidw --debug-abidiff <binary>".
diff --git a/tools/abidiff.cc b/tools/abidiff.cc
index 97b036cb..e0bb35ac 100644
--- a/tools/abidiff.cc
+++ b/tools/abidiff.cc
@@ -29,6 +29,7 @@  using std::ostream;
 using std::cout;
 using std::cerr;
 using std::shared_ptr;
+using abg_compat::optional;
 using abigail::ir::environment;
 using abigail::ir::environment_sptr;
 using abigail::translation_unit;
@@ -74,6 +75,7 @@  struct options
   vector<string>	headers_dirs2;
   vector<string>        header_files2;
   bool			drop_private_types;
+  optional<bool>	exported_interfaces_only;
   bool			linux_kernel_mode;
   bool			no_default_supprs;
   bool			no_arch;
@@ -197,6 +199,9 @@  display_usage(const string& prog_name, ostream& out)
     << " --header-file2|--hf2 <path>  the path to one header of file2\n"
     << " --drop-private-types  drop private types from "
     "internal representation\n"
+    << "  --exported-interfaces-only  analyze exported interfaces only\n"
+    << "  --allow-non-exported-interfaces  analyze interfaces that "
+    "might not be exported\n"
     << " --no-linux-kernel-mode  don't consider the input binaries as "
        "linux kernel binaries\n"
     << " --kmi-whitelist|-w  path to a "
@@ -403,6 +408,10 @@  parse_command_line(int argc, char* argv[], options& opts)
 	}
       else if (!strcmp(argv[i], "--drop-private-types"))
 	opts.drop_private_types = true;
+      else if (!strcmp(argv[i], "--exported-interfaces-only"))
+	opts.exported_interfaces_only = true;
+      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
+	opts.exported_interfaces_only = false;
       else if (!strcmp(argv[i], "--no-default-suppression"))
 	opts.no_default_supprs = true;
       else if (!strcmp(argv[i], "--no-architecture"))
@@ -1130,6 +1139,9 @@  main(int argc, char* argv[])
       t2_type = guess_file_type(opts.file2);
 
       environment_sptr env(new environment);
+      if (opts.exported_interfaces_only.has_value())
+	env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
+
 #ifdef WITH_DEBUG_SELF_COMPARISON
 	    if (opts.do_debug)
 	      env->self_comparison_debug_is_on(true);
diff --git a/tools/abidw.cc b/tools/abidw.cc
index 9a27a029..f38d6048 100644
--- a/tools/abidw.cc
+++ b/tools/abidw.cc
@@ -40,6 +40,7 @@  using std::ostream;
 using std::ofstream;
 using std::vector;
 using std::shared_ptr;
+using abg_compat::optional;
 using abigail::tools_utils::emit_prefix;
 using abigail::tools_utils::temp_file;
 using abigail::tools_utils::temp_file_sptr;
@@ -114,6 +115,7 @@  struct options
   bool			do_log;
   bool			drop_private_types;
   bool			drop_undefined_syms;
+  optional<bool>	exported_interfaces_only;
   type_id_style_kind	type_id_style;
 #ifdef WITH_DEBUG_SELF_COMPARISON
   string		type_id_file_path;
@@ -187,6 +189,9 @@  display_usage(const string& prog_name, ostream& out)
     << "  --short-locs  only print filenames rather than paths\n"
     << "  --drop-private-types  drop private types from representation\n"
     << "  --drop-undefined-syms  drop undefined symbols from representation\n"
+    << "  --exported-interfaces-only  analyze exported interfaces only\n"
+    << "  --allow-non-exported-interfaces  analyze interfaces that "
+    "might not be exported\n"
     << "  --no-comp-dir-path  do not show compilation path information\n"
     << "  --no-elf-needed  do not show the DT_NEEDED information\n"
     << "  --no-write-default-sizes  do not emit pointer size when it equals"
@@ -368,6 +373,10 @@  parse_command_line(int argc, char* argv[], options& opts)
 	opts.drop_private_types = true;
       else if (!strcmp(argv[i], "--drop-undefined-syms"))
 	opts.drop_undefined_syms = true;
+      else if (!strcmp(argv[i], "--exported-interfaces-only"))
+	opts.exported_interfaces_only = true;
+      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
+	opts.exported_interfaces_only = false;
       else if (!strcmp(argv[i], "--no-linux-kernel-mode"))
 	opts.linux_kernel_mode = false;
       else if (!strcmp(argv[i], "--abidiff"))
@@ -606,6 +615,9 @@  load_corpus_and_write_abixml(char* argv[],
             }
         }
 
+      if (opts.exported_interfaces_only.has_value())
+	env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
+
       t.start();
       corp = dwarf_reader::read_corpus_from_elf(ctxt, s);
       t.stop();
@@ -813,6 +825,9 @@  load_kernel_corpus_group_and_write_abixml(char* argv[],
   timer t, global_timer;
   suppressions_type supprs;
 
+  if (opts.exported_interfaces_only.has_value())
+    env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
+
   if (opts.do_log)
     emit_prefix(argv[0], cerr)
       << "going to build ABI representation of the Linux Kernel ...\n";
diff --git a/tools/abipkgdiff.cc b/tools/abipkgdiff.cc
index 551080b9..656d5882 100644
--- a/tools/abipkgdiff.cc
+++ b/tools/abipkgdiff.cc
@@ -106,6 +106,7 @@  using std::set;
 using std::ostringstream;
 using std::shared_ptr;
 using std::dynamic_pointer_cast;
+using abg_compat::optional;
 using abigail::workers::task;
 using abigail::workers::task_sptr;
 using abigail::workers::queue;
@@ -205,6 +206,7 @@  public:
   bool		fail_if_no_debug_info;
   bool		show_identical_binaries;
   bool		self_check;
+  optional<bool> exported_interfaces_only;
 #ifdef WITH_CTF
   bool		use_ctf;
 #endif
@@ -868,6 +870,9 @@  display_usage(const string& prog_name, ostream& out)
     "full impact analysis report rather than the default leaf changes reports\n"
     << " --non-reachable-types|-t  consider types non reachable"
     " from public interfaces\n"
+    << "  --exported-interfaces-only  analyze exported interfaces only\n"
+    << "  --allow-non-exported-interfaces  analyze interfaces that "
+    "might not be exported\n"
     << " --no-linkage-name		do not display linkage names of "
     "added/removed/changed\n"
     << " --redundant                    display redundant changes\n"
@@ -2076,6 +2081,10 @@  public:
     abigail::elf_reader::status detailed_status =
       abigail::elf_reader::STATUS_UNKNOWN;
 
+    if (args->opts.exported_interfaces_only.has_value())
+      env->analyze_exported_interfaces_only
+	(*args->opts.exported_interfaces_only);
+
     status |= compare(args->elf1, args->debug_dir1, args->private_types_suppr1,
 		      args->elf2, args->debug_dir2, args->private_types_suppr2,
 		      args->opts, env, diff, ctxt, &detailed_status);
@@ -2142,6 +2151,10 @@  public:
     diff_context_sptr ctxt;
     corpus_diff_sptr diff;
 
+    if (args->opts.exported_interfaces_only.has_value())
+      env->analyze_exported_interfaces_only
+	(*args->opts.exported_interfaces_only);
+
     abigail::elf_reader::status detailed_status =
       abigail::elf_reader::STATUS_UNKNOWN;
 
@@ -3024,6 +3037,10 @@  compare_prepared_linux_kernel_packages(package& first_package,
   string dist_root2 = second_package.extracted_dir_path();
 
   abigail::ir::environment_sptr env(new abigail::ir::environment);
+  if (opts.exported_interfaces_only.has_value())
+    env->analyze_exported_interfaces_only
+      (*opts.exported_interfaces_only);
+
   suppressions_type supprs;
   corpus_group_sptr corpus1, corpus2;
   corpus1 = build_corpus_group_from_kernel_dist_under(dist_root1,
@@ -3326,6 +3343,10 @@  parse_command_line(int argc, char* argv[], options& opts)
       else if (!strcmp(argv[i], "--full-impact")
 	       ||!strcmp(argv[i], "-f"))
 	opts.show_full_impact_report = true;
+      else if (!strcmp(argv[i], "--exported-interfaces-only"))
+	opts.exported_interfaces_only = true;
+      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
+	opts.exported_interfaces_only = false;
       else if (!strcmp(argv[i], "--no-linkage-name"))
 	opts.show_linkage_names = false;
       else if (!strcmp(argv[i], "--redundant"))
diff --git a/tools/kmidiff.cc b/tools/kmidiff.cc
index 8fd3fed9..f3332765 100644
--- a/tools/kmidiff.cc
+++ b/tools/kmidiff.cc
@@ -29,6 +29,7 @@  using std::vector;
 using std::ostream;
 using std::cout;
 using std::cerr;
+using abg_compat::optional;
 
 using namespace abigail::tools_utils;
 using namespace abigail::dwarf_reader;
@@ -60,6 +61,7 @@  struct options
   bool			show_hexadecimal_values;
   bool			show_offsets_sizes_in_bits;
   bool			show_impacted_interfaces;
+  optional<bool>	exported_interfaces_only;
 #ifdef WITH_CTF
   bool			use_ctf;
 #endif
@@ -120,6 +122,9 @@  display_usage(const string& prog_name, ostream& out)
     << " --impacted-interfaces|-i  show interfaces impacted by ABI changes\n"
     << " --full-impact|-f  show the full impact of changes on top-most "
 	 "interfaces\n"
+    << "  --exported-interfaces-only  analyze exported interfaces only\n"
+    << "  --allow-non-exported-interfaces  analyze interfaces that "
+    "might not be exported\n"
     << " --show-bytes  show size and offsets in bytes\n"
     << " --show-bits  show size and offsets in bits\n"
     << " --show-hex  show size and offset in hexadecimal\n"
@@ -262,6 +267,10 @@  parse_command_line(int argc, char* argv[], options& opts)
       else if (!strcmp(argv[i], "--full-impact")
 	       || !strcmp(argv[i], "-f"))
 	opts.leaf_changes_only = false;
+      else if (!strcmp(argv[i], "--exported-interfaces-only"))
+	opts.exported_interfaces_only = true;
+      else if (!strcmp(argv[i], "--allow-non-exported-interfaces"))
+	opts.exported_interfaces_only = false;
       else if (!strcmp(argv[i], "--show-bytes"))
 	opts.show_offsets_sizes_in_bits = false;
       else if (!strcmp(argv[i], "--show-bits"))
@@ -408,6 +417,9 @@  main(int argc, char* argv[])
 
   environment_sptr env(new environment);
 
+  if (opts.exported_interfaces_only.has_value())
+    env->analyze_exported_interfaces_only(*opts.exported_interfaces_only);
+
   corpus_group_sptr group1, group2;
   string debug_info_root_dir;
   corpus::origin origin =