From patchwork Tue Mar 16 16:55:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Giuliano Procida X-Patchwork-Id: 42590 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 171C53854804; Tue, 16 Mar 2021 16:55:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 171C53854804 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1615913720; bh=iKAfZ/KAcgIE3XE7yoFO9Oh/+b4uWfttMA+FdqSvJ4I=; h=Date:In-Reply-To:References:Subject:To:List-Id:List-Unsubscribe: List-Archive:List-Help:List-Subscribe:From:Reply-To:Cc:From; b=mzI6vOjVnjLpbbiRXTUGES+JfGmLfb5FfFbuBAhDXZ96emh2U+kvWxa3n29ry/mFZ awTcbks1QSzjUXNtmN39uygeUEvCh/ydADqrToZcjEdWQJsDgV4ZcoRevs7kLc6U1m eatH+zGt23f8j+jXHosHikFgj77pQUDJH3h0+qc0= X-Original-To: libabigail@sourceware.org Delivered-To: libabigail@sourceware.org Received: from mail-ed1-x54a.google.com (mail-ed1-x54a.google.com [IPv6:2a00:1450:4864:20::54a]) by sourceware.org (Postfix) with ESMTPS id 583613854804 for ; Tue, 16 Mar 2021 16:55:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 583613854804 Received: by mail-ed1-x54a.google.com with SMTP id y10so5622060edr.20 for ; Tue, 16 Mar 2021 09:55:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=iKAfZ/KAcgIE3XE7yoFO9Oh/+b4uWfttMA+FdqSvJ4I=; b=BtGtK4461zKZcQtHIVg5OE5KzdgmMxLIIkDpaZrcI7z2zyqr0zlOd627mD2x7lSS7x FiquqGJvDftSELklis092Fo55cfSk1D8YoDXqkiQGXvQ/TeYyQUIT5Hf3gxdSsP0APn7 Y57NINCnHjVO274YCDLXWqT01Dn+BEvmRoRA1+SwFLHKo/bi9fODvPnqrdLsMeh424/L YFcWAEJHx5CX2T4PC8PZdgAl8Mcy84Q9K3GKxF/Ru0deE70Wq4kFezaNtswQIdD8v0Oh GkTsD4q8cvvrBeNjJl3OyFWkfcDmQLpSR5NZgRQetAkOsjaELIclfF47P7gj1Uqt2CgV ExmA== X-Gm-Message-State: AOAM5331KI7onpCXtDEOTU7rTeNGcFmZDhKAWoQ3HQtI40anSy0VShgQ RY4ug1CO26wOokL8wgo4uH8ZnavqB5MjWmwGF9u/5nw/H+Sl525nkMqvi3skOeD3Wvpx2XJYpjP BRXO6xIDuof9Py8NdUkXkmzufvzUr2ursvLZP4/hAqE9wFbXS7GdBrJPmU2bptXnYTEGlEt4= X-Google-Smtp-Source: ABdhPJzL/T0PNl3yAz1HLA4odDxcefhmkeXQCjOU+dHm/3kMZg4nQ4cPmj+85NBkwNHCdWOe99hLBYutJhGZMg== X-Received: from tef.lon.corp.google.com ([2a00:79e0:d:110:b1f7:c9f5:8445:262f]) (user=gprocida job=sendgmr) by 2002:a17:907:d8a:: with SMTP id go10mr31109291ejc.46.1615913715125; Tue, 16 Mar 2021 09:55:15 -0700 (PDT) Date: Tue, 16 Mar 2021 16:55:09 +0000 In-Reply-To: <20210312165958.3575755-1-gprocida@google.com> Message-Id: <20210316165509.2658452-1-gprocida@google.com> Mime-Version: 1.0 References: <20210312165958.3575755-1-gprocida@google.com> X-Mailer: git-send-email 2.31.0.rc2.261.g7f71774620-goog Subject: [RFC PATCH v3] Add an experimental ABI pruning utility. To: libabigail@sourceware.org X-Spam-Status: No, score=-22.5 required=5.0 tests=BAYES_00, DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libabigail@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list of the Libabigail project List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-Patchwork-Original-From: Giuliano Procida via Libabigail From: Giuliano Procida Reply-To: Giuliano Procida Cc: maennich@google.com, kernel-team@android.com Errors-To: libabigail-bounces@sourceware.org Sender: "Libabigail" In response to internal reports about very large ABIs for libraires implemented in C++ but exposing only a C API, I wrote a script as a means of investigating the issue. The script aims to remove parts of the XML that are irrelevant to abidiff. It has been run on current libabigail test data ABI files as well as some locally-generated ones. The script by default considers all exported ELF symbols as the roots of interest (but this list can be overridden). On most XML files, this results in a modest saving (5-10%). However, for the library that sparked the invesigaton, the resulting XML was more than 2300 times smaller. The script may have some merit on its own but perhaps should be implemented differently (say in C++). Question: How hard would it be to implement this functionality (under flag control) within libabigail itself? A couple of tangential things worth mentioning (see the comments at the very beginning and very end of the script): - abidiff rejects XML files with an XML declaration at the top - abidiff rejects completely empty files Resolving the first would make the second moot. * scripts/abiprune.pl: Add script to prune ABI XML or restrict it to a subset of symbols. Signed-off-by: Giuliano Procida --- scripts/abiprune.pl | 356 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 356 insertions(+) create mode 100755 scripts/abiprune.pl diff --git a/scripts/abiprune.pl b/scripts/abiprune.pl new file mode 100755 index 00000000..e26d3115 --- /dev/null +++ b/scripts/abiprune.pl @@ -0,0 +1,356 @@ +#!/usr/bin/perl + +# This script is intended to consume libabigail ABI XML as generated +# by abidw and produce a possibly smaller representation that captures +# the same ABI. In particular, the output should be such that abidiff +# --harmless reports no differences (or is empty). +# +# It does this by modelling symbols and types dependencies as a +# directed graph, marking nodes reachable from the ELF symbols and +# dropping unreachable nodes. + +use strict; +use warnings; +no warnings 'recursion'; + +use autodie; + +use Getopt::Long; +use IO::File; +use XML::LibXML; + +# Making a graph from ABI XML. +# +# The following are the types of "node" we care about. The "edges" are +# obtained from a few XML attributes as well as via XML element +# containment. +# +# ELF (exported) symbols +# +# elf-symbol (has a name; the code here currently knows nothing +# about aliases) +# +# Declarations (that mention a symbol) +# +# These live in a scope. In C++ scopes can be nested and include +# namespaces and class types. +# +# var-decl (also used for member variables) +# elf-symbol linked to symbol via mangled-name +# type-id links to a type +# function-decl (also used for member functions) +# elf-symbol linked to symbol via mangled-name +# parameter and return type-ids link to types +# (alas, not just a link to a function type) +# +# Types +# +# These occupy pretty much all the other elements, besides those +# that act as simple containers. +# +# XML elements and their roles in more detail +# +# ELF +# +# elf-needed - container +# dependency - names a library +# elf-variable-symbols - contains a list of symbols +# elf-function-symbols - contains a list of symbols +# elf-symbol - describes an ELF variable or function +# +# Grouping and scoping +# +# abi-corpus-group +# abi-corpus +# abi-instr - compilation unit containers +# namespace-decl - pure container, possibly named +# +# Types (some introduce scopes, only in C++) +# +# type-decl - defines a primitive type +# typedef-decl - defines a type, links to a type +# qualified-type-def - defines a type, links to a type +# pointer-type-def - defines a type, links to a type +# reference-type-def - defines a type, links to a type +# array-type-def - defines a (multidimensional array) type, refers to element type, contains subranges +# subrange - contains array length, refers to element type; defines types (never referred to; duplicated) +# function-type - defines a type +# parameter - belongs to function-type and -decl, links to a type +# return - belongs to function-type and -decl, links to a type +# enum-decl - defines a type, names it, contains a list of enumerators and an underlying-type +# underlying-type - belongs to enum-decl +# enumerator - belongs to enum-decl +# union-decl - defines and names a type, contains member elements linked to other things +# class-decl - defines and names a type, contains base type, member elements linking to other things +# base-class - belongs to class-decl +# data-member - container for a member; holds access level +# member-function - container for a member; holds access level +# member-type - container for a type declaration; holds access level +# member-template - container for a (function?) template declaration; holds access level +# +# Higher order Things +# +# class-template-decl - defines a type (function), but without instantiation this isn't usable +# function-template-decl - defines a type (function), but without instantiation this isn't usable +# template-type-parameter - defines a type (variable), perhaps one which should be excluded from the real type graph +# template-non-type-parameter - names a template parameter, links to a type +# template-parameter-type-composition - container? +# +# Values +# +# function-decl - names a function, can link to a symbol +# has same children as function-type, rather than linking to a type +# var-decl - names a variable, can link to a symbol, links to a type + +my $symbols_file; +GetOptions ("symbols=s" => \$symbols_file) + && scalar @ARGV == 0 + or die "usage: $0 [-s symbols_file]\n"; + +my %wanted_symbols; +if ($symbols_file) { + my $fh = new IO::File $symbols_file, '<'; + while (<$fh>) { + chomp; + $wanted_symbols{$_} = undef; + } + close $fh; +} + +# Graph nodes. +my %symbols; +my %exported_symbols; +my %types; +# Graph edges. +my %type_type_edges; +my %symbol_type_edges; +my %type_symbol_edges; + +# Keep track of the types being defined. +my @id_stack; +# Keep track of the current (value) declaration. +my @symbol_stack; +# Traverse the whole XML DOM. +sub process($); +sub process($) { + my ($node) = @_; + + # The XML attributes we care about. + my $name; + my $id; + my $type_id; + my $symbol; + my $naming_typedef_id; + + # Not every node we encounter is an XML element. + if ($node->nodeType == XML_ELEMENT_NODE) { + $name = $node->getAttribute('name'); + $id = $node->getAttribute('id'); + $type_id = $node->getAttribute('type-id'); + $symbol = $node->getAttribute('mangled-name'); + $naming_typedef_id = $node->getAttribute('naming-typedef-id'); + die if defined $id && defined $symbol; + } + + if (defined $name && $node->getName() eq 'elf-symbol') { + $exported_symbols{$name} = undef; + # Early return is safe, but not necessary. + return; + } + + if (defined $id) { + # This element defines a type (but there may be more than one + # defining the same type - we cannot rely on uniqueness). + $types{$id} = undef; + if (defined $naming_typedef_id) { + # This is an odd one, there can be a backwards link from an + # anonymous type to the typedef that refers to it, so we need to + # pull in the typedef, even if nothing else refers to it. + $type_type_edges{$id}{$naming_typedef_id}{naming} = undef; + } + if (@id_stack) { + # Type parent<->child dependencies; record dependencies both + # ways to avoid making holes in the XML types. + $type_type_edges{$id_stack[-1]}{$id}{contains} = undef; + $type_type_edges{$id}{$id_stack[-1]}{in} = undef; + } + push @id_stack, $id; + } + + if (defined $symbol) { + # This element is a declaration linked to a symbol (whether or not + # exported). + $symbols{$symbol} = undef; + if (@id_stack) { + # The enclosing scope may include a C++ class, in which case we + # need to record the dependency. + $type_symbol_edges{$id_stack[-1]}{$symbol} = undef; + } + # The symbol depends on the types mentioned in this element, so + # record it. + push @symbol_stack, $symbol; + die "nested symbols: @symbol_stack\n" if scalar @symbol_stack > 1; + } + + if (defined $type_id) { + if (@id_stack) { + # The enclosing type refers to this type, record the dependency. + $type_type_edges{$id_stack[-1]}{$type_id}{depends} = undef; + } + if (@symbol_stack) { + # The enclosing declaration (linked to a symbol) refers to this + # type, record the dependency. + $symbol_type_edges{$symbol_stack[-1]}{$type_id} = undef; + } + } + + for my $child ($node->nonBlankChildNodes()) { + process($child); + } + + if (defined $symbol) { + pop @symbol_stack; + } + if (defined $id) { + pop @id_stack; + } +} + +# Load the XML. +my $dom = XML::LibXML->load_xml(IO => \*STDIN); +# Build a graph. +process($dom); +die if @id_stack or @symbol_stack; + +%wanted_symbols = %exported_symbols unless defined $symbols_file; + +# DFS visited state. +my %seen_types; +my %seen_symbols; +sub dfs_type($); +sub dfs_symbol($); + +sub dfs_type($) { + my ($type) = @_; + if (not exists $types{$type}) { + warn "edge to unknown type $type\n"; + return; + } + return if exists $seen_types{$type}; + $seen_types{$type} = undef; + + my $types = $type_type_edges{$type}; + if (defined $types) { + for my $type (keys %$types) { + for my $link (keys %{$types->{$type}}) { + if (exists $types->{$type}{$link}) { + dfs_type($type); + } + } + } + } + my $symbols = $type_symbol_edges{$type}; + if (defined $symbols) { + for my $symbol (keys %$symbols) { + dfs_symbol($symbol); + } + } +} + +sub dfs_symbol($) { + my ($symbol) = @_; + return if exists $seen_symbols{$symbol}; + $seen_symbols{$symbol} = undef; + + my $types = $symbol_type_edges{$symbol}; + if (defined $types) { + for my $type (keys %$types) { + dfs_type($type); + } + } else { + warn "no declaration found for exported symbol $symbol\n"; + } +} + +# Traverse the graph, starting from the wanted symbols. +for my $symbol (keys %wanted_symbols) { + dfs_symbol($symbol); +} + +# Useful counts. +sub print_report() { + my $count_symbols = scalar keys %symbols; + my $count_exported_symbols = scalar keys %exported_symbols; + my $count_types = scalar keys %types; + my $count_reachable_types = scalar keys %seen_types; + + print STDERR qq{symbols = $count_symbols +exported symbols = $count_exported_symbols +types = $count_types +reachable types = $count_reachable_types +}; +} + +my %drop_if_empty = map { $_ => undef } qw( + elf-variable-symbols + elf-function-symbols + namespace-decl + abi-instr + abi-corpus + abi-corpus-group +); + +# This is another XML DOM traversal. +sub prune($); +sub prune($) { + my ($node) = @_; + + my $node_name = $node->getName(); + my $name; + my $id; + my $symbol; + + if ($node->nodeType == XML_ELEMENT_NODE) { + $name = $node->getAttribute('name'); + $id = $node->getAttribute('id'); + $symbol = $node->getAttribute('mangled-name'); + die if defined $id && defined $symbol; + } + + # Keep / prune wanted / unwanted symbols + if (defined $name && $node_name eq 'elf-symbol') { + return !exists $seen_symbols{$name}; + } + + # Return if we know that this is a type or declaration to keep or + # drop in its entirety. + if (defined $id) { + return !exists $seen_types{$id}; + } + if ($node_name eq 'var-decl' || $node_name eq 'function-decl') { + return !defined $symbol || !exists $seen_symbols{$symbol}; + } + + # Otherwise, this is not a type, declaration or part thereof, so + # process child elements. + my $empty = 1; + for my $child ($node->nonBlankChildNodes()) { + if (prune($child)) { + $node->removeChild($child); + } else { + $empty = 0; + } + } + + # Empty container elements can be dropped. + return $empty && exists $drop_if_empty{$node_name}; +} + +# Drop unreachable nodes and any container nodes that end up empty. +prune($dom); +my $out = $dom->toString(); +# Remove blank line debris. +$out =~ s;^ *\n;;mg; +# Remove the XML declaration as abidiff doesn't like it. +$out =~ s;^<\?xml .*\?>\n;;m; +print $out;