From patchwork Thu Mar 11 11:53:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Giuliano Procida X-Patchwork-Id: 42451 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 43B773836C40; Thu, 11 Mar 2021 11:53:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 43B773836C40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1615463637; bh=L0Q0gG/dR4Da8CfzBiWScDiJ/0npsT34FYpSFnNU4Ko=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Help: List-Subscribe:From:Reply-To:Cc:From; b=ixsPvKuLQN1RppOwFv5pSBUdM7spF+GFPLNyATmGObeJbg1H1G0CaUG2GPQucW0iE Fz4yRoq4mENCUd8/ljnwNkOft1c0BnjI5PS4k75RJqf1LP/D28TmfwxbqSWuQYJM+6 bzbbSj0Kf1PHKPV/SO/742l/YbwUMm9qsM+R2wMc= X-Original-To: libabigail@sourceware.org Delivered-To: libabigail@sourceware.org Received: from mail-wm1-x349.google.com (mail-wm1-x349.google.com [IPv6:2a00:1450:4864:20::349]) by sourceware.org (Postfix) with ESMTPS id DF7EF386F824 for ; Thu, 11 Mar 2021 11:53:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org DF7EF386F824 Received: by mail-wm1-x349.google.com with SMTP id w10so2225654wmk.1 for ; Thu, 11 Mar 2021 03:53:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=L0Q0gG/dR4Da8CfzBiWScDiJ/0npsT34FYpSFnNU4Ko=; b=LXHnNLlBauUKIuU/GePIoY54gq+7K4GtFwljU9d6p5JlUSQR4LCfZ2UgzhNhGjFR3R VdQI4s9eFcrNe3CpAZL5HOMcyhq3Cd8kOWyQR4FC02ELVR77iAjXPyubNItFrUgrWjhN wXiHRK5p9utUis9Z8yMSyeq233Ah3GxPcpdOabRpG8/LRIYqB2MHDYQ9d+XXhBWEp9j0 hUNk5Fm0C7ZRsuSCNkTnWye2ezT6I1/AKDwghQVpnrW1xxcU65GjbIQwir2Ejo0e5QOe BHzUSxv8CY6Aaq6IVQ9UocaPgVvIB7SXFVFoxuvTztWDxrsglrPADYJPjRbCc7HuN5Ja nABw== X-Gm-Message-State: AOAM5314l46YwLpO6piZyfMlk2DEpsE9QvPfBmw8MkvRNVXYRNMhXk1I 2Nyn9wgkRbTK8WUuIa1gnNsSx6FAw2NI2XPkOdLn14/WLbqwDzZrxrIDFsCjcz9DTjZAuCc69lw CgZKsC53p+z0xL7rNQbtkBka77bpYkOf65D8TLIedTzeSBtNqA08Xef+DBzjlII28X+YuUCQ= X-Google-Smtp-Source: ABdhPJzPk1XrTmD8+b5Mb6krklgAIiKRLH6u2rUnoOGsIiYgXYwnh7jmXVXdTaWFN93ID190rk/4jhmq9X7f+w== X-Received: from tef.lon.corp.google.com ([2a00:79e0:d:110:8d25:b9e0:9366:2c14]) (user=gprocida job=sendgmr) by 2002:a05:600c:198c:: with SMTP id t12mr7679127wmq.183.1615463632740; Thu, 11 Mar 2021 03:53:52 -0800 (PST) Date: Thu, 11 Mar 2021 11:53:40 +0000 Message-Id: <20210311115340.3033687-1-gprocida@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.31.0.rc2.261.g7f71774620-goog Subject: [RFC PATCH] Add an experimental ABI pruning utility. To: libabigail@sourceware.org X-Spam-Status: No, score=-23.1 required=5.0 tests=BAYES_00, DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libabigail@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list of the Libabigail project List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-Patchwork-Original-From: Giuliano Procida via Libabigail From: Giuliano Procida Reply-To: Giuliano Procida Cc: maennich@google.com, kernel-team@android.com Errors-To: libabigail-bounces@sourceware.org Sender: "Libabigail" In response to internal reports about very large ABIs for libraires implemented in C++ but exposing only a C API, I wrote a script as a means of investigating the issue. The script aims to remove parts of the XML that are irrelevant to abidiff. It has been run on the current libabigail tests as well as some locally-generated ABI files. On most XML files, this results in a modest saving (5-10%). However, for the library that sparked the invesigaton, the resulting XML was more than 2300 times smaller. The script may have some merit on its own but perhaps should be implemented differently. Question: How hard would it be to implement this functionality (under flag control) within libabigail itself? A couple of tangential things worth mentioning (see the comments at the very beginning and very end of the script): - abidiff rejects XML files with an XML declaration at the top - abidiff rejects completely empty files Resolving the first would make the second moot. * scripts/abiprune.pl: Add script to prune ABI XML. Signed-off-by: Giuliano Procida --- scripts/abiprune.pl | 323 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 323 insertions(+) create mode 100755 scripts/abiprune.pl diff --git a/scripts/abiprune.pl b/scripts/abiprune.pl new file mode 100755 index 00000000..8e04a273 --- /dev/null +++ b/scripts/abiprune.pl @@ -0,0 +1,323 @@ +#!/usr/bin/perl + +# This script is intended to consume libabigail ABI XML as generated +# by abidw and produce a possibly smaller representation that captures +# the same ABI. In particular, the output should be such that abidiff +# --harmless reports no differences (or is empty). +# +# It does this by modelling symbols and types dependencies as a +# directed graph, marking nodes reachable from the ELF symbols and +# dropping unreachable nodes. + +use strict; +use warnings; +no warnings 'recursion'; + +use autodie; + +use XML::LibXML; + +# Making a graph from ABI XML. +# +# The following are the types of "node" we care about. The "edges" are +# obtained from a few XML attributes as well as via XML element +# containment. +# +# ELF (exported) symbols +# +# elf-symbol (has a name; the code here currently knows nothing +# about aliases) +# +# Declarations (that mention a symbol) +# +# These live in a scope. In C++ scopes can be nested and include +# namespaces and class types. +# +# var-decl (also used for member variables) +# elf-symbol linked to symbol via mangled-name +# type-id links to a type +# function-decl (also used for member functions) +# elf-symbol linked to symbol via mangled-name +# parameter and return type-ids link to types +# (alas, not just a link to a function type) +# +# Types +# +# These occupy pretty much all the other elements, besides those +# that act as simple containers. +# +# XML elements and their roles in more detail +# +# ELF +# +# elf-needed - container +# dependency - names a library +# elf-function-symbols - contains a list of symbols +# elf-variable-symbols - contains a list of symbols +# elf-symbol - describes an ELF variable or function +# +# Grouping and scoping +# +# abi-corpus-group +# abi-corpus +# abi-instr - compilation unit containers +# namespace-decl - pure container, possibly named +# +# Types (some introduce scopes, only in C++) +# +# type-decl - defines a primitive type +# typedef-decl - defines a type, links to a type +# qualified-type-def - defines a type, links to a type +# pointer-type-def - defines a type, links to a type +# reference-type-def - defines a type, links to a type +# array-type-def - defines a (multidimensional array) type, refers to element type, contains subranges +# subrange - contains array length, refers to element type; defines types (never referred to; duplicated) +# function-type - defines a type +# parameter - belongs to function-type and -decl, links to a type +# return - belongs to function-type and -decl, links to a type +# enum-decl - defines a type, names it, contains a list of enumerators and an underlying-type +# underlying-type - belongs to enum-decl +# enumerator - belongs to enum-decl +# union-decl - defines and names a type, contains member elements linked to other things +# class-decl - defines and names a type, contains base type, member elements linking to other things +# base-class - belongs to class-decl +# data-member - container for a member; holds access level +# member-function - container for a member; holds access level +# member-type - container for a type declaration; holds access level +# member-template - container for a (function?) template declaration; holds access level +# +# Higher order Things +# +# class-template-decl - defines a "type", but without instantiation this isn't usable +# function-template-decl - defines a "type", but without instantiation this isn't usable +# template-type-parameter - defines a type (formal variable), perhaps one which should be excluded from the real type graph +# template-non-type-parameter - names a template parameter, links to a type +# template-parameter-type-composition - container? +# +# Values +# +# function-decl - names a function, can link to a symbol +# has same children as function-type, rather than linking to a type +# var-decl - names a variable, can link to a symbol, links to a type + +# Graph nodes. +my %symbols; +my %exported_symbols; +my %types; +# Graph edges. +my %type_type_edges; +my %symbol_type_edges; +my %type_symbol_edges; + +# Keep track of the types being defined. +my @id_stack; +# Keep track of the current (value) declaration. +my @symbol_stack; +sub process($); +sub process($) { + my ($node) = @_; + + # The XML attributes we care about. + my $name; + my $id; + my $type_id; + my $symbol; + my $naming_typedef_id; + + # Not every node we encounter is an XML element. + if ($node->nodeType == XML_ELEMENT_NODE) { + $name = $node->getAttribute('name'); + $id = $node->getAttribute('id'); + $type_id = $node->getAttribute('type-id'); + $symbol = $node->getAttribute('mangled-name'); + $naming_typedef_id = $node->getAttribute('naming-typedef-id'); + die if defined $id && defined $symbol; + } + + if (defined $name && $node->getName() eq 'elf-symbol') { + $exported_symbols{$name} = undef; + # Early return is safe, but not necessary. + return; + } + + if (defined $id) { + # This element defines a type (but there may be more than one + # defining the same type - we cannot rely on uniqueness). + $types{$id} = undef; + if (defined $naming_typedef_id) { + # This is an odd one, there can be a backwards link from an + # anonymous type to the typedef that refers to it, so we need to + # pull in the typedef, even if nothing else refers to it. + $type_type_edges{$id}{$naming_typedef_id}{naming} = undef; + } + if (@id_stack) { + # Type parent<->child dependencies; record dependencies both + # ways to avoid making holes in the XML types. + $type_type_edges{$id_stack[-1]}{$id}{contains} = undef; + $type_type_edges{$id}{$id_stack[-1]}{in} = undef; + } + push @id_stack, $id; + } + + if (defined $symbol) { + # This element is a declaration linked to a symbol (whether or not + # exported). + $symbols{$symbol} = undef; + if (@id_stack) { + # The enclosing scope may include a C++ class, in which case we + # need to record the dependency. + $type_symbol_edges{$id_stack[-1]}{$symbol} = undef; + } + # The symbol depends on the types mentioned in this element, so + # record it. + push @symbol_stack, $symbol; + die "nested symbols: @symbol_stack\n" if scalar @symbol_stack > 1; + } + + if (defined $type_id) { + if (@id_stack) { + # The enclosing type refers to this type, record the dependency. + $type_type_edges{$id_stack[-1]}{$type_id}{depends} = undef; + } + if (@symbol_stack) { + # The enclosing declaration (linked to a symbol) refers to this + # type, record the dependency. + $symbol_type_edges{$symbol_stack[-1]}{$type_id} = undef; + } + } + + for my $child ($node->nonBlankChildNodes()) { + process($child); + } + + if (defined $symbol) { + pop @symbol_stack; + } + if (defined $id) { + pop @id_stack; + } +} + +# Load the XML. +my $dom = XML::LibXML->load_xml(IO => \*STDIN); +# Build a graph. +process($dom); +die if @id_stack or @symbol_stack; + +# DFS visited state. +my %seen_types; +my %seen_symbols; +sub dfs_type($); +sub dfs_symbol($); + +sub dfs_type($) { + my ($type) = @_; + return if exists $seen_types{$type}; + $seen_types{$type} = undef; + + my $types = $type_type_edges{$type}; + if (defined $types) { + for my $type (keys %$types) { + for my $link (keys %{$types->{$type}}) { + if (exists $types->{$type}{$link}) { + dfs_type($type); + } + } + } + } + my $symbols = $type_symbol_edges{$type}; + if (defined $symbols) { + for my $symbol (keys %$symbols) { + dfs_symbol($symbol); + } + } +} + +sub dfs_symbol($) { + my ($symbol) = @_; + return if exists $seen_symbols{$symbol}; + $seen_symbols{$symbol} = undef; + + my $types = $symbol_type_edges{$symbol}; + if (defined $types) { + for my $type (keys %$types) { + dfs_type($type); + } + } else { + warn "no declaration found for exported symbol $symbol\n"; + } +} + +# Traverse the graph, using the exported symbols as the roots. +for my $symbol (keys %exported_symbols) { + dfs_symbol($symbol); +} + +# Useful counts. +sub print_report() { + my $count_symbols = scalar keys %symbols; + my $count_exported_symbols = scalar keys %exported_symbols; + my $count_types = scalar keys %types; + my $count_reachable_types = scalar keys %seen_types; + + print STDERR qq{symbols = $count_symbols +exported symbols = $count_exported_symbols +types = $count_types +reachable types = $count_reachable_types +}; +} + +my %drop_if_empty = map { $_ => undef } qw( + namespace-decl + abi-instr + abi-corpus + abi-corpus-group +); + +# This is another DFS traversal of the XML. +sub prune($); +sub prune($) { + my ($node) = @_; + + my $name = $node->getName(); + my $id; + my $symbol; + + if ($node->nodeType == XML_ELEMENT_NODE) { + $id = $node->getAttribute('id'); + $symbol = $node->getAttribute('mangled-name'); + die if defined $id && defined $symbol; + } + + # Return if we know that this is a type or declaration to keep or + # drop in its entirety. + if (defined $id) { + return !exists $seen_types{$id}; + } + if ($name eq 'var-decl' || $name eq 'function-decl') { + return !defined $symbol || !exists $exported_symbols{$symbol}; + } + + # Otherwise, this is not a type, declaration or part thereof, so + # process child elements. + my $empty = 1; + for my $child ($node->nonBlankChildNodes()) { + if (prune($child)) { + $node->removeChild($child); + } else { + $empty = 0; + } + } + + # Empty container elements can be dropped. + return $empty && exists $drop_if_empty{$name}; +} + +# Drop unreachable nodes and any container nodes that end up empty. +prune($dom); +my $out = $dom->toString(); +# Remove blank line debris. +$out =~ s;^ *\n;;mg; +# Remove the XML declaration as abidiff doesn't like it. +$out =~ s;^<\?xml .*\?>\n;;m; +print $out;