From patchwork Thu Mar 25 21:51:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Giuliano Procida X-Patchwork-Id: 42774 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E7D663858001; Thu, 25 Mar 2021 21:52:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E7D663858001 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1616709129; bh=rtNyey0W4rdVw22uiMlDxOS7/ozvLNb9B9VCPHJNb7Y=; h=Date:In-Reply-To:References:Subject:To:List-Id:List-Unsubscribe: List-Archive:List-Help:List-Subscribe:From:Reply-To:Cc:From; b=iU/OTs1B5tMKsdEHdjrAKNlVlF65os2QZS1nea7A/T9GkiNGL7WHQk+MS+A1dAidp +X6tjYAqE4Q67AsZ6ieSbji0pdYCIHwktfICZ6dvbMdQibbsxkxMP2hQDv++l233Y+ WWYujGbjuQxWidpnu+0N0+yNZq3RYah61U3aM9Vs= X-Original-To: libabigail@sourceware.org Delivered-To: libabigail@sourceware.org Received: from mail-qv1-xf49.google.com (mail-qv1-xf49.google.com [IPv6:2607:f8b0:4864:20::f49]) by sourceware.org (Postfix) with ESMTPS id 680CF3858001 for ; Thu, 25 Mar 2021 21:52:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 680CF3858001 Received: by mail-qv1-xf49.google.com with SMTP id a7so2196286qvx.10 for ; Thu, 25 Mar 2021 14:52:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=rtNyey0W4rdVw22uiMlDxOS7/ozvLNb9B9VCPHJNb7Y=; b=A5TnY52dSSfDv5NrrbOluc7elJcgrJ9YlDyLaNqQYWydyD3PQMMrJoL4GELg8HjkdZ Pu++deHI8SAFG6hr5jpn9hc0oMHYc0c/6DTiOYVFAW2KHl7W7qiZAgM5rs5OB0uBZ+yk cgRL0V5H9RmyvO1Xx7QAGSzp0iwPapHBMO2ZEt7AuhT9p4PTMDZN4S0CAZmpznfSlBOs U5S6BVcobxlE2AUmmv43OnmrFxEaYGcHjCUrtCygLQQoyS6LVcjOYKYizcR4g29T+uy3 BPMRyRCqrwh6urX4Hif9cyNvDU0UwaoIp1txVbpd/yYYUc1Kat23acNg9ET7mxmuYHhD C09A== X-Gm-Message-State: AOAM531/aw0mJmUZNEiK4TBsb9J7VfZWAslK1l7HZZ4kv6J9HuiGYfuO Em8g0sATYYch/KVovAki+p5h65zO2I8nyUlMNgHZY7OgsrJWHsvisOm09PVPr1FI1E5lOkqPjmk j0Jl/tV/kEhOisNeKs8a+fZqDGjHTGN5afmFG5obhEYj8w9vkVIrEvzGHCKXIQDqQoMLZpa8= X-Google-Smtp-Source: ABdhPJxjkw/yqeda6xUJYsWGYY/wLKkmdJTozWW64CDQxevjqQcBEVcxPsMKP+vD7y++0G8regty9tbB/mrqyg== X-Received: from tef.lon.corp.google.com ([2a00:79e0:d:110:2df6:f24a:7f54:86a8]) (user=gprocida job=sendgmr) by 2002:a05:6214:10c7:: with SMTP id r7mr10469828qvs.3.1616709124904; Thu, 25 Mar 2021 14:52:04 -0700 (PDT) Date: Thu, 25 Mar 2021 21:51:38 +0000 In-Reply-To: <20210325215146.3597963-1-gprocida@google.com> Message-Id: <20210325215146.3597963-2-gprocida@google.com> Mime-Version: 1.0 References: <20210316165509.2658452-1-gprocida@google.com> <20210325215146.3597963-1-gprocida@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog Subject: [RFC PATCH 1/9] Add ABI tidying utility To: libabigail@sourceware.org X-Spam-Status: No, score=-22.5 required=5.0 tests=BAYES_00, DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libabigail@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list of the Libabigail project List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-Patchwork-Original-From: Giuliano Procida via Libabigail From: Giuliano Procida Reply-To: Giuliano Procida Cc: maennich@google.com, kernel-team@android.com Errors-To: libabigail-bounces@sourceware.org Sender: "Libabigail" This initial version: - reads XML into a DOM - strips all text (whitespace) from elements - reindents, assuming empty elements are emitted as a single tag - writes out XML, excluding the XML declaration Removing text elements makes other manipulation of the XML DON easier. This should be a semantics-preserving transformation, but is not. See https://sourceware.org/bugzilla/show_bug.cgi?id=27616. * scripts/abitidy.pl (stript_text): New function to remove text nodes from a DOM. (indent): New function to add whitespace nodes to reindent a DOM. (...): The rest of script consists of top-level comments, option handling and DOM read / write. Signed-off-by: Giuliano Procida --- scripts/abitidy.pl | 146 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 146 insertions(+) create mode 100755 scripts/abitidy.pl diff --git a/scripts/abitidy.pl b/scripts/abitidy.pl new file mode 100755 index 00000000..66d636d7 --- /dev/null +++ b/scripts/abitidy.pl @@ -0,0 +1,146 @@ +#!/usr/bin/perl + +# This script is intended to consume libabigail ABI XML as generated +# by abidw and produce a possibly smaller representation that captures +# the same ABI. In particular, the output should be such that abidiff +# --harmless reports no differences (or is empty). + +use v5.32.0; +use strict; +use warnings; +use experimental 'signatures'; + +use autodie; + +use Data::Dumper; +use Getopt::Long; +use IO::File; +use XML::LibXML; + +# Overview of ABI XML elements and their roles +# +# ELF +# +# elf-needed - container +# dependency - names a library +# elf-variable-symbols - contains a list of symbols +# elf-function-symbols - contains a list of symbols +# elf-symbol - describes an ELF variable or function +# +# Grouping and scoping +# +# abi-corpus-group +# abi-corpus +# abi-instr - compilation unit containers +# namespace-decl - pure container, possibly named +# +# Types (some introduce scopes, only in C++) +# +# type-decl - defines a primitive type +# typedef-decl - defines a type, links to a type +# qualified-type-def - defines a type, links to a type +# pointer-type-def - defines a type, links to a type +# reference-type-def - defines a type, links to a type +# array-type-def - defines a (multidimensional array) type, refers to element type, contains subranges +# subrange - contains array length, refers to element type; defines types (never referred to; duplicated) +# function-type - defines a type +# parameter - belongs to function-type and -decl, links to a type +# return - belongs to function-type and -decl, links to a type +# enum-decl - defines a type, names it, contains a list of enumerators and an underlying-type +# underlying-type - belongs to enum-decl +# enumerator - belongs to enum-decl +# union-decl - defines and names a type, contains member elements linked to other things +# class-decl - defines and names a type, contains base type, member elements linking to other things +# base-class - belongs to class-decl +# data-member - container for a member; holds access level +# member-function - container for a member; holds access level +# member-type - container for a type declaration; holds access level +# member-template - container for a (function?) template declaration; holds access level +# +# Higher order Things +# +# class-template-decl - defines a type (function), but without instantiation this isn't usable +# function-template-decl - defines a type (function), but without instantiation this isn't usable +# template-type-parameter - defines a type (variable), perhaps one which should be excluded from the real type graph +# template-non-type-parameter - names a template parameter, links to a type +# template-parameter-type-composition - container? +# +# Values +# +# var-decl - names a variable, can link to a symbol, links to a type +# function-decl - names a function, can link to a symbol +# has same children as function-type, rather than linking to a type + +# Remove all text nodes. +sub strip_text($dom) { + for my $node ($dom->findnodes('//text()')) { + $node->unbindNode(); + } +} + +# Make XML nicely indented. We could make the code a bit less inside +# out by passing the parent node as an extra argument. Efforts in this +# direction ran into trouble. +sub indent; +sub indent($indent, $node) { + if ($node->nodeType == XML_ELEMENT_NODE) { + my @children = $node->childNodes(); + return unless @children; + my $more_indent = $indent + 2; + # The ordering of operations here is incidental. The outcomes we + # want are 1. an extra newline after the opening tag and + # reindenting the closing tag to match, and 2. indentation for the + # children. + $node->insertBefore(new XML::LibXML::Text("\n"), $children[0]); + for my $child (@children) { + $node->insertBefore(new XML::LibXML::Text(' ' x $more_indent), $child); + indent($more_indent, $child); + $node->insertAfter(new XML::LibXML::Text("\n"), $child); + } + $node->appendText(' ' x $indent); + } else { + for my $child ($node->childNodes()) { + indent($indent, $child); + } + } +} + +# Parse arguments. +my $input_opt; +my $output_opt; +my $all_opt; +GetOptions('i|input=s' => \$input_opt, + 'o|output=s' => \$output_opt, + 'a|all' => sub { + 1 + }, + ) and !@ARGV or die("usage: $0", + map { (' ', $_) } ( + '[-i|--input file]', + '[-o|--output file]', + '[-a|--all]', + ), "\n"); + +exit 0 unless defined $input_opt; + +# Load the XML. +my $input = $input_opt eq '-' ? \*STDIN : new IO::File $input_opt, '<'; +my $dom = XML::LibXML->load_xml(IO => $input); +close $input; + +# This simplifies DOM analysis and manipulation. +strip_text($dom); + +exit 0 unless defined $output_opt; + +# Reformat for human consumption. +indent(0, $dom); + +# Emit the XML, removing the XML declaration. +my $output = $output_opt eq '-' ? \*STDOUT : new IO::File $output_opt, '>'; +my $out = $dom->toString(); +$out =~ s;^<\?xml .*\?>\n;;m; +print $output $out; +close $output; + +exit 0;