ctf-reader: Clear data that is relevant for current corpus processing.

Message ID 20240619153134.98859-1-claudiu.zissulescu-ianculescu@oracle.com
State New
Headers
Series ctf-reader: Clear data that is relevant for current corpus processing. |

Commit Message

Claudiu Zissulescu-Ianculescu June 19, 2024, 3:31 p.m. UTC
  From: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>

types_map is used as an intermediary storage of the types and
functions infos read using CTF reader and abigal's internal
representation.  This data is transfer to a vector which is latter on
reorder to achieve a canonical representation.  However, the types_map
is not clean each time a new archive is processed, leading to
increasing processing time when we need to process multiple CTF infos
like in the case of analyzing a linux built tree.
This fix dramatically reduce the time required to analyze a linux
built tree to few tens of seconds.

	* abg-ctf-reader.cc (clear_per_corpus_data()): New function.
	(read_corpus()): Clear relevant data calling the above function.

Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
---
 src/abg-ctf-reader.cc | 10 ++++++++++
 1 file changed, 10 insertions(+)
  

Comments

Dodji Seketeli June 21, 2024, 11:58 a.m. UTC | #1
Hello Claudiu,

Thanks for the patch!  It's really appreciated!

Please see my comments below.

claudiu.zissulescu-ianculescu@oracle.com a écrit:

> From: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
>
> types_map is used as an intermediary storage of the types and
> functions infos read using CTF reader and abigal's internal
> representation.  This data is transfer to a vector which is latter on
> reorder to achieve a canonical representation.  However, the types_map
> is not clean each time a new archive is processed, leading to
> increasing processing time when we need to process multiple CTF infos
> like in the case of analyzing a linux built tree.

You nailed the issue on the head!  Very well done!

I however have some more context to provide about how I think we could
handle the fix.  Please keep reading below.

> This fix dramatically reduce the time required to analyze a linux
> built tree to few tens of seconds.

Indeed.

[...]

> --- a/src/abg-ctf-reader.cc
> +++ b/src/abg-ctf-reader.cc
> @@ -644,6 +644,14 @@ public:
>      return result;
>    }
>  
> +  /// Clear the data that is relevant for the current corpus being
> +  /// read.
> +  void
> +  clear_per_corpus_data()
> +  {
> +    types_map.clear();
> +  }
> +
>    /// Read the CTF information in the binary and construct an ABI
>    /// corpus from it.
>    ///
> @@ -655,6 +663,8 @@ public:
>    corpus_sptr
>    read_corpus(fe_iface::status &status)
>    {
> +    clear_per_corpus_data ();
> +

So, if you read load_vmlinux_corpus in abg-tools-utils.cc, you can see
that all the binary type info readers (DWARF, BTF and CTF) are invoked
by first calling reader::initialize *before* actually reading the type
information.

So, if there is data to clear/reset, that should be done in
reader::initialize.  You can also see that by reading the API doc
comments of the reader::initialize functions.

So, what you are doing here is more or less what I have done in the
patch that I have just finished to "CI-test" at
https://sourceware.org/git/?p=libabigail.git;a=commit;h=79c045923852f1f44baddd35aa312edfa6e8fdc1.

That patch follows the same spirit as your patch here, but it does a
little bit more, as it also fixes some memory leak and fixes the
initialization sequence.

I thus propose to apply that patch instead, if you agree.

Many thanks for your dedication.

[...]

Cheers,
  

Patch

diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc
index 86c7752e..623bfa2b 100644
--- a/src/abg-ctf-reader.cc
+++ b/src/abg-ctf-reader.cc
@@ -644,6 +644,14 @@  public:
     return result;
   }
 
+  /// Clear the data that is relevant for the current corpus being
+  /// read.
+  void
+  clear_per_corpus_data()
+  {
+    types_map.clear();
+  }
+
   /// Read the CTF information in the binary and construct an ABI
   /// corpus from it.
   ///
@@ -655,6 +663,8 @@  public:
   corpus_sptr
   read_corpus(fe_iface::status &status)
   {
+    clear_per_corpus_data ();
+
     corpus_sptr corp = corpus();
     status = fe_iface::STATUS_UNKNOWN;