[1/2] ctf-reader: Fix re-initialization of the CTF reader

Message ID 87r0cqxzyr.fsf@redhat.com
State New
Headers
Series [1/2] ctf-reader: Fix re-initialization of the CTF reader |

Commit Message

Dodji Seketeli June 21, 2024, 12:06 p.m. UTC
  Hello,

When analyzing a Linux Kernel tree made of vmlinux and loadable module
binaries, the same (CTF) binary reader is re-used to load every single
binaries in a loop.  That can be seen, for instance, by reading the
code of load_vmlinux_corpus in abg-tools-utils.cc.

As part of that process, abigail::ctf::reader::initialize is invoked
prior to using the reader to read type information from each binary.

But then, looking at things a bit closer, I realized that
ctf::reader::initialize is failing to reset the reader::ctfa data
member.  That leads to a memory leak that was making things grow
out of proportion.

Also, the resetting code fails to actually clear out the map of types
that are to be sorted and canonicalized. That leads to unnecessarily
sorting huge amount of types.

The patch address the two points above.

Apart from the obvious gain in code integrity, this patch
significantly reduces the time taken to analyze a Linux Kernel tree.

The patch has been successfully tested in the CI on sourceware.

OK to apply to the mainline?

	* src/abg-ctf-reader.cc (reader::reader): Initialize ctfa,
	ctf_sect, symtab_sect and strtab_sect data members.
	(reader::initialize): In the overload taking no argument, make
	sure to free the ctfa data member before setting it to nullptr.
	In the overload that takes argument, make sure to invoke
	reader::initialize() to really free the data used.  Also, improve
	the comments.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 src/abg-ctf-reader.cc | 48 ++++++++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 21 deletions(-)
  

Comments

Claudiu Zissulescu-Ianculescu June 21, 2024, 3:40 p.m. UTC | #1
Hello,

IMHO looks alright,
Claudiu

On 6/21/24 3:06 PM, Dodji Seketeli wrote:
> Hello,
> 
> When analyzing a Linux Kernel tree made of vmlinux and loadable module
> binaries, the same (CTF) binary reader is re-used to load every single
> binaries in a loop.  That can be seen, for instance, by reading the
> code of load_vmlinux_corpus in abg-tools-utils.cc.
> 
> As part of that process, abigail::ctf::reader::initialize is invoked
> prior to using the reader to read type information from each binary.
> 
> But then, looking at things a bit closer, I realized that
> ctf::reader::initialize is failing to reset the reader::ctfa data
> member.  That leads to a memory leak that was making things grow
> out of proportion.
> 
> Also, the resetting code fails to actually clear out the map of types
> that are to be sorted and canonicalized. That leads to unnecessarily
> sorting huge amount of types.
> 
> The patch address the two points above.
> 
> Apart from the obvious gain in code integrity, this patch
> significantly reduces the time taken to analyze a Linux Kernel tree.
> 
> The patch has been successfully tested in the CI on sourceware.
> 
> OK to apply to the mainline?
> 
> 	* src/abg-ctf-reader.cc (reader::reader): Initialize ctfa,
> 	ctf_sect, symtab_sect and strtab_sect data members.
> 	(reader::initialize): In the overload taking no argument, make
> 	sure to free the ctfa data member before setting it to nullptr.
> 	In the overload that takes argument, make sure to invoke
> 	reader::initialize() to really free the data used.  Also, improve
> 	the comments.
> 
> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
> ---
>  src/abg-ctf-reader.cc | 48 ++++++++++++++++++++++++-------------------
>  1 file changed, 27 insertions(+), 21 deletions(-)
> 
> diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc
> index f4fb91ed..8810e138 100644
> --- a/src/abg-ctf-reader.cc
> +++ b/src/abg-ctf-reader.cc
> @@ -238,7 +238,8 @@ public:
>    reader(const string&		elf_path,
>  	 const vector<char**>&	debug_info_root_paths,
>  	 environment&		env)
> -    : elf_based_reader(elf_path, debug_info_root_paths, env)
> +    : elf_based_reader(elf_path, debug_info_root_paths, env),
> +      ctfa(), ctf_sect(), symtab_sect(), strtab_sect()
>    {
>      initialize();
>    }
> @@ -248,17 +249,21 @@ public:
>    /// This is useful to clear out the data used by the reader and get
>    /// it ready to be used again.
>    ///
> -  /// Note that the reader eeps the same environment it has been
> -  /// originally created with.
> +  /// Note that the reader keeps (doesn't clear) the same environment
> +  /// it has been originally created with.
>    ///
>    /// Please also note that the life time of this environment object
> -  /// must be greater than the life time of the resulting @ref
> -  /// reader the context uses resources that are allocated in
> -  /// the environment.
> +  /// must be greater than the life time of the resulting @ref reader
> +  /// the context uses resources that are allocated in the
> +  /// environment.
>    void
>    initialize()
>    {
> -    ctfa = nullptr;
> +    if (ctfa)
> +      {
> +	ctf_close(ctfa);
> +	ctfa = nullptr;
> +      }
>      types_map.clear();
>      cur_tu_.reset();
>      corpus_group().reset();
> @@ -266,6 +271,16 @@ public:
>  
>    /// Initializer of the reader.
>    ///
> +  /// This first makes sure the data used by the reader is cleared.
> +  /// And then it initlizes it with the information passed in
> +  /// argument.
> +  ///
> +  /// This is useful to clear out the data used by the reader and get
> +  /// it ready to be used again.
> +  ///
> +  /// Note that the reader keeps the same environment it has been
> +  /// originally created with.
> +  ///
>    /// @param elf_path the new path to the new ELF file to use.
>    ///
>    /// @param debug_info_root_paths a vector of paths to use to look
> @@ -275,22 +290,13 @@ public:
>    ///
>    /// @param linux_kernel_mode currently not used.
>    ///
> -  /// This is useful to clear out the data used by the reader and get
> -  /// it ready to be used again.
> -  ///
> -  /// Note that the reader eeps the same environment it has been
> -  /// originally created with.
> -  ///
> -  /// Please also note that the life time of this environment object
> -  /// must be greater than the life time of the resulting @ref
> -  /// reader the context uses resources that are allocated in
> -  /// the environment.
>    void
> -  initialize(const string& elf_path,
> -             const vector<char**>& debug_info_root_paths,
> -             bool load_all_types = false,
> -             bool linux_kernel_mode = false)
> +  initialize(const string&		elf_path,
> +             const vector<char**>&	debug_info_root_paths,
> +             bool			load_all_types = false,
> +             bool			linux_kernel_mode = false)
>    {
> +    initialize();
>      load_all_types = load_all_types;
>      linux_kernel_mode = linux_kernel_mode;
>      elf_based_reader::initialize(elf_path, debug_info_root_paths);
  
Dodji Seketeli June 21, 2024, 4:50 p.m. UTC | #2
Claudiu Zissulescu-Ianculescu <claudiu.zissulescu-ianculescu@oracle.com> a écrit:

[...]

> IMHO looks alright,

Thanks!

[...]

> On 6/21/24 3:06 PM, Dodji Seketeli wrote:

[...]

>> Hello,
>> 
>> When analyzing a Linux Kernel tree made of vmlinux and loadable module
>> binaries, the same (CTF) binary reader is re-used to load every single
>> binaries in a loop.  That can be seen, for instance, by reading the
>> code of load_vmlinux_corpus in abg-tools-utils.cc.
>> 
>> As part of that process, abigail::ctf::reader::initialize is invoked
>> prior to using the reader to read type information from each binary.
>> 
>> But then, looking at things a bit closer, I realized that
>> ctf::reader::initialize is failing to reset the reader::ctfa data
>> member.  That leads to a memory leak that was making things grow
>> out of proportion.
>> 
>> Also, the resetting code fails to actually clear out the map of types
>> that are to be sorted and canonicalized. That leads to unnecessarily
>> sorting huge amount of types.
>> 
>> The patch address the two points above.
>> 
>> Apart from the obvious gain in code integrity, this patch
>> significantly reduces the time taken to analyze a Linux Kernel tree.
>> 
>> The patch has been successfully tested in the CI on sourceware.
>> 
>> OK to apply to the mainline?

So, I have thus applied this patch to the mainline.

Many thanks!

[...]

Cheers,
  

Patch

diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc
index f4fb91ed..8810e138 100644
--- a/src/abg-ctf-reader.cc
+++ b/src/abg-ctf-reader.cc
@@ -238,7 +238,8 @@  public:
   reader(const string&		elf_path,
 	 const vector<char**>&	debug_info_root_paths,
 	 environment&		env)
-    : elf_based_reader(elf_path, debug_info_root_paths, env)
+    : elf_based_reader(elf_path, debug_info_root_paths, env),
+      ctfa(), ctf_sect(), symtab_sect(), strtab_sect()
   {
     initialize();
   }
@@ -248,17 +249,21 @@  public:
   /// This is useful to clear out the data used by the reader and get
   /// it ready to be used again.
   ///
-  /// Note that the reader eeps the same environment it has been
-  /// originally created with.
+  /// Note that the reader keeps (doesn't clear) the same environment
+  /// it has been originally created with.
   ///
   /// Please also note that the life time of this environment object
-  /// must be greater than the life time of the resulting @ref
-  /// reader the context uses resources that are allocated in
-  /// the environment.
+  /// must be greater than the life time of the resulting @ref reader
+  /// the context uses resources that are allocated in the
+  /// environment.
   void
   initialize()
   {
-    ctfa = nullptr;
+    if (ctfa)
+      {
+	ctf_close(ctfa);
+	ctfa = nullptr;
+      }
     types_map.clear();
     cur_tu_.reset();
     corpus_group().reset();
@@ -266,6 +271,16 @@  public:
 
   /// Initializer of the reader.
   ///
+  /// This first makes sure the data used by the reader is cleared.
+  /// And then it initlizes it with the information passed in
+  /// argument.
+  ///
+  /// This is useful to clear out the data used by the reader and get
+  /// it ready to be used again.
+  ///
+  /// Note that the reader keeps the same environment it has been
+  /// originally created with.
+  ///
   /// @param elf_path the new path to the new ELF file to use.
   ///
   /// @param debug_info_root_paths a vector of paths to use to look
@@ -275,22 +290,13 @@  public:
   ///
   /// @param linux_kernel_mode currently not used.
   ///
-  /// This is useful to clear out the data used by the reader and get
-  /// it ready to be used again.
-  ///
-  /// Note that the reader eeps the same environment it has been
-  /// originally created with.
-  ///
-  /// Please also note that the life time of this environment object
-  /// must be greater than the life time of the resulting @ref
-  /// reader the context uses resources that are allocated in
-  /// the environment.
   void
-  initialize(const string& elf_path,
-             const vector<char**>& debug_info_root_paths,
-             bool load_all_types = false,
-             bool linux_kernel_mode = false)
+  initialize(const string&		elf_path,
+             const vector<char**>&	debug_info_root_paths,
+             bool			load_all_types = false,
+             bool			linux_kernel_mode = false)
   {
+    initialize();
     load_all_types = load_all_types;
     linux_kernel_mode = linux_kernel_mode;
     elf_based_reader::initialize(elf_path, debug_info_root_paths);