[PATCHv,v2] ctf-reader: Lookup debug info for symbols in a non default archive member

Message ID 20220907234042.1610173-1-guillermo.e.martinez@oracle.com
State New
Headers
Series [PATCHv,v2] ctf-reader: Lookup debug info for symbols in a non default archive member |

Commit Message

Guillermo E. Martinez Sept. 7, 2022, 11:40 p.m. UTC
  Hello,

This patch v2 to improves the ABI XML file generated by ctf reader,
there are Linux symbols (EXPORT_SYMBOL*) that were missing.

Changes from v1:

    - Change order for `ctf_lookup_*' to at first looks symbol
    function types in `CTF Function section', and afterwards 
    if is not success try in `CTF Variable section'. 
    - Add comments describing use of `ctf_lookup_variable'.

Comments will be grateful and appreciated!.

Thanks in advanced,
guillermo
--

The current mechanism used by the ctf reader to looking for debug
information given a specific Linux symbol, it opens the dictionary
(default) which the name match with the binary name being processing
in the current corpus, e.g. `vmlinux' or `module-name`.ko. However
there are information symbols not located in a default dictionary,
this is evident comparing the symbols in `Module.symvers' file with
ABI XML file, so for example, the ctf reader is expecting to find the
information for `LZ4_decompress_fast' symbol in the CTF `vmlinux'
archive member, because this symbols is defined in `vmlinux' binary:

   0x4c416eb9	LZ4_decompress_fast	vmlinux	EXPORT_SYMBOL

But, it figures out that it is missing. The correct location is
`vmlinux#0' dictionary:

  CTF archive member: vmlinux:
    ...
    Function objects:
    ...

  CTF archive member: vmlinux#0:
    Function objects:
    ...
    LZ4_decompress_fast -> 0x80037400: (kind 5) int (*) (const char *, char *, int) (aligned at 0x8)
    ...

Therefore, ctf reader is looking for debug information in the whole
archive, fortunately `libctf' provides a fast lookup mechanism using
cache, dictionary references, etc., so the penalty performance is ~10%.

Now, it make use of `ctf_lookup_by_symbol_name' at first instance which
is in charge to locate symbol information given a symbol name on either
CTF Function o Variable sections, if there isn't found it tries by using
`ctf_lookup_variable' to looks in the CTF  Variable section, this could
happens due to `ld' operated with `--ctf-variables' option and function
types information now resides in CTF  Variable section.

	* src/abg-ctf-reader.cc (lookup_symbol_in_ctf_archive): New function.
	(process_ctf_archive): Use `lookup_symbol_in_ctf_archive'.

Signed-off-by: Guillermo E. Martinez <guillermo.e.martinez@oracle.com>
---
 src/abg-ctf-reader.cc | 74 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 64 insertions(+), 10 deletions(-)
  

Comments

Dodji Seketeli Sept. 13, 2022, 9:26 a.m. UTC | #1
Hello Guillermo,

Thank you for the explanations and the updated patch.  Everything is
clear for me now!  Thanks again.

I have applied the patch to master, but just with some slight obvious
changes that I am discussing below.


"Guillermo E. Martinez via Libabigail" <libabigail@sourceware.org> a
écrit:

[...]


> --- a/src/abg-ctf-reader.cc

[...]

> @@ -1204,6 +1204,61 @@ lookup_type(read_context *ctxt, corpus_sptr corp,
>    return result;
>  }
>  
> +/// Given a symbol name, lookup the corresponding CTF information in
> +/// the default dictionary (CTF archive member provided by the caller)
> +/// If the search is not success, the  looks for the symbol name
> +/// in _all_ archive members.
> +///
> +/// @param ctfa the CTF archive.
> +/// @param dict the default dictionary to looks for.
> +/// @param sym_name the symbol name.
> +/// @param corp the IR corpus.
> +///
> +/// Note that if @ref sym_name is found in other than its default dictionary
> +/// @ref ctf_dict will be updated and it must be explicitly closed by its
> +/// caller.
> +///
> +/// @return a valid CTF type id, if @ref sym_name was found, CTF_ERR otherwise.
> +
> +static ctf_id_t
> +lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict,
> +                             const char *sym_name, corpus_sptr corp)

It seems to me that the "corp" parameter is not used in the function, so
I removed it.  I have adjusted the doxygen comment to remove it as well.


> +{
> +  int ctf_err;
> +  ctf_dict_t *dict = *ctf_dict;
> +  ctf_id_t ctf_type = ctf_lookup_by_symbol_name(dict, sym_name);
> +
> +  if (ctf_type != CTF_ERR)
> +    return ctf_type;
> +
> +  /* Probably --ctf-variables option was used by ld, so symbol type
> +     definition must be found in the CTF Variable section. */
> +  ctf_type = ctf_lookup_variable(dict, sym_name);
> +
> +  /* Not lucky, then, search in whole archive */
> +  if (ctf_type == CTF_ERR)
> +    {
> +      ctf_dict_t *fp;
> +      ctf_next_t *i = NULL;
> +      const char *arcname;
> +
> +      while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL)
> +        {
> +          if ((ctf_type = ctf_lookup_by_symbol_name (fp, sym_name)) == CTF_ERR)
> +            ctf_type = ctf_lookup_variable(fp, sym_name);
> +
> +          if (ctf_type != CTF_ERR)
> +            {
> +              *ctf_dict = fp;
> +              break;
> +            }
> +          ctf_dict_close(fp);
> +        }
> +    }
> +
> +  return ctf_type;
> +}
> +

[...]

>    for (const auto& symbol : symtab_reader::filtered_symtab(*symtab, filter))
>      {
>        std::string sym_name = symbol->get_name();
>        ctf_id_t ctf_sym_type;
>  
> -      ctf_sym_type = ctf_lookup_variable(ctf_dict, sym_name.c_str());
> -      if (ctf_sym_type == (ctf_id_t) -1
> -          && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
> -        // lookup in function objects
> -        ctf_sym_type = ctf_lookup_by_symbol_name(ctf_dict, sym_name.c_str());
> -
> -      if (ctf_sym_type == (ctf_id_t) -1)
> -        continue;
> +      ctf_sym_type = lookup_symbol_in_ctf_archive(ctxt->ctfa, &ctf_dict,
> +                                                  sym_name.c_str(), corp);
I have adjusted that call to remove the "corp" argument as it's no
longer needed.

Oh, thanks for adjusting this code.  Using lookup_symbol_in_ctf_archive
here makes things a lot clearer to me at least!

Below is the patch that I have applied.  I have slightly amended the
introductory test to correct some slight typos.

From ad47854627f76c7959ae1a7ae59c9fcda38091c5 Mon Sep 17 00:00:00 2001
From: "Guillermo E. Martinez via Libabigail" <libabigail@sourceware.org>
Date: Wed, 7 Sep 2022 18:40:42 -0500
Subject: [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member

The current mechanism used by the ctf reader for looking for debug
information given a specific Linux symbol is the following: it opens
the dictionary (default) which name matches the binary name being
processed in the current corpus, e.g. `vmlinux' or
`module-name`.ko. However there are symbols and information that are
not located in the default dictionary; this is evident comparing the
symbols in `Module.symvers' file with ABI XML file, so for example,
the ctf reader is expecting to find the information for
`LZ4_decompress_fast' symbol in the CTF `vmlinux' archive member,
because this symbols is defined in `vmlinux' binary:

   0x4c416eb9	LZ4_decompress_fast	vmlinux	EXPORT_SYMBOL

But, it figures out that it is missing. The correct location is
`vmlinux#0' dictionary:

  CTF archive member: vmlinux:
    ...
    Function objects:
    ...

  CTF archive member: vmlinux#0:
    Function objects:
    ...
    LZ4_decompress_fast -> 0x80037400: (kind 5) int (*) (const char *, char *, int) (aligned at 0x8)
    ...

Therefore, ctf reader must be looking for debug information in the
whole archive; fortunately `libctf' provides a fast lookup mechanism
using cache, dictionary references, etc., so the penalty performance
is ~10%.

Now, it make use of `ctf_lookup_by_symbol_name' at first instance
which is in charge to locate symbol information given a symbol name on
either CTF Function or Variable sections; if the symbol isn't found it
tries using `ctf_lookup_variable' to look into the CTF Variable
section; this could happens due to `ld' operating with the
`--ctf-variables' option which makes function types information to
reside in the CTF Variable section.

	* src/abg-ctf-reader.cc (lookup_symbol_in_ctf_archive): New function.
	(process_ctf_archive): Use `lookup_symbol_in_ctf_archive'.

Signed-off-by: Guillermo E. Martinez <guillermo.e.martinez@oracle.com>
Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 src/abg-ctf-reader.cc | 74 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 64 insertions(+), 10 deletions(-)

diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc
index 71808f9a..e307fcd7 100644
--- a/src/abg-ctf-reader.cc
+++ b/src/abg-ctf-reader.cc
@@ -1204,6 +1204,61 @@ lookup_type(read_context *ctxt, corpus_sptr corp,
   return result;
 }
 
+/// Given a symbol name, lookup the corresponding CTF information in
+/// the default dictionary (CTF archive member provided by the caller)
+/// If the search is not success, the  looks for the symbol name
+/// in _all_ archive members.
+///
+/// @param ctfa the CTF archive.
+/// @param dict the default dictionary to looks for.
+/// @param sym_name the symbol name.
+/// @param corp the IR corpus.
+///
+/// Note that if @ref sym_name is found in other than its default dictionary
+/// @ref ctf_dict will be updated and it must be explicitly closed by its
+/// caller.
+///
+/// @return a valid CTF type id, if @ref sym_name was found, CTF_ERR otherwise.
+
+static ctf_id_t
+lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict,
+                             const char *sym_name)
+{
+  int ctf_err;
+  ctf_dict_t *dict = *ctf_dict;
+  ctf_id_t ctf_type = ctf_lookup_by_symbol_name(dict, sym_name);
+
+  if (ctf_type != CTF_ERR)
+    return ctf_type;
+
+  /* Probably --ctf-variables option was used by ld, so symbol type
+     definition must be found in the CTF Variable section. */
+  ctf_type = ctf_lookup_variable(dict, sym_name);
+
+  /* Not lucky, then, search in whole archive */
+  if (ctf_type == CTF_ERR)
+    {
+      ctf_dict_t *fp;
+      ctf_next_t *i = NULL;
+      const char *arcname;
+
+      while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL)
+        {
+          if ((ctf_type = ctf_lookup_by_symbol_name (fp, sym_name)) == CTF_ERR)
+            ctf_type = ctf_lookup_variable(fp, sym_name);
+
+          if (ctf_type != CTF_ERR)
+            {
+              *ctf_dict = fp;
+              break;
+            }
+          ctf_dict_close(fp);
+        }
+    }
+
+  return ctf_type;
+}
+
 /// Process a CTF archive and create libabigail IR for the types,
 /// variables and function declarations found in the archive, iterating
 /// over public symbols.  The IR is added to the given corpus.
@@ -1222,7 +1277,7 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
   corp->add(ir_translation_unit);
 
   int ctf_err;
-  ctf_dict_t *ctf_dict;
+  ctf_dict_t *ctf_dict, *dict_tmp;
   const auto symtab = ctxt->symtab;
   symtab_reader::symtab_filter filter = symtab->make_filter();
   filter.set_public_symbols();
@@ -1248,19 +1303,17 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
       abort();
     }
 
+  dict_tmp = ctf_dict;
+
   for (const auto& symbol : symtab_reader::filtered_symtab(*symtab, filter))
     {
       std::string sym_name = symbol->get_name();
       ctf_id_t ctf_sym_type;
 
-      ctf_sym_type = ctf_lookup_variable(ctf_dict, sym_name.c_str());
-      if (ctf_sym_type == (ctf_id_t) -1
-          && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
-        // lookup in function objects
-        ctf_sym_type = ctf_lookup_by_symbol_name(ctf_dict, sym_name.c_str());
-
-      if (ctf_sym_type == (ctf_id_t) -1)
-        continue;
+      ctf_sym_type = lookup_symbol_in_ctf_archive(ctxt->ctfa, &ctf_dict,
+                                                  sym_name.c_str());
+      if (ctf_sym_type == CTF_ERR)
+          continue;
 
       if (ctf_type_kind(ctf_dict, ctf_sym_type) != CTF_K_FUNCTION)
         {
@@ -1298,13 +1351,14 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
                                                    func_type,
                                                    0 /* is_inline */,
                                                    location()));
-
           func_declaration->set_symbol(symbol);
           add_decl_to_scope(func_declaration,
                             ir_translation_unit->get_global_scope());
           func_declaration->set_is_in_public_symbol_table(true);
           ctxt->maybe_add_fn_to_exported_decls(func_declaration.get());
         }
+
+      ctf_dict = dict_tmp;
     }
 
   ctf_dict_close(ctf_dict);
  

Patch

diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc
index 71808f9a..f5f58c7a 100644
--- a/src/abg-ctf-reader.cc
+++ b/src/abg-ctf-reader.cc
@@ -1204,6 +1204,61 @@  lookup_type(read_context *ctxt, corpus_sptr corp,
   return result;
 }
 
+/// Given a symbol name, lookup the corresponding CTF information in
+/// the default dictionary (CTF archive member provided by the caller)
+/// If the search is not success, the  looks for the symbol name
+/// in _all_ archive members.
+///
+/// @param ctfa the CTF archive.
+/// @param dict the default dictionary to looks for.
+/// @param sym_name the symbol name.
+/// @param corp the IR corpus.
+///
+/// Note that if @ref sym_name is found in other than its default dictionary
+/// @ref ctf_dict will be updated and it must be explicitly closed by its
+/// caller.
+///
+/// @return a valid CTF type id, if @ref sym_name was found, CTF_ERR otherwise.
+
+static ctf_id_t
+lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict,
+                             const char *sym_name, corpus_sptr corp)
+{
+  int ctf_err;
+  ctf_dict_t *dict = *ctf_dict;
+  ctf_id_t ctf_type = ctf_lookup_by_symbol_name(dict, sym_name);
+
+  if (ctf_type != CTF_ERR)
+    return ctf_type;
+
+  /* Probably --ctf-variables option was used by ld, so symbol type
+     definition must be found in the CTF Variable section. */
+  ctf_type = ctf_lookup_variable(dict, sym_name);
+
+  /* Not lucky, then, search in whole archive */
+  if (ctf_type == CTF_ERR)
+    {
+      ctf_dict_t *fp;
+      ctf_next_t *i = NULL;
+      const char *arcname;
+
+      while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL)
+        {
+          if ((ctf_type = ctf_lookup_by_symbol_name (fp, sym_name)) == CTF_ERR)
+            ctf_type = ctf_lookup_variable(fp, sym_name);
+
+          if (ctf_type != CTF_ERR)
+            {
+              *ctf_dict = fp;
+              break;
+            }
+          ctf_dict_close(fp);
+        }
+    }
+
+  return ctf_type;
+}
+
 /// Process a CTF archive and create libabigail IR for the types,
 /// variables and function declarations found in the archive, iterating
 /// over public symbols.  The IR is added to the given corpus.
@@ -1222,7 +1277,7 @@  process_ctf_archive(read_context *ctxt, corpus_sptr corp)
   corp->add(ir_translation_unit);
 
   int ctf_err;
-  ctf_dict_t *ctf_dict;
+  ctf_dict_t *ctf_dict, *dict_tmp;
   const auto symtab = ctxt->symtab;
   symtab_reader::symtab_filter filter = symtab->make_filter();
   filter.set_public_symbols();
@@ -1248,19 +1303,17 @@  process_ctf_archive(read_context *ctxt, corpus_sptr corp)
       abort();
     }
 
+  dict_tmp = ctf_dict;
+
   for (const auto& symbol : symtab_reader::filtered_symtab(*symtab, filter))
     {
       std::string sym_name = symbol->get_name();
       ctf_id_t ctf_sym_type;
 
-      ctf_sym_type = ctf_lookup_variable(ctf_dict, sym_name.c_str());
-      if (ctf_sym_type == (ctf_id_t) -1
-          && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
-        // lookup in function objects
-        ctf_sym_type = ctf_lookup_by_symbol_name(ctf_dict, sym_name.c_str());
-
-      if (ctf_sym_type == (ctf_id_t) -1)
-        continue;
+      ctf_sym_type = lookup_symbol_in_ctf_archive(ctxt->ctfa, &ctf_dict,
+                                                  sym_name.c_str(), corp);
+      if (ctf_sym_type == CTF_ERR)
+          continue;
 
       if (ctf_type_kind(ctf_dict, ctf_sym_type) != CTF_K_FUNCTION)
         {
@@ -1298,13 +1351,14 @@  process_ctf_archive(read_context *ctxt, corpus_sptr corp)
                                                    func_type,
                                                    0 /* is_inline */,
                                                    location()));
-
           func_declaration->set_symbol(symbol);
           add_decl_to_scope(func_declaration,
                             ir_translation_unit->get_global_scope());
           func_declaration->set_is_in_public_symbol_table(true);
           ctxt->maybe_add_fn_to_exported_decls(func_declaration.get());
         }
+
+      ctf_dict = dict_tmp;
     }
 
   ctf_dict_close(ctf_dict);