[1/1] Add support for symbol addition to the Python API

Message ID 5c5901d9216f$700a9b10$501fd130$@gmail.com
State New
Headers
Series [1/1] Add support for symbol addition to the Python API |

Commit Message

Terekhov, Mikhail via Gdb-patches Jan. 6, 2023, 1:37 a.m. UTC
  This patch adds support for symbol creation and registration. It currently 
supports adding type symbols (VAR_DOMAIN/LOC_TYPEDEF), static symbols 
(VAR_DOMAIN/LOC_STATIC) and goto target labels (LABEL_DOMAIN/LOC_LABEL). It 
adds the `add_type_symbol`, `add_static_symbol` and `add_label_symbol`
functions
to the `gdb.Objfile` type, allowing for the addition of the aforementioned
types of
symbols.

This is done through building a new `compunit_symtab`s for each symbol that
is
to be added, owned by a given objfile and whose lifetimes is bound to it. i 
might be missing something here, but there doesn't seem to be an intended
way 
to add new symbols to a compunit_symtab after it's been finished. if there
is, 
then the efficiency of this method could very much be improved. It could
also be
made more efficient by having a way to add whole batches of symbols at once,

which would then all get added to the same `compunit_symtab`.

For now, though, this implementation lets us add symbols that can be used
to,
for instance, query registered types through `gdb.lookup_type`, and allows
reverse engineering GDB plugins (such as Pwndbg [0] or decomp2gdb [1]) to
add 
symbols directly through the Python API instead of having to compile an
object 
file for the target architecture that they later load through the
add-symbol-file 
command. [2]

[0] https://github.com/pwndbg/pwndbg/
[1] https://github.com/mahaloz/decomp2dbg
[2]
https://github.com/mahaloz/decomp2dbg/blob/055be6b2001954d00db2d683f20e9b714
af75880/decomp2dbg/clients/gdb/symbol_mapper.py#L235-L243]


---
  

Comments

Simon Marchi Jan. 6, 2023, 8:21 p.m. UTC | #1
Same as with the other patches, I can't apply that patch, it seems
misformatted.

On 1/5/23 20:37, dark.ryu.550--- via Gdb-patches wrote:
> This patch adds support for symbol creation and registration. It currently 
> supports adding type symbols (VAR_DOMAIN/LOC_TYPEDEF), static symbols 
> (VAR_DOMAIN/LOC_STATIC) and goto target labels (LABEL_DOMAIN/LOC_LABEL). It 
> adds the `add_type_symbol`, `add_static_symbol` and `add_label_symbol`
> functions
> to the `gdb.Objfile` type, allowing for the addition of the aforementioned
> types of
> symbols.
> 
> This is done through building a new `compunit_symtab`s for each symbol that
> is
> to be added, owned by a given objfile and whose lifetimes is bound to it. i 
> might be missing something here, but there doesn't seem to be an intended
> way 
> to add new symbols to a compunit_symtab after it's been finished. if there
> is, 
> then the efficiency of this method could very much be improved. It could
> also be
> made more efficient by having a way to add whole batches of symbols at once,
> 
> which would then all get added to the same `compunit_symtab`.

Indeed, I don't think there's a way today to add symbols to a finished
compunit_symtab.  Maybe it would be worth exploring that.  First, to
avoid creating one compunit_symtab per created user symbol.  But also
because I wonder how user-created symbols interact with existing
symbols.  Let's say I have a symbol that comes from DWARF in an existing
compunit_symtab, and I create a user symbol for that function's address.
The new symbol is in a new compunit_symtab.  This means there is some
overlap in the addresses of two compunit_symtabs.  What would functions
like find_compunit_symtab_by_address return?  Should the new symbol be
added to an existing compunit_symtab, if the address falls into an
existing compunit_symtab's address range?

I think I'll have more questions / worries, but I'll wait until I can
actually apply the patch and read it (I can't read diffs, sorry).

Simon
  

Patch

diff --git a/gdb/python/py-objfile.c b/gdb/python/py-objfile.c
index c278925531b..9b884d4c414 100644
--- a/gdb/python/py-objfile.c
+++ b/gdb/python/py-objfile.c
@@ -25,6 +25,7 @@ 
 #include "build-id.h"
 #include "symtab.h"
 #include "python.h"
+#include "buildsym.h"
 
 struct objfile_object
 {
@@ -527,6 +528,229 @@  objfpy_lookup_static_symbol (PyObject *self, PyObject
*args, PyObject *kw)
   Py_RETURN_NONE;
 }
 
+static struct symbol *
+add_new_symbol(
+  struct objfile *objfile,
+  const char *name,
+  enum language language,
+  enum domain_enum domain,
+  enum address_class aclass,
+  short section_index,
+  CORE_ADDR last_addr,
+  CORE_ADDR end_addr,
+  bool global,
+  std::function<void(struct symbol*)> params)
+{
+  struct symbol *symbol = new (&objfile->objfile_obstack) struct symbol();
+  OBJSTAT (objfile, n_syms++);
+
+  symbol->set_language(language, &objfile->objfile_obstack);
+  symbol->compute_and_set_names(gdb::string_view (name), true,
objfile->per_bfd);
+
+  symbol->set_is_objfile_owned (true);
+  symbol->set_section_index (aclass);
+  symbol->set_domain (domain);
+  symbol->set_aclass_index (aclass);
+
+  params(symbol);
+
+  buildsym_compunit builder (objfile, "", "", language, last_addr);
+  add_symbol_to_list (symbol, global ? builder.get_global_symbols() :
builder.get_file_symbols());
+  builder.end_compunit_symtab(end_addr, section_index);
+
+  return symbol;
+}
+
+static enum language
+parse_language(const char *language)
+{
+  if (strcmp (language, "c") == 0)
+    return language_c;
+  else if (strcmp (language, "objc") == 0)
+    return language_objc;
+  else if (strcmp (language, "cplus") == 0)
+    return language_cplus;
+  else if (strcmp (language, "d") == 0)
+    return language_d;
+  else if (strcmp (language, "go") == 0)
+    return language_go;
+  else if (strcmp (language, "fortran") == 0)
+    return language_fortran;
+  else if (strcmp (language, "m2") == 0)
+    return language_m2;
+  else if (strcmp (language, "asm") == 0)
+    return language_asm;
+  else if (strcmp (language, "pascal") == 0)
+    return language_pascal;
+  else if (strcmp (language, "opencl") == 0)
+    return language_opencl;
+  else if (strcmp (language, "rust") == 0)
+    return language_rust;
+  else if (strcmp (language, "ada") == 0)
+    return language_ada;
+  else if (strcmp (language, "auto") == 0)
+    return language_auto;
+  else
+    return language_unknown;
+}
+
+/* Adds a type (LOC_TYPEDEF) symbol to a given objfile. */
+
+static PyObject *
+objfpy_add_type_symbol (PyObject *self, PyObject *args, PyObject *kw)
+{
+  static const char *format = "sO|s";
+  static const char *keywords[] =
+    {
+      "name", "type", "language",NULL
+    };
+
+  PyObject *type_object;
+  const char *name;
+  const char *language_name = nullptr;
+
+  if (!gdb_PyArg_ParseTupleAndKeywords(args, kw, format, keywords, &name,
+                                       &type_object, &language_name))
+    return nullptr;
+
+  struct objfile *objfile = objfile_object_to_objfile(self);
+  if (objfile == nullptr)
+    return nullptr;
+
+  struct type *type = type_object_to_type(type_object);
+  if (type == nullptr)
+    return nullptr;
+
+  if (language_name == nullptr)
+    language_name = "auto";
+  enum language language = parse_language(language_name);
+  if (language == language_unknown)
+  {
+    PyErr_SetString(PyExc_ValueError, "invalid language name");
+    return nullptr;
+  }
+
+  struct symbol* symbol = add_new_symbol(
+    objfile,
+    name,
+    language,
+    VAR_DOMAIN,
+    LOC_TYPEDEF,
+    0,
+    0,
+    0,
+    false,
+    [&](struct symbol* temp_symbol)
+    {
+      temp_symbol->set_type(type);
+    });
+
+
+  return symbol_to_symbol_object(symbol);
+}
+
+/* Adds a label (LOC_LABEL) symbol to a given objfile. */
+
+static PyObject *
+objfpy_add_label_symbol (PyObject *self, PyObject *args, PyObject *kw)
+{
+  static const char *format = "sk|s";
+  static const char *keywords[] =
+    {
+      "name", "address", "language",NULL
+    };
+
+  const char *name;
+  CORE_ADDR address;
+  const char *language_name = nullptr;
+
+  if (!gdb_PyArg_ParseTupleAndKeywords(args, kw, format, keywords, &name,
+                                       &address, &language_name))
+    return nullptr;
+
+  struct objfile *objfile = objfile_object_to_objfile(self);
+  if (objfile == nullptr)
+    return nullptr;
+
+  if (language_name == nullptr)
+    language_name = "auto";
+  enum language language = parse_language(language_name);
+  if (language == language_unknown)
+  {
+    PyErr_SetString(PyExc_ValueError, "invalid language name");
+    return nullptr;
+  }
+
+  struct symbol* symbol = add_new_symbol(
+    objfile,
+    name,
+    language,
+    LABEL_DOMAIN,
+    LOC_LABEL,
+    0,
+    0,
+    0,
+    false,
+    [&](struct symbol* temp_symbol)
+    {
+      temp_symbol->set_value_address(address);
+    });
+
+
+  return symbol_to_symbol_object(symbol);
+}
+
+/* Adds a static (LOC_STATIC) symbol to a given objfile. */
+
+static PyObject *
+objfpy_add_static_symbol (PyObject *self, PyObject *args, PyObject *kw)
+{
+  static const char *format = "sk|s";
+  static const char *keywords[] =
+    {
+      "name", "address", "language", NULL
+    };
+
+  const char *name;
+  CORE_ADDR address;
+  const char *language_name = nullptr;
+
+  if (!gdb_PyArg_ParseTupleAndKeywords(args, kw, format, keywords, &name,
+                                       &address, &language_name))
+    return nullptr;
+
+  struct objfile *objfile = objfile_object_to_objfile(self);
+  if (objfile == nullptr)
+    return nullptr;
+
+  if (language_name == nullptr)
+    language_name = "auto";
+  enum language language = parse_language(language_name);
+  if (language == language_unknown)
+  {
+    PyErr_SetString(PyExc_ValueError, "invalid language name");
+    return nullptr;
+  }
+
+  struct symbol* symbol = add_new_symbol(
+    objfile,
+    name,
+    language,
+    VAR_DOMAIN,
+    LOC_STATIC,
+    0,
+    0,
+    0,
+    false,
+    [&](struct symbol* temp_symbol)
+    {
+      temp_symbol->set_value_address(address);
+    });
+
+
+  return symbol_to_symbol_object(symbol);
+}
+
 /* Implement repr() for gdb.Objfile.  */
 
 static PyObject *
@@ -704,6 +928,18 @@  objfile_to_objfile_object (struct objfile *objfile)
   return gdbpy_ref<>::new_reference (result);
 }
 
+struct objfile *
+objfile_object_to_objfile (PyObject *self)
+{
+  if (!PyObject_TypeCheck (self, &objfile_object_type))
+    return nullptr;
+
+  auto objfile_object = (struct objfile_object*) self;
+  OBJFPY_REQUIRE_VALID (objfile_object);
+
+  return objfile_object->objfile;
+}
+
 int
 gdbpy_initialize_objfile (void)
 {
@@ -737,6 +973,18 @@  Look up a global symbol in this objfile and return it."
},
     "lookup_static_symbol (name [, domain]).\n\
 Look up a static-linkage global symbol in this objfile and return it." },
 
+  { "add_type_symbol", (PyCFunction) objfpy_add_type_symbol,
+    METH_VARARGS | METH_KEYWORDS,
+    "add_type_symbol(name: string, type: gdb.Type, [language: string])\n\
+    Registers a new symbol inside VAR_DOMAIN/LOC_TYPEDEF, with the given
name\
+    referring to the given type." },
+
+  { "add_label_symbol", (PyCFunction) objfpy_add_label_symbol,
+    METH_VARARGS | METH_KEYWORDS,
+    "add_label_symbol(name: string, address: int, [language: string])\n\
+    Registers a new symbol inside LABEL_DOMAIN/LOC_LABEL, with the given
name\
+    pointing to the given address." },
+
+  { "add_static_symbol", (PyCFunction) objfpy_add_static_symbol,
+    METH_VARARGS | METH_KEYWORDS,
+    "add_static_symbol(name: string, address: int, [language: string])\n\
+    Registers a new symbol inside VAR_DOMAIN/LOC_STATIC, with the given
name\
+    pointing to the given address." },
+
   { NULL }
 };
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index 06357cc8c0b..3877f8a7ca9 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -481,6 +494,8 @@  struct symtab *symtab_object_to_symtab (PyObject *obj);
 struct symtab_and_line *sal_object_to_symtab_and_line (PyObject *obj);
 frame_info_ptr frame_object_to_frame_info (PyObject *frame_obj);
 struct gdbarch *arch_object_to_gdbarch (PyObject *obj);
+struct objfile *objfile_object_to_objfile (PyObject *self);
 
 /* Convert Python object OBJ to a program_space pointer.  OBJ must be a
    gdb.Progspace reference.  Return nullptr if the gdb.Progspace is not