gdb/debuginfod: cleanup debuginfod earlier

Message ID 46c030dcd5fc7a1a2ef4e078a92798f18c947ace.1684840908.git.aburgess@redhat.com
State New
Headers
Series gdb/debuginfod: cleanup debuginfod earlier |

Commit Message

Andrew Burgess May 23, 2023, 11:23 a.m. UTC
  A GDB crash was discovered on Fedora GDB that was tracked back to an
issue with the way that debuginfod is cleaned up.

The bug was reported on Fedora 37, 38, and 39.  Here are the steps to
reproduce:

1. The file /etc/ssl/openssl.cnf contains the following lines:

   [provider_sect]
   default = default_sect
   ##legacy = legacy_sect
   ##
   [default_sect]
   activate = 1

   ##[legacy_sect]
   ##activate = 1

   The bug will occur when the '##' characters are removed so that the
   lines in question look like this:

   [provider_sect]
   default = default_sect
   legacy = legacy_sect

   [default_sect]
   activate = 1

   [legacy_sect]
   activate = 1

2. Clean up any existing debuginfod cache data:

   > rm -rf $HOME/.cache/debuginfod_client

3. Run GDB:

   > gdb -nx -q -iex 'set trace-commands on' \
                -iex 'set debuginfod enabled on' \
                -iex 'set confirm off' \
		-ex 'start' -ex 'quit' /bin/ls
   +set debuginfod enabled on
   +set confirm off
   Reading symbols from /bin/ls...
   Downloading separate debug info for /usr/bin/ls
   ... snip ...
   Temporary breakpoint 1, main (argc=1, argv=0x7fffffffde38) at ../src/ls.c:1646
   1646    {
   +quit

   Fatal signal: Segmentation fault
   ----- Backtrace -----
   ... snip ...

So GDB ends up crashing during exit.

What's happening is that when debuginfod is initialised
debuginfod_begin is called (this is in the debuginfod library), this
in turn sets up libcurl, which makes use of openssl.  Somewhere during
this setup process an at_exit function is registered to cleanup some
state.

Back in GDB the debuginfod_client object is managed using this code:

  /* Deleter for a debuginfod_client.  */

  struct debuginfod_client_deleter
  {
    void operator() (debuginfod_client *c)
    {
      debuginfod_end (c);
    }
  };

  using debuginfod_client_up
    = std::unique_ptr<debuginfod_client, debuginfod_client_deleter>;

And then a global debuginfod_client_up is created to hold a pointer to
the debuginfod_client object.  As a global this will be cleaned up
using the standard C++ global object destructor mechanism, which is
run after the at_exit handlers.

However, it is expected that when debuginfod_end is called the
debuginfod_client object will still be in a usable state, that is, we
don't expect the at_exit handlers to have run and started cleaning up
the library state.

To fix this issue we need to ensure that debuginfod_end is called
before the at_exit handlers have a chance to run.

I propose that we should make use of GDB's make_final_cleanup
mechanism to call debuginfod_end.  Or rather, I intend to continue
using the debuginfod_client_up class, but ensure that the global
instance of this class is set back to nullptr before GDB calls exit.
Setting debuginfod_client_up to nullptr will invoke the deleter, which
will call debuginfod_end.

For the implementation I've created a new class which holds the
debuginfod_client_up we can then pass a pointer to this class to
make_final_cleanup.  The alternative, passing a pointer to a
unique_ptr made me feel really uncomfortable -- we'd then have
multiple references to the unique_ptr, which seems like a bad idea.

There's no test associated with this patch.  I have no idea how I
might trigger this bug from within the testsuite.  If anyone has any
ideas then I'm happy to have a go at writing something.

None of the existing debuginfod tests fail after this change.

Co-Authored-By: Mark Wielaard <mark@klomp.org>
---
 gdb/debuginfod-support.c | 104 +++++++++++++++++++++++++++++++++++----
 1 file changed, 94 insertions(+), 10 deletions(-)


base-commit: e84060b489746d031ed1ec9e7b6b39fdf4b6cfe3
  

Comments

Simon Marchi May 23, 2023, 12:31 p.m. UTC | #1
On 5/23/23 07:23, Andrew Burgess via Gdb-patches wrote:
> A GDB crash was discovered on Fedora GDB that was tracked back to an
> issue with the way that debuginfod is cleaned up.
> 
> The bug was reported on Fedora 37, 38, and 39.  Here are the steps to
> reproduce:
> 
> 1. The file /etc/ssl/openssl.cnf contains the following lines:
> 
>    [provider_sect]
>    default = default_sect
>    ##legacy = legacy_sect
>    ##
>    [default_sect]
>    activate = 1
> 
>    ##[legacy_sect]
>    ##activate = 1
> 
>    The bug will occur when the '##' characters are removed so that the
>    lines in question look like this:
> 
>    [provider_sect]
>    default = default_sect
>    legacy = legacy_sect
> 
>    [default_sect]
>    activate = 1
> 
>    [legacy_sect]
>    activate = 1
> 
> 2. Clean up any existing debuginfod cache data:
> 
>    > rm -rf $HOME/.cache/debuginfod_client
> 
> 3. Run GDB:
> 
>    > gdb -nx -q -iex 'set trace-commands on' \
>                 -iex 'set debuginfod enabled on' \
>                 -iex 'set confirm off' \
> 		-ex 'start' -ex 'quit' /bin/ls
>    +set debuginfod enabled on
>    +set confirm off
>    Reading symbols from /bin/ls...
>    Downloading separate debug info for /usr/bin/ls
>    ... snip ...
>    Temporary breakpoint 1, main (argc=1, argv=0x7fffffffde38) at ../src/ls.c:1646
>    1646    {
>    +quit
> 
>    Fatal signal: Segmentation fault
>    ----- Backtrace -----
>    ... snip ...
> 
> So GDB ends up crashing during exit.
> 
> What's happening is that when debuginfod is initialised
> debuginfod_begin is called (this is in the debuginfod library), this
> in turn sets up libcurl, which makes use of openssl.  Somewhere during
> this setup process an at_exit function is registered to cleanup some
> state.
> 
> Back in GDB the debuginfod_client object is managed using this code:
> 
>   /* Deleter for a debuginfod_client.  */
> 
>   struct debuginfod_client_deleter
>   {
>     void operator() (debuginfod_client *c)
>     {
>       debuginfod_end (c);
>     }
>   };
> 
>   using debuginfod_client_up
>     = std::unique_ptr<debuginfod_client, debuginfod_client_deleter>;
> 
> And then a global debuginfod_client_up is created to hold a pointer to
> the debuginfod_client object.  As a global this will be cleaned up
> using the standard C++ global object destructor mechanism, which is
> run after the at_exit handlers.
> 
> However, it is expected that when debuginfod_end is called the
> debuginfod_client object will still be in a usable state, that is, we
> don't expect the at_exit handlers to have run and started cleaning up
> the library state.
> 
> To fix this issue we need to ensure that debuginfod_end is called
> before the at_exit handlers have a chance to run.
> 
> I propose that we should make use of GDB's make_final_cleanup
> mechanism to call debuginfod_end.  Or rather, I intend to continue
> using the debuginfod_client_up class, but ensure that the global
> instance of this class is set back to nullptr before GDB calls exit.
> Setting debuginfod_client_up to nullptr will invoke the deleter, which
> will call debuginfod_end.
> 
> For the implementation I've created a new class which holds the
> debuginfod_client_up we can then pass a pointer to this class to
> make_final_cleanup.  The alternative, passing a pointer to a
> unique_ptr made me feel really uncomfortable -- we'd then have
> multiple references to the unique_ptr, which seems like a bad idea.
> 
> There's no test associated with this patch.  I have no idea how I
> might trigger this bug from within the testsuite.  If anyone has any
> ideas then I'm happy to have a go at writing something.
> 
> None of the existing debuginfod tests fail after this change.
> 
> Co-Authored-By: Mark Wielaard <mark@klomp.org>

I was able to reproduce on Arch Linux using the same openssl
configuration.

Using the final cleanup mechanism sounds fine to me.  But if we're not
going to manage the lifetime of the object with the destructor, I think
that there is no point in having a debuginfod_client_up, it just adds an
additional and unnecessary layer to run the same code (we can just call
debuginfod_end where you would reset the unique pointer).

Also, I think the debuginfod_client_manager class is unnecessary and
just adds some more unnecessary layers, making the code hard to follow
for no benefit.  We can just make the existing get_debuginfod_client
function set up the final cleanup handler, as follow:

diff --git a/gdb/debuginfod-support.c b/gdb/debuginfod-support.c
index 5853f420a18d..4ab650ef2494 100644
--- a/gdb/debuginfod-support.c
+++ b/gdb/debuginfod-support.c
@@ -96,20 +96,6 @@ struct user_data
   ui_out::progress_update progress;
 };

-/* Deleter for a debuginfod_client.  */
-
-struct debuginfod_client_deleter
-{
-  void operator() (debuginfod_client *c)
-  {
-    debuginfod_end (c);
-  }
-};
-
-using debuginfod_client_up
-  = std::unique_ptr<debuginfod_client, debuginfod_client_deleter>;
-
-
 /* Convert SIZE into a unit suitable for use with progress updates.
    SIZE should in given in bytes and will be converted into KB, MB, GB
    or remain unchanged. UNIT will be set to "B", "KB", "MB" or "GB"
@@ -180,20 +166,32 @@ progressfn (debuginfod_client *c, long cur, long total)
   return 0;
 }

+static void cleanup_debuginfod_client (void *arg)
+{
+  debuginfod_client *client = static_cast<debuginfod_client *> (arg);
+  debuginfod_end (client);
+}
+
+/* Return a pointer to the single global debuginfod_client, initialising it
+   first if needed.  */
+
 static debuginfod_client *
 get_debuginfod_client ()
 {
-  static debuginfod_client_up global_client;
+  static debuginfod_client *global_client;

   if (global_client == nullptr)
     {
-      global_client.reset (debuginfod_begin ());
+      global_client = debuginfod_begin ();

       if (global_client != nullptr)
-	debuginfod_set_progressfn (global_client.get (), progressfn);
+	{
+	  make_final_cleanup (cleanup_debuginfod_client, global_client);
+	  debuginfod_set_progressfn (global_client, progressfn);
+	}
     }

-  return global_client.get ();
+  return global_client;
 }

 /* Check if debuginfod is enabled.  If configured to do so, ask the user

---

As for the test I don't see any other way of truly testing this than
running the regular tests on the affected systems.

Simon
  
Aaron Merey May 23, 2023, 2:23 p.m. UTC | #2
On Tue, May 23, 2023 at 7:23 AM Andrew Burgess <aburgess@redhat.com> wrote:
>
> And then a global debuginfod_client_up is created to hold a pointer to
> the debuginfod_client object.  As a global this will be cleaned up
> using the standard C++ global object destructor mechanism, which is
> run after the at_exit handlers.
>
> However, it is expected that when debuginfod_end is called the
> debuginfod_client object will still be in a usable state, that is, we
> don't expect the at_exit handlers to have run and started cleaning up
> the library state.

The crash comes down to curl_multi_cleanup triggering a double free
when it's called during process exit. Ideally this should be fixed in
libcurl or at least the libcurl docs should mention that curl_multi_cleanup
shouldn't be called at exit.

But it's still a good idea to add this workaround to gdb. Thanks for looking
into this. I tested Simon's patch since it's a bit simpler and it fixes the
crash for me on F37.

> There's no test associated with this patch.  I have no idea how I
> might trigger this bug from within the testsuite.  If anyone has any
> ideas then I'm happy to have a go at writing something.

gdb's debuginfod tests only pull files from local servers.  This crash
does not reproduce when using a localhost URL.  To test for this
gdb would have to download from a remote server and maybe
use the OPENSSL_CONF environment variable to set a custom
config file path.  However this might cause more problems than it
solves.

Aaron
  
Andrew Burgess May 24, 2023, 2:33 p.m. UTC | #3
Aaron Merey <amerey@redhat.com> writes:

> On Tue, May 23, 2023 at 7:23 AM Andrew Burgess <aburgess@redhat.com> wrote:
>>
>> And then a global debuginfod_client_up is created to hold a pointer to
>> the debuginfod_client object.  As a global this will be cleaned up
>> using the standard C++ global object destructor mechanism, which is
>> run after the at_exit handlers.
>>
>> However, it is expected that when debuginfod_end is called the
>> debuginfod_client object will still be in a usable state, that is, we
>> don't expect the at_exit handlers to have run and started cleaning up
>> the library state.
>
> The crash comes down to curl_multi_cleanup triggering a double free
> when it's called during process exit. Ideally this should be fixed in
> libcurl or at least the libcurl docs should mention that curl_multi_cleanup
> shouldn't be called at exit.
>
> But it's still a good idea to add this workaround to gdb.

It's only a workaround if libcurl doesn't just update their docs to say
don't call curl_multi_cleanup at exit!  If they do that then debuginfod
needs to update its docs to say don't call debuginfod_end at exit
... and then we're back here, right?

>                                                            Thanks for looking
> into this. I tested Simon's patch since it's a bit simpler and it fixes the
> crash for me on F37.

Excellent, I'll add a commit message to Simon's version and repost it.

Thanks,
Andrew

>
>> There's no test associated with this patch.  I have no idea how I
>> might trigger this bug from within the testsuite.  If anyone has any
>> ideas then I'm happy to have a go at writing something.
>
> gdb's debuginfod tests only pull files from local servers.  This crash
> does not reproduce when using a localhost URL.  To test for this
> gdb would have to download from a remote server and maybe
> use the OPENSSL_CONF environment variable to set a custom
> config file path.  However this might cause more problems than it
> solves.
>
> Aaron
  

Patch

diff --git a/gdb/debuginfod-support.c b/gdb/debuginfod-support.c
index 5853f420a18..5dfb313f0af 100644
--- a/gdb/debuginfod-support.c
+++ b/gdb/debuginfod-support.c
@@ -180,20 +180,104 @@  progressfn (debuginfod_client *c, long cur, long total)
   return 0;
 }
 
-static debuginfod_client *
-get_debuginfod_client ()
+/* A class that manages a debuginfod_client object.  There should only be a
+   single debuginfod_client created within GDB, so we should only ever have
+   a single instance of this class.
+
+   The critical part of this class is that we register a final cleanup with
+   GDB that is responsible for ensuring the debuginfod_client object is
+   correctly cleaned up before GDB calls exit.  If we rely on the C++
+   global object destructors to clean up the debuginfod_client object then
+   we run into problems with the order in which object destructors are
+   called relative to at_exit callbacks.  See the init method below for
+   more details.  */
+
+struct debuginfod_client_manager
 {
-  static debuginfod_client_up global_client;
+  /* Return a pointer to the managed debuginfod_client object, initialising
+     it first if needed.  */
 
-  if (global_client == nullptr)
-    {
-      global_client.reset (debuginfod_begin ());
+  debuginfod_client *get ()
+  {
+    if (m_global_client == nullptr)
+      this->init ();
+    return m_global_client.get ();
+  }
 
-      if (global_client != nullptr)
-	debuginfod_set_progressfn (global_client.get (), progressfn);
-    }
+  /* There should only be a single, global, instance of this class.  The
+     managed debuginfod_client will be reset back to nullptr by GDB's
+     make_final_cleanup mechanism before the global object is destroyed,
+     which is why the assertion here makes sense.  */
+
+  ~debuginfod_client_manager ()
+  {
+    gdb_assert (m_global_client == nullptr);
+  }
 
-  return global_client.get ();
+private:
+
+  /* Callback used by GDB's final cleanup mechanism.  Cleans up the
+     debuginfod_client object being managed by ARG, which should be a
+     debuginfod_client_manager pointer.  */
+
+  static void cleanup_debuginfod_client (void *arg)
+  {
+    debuginfod_client_manager *m = (debuginfod_client_manager *) arg;
+    m->cleanup ();
+  }
+
+  /* Set the managed debuginfod_client back to nullptr.  This will ensure
+     that the debuginfod cleanup function is called.  */
+
+  void cleanup ()
+  {
+    m_global_client = nullptr;
+  }
+
+  /* Initialise the managed debuginfod_client object and register with
+     GDB's global cleanup mechanism to ensure that the object is cleaned up
+     correctly before GDB calls exit.  */
+
+  void init ()
+  {
+    gdb_assert (m_global_client == nullptr);
+
+    debuginfod_client *client = debuginfod_begin ();
+
+    if (client != nullptr)
+      {
+	/* Debuginfod (or the libraries it uses) use at_exit to cleanup
+	   some of their state.  However, it is requires that the
+	   debuginfod_client object be cleaned up before the libraries that
+	   debuginfod uses are cleaned up.
+
+	   If we rely on the C++ global object destructors to cleanup the
+	   debuginfod object then these are run after the at_exit
+	   callbacks, which is too late.
+
+	   So instead, we register with GDB's final cleanup mechanism, this
+	   is run before GDB calls exit, which is before the at_exit
+	   handlers are run, this ensures things are cleaned up in the
+	   correct order.  */
+	make_final_cleanup (cleanup_debuginfod_client, this);
+	debuginfod_set_progressfn (client, progressfn);
+	m_global_client.reset (client);
+      }
+  }
+
+  /* The managed debuginfod_client object.  */
+
+  debuginfod_client_up m_global_client;
+};
+
+/* Return a pointer to the single global debuginfod_client, initialising it
+   first if needed.  */
+
+static debuginfod_client *
+get_debuginfod_client ()
+{
+  static debuginfod_client_manager global_client_manager;
+  return global_client_manager.get ();
 }
 
 /* Check if debuginfod is enabled.  If configured to do so, ask the user