[0/2] Demangler crash handler

Message ID 20140509153305.GA13345@blade.nx
State Superseded
Headers

Commit Message

Gary Benson May 9, 2014, 3:33 p.m. UTC
  Mark Kettenis wrote:
> > A number of bugs have been filed recently because of segmentation
> > faults in the demangler.  While such crashes are a problem for all
> > demangler consumers, they are particularly nasty for GDB because
> > they prevent the user from debugging their program at all.
> > 
> > This patch series arranges for GDB to catch segmentation faults
> > in the demangler and recover from them gracefully.  A warning is
> > printed the first time a fault occurs.  Example sessions with and
> > without these patches are included below.
> > 
> > None of the wrapped code uses cleanups, so each caught failure
> > will leak a small amount of memory.  This is undesirable but I
> > think the benefits here outweigh this drawback.
> > 
> > Ok to commit?
> 
> No.  It's this skind of duct-tape that will make sure that bugs in
> the demangler won't get fixed.  Apart from removing the incentive to
> fix the bugs, these SIGSEGV signal handlers make actually fixing the
> bugs harder as you won't have core dumps.

I would normally agree with you 100% on this issue Mark, but in this
case I think a handler is justified.  If the demangler crashes because
of a symbol in the users program then the user cannot debug their
program at all.  If the demangler were simple and well understood then
that would be fine but it's not: the demangler is complex, the
specification it's following is complex, and everything's complicated
further because you can't allocate heap and you have to roll your own
data structures.  The reality is that the libiberty demangler is a
breeding ground for segfaults, and GDB needs to be able to deal with
this.

It's true that you don't get core dumps with this patch, but what you
do get in return is a printed warning that includes the symbol that
caused the crash.  That's all you need in most cases.  The five recent
demangler crashes (14963, 16593, 16752, 16817 and 16845) all required
digging by either the reporter or a GDB developer to uncover the
failing symbol.  Printing the offending symbol means this work is
already done.

If the lack of core dumps is a showstopper for you then I can
update the patch to allow disabling the handler with
"maint set handle-demangler-crashes 0" or some similar thing.

> Besides, any signal handler that does more than just setting a flag
> is probably broken.  Did you verify that you only call async-signal-
> safe functions in the signal handler code path?

I didn't think this was necessary as to my knowledge SIGSEGV is only
ever emitted synchronously.  If it is an issue then the patch could
be reworked to use (sig)longjmp as included below.

Thanks,
Gary
  

Comments

Doug Evans May 11, 2014, 5:17 a.m. UTC | #1
On Fri, May 9, 2014 at 8:33 AM, Gary Benson <gbenson@redhat.com> wrote:
> [...]
> +  if (crash_signal != 0)
> +    {
> +      static int warning_printed = 0;
> +
> +      if (!warning_printed)
> +       {
> +         warning ("internal error: demangler failed with signal %d\n"
> +                  "Unable to demangle '%s'\n"
> +                  "This is a bug, "
> +                  "please report it to the GDB maintainers.",
> +                  crash_signal, name);
> +
> +         warning_printed = 1;
> +       }
> +
> +      result = NULL;
> +    }
> +
> +  return result;

Hi.

Applying "Consistency Is Good" to this patch,
I wonder if we should do something similar to what we do for internal errors.
I'm not sure I would use the same flag (grep for
internal_problem_modes and friends in utils.c), but
OTOH I wouldn't want a proliferation of options for controlling each
particular kind of "crash".

What do you think?
  
Mark Kettenis May 11, 2014, 8:23 p.m. UTC | #2
> Date: Fri, 9 May 2014 16:33:06 +0100
> From: Gary Benson <gbenson@redhat.com>
> 
> Mark Kettenis wrote:
> > > A number of bugs have been filed recently because of segmentation
> > > faults in the demangler.  While such crashes are a problem for all
> > > demangler consumers, they are particularly nasty for GDB because
> > > they prevent the user from debugging their program at all.
> > > 
> > > This patch series arranges for GDB to catch segmentation faults
> > > in the demangler and recover from them gracefully.  A warning is
> > > printed the first time a fault occurs.  Example sessions with and
> > > without these patches are included below.
> > > 
> > > None of the wrapped code uses cleanups, so each caught failure
> > > will leak a small amount of memory.  This is undesirable but I
> > > think the benefits here outweigh this drawback.
> > > 
> > > Ok to commit?
> > 
> > No.  It's this skind of duct-tape that will make sure that bugs in
> > the demangler won't get fixed.  Apart from removing the incentive to
> > fix the bugs, these SIGSEGV signal handlers make actually fixing the
> > bugs harder as you won't have core dumps.
> 
> I would normally agree with you 100% on this issue Mark, but in this
> case I think a handler is justified.  If the demangler crashes because
> of a symbol in the users program then the user cannot debug their
> program at all.  If the demangler were simple and well understood then
> that would be fine but it's not: the demangler is complex, the
> specification it's following is complex, and everything's complicated
> further because you can't allocate heap and you have to roll your own
> data structures.  The reality is that the libiberty demangler is a
> breeding ground for segfaults, and GDB needs to be able to deal with
> this.

There are entire subsystems in GDB that are a breeding ground for
segfaults.  Should we therefore wrap evrything?

It is obvious that the demangler is a breeding ground for segmentation
faults.  It uses strcpy, strcat and sprintf.  So it's probably full of
buffer overflows.  I bet that if those are fixed, the SIGSEGVs are
gone.

Note that only some of those buffer overflows will generate a SIGSEGV.
Others will corrupt random memory.  And you can't patch those up with
a signal handler.

> It's true that you don't get core dumps with this patch, but what you
> do get in return is a printed warning that includes the symbol that
> caused the crash.  That's all you need in most cases.  The five recent
> demangler crashes (14963, 16593, 16752, 16817 and 16845) all required
> digging by either the reporter or a GDB developer to uncover the
> failing symbol.  Printing the offending symbol means this work is
> already done.
> 
> If the lack of core dumps is a showstopper for you then I can
> update the patch to allow disabling the handler with
> "maint set handle-demangler-crashes 0" or some similar thing.

Not acceptable.  Unless you make it the default...
  
Gary Benson May 13, 2014, 10:21 a.m. UTC | #3
Mark Kettenis wrote:
> There are entire subsystems in GDB that are a breeding ground for
> segfaults.  Should we therefore wrap evrything?

This is a straw man.  I'm not proposing we wrap all subsystems.
I'm proposing we wrap one specific subsystem that has caused
problems in the past and is likely to continue to do so into
the forseeable future.

I'd argue that the demangler warrants special consideration as
in one sense it's not a subsystem of GDB but rather a (somewhat
unloved?) side-project of GCC that we borrow.  The situation
we find ourselves in is this:

 * GDB is more likely to see demangler crashes than libstdc++

   GDB demangles all symbols in every file it loads, always.  In
   libstdc++ the demangler is only called in error situations,
   and then only to demangle the few symbols that appear in the
   backtrace.

   So: we get the bug reports, and have to triage them, even
   though the code is not "ours".   

 * Bugs have a more serious affect on GDB than on libstdc++

   Currently a demangler crash will cause GDB to segfault, usually at
   startup.  A demangler crash in libstdc++ is not such a big deal as
   the application was likely crashing anyway.

   So: bugs that are high-priority for us are low-priority for the
   "main" demangler authors, and we end up having to fix them.

 * Demangler patches often get waved through with minimal scrutiny

   The few people who really know the demangler are busy with other
   things, and the above two points mean demangler issues are low-
   priority for them.

   So: bugs are going to keep on coming our way.
   
This situation is why I feel the demangler merits special handling.

> It is obvious that the demangler is a breeding ground for
> segmentation faults.  It uses strcpy, strcat and sprintf.  So it's
> probably full of buffer overflows.  I bet that if those are fixed,
> the SIGSEGVs are gone.

That's not been the case for the crashes I recently fixed.  The
demangler parses the mangled symbol into a tree that it then walks to
generate the result.  All pretty standard so far, but in certain cases
sections of the tree can be (re)entered from other, non-local parts of
the tree.  Both crashes I fixed involved a stack overflow caused by a
loop in the tree caused by a reentry.

> Note that only some of those buffer overflows will generate a
> SIGSEGV.  Others will corrupt random memory.  And you can't patch
> those up with a signal handler.

Agreed.  I'm not pretending this will solve the underlying issues,
and I'm not making an excuse to not fix demangler crashes.  I was
merely trying to make a bad situation less bad by a) allowing the
possibility that the user could continue to use GDB to debug their
program, and b) helping the user make a more meaningful bug report.

I agree that there might be cases where this patch could turn a
demangler failure into some other difficult-to-debug crash that would
waste developer time.  But developer time is being wasted now: every
demangler filed requires the reporter or (more likely) one of us to
trail through the core file to identify that the issue is a demangler
bug and then pull out the symbol that caused the crash.  This patch
does that work.

In my response to Doug's mail [1] I updated the patch to use
internal_warning.  The user is interrupted (so they cannot miss the
warning) and the questions they are asked make it plain that by
continuing over the error they are now living on the edge.  Does
that minimise the possibility of us wasting time chasing memory
corruption errors enough for you?

> > It's true that you don't get core dumps with this patch, but what
> > you do get in return is a printed warning that includes the symbol
> > that caused the crash.  That's all you need in most cases.  The
> > five recent demangler crashes (14963, 16593, 16752, 16817 and
> > 16845) all required digging by either the reporter or a GDB
> > developer to uncover the failing symbol.  Printing the offending
> > symbol means this work is already done.
> > 
> > If the lack of core dumps is a showstopper for you then I can
> > update the patch to allow disabling the handler with "maint set
> > handle-demangler-crashes 0" or some similar thing.
> 
> Not acceptable.  Unless you make it the default...

Disabled by default wouldn't stop us having to triage the bugs that
are filed.

Would an enabled-by-default crash catcher using internal_warning as
above work for you?

Thanks,
Gary
  
Pedro Alves May 13, 2014, 4:05 p.m. UTC | #4
On 05/13/14 11:21, Gary Benson wrote:
> I'd argue that the demangler warrants special consideration as
> in one sense it's not a subsystem of GDB but rather a (somewhat
> unloved?) side-project of GCC that we borrow.  The situation
> we find ourselves in is this:
> 
>  * GDB is more likely to see demangler crashes than libstdc++

True.

> 
>    GDB demangles all symbols in every file it loads, always.  In
>    libstdc++ the demangler is only called in error situations,
>    and then only to demangle the few symbols that appear in the
>    backtrace.
> 
>    So: we get the bug reports, and have to triage them, even
>    though the code is not "ours".   

I really can't agree with the "not ours" sentiment.  Not if we think
in terms of the GNU toolchain.  IMO, we should foremost think of
ourselves as development toolchain toolsmiths over gdb hackers.

Thought experiment #1: I hereby declare that the demangler maintainers
are GDB hackers.  In order to get one's code in the demangler,
one has to get it past those specific (and very busy) GDB hackers.

Thought experiment #2: I'm going to import and fork the demangler
into gdb/ directly, and declare that from here on, we get to fix it
ourselves.

Either case doesn't seem to make a difference to me.  Except that
with #2, we'd end with an incompetent demangler maintainer. :-)

>  * Bugs have a more serious affect on GDB than on libstdc++

True.

> 
>    Currently a demangler crash will cause GDB to segfault, usually at
>    startup.  A demangler crash in libstdc++ is not such a big deal as
>    the application was likely crashing anyway.
> 
>    So: bugs that are high-priority for us are low-priority for the
>    "main" demangler authors, and we end up having to fix them.

True.

So...  Since this subsystem is so important, should we write our
own demangler from scratch then?  Would that result in a better
outcome?  Or, can we perhaps extend the demangler a little
to make it more robust itself?

Is there something that could be done to demangler's testsuite to
make it more effective in catching bugs that GDB would catch?
(I mean other than throwing more symbols at it.
Though a fuzzy tester that throws pseudo-random symbols at it
would be a nice project on its own.)

> 
>  * Demangler patches often get waved through with minimal scrutiny

That does sound like a problem.  Can we work with the gcc folks to
somehow prevent this from happening?  E.g., perhaps we could ask them
to CC all demangler patches to the gdb-patches@ list as well, like
supposedly done for some other shared files.

> 
>    The few people who really know the demangler are busy with other
>    things, and the above two points mean demangler issues are low-
>    priority for them.

It's not clear to me whether the issues are with demangling itself
being complex, or with the current implementation.  In any case,
this doesn't sound like a problem on their end, but on ours.
But if the demangler was "ours", who would be doing the
fixing anyway?
  
Gary Benson May 15, 2014, 1:24 p.m. UTC | #5
Pedro Alves wrote:
> On 05/13/14 11:21, Gary Benson wrote:
> >    GDB demangles all symbols in every file it loads, always.  In
> >    libstdc++ the demangler is only called in error situations,
> >    and then only to demangle the few symbols that appear in the
> >    backtrace.
> > 
> >    So: we get the bug reports, and have to triage them, even
> >    though the code is not "ours".   
> 
> I really can't agree with the "not ours" sentiment.  Not if we think
> in terms of the GNU toolchain.  IMO, we should foremost think of
> ourselves as development toolchain toolsmiths over gdb hackers.
> 
> Thought experiment #1: I hereby declare that the demangler
> maintainers are GDB hackers.  In order to get one's code in the
> demangler, one has to get it past those specific (and very busy)
> GDB hackers.
> 
> Thought experiment #2: I'm going to import and fork the demangler
> into gdb/ directly, and declare that from here on, we get to fix it
> ourselves.
> 
> Either case doesn't seem to make a difference to me.  Except that
> with #2, we'd end with an incompetent demangler maintainer. :-)

I don't think anybody here really wants to be the demangler
maintainer.  Trouble is, I'm not sure anybody in GCC land
wants to be it either!

I wasn't really trying to disown myself from libiberty ownership.
I guess what I was trying to say is the bulk of libiberty
contributions come from GCC, but that GCC is not a heavy consumer
of libiberty.  We're eating their dogfood :)

> >  * Bugs have a more serious affect on GDB than on libstdc++
> 
> True.
> 
> >    Currently a demangler crash will cause GDB to segfault, usually
> >    at startup.  A demangler crash in libstdc++ is not such a big
> >    deal as the application was likely crashing anyway.
> > 
> >    So: bugs that are high-priority for us are low-priority for the
> >    "main" demangler authors, and we end up having to fix them.
> 
> True.
> 
> So...  Since this subsystem is so important, should we write our own
> demangler from scratch then?  Would that result in a better outcome?

If there was somebody here with a deep knowledge of C++ and a deep
knowledge of all the different mangling schemes we need to support
then maybe.  I'm not that person though.  And it's not a small
project:

  $ wc -l libiberty/*dem*
    6386 libiberty/cp-demangle.c
     174 libiberty/cp-demangle.h
     232 libiberty/cp-demint.c
    4869 libiberty/cplus-dem.c
   11661 total

If we're going to spend a lot of time on something we should spend it
improving the existing demangler rather than trying to roll our own.
The existing demangler is *good*, it just maybe hasn't had quite the
beating that it gets from a GDB run on a big app using all kinds of
the latest and greatest C++ features.

> Or, can we perhaps extend the demangler a little to make it more
> robust itself?

I'm not sure.  There's nothing obvious I can think of.  I see you
wrote another email with some ideas, I'll reply to that separately.

> Is there something that could be done to demangler's testsuite to
> make it more effective in catching bugs that GDB would catch?
> (I mean other than throwing more symbols at it.

Again, I'm not sure, unless you were to break it open and unit test
the various components.  It'd need careful refactoring to allow this
without breaking the API.

> Though a fuzzy tester that throws pseudo-random symbols at it would
> be a nice project on its own.)

I have a fuzzer for it.  <http://gbenson.net/?p=422>.  Depressingly
it gets a segfault in seconds every time.  There seem to be at least
three different issues.

> > * Demangler patches often get waved through with minimal scrutiny
> 
> That does sound like a problem.  Can we work with the gcc folks to
> somehow prevent this from happening?  E.g., perhaps we could ask
> them to CC all demangler patches to the gdb-patches@ list as well,
> like supposedly done for some other shared files.

Maybe, I'm not sure who you'd ask though.  All mail to gcc-patches
with "mangl" in the subject ends up in my inbox, so the stuff is at
least getting extra scrutiny from me :)  Unless of course the subject
is something useless like "PR 12345" (a pet hate of mine!)

> It's not clear to me whether the issues are with demangling itself
> being complex, or with the current implementation.

A bit of both I think.

Cheers,
Gary
  
Pedro Alves May 15, 2014, 2:07 p.m. UTC | #6
On 05/15/2014 02:24 PM, Gary Benson wrote:

>> Though a fuzzy tester that throws pseudo-random symbols at it would
>> be a nice project on its own.)
> 
> I have a fuzzer for it.  <http://gbenson.net/?p=422>.

Nice!

I took a peek.  I suggest wrapping it in a SIGSEGV handler so
the it prints the symbol that crashes.  :-)  That way one can
leave it running for a few hours, without needing to have core
dumps enabled.

> Depressingly it gets a segfault in seconds every time.  There seem to
> be at least three different issues.
  
Gary Benson May 15, 2014, 2:28 p.m. UTC | #7
Pedro Alves wrote:
> On 05/15/2014 02:24 PM, Gary Benson wrote:
> > > Though a fuzzy tester that throws pseudo-random symbols at it
> > > would be a nice project on its own.)
> > 
> > I have a fuzzer for it.  <http://gbenson.net/?p=422>.
> 
> Nice!
> 
> I took a peek.  I suggest wrapping it in a SIGSEGV handler so
> the it prints the symbol that crashes.  :-)  That way one can
> leave it running for a few hours, without needing to have core
> dumps enabled.

I could do that :)

The way I imagined using it right now would be to run to a fault,
then fix that fault.  The rate it finds crashes you'd get 1000 an
hour easily, but they might all be the same three bugs.

I'm working on some other stuff now, but I might make it my task
to fix demangler bugs here and there between projects.  I pretty
much have a handle on it now I think.

Cheers,
Gary
  
Pedro Alves May 15, 2014, 3:25 p.m. UTC | #8
On 05/15/2014 03:28 PM, Gary Benson wrote:

> I'm working on some other stuff now, but I might make it my task
> to fix demangler bugs here and there between projects.  I pretty
> much have a handle on it now I think.

Now that's the spirit!

Before you'll know it you'll be our goto demangler guy.

Oh wait...  You already are.  :-)

Thanks,
  
Pedro Alves May 16, 2014, 11:06 a.m. UTC | #9
On 05/15/2014 02:24 PM, Gary Benson wrote:
>>> > > * Demangler patches often get waved through with minimal scrutiny
>> > 
>> > That does sound like a problem.  Can we work with the gcc folks to
>> > somehow prevent this from happening?  E.g., perhaps we could ask
>> > them to CC all demangler patches to the gdb-patches@ list as well,
>> > like supposedly done for some other shared files.
> Maybe, I'm not sure who you'd ask though.  All mail to gcc-patches
> with "mangl" in the subject ends up in my inbox, so the stuff is at
> least getting extra scrutiny from me :)  Unless of course the subject
> is something useless like "PR 12345" (a pet hate of mine!)

We could point the current libiberty/demanger maintainers at this
discussion, and see what they think of that.  Or gcc-patches@.  Or both.
If they agree, we could document it in src/MAINTAINERS,  like e.g.,
it's mentioned for top level files:

Makefile.*; configure; configure.ac; src-release
        Any global maintainer can approve changes to these
        files, but they should be aware that they need to
        be kept in sync with their counterparts in the GCC
        repository.  Also please notify the following of
        any committed patches:
                binutils@sourceware.org
                gdb-patches@sourceware.org

It might be cleaner if the demangler was split into its own
directory, IMO.  Say libdemangler.  I don't really know whether
it depends on much in libiberty -- it's just a text transform.
But that's probably not going to happen -- some measurable effort
there for not that much gain.  :-)
  

Patch

diff --git a/gdb/cp-support.c b/gdb/cp-support.c
index 91533e8..5e79fb4 100644
--- a/gdb/cp-support.c
+++ b/gdb/cp-support.c
@@ -36,6 +36,7 @@ 
 #include "value.h"
 #include "cp-abi.h"
 #include "language.h"
+#include <signal.h>
 
 #include "safe-ctype.h"
 
@@ -1505,12 +1506,89 @@  cp_lookup_rtti_type (const char *name, struct block *block)
   return rtti_type;
 }
 
+#ifdef SIGSEGV
+
+/* PortabiWrap set/long jmp so that it's more portable.  */
+
+#if defined(HAVE_SIGSETJMP)
+#define SIGJMP_BUF		sigjmp_buf
+#define SIGSETJMP(buf)		sigsetjmp((buf), 1)
+#define SIGLONGJMP(buf,val)	siglongjmp((buf), (val))
+#else
+#define SIGJMP_BUF		jmp_buf
+#define SIGSETJMP(buf)		setjmp(buf)
+#define SIGLONGJMP(buf,val)	longjmp((buf), (val))
+#endif
+
+/* Stack context and environment for demangler crash recovery.  */
+
+static SIGJMP_BUF gdb_demangle_jmp_buf;
+
+/* Signal handler for gdb_demangle.  */
+
+static void
+gdb_demangle_signal_handler (int signo)
+{
+  SIGLONGJMP (gdb_demangle_jmp_buf, signo);
+}
+
+#endif
+
 /* A wrapper for bfd_demangle.  */
 
 char *
 gdb_demangle (const char *name, int options)
 {
-  return bfd_demangle (NULL, name, options);
+  char *result = NULL;
+  int crash_signal = 0;
+
+#ifdef SIGSEGV
+#if defined (HAVE_SIGACTION) && defined (SA_RESTART)
+  struct sigaction sa, old_sa;
+
+  sa.sa_handler = gdb_demangle_signal_handler;
+  sigemptyset (&sa.sa_mask);
+  sa.sa_flags = 0;
+  sigaction (SIGSEGV, &sa, &old_sa);
+#else
+  void (*ofunc) ();
+
+  ofunc = (void (*)()) signal (SIGSEGV, gdb_demangle_signal_handler);
+#endif
+
+  crash_signal = SIGSETJMP (gdb_demangle_jmp_buf);
+#endif
+
+  if (crash_signal == 0)
+    result = bfd_demangle (NULL, name, options);
+
+#ifdef SIGSEGV
+#if defined (HAVE_SIGACTION) && defined (SA_RESTART)
+  sigaction (SIGSEGV, &old_sa, NULL);
+#else
+  signal (SIGSEGV, ofunc);
+#endif
+#endif
+
+  if (crash_signal != 0)
+    {
+      static int warning_printed = 0;
+
+      if (!warning_printed)
+	{
+	  warning ("internal error: demangler failed with signal %d\n"
+		   "Unable to demangle '%s'\n"
+		   "This is a bug, "
+		   "please report it to the GDB maintainers.",
+		   crash_signal, name);
+
+	  warning_printed = 1;
+	}
+
+      result = NULL;
+    }
+
+  return result;
 }
 
 /* Don't allow just "maintenance cplus".  */