[pushed] Use setjmp/longjmp for TRY/CATCH instead of sigsetjmp/siglongjmp

Message ID 1460479589-21126-1-git-send-email-palves@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves April 12, 2016, 4:46 p.m. UTC
  Now that we don't ever throw GDB exceptions from signal handlers [1],
we can switch to have TRY/CATCH implemented in terms of plain
setjmp/longjmp instead of sigsetjmp/siglongjmp.

In https://sourceware.org/ml/gdb-patches/2015-02/msg00114.html, Yichun
Zhang mentions a 11%/14%+ speedup in his GDB python scripts with a
patch that did something similar to only a specific set of TRY/CATCH
calls.

[1] - https://sourceware.org/ml/gdb-patches/2016-03/msg00351.html

Tested on x86_64 Fedora 23, native and gdbserver.

gdb/ChangeLog:
2016-04-12  Pedro Alves  <palves@redhat.com>

	* common/common-exceptions.c (struct catcher) <buf>: Now a
	'jmp_buf' instead of SIGJMP_BUF.
	(exceptions_state_mc_init): Change return type to 'jmp_buf'.
	(throw_exception): Use longjmp instead of SIGLONGJMP.
	* common/common-exceptions.h: Include <setjmp.h> instead of
	"gdb_setjmp.h".
	(exceptions_state_mc_init): Change return type to 'jmp_buf'.
	[GDB_XCPT == GDB_XCPT_SJMP] (TRY): Use setjmp instead of
	SIGSETJMP.
	* cp-support.c: Include "gdb_setjmp.h".
---
 gdb/ChangeLog                  | 13 +++++++++++++
 gdb/common/common-exceptions.c |  6 +++---
 gdb/common/common-exceptions.h |  8 ++++----
 gdb/cp-support.c               |  2 +-
 4 files changed, 21 insertions(+), 8 deletions(-)
  

Comments

Sergio Durigan Junior May 5, 2016, 1:11 a.m. UTC | #1
On Tuesday, April 12 2016, Pedro Alves wrote:

> Now that we don't ever throw GDB exceptions from signal handlers [1],
> we can switch to have TRY/CATCH implemented in terms of plain
> setjmp/longjmp instead of sigsetjmp/siglongjmp.
>
> In https://sourceware.org/ml/gdb-patches/2015-02/msg00114.html, Yichun
> Zhang mentions a 11%/14%+ speedup in his GDB python scripts with a
> patch that did something similar to only a specific set of TRY/CATCH
> calls.
>
> [1] - https://sourceware.org/ml/gdb-patches/2016-03/msg00351.html
>
> Tested on x86_64 Fedora 23, native and gdbserver.

Hi Pedro,

Almost a month ago, a few PPC64{BE,LE} buildslaves started failing to
upload the gdb.log/gdb.sum files to the buildmaster when running
testcases related to native-{extended-}gdbserver.  This was causing the
buildmaster to be unable to analyze regressions.  I was concerned about
this, but somehow I forgot to investigate the problem for a while and
only got back to it a couple of days ago.

Edjunior (Cc'ed) and I looked at the logs and found that the actual
problem was that the testsuite step was being terminated because it was
timing out.  Edjunior kindly looked at the machine and found several
zombie processes hanging (apparently all of them belonging to
gdb.threads/process-dies-while-handling-bp.exp, but Edjunior can correct
me if I'm wrong).  Since these machines are VMs, we thought a simple
reboot would probably fix the issue.  Unfortunately it didn't.

Anyway, after some more "investigation", I found that this strange
behaviour started happening after you pushed this specific patch.  You
can see, for example, the last 155 builds on one of the builders
affected, and you'll see that the 'Failed' message only happens after
this commit:

  <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-extended-gdbserver-m64?numbuilds=155>

You can also look at another builder, and see that the behaviour is the
same:

  <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-gdbserver-m64?numbuilds=200>

Or:

  <http://gdb-build.sergiodj.net/builders/Fedora-ppc64le-native-extended-gdbserver-m64?numbuilds=200>

(This last one being a PPC64LE).

As I said, this problem doesn't happen when we're not testing gdbserver
configurations.

I haven't investigated the problem further, and it may very well be
something unrelated to this patch (notice that, although the failure
happens several times, it's not deterministic), but I decided it was
a good thing to raise awareness.

I'll try to invest more time to figure out what's happening; if I find
anything else, I'll reply to this e-mail.  If you need access to one of
the buildslaves, please let me know and I'll see if I can arrange that
for you.

Thanks,

> gdb/ChangeLog:
> 2016-04-12  Pedro Alves  <palves@redhat.com>
>
> 	* common/common-exceptions.c (struct catcher) <buf>: Now a
> 	'jmp_buf' instead of SIGJMP_BUF.
> 	(exceptions_state_mc_init): Change return type to 'jmp_buf'.
> 	(throw_exception): Use longjmp instead of SIGLONGJMP.
> 	* common/common-exceptions.h: Include <setjmp.h> instead of
> 	"gdb_setjmp.h".
> 	(exceptions_state_mc_init): Change return type to 'jmp_buf'.
> 	[GDB_XCPT == GDB_XCPT_SJMP] (TRY): Use setjmp instead of
> 	SIGSETJMP.
> 	* cp-support.c: Include "gdb_setjmp.h".
> ---
>  gdb/ChangeLog                  | 13 +++++++++++++
>  gdb/common/common-exceptions.c |  6 +++---
>  gdb/common/common-exceptions.h |  8 ++++----
>  gdb/cp-support.c               |  2 +-
>  4 files changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/gdb/ChangeLog b/gdb/ChangeLog
> index 5dfd4b0..b750266 100644
> --- a/gdb/ChangeLog
> +++ b/gdb/ChangeLog
> @@ -1,5 +1,18 @@
>  2016-04-12  Pedro Alves  <palves@redhat.com>
>  
> +	* common/common-exceptions.c (struct catcher) <buf>: Now a
> +	'jmp_buf' instead of SIGJMP_BUF.
> +	(exceptions_state_mc_init): Change return type to 'jmp_buf'.
> +	(throw_exception): Use longjmp instead of SIGLONGJMP.
> +	* common/common-exceptions.h: Include <setjmp.h> instead of
> +	"gdb_setjmp.h".
> +	(exceptions_state_mc_init): Change return type to 'jmp_buf'.
> +	[GDB_XCPT == GDB_XCPT_SJMP] (TRY): Use setjmp instead of
> +	SIGSETJMP.
> +	* cp-support.c: Include "gdb_setjmp.h".
> +
> +2016-04-12  Pedro Alves  <palves@redhat.com>
> +
>  	* common/common-exceptions.c (exception_rethrow): Remove
>  	prepare_to_throw_exception call.
>  	* common/common-exceptions.h (prepare_to_throw_exception): Delete
> diff --git a/gdb/common/common-exceptions.c b/gdb/common/common-exceptions.c
> index 5ea8188..2e63862 100644
> --- a/gdb/common/common-exceptions.c
> +++ b/gdb/common/common-exceptions.c
> @@ -46,7 +46,7 @@ struct catcher
>  {
>    enum catcher_state state;
>    /* Jump buffer pointing back at the exception handler.  */
> -  SIGJMP_BUF buf;
> +  jmp_buf buf;
>    /* Status buffer belonging to the exception handler.  */
>    struct gdb_exception exception;
>    struct cleanup *saved_cleanup_chain;
> @@ -73,7 +73,7 @@ catcher_list_size (void)
>    return size;
>  }
>  
> -SIGJMP_BUF *
> +jmp_buf *
>  exceptions_state_mc_init (void)
>  {
>    struct catcher *new_catcher = XCNEW (struct catcher);
> @@ -275,7 +275,7 @@ throw_exception (struct gdb_exception exception)
>       be zero, by definition in defs.h.  */
>    exceptions_state_mc (CATCH_THROWING);
>    current_catcher->exception = exception;
> -  SIGLONGJMP (current_catcher->buf, exception.reason);
> +  longjmp (current_catcher->buf, exception.reason);
>  #else
>    if (exception.reason == RETURN_QUIT)
>      {
> diff --git a/gdb/common/common-exceptions.h b/gdb/common/common-exceptions.h
> index 54c6249..e21713c 100644
> --- a/gdb/common/common-exceptions.h
> +++ b/gdb/common/common-exceptions.h
> @@ -20,7 +20,7 @@
>  #ifndef COMMON_EXCEPTIONS_H
>  #define COMMON_EXCEPTIONS_H
>  
> -#include "gdb_setjmp.h"
> +#include <setjmp.h>
>  
>  /* Reasons for calling throw_exceptions().  NOTE: all reason values
>     must be less than zero.  enum value 0 is reserved for internal use
> @@ -142,7 +142,7 @@ struct gdb_exception
>     macros defined below.  */
>  
>  #if GDB_XCPT == GDB_XCPT_SJMP
> -extern SIGJMP_BUF *exceptions_state_mc_init (void);
> +extern jmp_buf *exceptions_state_mc_init (void);
>  extern int exceptions_state_mc_action_iter (void);
>  extern int exceptions_state_mc_action_iter_1 (void);
>  extern int exceptions_state_mc_catch (struct gdb_exception *, int);
> @@ -181,9 +181,9 @@ extern void exception_rethrow (void);
>  
>  #define TRY \
>       { \
> -       SIGJMP_BUF *buf = \
> +       jmp_buf *buf = \
>  	 exceptions_state_mc_init (); \
> -       SIGSETJMP (*buf); \
> +       setjmp (*buf); \
>       } \
>       while (exceptions_state_mc_action_iter ()) \
>         while (exceptions_state_mc_action_iter_1 ())
> diff --git a/gdb/cp-support.c b/gdb/cp-support.c
> index c7f5074..5662f86 100644
> --- a/gdb/cp-support.c
> +++ b/gdb/cp-support.c
> @@ -34,7 +34,7 @@
>  #include "cp-abi.h"
>  #include "namespace.h"
>  #include <signal.h>
> -
> +#include "gdb_setjmp.h"
>  #include "safe-ctype.h"
>  
>  #define d_left(dc) (dc)->u.s_binary.left
> -- 
> 2.5.5
  
Pedro Alves May 5, 2016, 11:22 a.m. UTC | #2
On 05/05/2016 02:11 AM, Sergio Durigan Junior wrote:

> As I said, this problem doesn't happen when we're not testing gdbserver
> configurations.
> 
> I haven't investigated the problem further, and it may very well be
> something unrelated to this patch (notice that, although the failure
> happens several times, it's not deterministic), but I decided it was
> a good thing to raise awareness.

So far, this looks unrelated to the patch in question to me.

Testing on gcc112 on the compile farm, (POWER8/PPC64, Fedora 21), I
do see the testsuite hanging too.  The whole testsuite runs, but then
a couple tests hang forever, it seems.  ps shows:

palves    72739  72012  0 10:31 pts/49   00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/process-dies-while-handling-bp gdb.threads/process-dies-while-handling-bp.exp --target_board=native-gdbserver
palves   157333 156965  0 10:30 pts/49   00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.base/cond-expr gdb.base/cond-expr.exp --target_board=native-gdbserver

So indeed one of them is  gdb.threads/process-dies-while-handling-bp.exp.

Attaching to the gdbserver process, we see it stuck here:

(gdb) bt
#0  0x00003fff96b5ddf4 in sigsuspend () from /lib64/libc.so.6
#1  0x0000000010049fa0 in linux_wait_for_event_filtered (wait_ptid=..., filter_ptid=..., wstatp=0x3fffc5d76608, options=1073741824)
    at ../../../src/gdb/gdbserver/linux-low.c:2709
#2  0x000000001004d208 in wait_for_sigstop () at ../../../src/gdb/gdbserver/linux-low.c:3904
#3  0x000000001004d97c in stop_all_lwps (suspend=0, except=0x0) at ../../../src/gdb/gdbserver/linux-low.c:4041
#4  0x00000000100466d8 in linux_kill (pid=82121) at ../../../src/gdb/gdbserver/linux-low.c:1345
#5  0x0000000010021024 in kill_inferior (pid=82121) at ../../../src/gdb/gdbserver/target.c:326
#6  0x000000001001baf4 in detach_or_kill_inferior_callback (entry=0x100335f45c0) at ../../../src/gdb/gdbserver/server.c:3388
#7  0x0000000010008f24 in for_each_inferior (list=0x100b79e8 <all_processes>, action=0x1001ba54 <detach_or_kill_inferior_callback(inferior_list_entry*)>)
    at ../../../src/gdb/gdbserver/inferiors.c:55
#8  0x000000001001bdcc in detach_or_kill_for_exit () at ../../../src/gdb/gdbserver/server.c:3449
#9  0x000000001001be2c in detach_or_kill_for_exit_cleanup (ignore=0x0) at ../../../src/gdb/gdbserver/server.c:3463
#10 0x0000000010040814 in do_my_cleanups (pmy_chain=0x100b0490 <cleanup_chain>, old_chain=0x10075310 <sentinel_cleanup>)
    at ../../../src/gdb/gdbserver/../common/cleanups.c:154
#11 0x0000000010040918 in do_cleanups (old_chain=0x10075310 <sentinel_cleanup>) at ../../../src/gdb/gdbserver/../common/cleanups.c:176
#12 0x000000001004144c in throw_exception_cxx (exception=...) at ../../../src/gdb/gdbserver/../common/common-exceptions.c:289
#13 0x0000000010041584 in throw_exception (exception=...) at ../../../src/gdb/gdbserver/../common/common-exceptions.c:317
#14 0x0000000010041778 in throw_it (reason=RETURN_QUIT, error=GDB_NO_ERROR, fmt=0x1006d820 "Quit", ap=0x3fffc5d76bf8 " D\327\305\377?")
    at ../../../src/gdb/gdbserver/../common/common-exceptions.c:373
#15 0x0000000010041810 in throw_vquit (fmt=0x1006d820 "Quit", ap=0x3fffc5d76bf8 " D\327\305\377?") at ../../../src/gdb/gdbserver/../common/common-exceptions.c:385
#16 0x00000000100418dc in throw_quit (fmt=0x1006d820 "Quit") at ../../../src/gdb/gdbserver/../common/common-exceptions.c:404
#17 0x000000001001cf04 in captured_main (argc=4, argv=0x3fffc5d77178) at ../../../src/gdb/gdbserver/server.c:3790
#18 0x000000001001cf94 in main (argc=4, argv=0x3fffc5d77178) at ../../../src/gdb/gdbserver/server.c:3804
(gdb) 

So gdbserver was quitting, and was trying to kill all child
processes along with it, and then it hangs.  Process 82121, the
one gdb server was trying to kill is actually dead already:

 [palves@gcc2-power8 src]$ cat /proc/82121/status  | grep State
 State:  Z (zombie)

That we mishandle the case of the process dying unexpectedly is
already known and it manifests in several different ways, so
that's not much surprising.  That's exactly the point of that
test in the first place.  

What _is_ surprising is that the testsuite framework doesn't
time out eventually...

I attached to the corresponding gdb process, and I see
that we're stuck in a loop sending "monitor exit" to gdbserver,
in rcmd:

10292         if (getpkt_sane (&rs->buf, &rs->buf_size, 0) == -1)
10293           { 
10294             /* Timeout.  Continue to (try to) read responses.
10295                This is better than stopping with an error, assuming the stub
10296                is still executing the (long) monitor command.
10297                If needed, the user can interrupt gdb using C-c, obtaining
10298                an effect similar to stop on timeout.  */
10299             continue;
10300           }

I mean, getpkt_sane is constantly timing out.  We can step through the code
and see that:

(gdb) finish
Run till exit from #0  do_ser_base_readchar (scb=0x1002eb57cc0, timeout=1) at ../../src/gdb/ser-base.c:345
0x0000000010068940 in generic_readchar (scb=0x1002eb57cc0, timeout=2, do_readchar=0x10068640 <do_ser_base_readchar(serial*, int)>) at ../../src/gdb/ser-base.c:424
424           ch = do_readchar (scb, timeout);
Value returned is $1 = -2
(gdb) finish
Run till exit from #0  0x0000000010068940 in generic_readchar (scb=0x1002eb57cc0, timeout=2, do_readchar=0x10068640 <do_ser_base_readchar(serial*, int)>)
    at ../../src/gdb/ser-base.c:424
0x0000000010068a2c in ser_base_readchar (scb=0x1002eb57cc0, timeout=2) at ../../src/gdb/ser-base.c:451
451       return generic_readchar (scb, timeout, do_ser_base_readchar);
Value returned is $2 = -2
(gdb) finish
...


While this code is debatable too, I think that expect/runtest/dejagnu
itself should be timing out, and then force-killing gdb anyway.

The still-running log shows:

$ tail testsuite/outputs/gdb.threads/process-dies-while-handling-bp/gdb.log
(gdb) PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: set breakpoint that evals false
continue &
Continuing.
(gdb) PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: continue &
KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749)
Remote debugging from host 127.0.0.1
gdbserver: reading register 0: No such process
Killing process(es): 82121
monitor exit
Ignoring packet error, continuing...
$ 

So all is self consistent.  

I just don't understand why doesn't dejagnu timeout.

I think the "monitor exit" is the one from
gdb/testsuite/lib/gdbserver-support.exp:gdb_exit.



The other hung test is "gdb.base/cond-expr.exp".  This one's more
mysterious.  Attaching to the gdb in question, I really see nothing.  gdb
is not debugging anything, and is just waiting for input.

However, from:

$ tail testsuite/outputs/gdb.base/cond-expr/gdb.log 
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
/home/palves/gdb/build/gdb/testsuite/../../gdb/gdb version  7.11.50.20160505-git -nw -nx -data-directory /home/palves/gdb/build/gdb/testsuite/../data-directory  -ex "set auto-connect-native-target off"

runtest completed at Thu May  5 10:31:20 2016
$ 

we see that that test did complete.  So I can't really explain that...

So in sum, we may have gdb or gdbserver bugs, but the framework
should be timing out and coping anyway.  Why isn't it?  I have no
clue atm.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 5dfd4b0..b750266 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,5 +1,18 @@ 
 2016-04-12  Pedro Alves  <palves@redhat.com>
 
+	* common/common-exceptions.c (struct catcher) <buf>: Now a
+	'jmp_buf' instead of SIGJMP_BUF.
+	(exceptions_state_mc_init): Change return type to 'jmp_buf'.
+	(throw_exception): Use longjmp instead of SIGLONGJMP.
+	* common/common-exceptions.h: Include <setjmp.h> instead of
+	"gdb_setjmp.h".
+	(exceptions_state_mc_init): Change return type to 'jmp_buf'.
+	[GDB_XCPT == GDB_XCPT_SJMP] (TRY): Use setjmp instead of
+	SIGSETJMP.
+	* cp-support.c: Include "gdb_setjmp.h".
+
+2016-04-12  Pedro Alves  <palves@redhat.com>
+
 	* common/common-exceptions.c (exception_rethrow): Remove
 	prepare_to_throw_exception call.
 	* common/common-exceptions.h (prepare_to_throw_exception): Delete
diff --git a/gdb/common/common-exceptions.c b/gdb/common/common-exceptions.c
index 5ea8188..2e63862 100644
--- a/gdb/common/common-exceptions.c
+++ b/gdb/common/common-exceptions.c
@@ -46,7 +46,7 @@  struct catcher
 {
   enum catcher_state state;
   /* Jump buffer pointing back at the exception handler.  */
-  SIGJMP_BUF buf;
+  jmp_buf buf;
   /* Status buffer belonging to the exception handler.  */
   struct gdb_exception exception;
   struct cleanup *saved_cleanup_chain;
@@ -73,7 +73,7 @@  catcher_list_size (void)
   return size;
 }
 
-SIGJMP_BUF *
+jmp_buf *
 exceptions_state_mc_init (void)
 {
   struct catcher *new_catcher = XCNEW (struct catcher);
@@ -275,7 +275,7 @@  throw_exception (struct gdb_exception exception)
      be zero, by definition in defs.h.  */
   exceptions_state_mc (CATCH_THROWING);
   current_catcher->exception = exception;
-  SIGLONGJMP (current_catcher->buf, exception.reason);
+  longjmp (current_catcher->buf, exception.reason);
 #else
   if (exception.reason == RETURN_QUIT)
     {
diff --git a/gdb/common/common-exceptions.h b/gdb/common/common-exceptions.h
index 54c6249..e21713c 100644
--- a/gdb/common/common-exceptions.h
+++ b/gdb/common/common-exceptions.h
@@ -20,7 +20,7 @@ 
 #ifndef COMMON_EXCEPTIONS_H
 #define COMMON_EXCEPTIONS_H
 
-#include "gdb_setjmp.h"
+#include <setjmp.h>
 
 /* Reasons for calling throw_exceptions().  NOTE: all reason values
    must be less than zero.  enum value 0 is reserved for internal use
@@ -142,7 +142,7 @@  struct gdb_exception
    macros defined below.  */
 
 #if GDB_XCPT == GDB_XCPT_SJMP
-extern SIGJMP_BUF *exceptions_state_mc_init (void);
+extern jmp_buf *exceptions_state_mc_init (void);
 extern int exceptions_state_mc_action_iter (void);
 extern int exceptions_state_mc_action_iter_1 (void);
 extern int exceptions_state_mc_catch (struct gdb_exception *, int);
@@ -181,9 +181,9 @@  extern void exception_rethrow (void);
 
 #define TRY \
      { \
-       SIGJMP_BUF *buf = \
+       jmp_buf *buf = \
 	 exceptions_state_mc_init (); \
-       SIGSETJMP (*buf); \
+       setjmp (*buf); \
      } \
      while (exceptions_state_mc_action_iter ()) \
        while (exceptions_state_mc_action_iter_1 ())
diff --git a/gdb/cp-support.c b/gdb/cp-support.c
index c7f5074..5662f86 100644
--- a/gdb/cp-support.c
+++ b/gdb/cp-support.c
@@ -34,7 +34,7 @@ 
 #include "cp-abi.h"
 #include "namespace.h"
 #include <signal.h>
-
+#include "gdb_setjmp.h"
 #include "safe-ctype.h"
 
 #define d_left(dc) (dc)->u.s_binary.left