Sort threads for thread apply all (bt)

Message ID 20150115183316.GA16405@host2.jankratochvil.net
State New, archived
Headers

Commit Message

Jan Kratochvil Jan. 15, 2015, 6:33 p.m. UTC
  Hi,

downstream Fedora request:
	Please make it easier to find the backtrace of the crashing thread
	https://bugzilla.redhat.com/show_bug.cgi?id=1024504

Currently after loading a core file GDB prints:

Core was generated by `./threadcrash1'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000040062f in start (arg=0x0) at threadcrash1.c:8
8	*(volatile int *)0=0;
(gdb) _

there is nowhere seen which of the threads had crashed.  In reality GDB always
numbers that thread as #1 and it is the current thread that time.  But after
dumping all the info into a file for later analysis it is no longer obvious.
'thread apply all bt' even puts the thread #1 to the _end_ of the output!!!

Should GDB always print after loading a core file what "thread" command would
print?
[Current thread is 1 (Thread 0x7fcbe28fe700 (LWP 15453))]

Or should the "thread" output be printed only in non-interactive mode?

Or something different?

Or is the current behavior OK as is and the tools calling GDB in batch mode
should indicate the thread #1 on their own?

------------------------------------------------------------------------------

I find maybe as good enough and with no risk of UI change flamewar to just
sort the threads by their number.  Currently they are printed as they happen
in the internal GDB list which has no advantage.  Printing thread #1 as the
first one with assumed 'thread apply all bt' (after the core file is loaded)
should make the complaint resolved I guess.

No regressions on {x86_64,x86_64-m32,i686}-fedora22pre-linux-gnu.


Jan


diff of crashing different thread in 2-threaded program:
------------------------------------------------------------------------------
-Reading symbols from /home/jkratoch/t/threadcrash1...done.
-[New LWP 15453]
-[New LWP 15446]
+Reading symbols from /home/jkratoch/t/threadcrash2...done.
+[New LWP 29285]
+[New LWP 29288]
 [Thread debugging using libthread_db enabled]
 Using host libthread_db library "/lib64/libthread_db.so.1".
-Core was generated by `./threadcrash1'.
+Core was generated by `./threadcrash2'.
 Program terminated with signal SIGSEGV, Segmentation fault.
-#0  0x000000000040062f in start (arg=0x0) at threadcrash1.c:8
-8      *(volatile int *)0=0;
+#0  0x0000000000400684 in main () at threadcrash2.c:19
+19     *(volatile int *)0=0;
 (gdb) thread apply all bt
 
-Thread 2 (Thread 0x7fcbe2900700 (LWP 15446)):
+Thread 2 (Thread 0x7feb9c14c700 (LWP 29288)):
 Python Exception <type 'exceptions.ImportError'> No module named gdb.frames:
 #0  0x0000003424ec491d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
 #1  0x0000003424ec47b4 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
-#2  0x000000000040068a in main () at threadcrash1.c:19
+#2  0x000000000040062a in start (arg=0x0) at threadcrash2.c:7
+#3  0x000000342560752a in start_thread (arg=0x7feb9c14c700) at pthread_create.c:310
+#4  0x0000003424f0079d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
 
-Thread 1 (Thread 0x7fcbe28fe700 (LWP 15453)):
+Thread 1 (Thread 0x7feb9c14e700 (LWP 29285)):
 Python Exception <type 'exceptions.ImportError'> No module named gdb.frames:
-#0  0x000000000040062f in start (arg=0x0) at threadcrash1.c:8
-#1  0x000000342560752a in start_thread (arg=0x7fcbe28fe700) at pthread_create.c:310
-#2  0x0000003424f0079d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
+#0  0x0000000000400684 in main () at threadcrash2.c:19
 (gdb) thread
-[Current thread is 1 (Thread 0x7fcbe28fe700 (LWP 15453))]
+[Current thread is 1 (Thread 0x7feb9c14e700 (LWP 29285))]
 (gdb) q
gdb/ChangeLog
2015-01-15  Jan Kratochvil  <jan.kratochvil@redhat.com>

	* thread.c (tp_array_compar): New function.
	(thread_apply_all_command): Sort tp_array using it.
#include <pthread.h>
#include <assert.h>
#include <unistd.h>

static void *start (void *arg)
{
  sleep (1);
*(volatile int *)0=0;
  return arg;
}

int main (void)
{
  pthread_t thread1;
  int i;

  i = pthread_create (&thread1, NULL, start, NULL);
  assert (i == 0);
  sleep (100);
  i = pthread_join (thread1, NULL);
  assert (i == 0);

  return 0;
}
#include <pthread.h>
#include <assert.h>
#include <unistd.h>

static void *start (void *arg)
{
  sleep (100);
  return arg;
}

int main (void)
{
  pthread_t thread1;
  int i;

  i = pthread_create (&thread1, NULL, start, NULL);
  assert (i == 0);
  sleep (1);
*(volatile int *)0=0;
  i = pthread_join (thread1, NULL);
  assert (i == 0);

  return 0;
}
  

Comments

Doug Evans Jan. 15, 2015, 7:29 p.m. UTC | #1
On Thu, Jan 15, 2015 at 10:33 AM, Jan Kratochvil
<jan.kratochvil@redhat.com> wrote:
> Hi,
>
> downstream Fedora request:
>         Please make it easier to find the backtrace of the crashing thread
>         https://bugzilla.redhat.com/show_bug.cgi?id=1024504
>
> Currently after loading a core file GDB prints:
>
> Core was generated by `./threadcrash1'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x000000000040062f in start (arg=0x0) at threadcrash1.c:8
> 8       *(volatile int *)0=0;
> (gdb) _
>
> there is nowhere seen which of the threads had crashed.  In reality GDB always
> numbers that thread as #1 and it is the current thread that time.  But after
> dumping all the info into a file for later analysis it is no longer obvious.
> 'thread apply all bt' even puts the thread #1 to the _end_ of the output!!!
>
> Should GDB always print after loading a core file what "thread" command would
> print?
> [Current thread is 1 (Thread 0x7fcbe28fe700 (LWP 15453))]

Sounds reasonable to me.
Though there is the concern to not even talk about threads if there are "none".
So maybe only print that if there is more than one thread?

> Or should the "thread" output be printed only in non-interactive mode?
>
> Or something different?
>
> Or is the current behavior OK as is and the tools calling GDB in batch mode
> should indicate the thread #1 on their own?
>
> ------------------------------------------------------------------------------
>
> I find maybe as good enough and with no risk of UI change flamewar to just
> sort the threads by their number.  Currently they are printed as they happen
> in the internal GDB list which has no advantage.  Printing thread #1 as the
> first one with assumed 'thread apply all bt' (after the core file is loaded)
> should make the complaint resolved I guess.
>
> No regressions on {x86_64,x86_64-m32,i686}-fedora22pre-linux-gnu.

No objection to sorting the list, but if thread #1 is the important one,
then a concern could be it'll have scrolled off the screen (such a
concern has been voiced in another thread in another context),
and if not lost (say it's in an emacs buffer) one would still have
to scroll back to see it.
So one *could* still want #1 to be last.
Do we want an option to choose the sort direction?
[I wouldn't make it a global parameter, just an option to
thread apply.]
  

Patch

diff --git a/gdb/thread.c b/gdb/thread.c
index ed20fbe..cdab5d8 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1382,6 +1382,17 @@  make_cleanup_restore_current_thread (void)
 			    restore_current_thread_cleanup_dtor);
 }
 
+/* Sort an array for struct thread_info pointers by their ascending NUM.  */
+
+static int
+tp_array_compar (const void *ap_voidp, const void *bp_voidp)
+{
+  const struct thread_info *const *ap = ap_voidp;
+  const struct thread_info *const *bp = bp_voidp;
+
+  return ((*ap)->num > (*bp)->num) - ((*ap)->num < (*bp)->num);
+}
+
 /* Apply a GDB command to a list of threads.  List syntax is a whitespace
    seperated list of numbers, or ranges, or the keyword `all'.  Ranges consist
    of two numbers seperated by a hyphen.  Examples:
@@ -1431,6 +1442,8 @@  thread_apply_all_command (char *cmd, int from_tty)
           i++;
         }
 
+      qsort (tp_array, i, sizeof (*tp_array), tp_array_compar);
+
       make_cleanup (set_thread_refcount, &ta_cleanup);
 
       for (k = 0; k != i; k++)