[1/2] Fix asertion failure while analysing core files in AIX with terminated threads.

Message ID 20260324095213.46219-2-akamath996@gmail.com
State New
Headers
Series [1/2] Fix asertion failure while analysing core files in AIX with terminated threads. |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Test passed

Commit Message

Aditya Vidyadhar Kamath March 24, 2026, 9:52 a.m. UTC
  From: Aditya Vidyadhar Kamath <aditya.kamath1@ibm.com>

If we analyse core files today in AIX ( few of them ) we get,
  regcache.c:432: internal-error: get_thread_regcache:
  Assertion `thread->state != THREAD_EXITED' failed.

The reason being the aix-thread.c file where root cause is the sync_threadlists()
function. When reading an AIX core file, threads are reported by
libpthread library as being in PST_TERM (terminated) state, which
is correct since process crashed. However, sync_threadlists() was
calling delete_thread() for these terminated threads, marking them
as THREAD_EXITED in GDBs internal state.

Later, when GDB tried to fetchregisters or access frame information
for these threads during core file analysis, it would hit an
assertion in get_thread_regcache() that prevents accessing exited threads.

The fix is to only delte threads which are terminated when we have
execution i.e. debugging a binary. For a core file, we need to keep
all threads, even terminated ones, so they can be analysed.

In AIX we see this in 7.3 from any python3.12 core file dumps.

This patch fixes the same.
---
 gdb/aix-thread.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)
  

Comments

Ulrich Weigand March 24, 2026, noon UTC | #1
Aditya Vidyadhar Kamath <akamath996@gmail.com> wrote:

>If we analyse core files today in AIX ( few of them ) we get,
>  regcache.c:432: internal-error: get_thread_regcache:
>  Assertion `thread->state != THREAD_EXITED' failed.
>
>The reason being the aix-thread.c file where root cause is the
>sync_threadlists()
>function. When reading an AIX core file, threads are reported by
>libpthread library as being in PST_TERM (terminated) state, which
>is correct since process crashed. However, sync_threadlists() was
>calling delete_thread() for these terminated threads, marking them
>as THREAD_EXITED in GDBs internal state.

I guess the more fundamental question would be, do we even need
sync_threadlists() at all when debugging a core file?  Here,
the list of threads can never change, so what is the point of
attempting to sync it?  Does it work if you just never call
sync_threadlists() for core files (or have it return early)?

Bye,
Ulrich
  
Aditya Kamath March 24, 2026, 12:43 p.m. UTC | #2
Hi Ulrich and community members,

>I guess the more fundamental question would be, do we even need
>sync_threadlists() at all when debugging a core file?  Here,
>the list of threads can never change, so what is the point of
>attempting to sync it?  Does it work if you just never call
>sync_threadlists() for core files (or have it return early)?

Yes Ulrich, you are right.  We need not call sync_threadlists ().

# git diff
diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c
index c2e6b6d2bd6..50065c0d8a7 100644
--- a/gdb/aix-thread.c
+++ b/gdb/aix-thread.c
@@ -884,7 +884,11 @@ pd_update (pid_t pid)
   if (status != PTHDB_SUCCESS)
     return ptid_t (pid);

-  sync_threadlists (pid);
+  /* Attempt to sync_threadlists () only when debugging process files,
+     not core files since list of threads never change for core files.
+  */
+  if (target_has_execution ())
+    sync_threadlists (pid);

   /* Define "current thread" as one that just received a trap signal.  */

With this patch right now,

I am able to see my core contents like below. That is the root cause of this issue. Let me know if you are okay and I will send a v2 of this patch.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0900000000e2b25c in .PyThreadState_DeleteCurrent ()
   from /opt/freeware/lib/libpython3.9.a(libpython3.9.so)
(gdb) bt
#0  0x0900000000e2b25c in .PyThreadState_DeleteCurrent ()
   from /opt/freeware/lib/libpython3.9.a(libpython3.9.so)
#1  0x090000000de52dd0 in _Py_M__importlib_bootstrap ()
   from /opt/freeware/lib64/libpython3.9.a(libpython3.9.so)
#2  0x0900000000e2cb84 in ._PyObject_CheckCrossInterpreterData ()
   from /opt/freeware/lib/libpython3.9.a(libpython3.9.so)
#3  0x09001000a0b6d968 in ?? () from /opt/freeware/lib/libpython3.9.a(libpython3.9.so)
#4  0x09000000031dcc10 in ?? ()
#5  0x090000000dda8b50 in ._PyImport_LoadDynamicModuleWithSpec ()
   from /opt/freeware/lib64/libpython3.9.a(libpython3.9.so)
#6  0x090000000dbebf84 in ._PyImport_FindExtensionObject ()
   from /opt/freeware/lib64/libpython3.9.a(libpython3.9.so)

Thanks and regards,
Aditya.
  
Ulrich Weigand March 24, 2026, 1:42 p.m. UTC | #3
Aditya Kamath <Aditya.Kamath1@ibm.com> wrote:

>-  sync_threadlists (pid);
>+  /* Attempt to sync_threadlists () only when debugging process
>files,
>+     not core files since list of threads never change for core
>files.
>+  */
>+  if (target_has_execution ())
>+    sync_threadlists (pid);

This makes sense to me, yes.

Bye,
Ulrich
  

Patch

diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c
index c2e6b6d2bd6..e891f510e08 100644
--- a/gdb/aix-thread.c
+++ b/gdb/aix-thread.c
@@ -830,8 +830,10 @@  sync_threadlists (pid_t pid)
 		private_thread_info_up (priv));
 	}
 
-      /* The thread is terminated. Remove it.  */
-      if (state == PST_TERM)
+      /* If the thread is terminated remove it, but only if it is a binary
+	 and has execution.  If it is a core file, keep terminated threads
+	 so we can analyse them.  */
+      if (target_has_execution () && state == PST_TERM)
 	{
 	  thread_info *thr = proc_target->find_thread (ptid);
 	  gdb_assert (thr != nullptr);
@@ -847,16 +849,20 @@  sync_threadlists (pid_t pid)
        where in the end after the threadfunc breakpoint is hit, the
        thread exits and gets into a PST_UNKNOWN state.  So this thread
        will not run in the above for loop.  Therefore the below for loop
-       is to manually delete such threads.  */
-    for (thread_info &it : all_threads_safe ())
+       will manually delete such threads. We only do the for binaries in
+       execution.  For core files keep all threads for analysis.  */
+    if (target_has_execution ())
       {
-	aix_thread_info *priv = get_aix_thread_info (&it);
-	if (in_queue_threads.count (priv->pdtid) == 0
-		&& in_thread_list (proc_target, it.ptid)
-		&& pid == it.ptid.pid ())
+	for (thread_info &it : all_threads_safe ())
 	  {
-	    delete_thread (&it);
-	    data->exited_threads.insert (priv->pdtid);
+	    aix_thread_info *priv = get_aix_thread_info (&it);
+	    if (in_queue_threads.count (priv->pdtid) == 0
+		    && in_thread_list (proc_target, it.ptid)
+		    && pid == it.ptid.pid ())
+	      {
+		delete_thread (&it);
+		data->exited_threads.insert (priv->pdtid);
+	      }
 	  }
       }
 }