[v2] Fix asertion failure while analysing core files in AIX with terminated threads.

Message ID 20260324141545.32330-2-akamath996@gmail.com
State New
Headers
Series [v2] Fix asertion failure while analysing core files in AIX with terminated threads. |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Test passed

Commit Message

Aditya Vidyadhar Kamath March 24, 2026, 2:15 p.m. UTC
  From: Aditya Vidyadhar Kamath <aditya.kamath1@ibm.com>

If we analyse core files today in AIX ( few of them ) we get,
  regcache.c:432: internal-error: get_thread_regcache:
  Assertion `thread->state != THREAD_EXITED' failed.

The reason being the aix-thread.c file where root cause is the sync_threadlists()
function. When reading an AIX core file, threads are reported by
libpthread library as being in PST_TERM (terminated) state, which
is correct since process crashed. However, sync_threadlists() was
calling delete_thread() for these terminated threads, marking them
as THREAD_EXITED in GDBs internal state.

Later, when GDB tried to fetchregisters or access frame information
for these threads during core file analysis, it would hit an
assertion in get_thread_regcache() that prevents accessing exited threads.

In AIX we see this in 7.3 from any python3 core file dumps.

The fix is to call sync_threadlists () only for a program in execution and
not for core files.
---
 gdb/aix-thread.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
  

Comments

Simon Marchi March 24, 2026, 4:12 p.m. UTC | #1
On 3/24/26 10:15 AM, Aditya Vidyadhar Kamath wrote:
> From: Aditya Vidyadhar Kamath <aditya.kamath1@ibm.com>
> 
> If we analyse core files today in AIX ( few of them ) we get,
>   regcache.c:432: internal-error: get_thread_regcache:
>   Assertion `thread->state != THREAD_EXITED' failed.
> 
> The reason being the aix-thread.c file where root cause is the sync_threadlists()
> function. When reading an AIX core file, threads are reported by
> libpthread library as being in PST_TERM (terminated) state, which
> is correct since process crashed. However, sync_threadlists() was
> calling delete_thread() for these terminated threads, marking them
> as THREAD_EXITED in GDBs internal state.

Looking at how it works on Linux, the threads are in state
THREAD_STOPPED when looking at a core dump.  I think that THREAD_STOPPED
makes sense, because when looking at a core, you are looking at a
snapshot of the process state when it was still alive, with all threads
stopped.

Could the AIX target do the same?  I think it would be nice for
consistency across targets.

Simon
  
Ulrich Weigand March 25, 2026, 11:56 a.m. UTC | #2
Simon Marchi <simon.marchi@polymtl.ca> wrote:
>On 3/24/26 10:15 AM, Aditya Vidyadhar Kamath wrote:
> >From: Aditya Vidyadhar Kamath <aditya.kamath1@ibm.com>
> >
> >If we analyse core files today in AIX ( few of them ) we get,
> >  regcache.c:432: internal-error: get_thread_regcache:
> >  Assertion `thread->state != THREAD_EXITED' failed.
> 
> >The reason being the aix-thread.c file where root cause is the
> >sync_threadlists()
> >function. When reading an AIX core file, threads are reported by
> >libpthread library as being in PST_TERM (terminated) state, which
> >is correct since process crashed. However, sync_threadlists() was
> >calling delete_thread() for these terminated threads, marking them
> >as THREAD_EXITED in GDBs internal state.
>
>Looking at how it works on Linux, the threads are in state
>THREAD_STOPPED when looking at a core dump.  I think that
>THREAD_STOPPED
>makes sense, because when looking at a core, you are looking at a
>snapshot of the process state when it was still alive, with all
>threads stopped.

Isn't this actually what happens with Aditya's latest patch?
Given that sync_threadlists is no longer called, all threads
should remain in the state common code set up for them initially,
which I think would be THREAD_STOPPED?

Bye,
Ulrich
  
Simon Marchi March 25, 2026, 3:17 p.m. UTC | #3
On 2026-03-25 07:56, Ulrich Weigand wrote:
> Simon Marchi <simon.marchi@polymtl.ca> wrote:
>> On 3/24/26 10:15 AM, Aditya Vidyadhar Kamath wrote:
>>> From: Aditya Vidyadhar Kamath <aditya.kamath1@ibm.com>
>>>
>>> If we analyse core files today in AIX ( few of them ) we get,
>>>   regcache.c:432: internal-error: get_thread_regcache:
>>>   Assertion `thread->state != THREAD_EXITED' failed.
>>
>>> The reason being the aix-thread.c file where root cause is the
>>> sync_threadlists()
>>> function. When reading an AIX core file, threads are reported by
>>> libpthread library as being in PST_TERM (terminated) state, which
>>> is correct since process crashed. However, sync_threadlists() was
>>> calling delete_thread() for these terminated threads, marking them
>>> as THREAD_EXITED in GDBs internal state.
>>
>> Looking at how it works on Linux, the threads are in state
>> THREAD_STOPPED when looking at a core dump.  I think that
>> THREAD_STOPPED
>> makes sense, because when looking at a core, you are looking at a
>> snapshot of the process state when it was still alive, with all
>> threads stopped.
> 
> Isn't this actually what happens with Aditya's latest patch?
> Given that sync_threadlists is no longer called, all threads
> should remain in the state common code set up for them initially,
> which I think would be THREAD_STOPPED?
> 
> Bye,
> Ulrich

Ah ok, I didn't catch that, but now that I re-read the commit message it
indeed sounds like it, thanks.

Simon
  
Ulrich Weigand March 25, 2026, 4:06 p.m. UTC | #4
Aditya Vidyadhar Kamath <akamath996@gmail.com> wrote:

>-  sync_threadlists (pid);
>+  /* Attempt to sync_threadlists () only when debugging object files
>+     and not core files since list of threads never change for core
>+     files.  */
>+  if (target_has_execution ())
>+    sync_threadlists (pid);

This is OK.

Thanks,
Ulrich
  
Aditya Kamath March 26, 2026, 10:10 a.m. UTC | #5
Hi Ulrich, Simon and community members,

Thank you Ulrich and Simon for your feedback.

>>+  if (target_has_execution ())
>>+    sync_threadlists (pid);

>This is OK.

Thanks. I have pushed the same.

https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=edf0ae3b9f1ca23daaa55628683edc9a3360286f
  

Patch

diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c
index c2e6b6d2bd6..5469ddca4f7 100644
--- a/gdb/aix-thread.c
+++ b/gdb/aix-thread.c
@@ -884,7 +884,11 @@  pd_update (pid_t pid)
   if (status != PTHDB_SUCCESS)
     return ptid_t (pid);
 
-  sync_threadlists (pid);
+  /* Attempt to sync_threadlists () only when debugging object files
+     and not core files since list of threads never change for core
+     files.  */
+  if (target_has_execution ())
+    sync_threadlists (pid);
 
   /* Define "current thread" as one that just received a trap signal.  */