[2/2] Fix AIX core file handling: prevent crashes during GDB quit
Checks
| Context |
Check |
Description |
| linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gdb_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gdb_check--master-arm |
success
|
Test passed
|
| linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 |
success
|
Test passed
|
Commit Message
From: Aditya Vidyadhar Kamath <aditya.kamath1@ibm.com>
This patch fixes two root causes that caused GDB to crash in AIX
while quitting debugger both while analysing AIX core files.
They are:
1: gdb/aix-thread.c: (pd_disable function) When debugging core files,
all threads are terminated and the pthdb session data is invalid.
Calling pthdb_session_destroy() on this invalid session causes
undefined behaviour. So check if it is a running process and then
only clear the session data. For core files do not do it.
2: bfd/coffgen.c: (_bfd_coff_free_cached_info): Exclude core files
AIX core files use the XCOFF format which have different tdata
structure (core_dumpxx) than COFF objects (coff_tdata). The
function was incorrectly treating core file tdata as COFF data
and attempting to free hash tables that don't exist in the core
dump structure, resulting in segment faults. So restrict to
bfd_object format only.
We can see this problem when quit after debugging a large core file
in AIX 7.3.
Ex:
Program terminated with signal SIGSEGV, Segmentation fault.
from /opt/freeware/lib/libpython3.9.a(libpython3.9.so)
(gdb)q
Fatal signal: Segmentation fault
----- Backtrace -----
0x1009fbffb ???
0x1009fc11f ???
0x1005c3587 ???
0x1005c3833 ???
0x4fdf ???
---
bfd/coffgen.c | 3 +--
gdb/aix-thread.c | 7 ++++++-
2 files changed, 7 insertions(+), 3 deletions(-)
Comments
Aditya Vidyadhar Kamath <akamath996@gmail.com> wrote:
>1: gdb/aix-thread.c: (pd_disable function) When debugging core files,
> all threads are terminated and the pthdb session data is
invalid.
> Calling pthdb_session_destroy() on this invalid session causes
> undefined behaviour. So check if it is a running process and
then
> only clear the session data. For core files do not do it.
I do not understand this. Why is the session invalid? We created
that session by calling pthdb_session_init in pd_activate, and that
call must have returned PTHDB_SUCCESS. Why would the resulting
session be invalid in a way that pthdb_session_destroy crashes?
Not calling this is simply a memory leak here ...
>2: bfd/coffgen.c: (_bfd_coff_free_cached_info): Exclude core files
> AIX core files use the XCOFF format which have different tdata
> structure (core_dumpxx) than COFF objects (coff_tdata). The
> function was incorrectly treating core file tdata as COFF data
> and attempting to free hash tables that don't exist in the
core
> dump structure, resulting in segment faults. So restrict to
> bfd_object format only.
Doesn't AIX use XCOFF for everything, objects and core files? Why
would this COFF-specific routine be invoked at all on AIX? Conversely,
for actual COFF platforms, don't we need to do this cleanup also for
core files?
In any case, this part of the patch touches binutils and needs to
be reviewed on the binutils mailing list.
Bye,
Ulrich
@@ -3310,8 +3310,7 @@ _bfd_coff_free_cached_info (bfd *abfd)
struct coff_tdata *tdata;
if (bfd_family_coff (abfd)
- && (bfd_get_format (abfd) == bfd_object
- || bfd_get_format (abfd) == bfd_core)
+ && bfd_get_format (abfd) == bfd_object
&& (tdata = coff_data (abfd)) != NULL)
{
if (tdata->section_by_index)
@@ -1003,7 +1003,12 @@ pd_disable (inferior *inf)
return;
if (!data->pd_active)
return;
- pthdb_session_destroy (data->pd_session);
+ /* For core files, all threads are terminated. Calling
+ pthdb_session_destroy (data->pd_session) on it is incorrect
+ and will cause undefined behaviour. So skip it since we
+ are cleaning. For binaries clean up. */
+ if (target_has_execution ())
+ pthdb_session_destroy (data->pd_session);
pid_to_prc (&inferior_ptid);
data->pd_active = 0;