possible fix for PR symtab/23010

Message ID 2c888612-11e8-b376-5797-b40fada46867@redhat.com
State New, archived
Headers

Commit Message

Keith Seitz May 25, 2018, 10:43 p.m. UTC
  [WARNING: Very long explanation enclosed. Skip to end if interested in conclusions.]

On 05/24/2018 07:48 AM, Keith Seitz wrote:
> On 05/23/2018 09:08 PM, Tom Tromey wrote:
>>>>>>> "Sergio" == Sergio Durigan Junior <sergiodj@redhat.com> writes:
>>
>> Sergio> Keith has performed a few tests today, and it seems that the patch
>> Sergio> doesn't fix the real issues reported by Fedora GDB users after all.
>> Sergio> I'm still deciding if it makes sense to ship this on Fedora GDB or
>> Sergio> not...  Sorry for requesting the backport, I hope it's still useful for
>> Sergio> GDB 8.1.1.
>>
>> Maybe the backport is flawed some way.  Or did Keith try git master?
>> I'm a bit curious to know what fails.
> 
> I'm working on git master with the 23010 patch installed. The patch doesn't help with rhbz 1574015 (which I am investigating). Fortunately, this bug has a reproducer.

Okay, I have a patch, but I still don't have a test case for it (yet?). The problem is that most of the time, the bug is latent. It is non-trivial to tickle the bug. [But WebKit does it *very well*.]

As a reminder, the symptom manifested here is the assertion that DICT_LANGUAGE == SYMBOL_LANGUAGE. The webkit reproducer in 1574015 shows that we are attempting to add a language_minimal symbol to a language_cplus dictionary.

The root cause of this, of course, is that SYMBOL_LANGUAGE shouldn't be language_minimal (duh). That's just the default "pretend" language that is set for the containing CU. So the question, is *why* isn't the CU's language set?

I changed add_symbol_to_list to print out a message when the symbol's language was language_minimal. WebKit shows many hundreds (thousands?) of these messages, so it is a prevalent problem -- not all of which actually cause any problems (that we know of).

After a lot of investigation, here is why this is happening.

The backtrace command (thread apply all bt full) that we're running is looking for the compunit containing the PC of the thread. That calls get_prev_frame several times. This function calls (eventually) dwarf2_frame_prev_register. That eventually ends up calling find_pc_compunit_symtab.

In this function (find_pc_sect_compunit_symtab actually), we loop over all compunits, calling the "quick" function dw2_find_pc_sect_compunit_symtab. That function calls dw2_instantiate_symtab to read in all the CU's symbols. Now the fun begins.

dw2_do_instantiate_symtab queues the per_cu for reading, using a default "pretend" language of language_minimal with the expectation that this will be set later.

The DIEs of this (only queued) CU are then processed.

The first DIE is DW_TAG_compile_unit. That's handled by read_file_scope.

(Nearly) The first thing read_file_scope does is:

  get_scope_pc_bounds (die, &lowpc, &highpc, cu);

This function loops over the children of the current DIE (a compile_unit), looking for bounds. The first such child is a subprogram, and we attempt to get its bounds. We use dwarf2_attr to get at DW_AT_high_pc.

This subprogram has DW_AT_specification set, so dwarf_attr (via follow_die_ref/follow_die_offset) will follow that, but follow_die_offset *also* attempts to load the containing CU for the spec DIE. That spec DIE lives inside a CU that is a partial_unit and has no language attribute. So it simply inherits the language from the CU that elicited the read. [That all happens in follow_die_offset.]

The original CU's language is still language_minimal -- we haven't gotten to the line in read_file_scope that actually sets the language yet!

And that is the cause of these problems. The call to prepare_one_comp_unit needs to be the *first* thing that is done when reading a CU so that the CU's language can be recorded (and inherited by any referenced partial_units).

So, alas, this is the near trivial patch to fix this dictionary/symbol assertion in insert_symbol_hashed:


This fixes 1574015 and all the -readnow problems reported against WebKit (such as rhbz1560010/symtab/23010).
[I have also verified that all my language_minimal printfs in add_symbol_to_list are now gone.]

But I still haven't come up with a reduced reproducer, and I'm exhausted, and not just from writing this longer-than-needed message. :-P

Keith
  

Comments

Tom Tromey May 27, 2018, 3:57 a.m. UTC | #1
>>>>> "Keith" == Keith Seitz <keiths@redhat.com> writes:

Keith> And that is the cause of these problems. The call to
Keith> prepare_one_comp_unit needs to be the *first* thing that is done
Keith> when reading a CU so that the CU's language can be recorded (and
Keith> inherited by any referenced partial_units).

Nice work debugging this.

Tom
  
Joel Brobecker May 30, 2018, 8:55 p.m. UTC | #2
> [WARNING: Very long explanation enclosed. Skip to end if interested in
> conclusions.]

This is extremely useful in understanding the source of the problem,
and therefore the solution. A bit thank you for writing it up.

> And that is the cause of these problems. The call to
> prepare_one_comp_unit needs to be the *first* thing that is done when
> reading a CU so that the CU's language can be recorded (and inherited
> by any referenced partial_units).
> 
> So, alas, this is the near trivial patch to fix this dictionary/symbol
> assertion in insert_symbol_hashed:

I agree with Tom; very nice investigative work, and the near trivial
patch feels like gravy.

> 
> diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
> index 49ce83ff20..0145c83b30 100644
> --- a/gdb/dwarf2read.c
> +++ b/gdb/dwarf2read.c
> @@ -11470,6 +11470,8 @@ read_file_scope (struct die_info *die, struct dwarf2_cu *cu)
>    struct die_info *child_die;
>    CORE_ADDR baseaddr;
>  
> +  prepare_one_comp_unit (cu, die, cu->language);
> +
>    baseaddr = ANOFFSET (objfile->section_offsets, SECT_OFF_TEXT (objfile));
>  
>    get_scope_pc_bounds (die, &lowpc, &highpc, cu);
> @@ -11482,8 +11484,6 @@ read_file_scope (struct die_info *die, struct dwarf2_cu *cu)
>  
>    file_and_directory fnd = find_file_and_directory (die, cu);
>  
> -  prepare_one_comp_unit (cu, die, cu->language);
> -
>    /* The XLCL doesn't generate DW_LANG_OpenCL because this attribute is not
>       standardised yet.  As a workaround for the language detection we fall
>       back to the DW_AT_producer string.  */
> 

I looked at the patch, and in particular, checked to see if there
is anything we were doing ahead of the call to prepare_one_comp_unit
that prepare_one_comp_unit would also need. But this function
does very little, basically reading the DW_AT_language and
the DW_AT_producer attributes. So I don't see how this could have
some negative side-effect, or any hints of why it might have been
placed slight later in read_file_scope's body.

A testcase in this particular case, where the order in which we do
things is important, would be very useful in avoiding a regression.
However, considering how specific the conditions need to be in order
to trigger the bug, and considering the current timing, I (personally)
think that we should not let best be the enemy of good, and allow
this patch in if Keith fails to create a testcase after a reasonable
amount of effort.
  

Patch

diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
index 49ce83ff20..0145c83b30 100644
--- a/gdb/dwarf2read.c
+++ b/gdb/dwarf2read.c
@@ -11470,6 +11470,8 @@  read_file_scope (struct die_info *die, struct dwarf2_cu *cu)
   struct die_info *child_die;
   CORE_ADDR baseaddr;
 
+  prepare_one_comp_unit (cu, die, cu->language);
+
   baseaddr = ANOFFSET (objfile->section_offsets, SECT_OFF_TEXT (objfile));
 
   get_scope_pc_bounds (die, &lowpc, &highpc, cu);
@@ -11482,8 +11484,6 @@  read_file_scope (struct die_info *die, struct dwarf2_cu *cu)
 
   file_and_directory fnd = find_file_and_directory (die, cu);
 
-  prepare_one_comp_unit (cu, die, cu->language);
-
   /* The XLCL doesn't generate DW_LANG_OpenCL because this attribute is not
      standardised yet.  As a workaround for the language detection we fall
      back to the DW_AT_producer string.  */