[v2] Really fix gdb/23010: SYMBOL_LANGUAGE != DICT_LANGUAGE assertion

Message ID 20180627190341.24442-1-keiths@redhat.com
State New, archived
Headers

Commit Message

Keith Seitz June 27, 2018, 7:03 p.m. UTC
  This patch is another attempt at really fixing the multitude of assertions
being seen where symbols of one language are being added to symbol lists of
another language.

In short, this is happening because read_file_scope was reading DIEs without
having its language set. As a result, the default language,
language_minimal, was being passed to imported DIE trees and, in certain
specific situations, causing the SYMBOL_LANGUAGE != DICT_LANGUAGE assertion
being widely reported.

For a more thorough explanation, see the following mailing list
post:

  https://sourceware.org/ml/gdb-patches/2018-05/msg00703.html

Since a test reproducer for this has been so elusive, this patch also adds a
wrapper function around add_symbol_to_list which asserts when adding a
symbol of one language to a list containing symbols of a different language.

Differences from v1:
- Removed complaint variation
- Updated comments
- Fixed typo

gdb/ChangeLog:

	PR gdb/23010
	* dwarf2read.c (dw2_add_symbol_to_list): New function.
	(fixup_go_packaging, new_symbol): Use dw2_add_symbol_to_list
	instead of add_symbol_to_list.
	(read_file_scope): Call prepare_one_comp_unit before reading
	any other DIEs.
---
 gdb/dwarf2read.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)
  

Comments

Simon Marchi July 11, 2018, 1:17 a.m. UTC | #1
Hi Keith,

On 2018-06-27 03:03 PM, Keith Seitz wrote:
> This patch is another attempt at really fixing the multitude of assertions
> being seen where symbols of one language are being added to symbol lists of
> another language.
> 
> In short, this is happening because read_file_scope was reading DIEs without
> having its language set. As a result, the default language,
> language_minimal, was being passed to imported DIE trees and, in certain
> specific situations, causing the SYMBOL_LANGUAGE != DICT_LANGUAGE assertion
> being widely reported.
> 
> For a more thorough explanation, see the following mailing list
> post:
> 
>   https://sourceware.org/ml/gdb-patches/2018-05/msg00703.html

I wouldn't mind if you included the information here.

> Since a test reproducer for this has been so elusive, this patch also adds a
> wrapper function around add_symbol_to_list which asserts when adding a
> symbol of one language to a list containing symbols of a different language.

I have a question about the assertion vs complaint.  Is it logically
impossible (putting aside other GDB bugs) for this assertion to fail?
Or is it possible now to feed bad debug info to GDB that will trigger
this new assert?  If it's the former, then no problem, if it's the later
then an assertion is not appropriate.

> Differences from v1:
> - Removed complaint variation
> - Updated comments
> - Fixed typo
> 
> gdb/ChangeLog:
> 
> 	PR gdb/23010
> 	* dwarf2read.c (dw2_add_symbol_to_list): New function.
> 	(fixup_go_packaging, new_symbol): Use dw2_add_symbol_to_list
> 	instead of add_symbol_to_list.
> 	(read_file_scope): Call prepare_one_comp_unit before reading
> 	any other DIEs.
> ---
>  gdb/dwarf2read.c | 29 +++++++++++++++++++++++------
>  1 file changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
> index 4ad0527406..ba70315957 100644
> --- a/gdb/dwarf2read.c
> +++ b/gdb/dwarf2read.c
> @@ -9701,6 +9701,24 @@ compute_delayed_physnames (struct dwarf2_cu *cu)
>    cu->method_list.clear ();
>  }
>  
> +/* A wrapper for add_symbol_to_list to ensure that SYMBOL's language is
> +   the same as all other symbols in LISTHEAD.  If a new symbol is added
> +   with a different language, this function asserts.  */
> +
> +static inline void
> +dw2_add_symbol_to_list (struct symbol *symbol, struct pending **listhead)
> +{
> +  /* Only assert if LISTHEAD already contains symbols of a different
> +     language (dict_create_hashed/insert_symbol_hashed requires that all
> +     symbols in this list are of the same language).  */
> +  gdb_assert ((*listhead) == NULL
> +	      || (*listhead)->nsyms == 0
> +	      || (SYMBOL_LANGUAGE ((*listhead)->symbol[0])
> +		  == SYMBOL_LANGUAGE (symbol)));

I wonder if it's actually possible to have (*listhead)->nsyms == 0, since as
soon as a "struct pending" is allocated, there is at least one sym put into
it (in add_symbol_to_list).

It's probably a bit hard to tell for older debug formats, but is it only for
DWARF that this condition must hold?  Did you consider putting this assertion
directly in add_symbol_to_list?  I'm not saying it's the right thing to do, I'm
just asking.

Thanks,

Simon
  
Keith Seitz July 12, 2018, 3:47 p.m. UTC | #2
On 07/10/2018 06:17 PM, Simon Marchi wrote:
> I wouldn't mind if you included the information here.

I will certainly include a redux of that analysis in the commit log.

> I have a question about the assertion vs complaint.  Is it logically
> impossible (putting aside other GDB bugs) for this assertion to fail?
> Or is it possible now to feed bad debug info to GDB that will trigger
> this new assert?  If it's the former, then no problem, if it's the later
> then an assertion is not appropriate.

IMO the former. The code is currently structured so that a dictionary can *only* have symbols of one language inside it. While it may be possible to trigger this with malicious DWARF hacking, I cannot really say for certain that it isn't possible in valid DWARF. [Sorry, that's more of a politician's answer to your question.]

>> diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
>> index 4ad0527406..ba70315957 100644
>> --- a/gdb/dwarf2read.c
>> +++ b/gdb/dwarf2read.c
>> @@ -9701,6 +9701,24 @@ compute_delayed_physnames (struct dwarf2_cu *cu)
>>    cu->method_list.clear ();
>>  }
>>  
>> +/* A wrapper for add_symbol_to_list to ensure that SYMBOL's language is
>> +   the same as all other symbols in LISTHEAD.  If a new symbol is added
>> +   with a different language, this function asserts.  */
>> +
>> +static inline void
>> +dw2_add_symbol_to_list (struct symbol *symbol, struct pending **listhead)
>> +{
>> +  /* Only assert if LISTHEAD already contains symbols of a different
>> +     language (dict_create_hashed/insert_symbol_hashed requires that all
>> +     symbols in this list are of the same language).  */
>> +  gdb_assert ((*listhead) == NULL
>> +	      || (*listhead)->nsyms == 0
>> +	      || (SYMBOL_LANGUAGE ((*listhead)->symbol[0])
>> +		  == SYMBOL_LANGUAGE (symbol)));
> 
> I wonder if it's actually possible to have (*listhead)->nsyms == 0, since as
> soon as a "struct pending" is allocated, there is at least one sym put into
> it (in add_symbol_to_list).

Ha! Yes, that's true. I didn't even look. I just erred on the defensive side. I'll remove that.

> It's probably a bit hard to tell for older debug formats, but is it only for
> DWARF that this condition must hold?  Did you consider putting this assertion
> directly in add_symbol_to_list?  I'm not saying it's the right thing to do, I'm
> just asking.

The assertion holds for any debug reader that uses hashed dictionaries, and that list isn't a straightforward list to compose. That's probably why I originally isolated to dwarf2read.c. Or I just plain didn't think about (interfering with) other debuginfo readers.

Any symbol reader using add_symbol_to_list could create hashed dictionaries, but AFAICT there is no requirement that it does.

Out of curiosity, I moved (and adjusted) the assertion to add_symbol_to_list and ran some tests. As expected, DWARF tests showed no difference. STABS testing, though, triggers the assertion. add_symbol_to_list is often called before any dictionary is created.

Thank you for taking a look,
Keith
  
Tom Tromey July 12, 2018, 5:11 p.m. UTC | #3
>>>>> "Keith" == Keith Seitz <keiths@redhat.com> writes:

Keith> For a more thorough explanation, see the following mailing list
Keith> post:
Keith>   https://sourceware.org/ml/gdb-patches/2018-05/msg00703.html

I think it would be good for this analysis to end up in the commit
message.  I found that analysis compelling.

Otherwise, FWIW, this patch seems correct to me.

Tom
  
Simon Marchi July 12, 2018, 5:12 p.m. UTC | #4
On 2018-07-12 11:47, Keith Seitz wrote:
>> I have a question about the assertion vs complaint.  Is it logically
>> impossible (putting aside other GDB bugs) for this assertion to fail?
>> Or is it possible now to feed bad debug info to GDB that will trigger
>> this new assert?  If it's the former, then no problem, if it's the 
>> later
>> then an assertion is not appropriate.
> 
> IMO the former. The code is currently structured so that a dictionary
> can *only* have symbols of one language inside it. While it may be
> possible to trigger this with malicious DWARF hacking, I cannot really
> say for certain that it isn't possible in valid DWARF. [Sorry, that's
> more of a politician's answer to your question.]

Ok.  I'm asking because ideally, we should react to bad input (including 
DWARF) by providing a useful error message rather than asserting.  I'm 
fine with leaving it like this.  That's not in the scope of your patch, 
and your patch does not make the situation worse (it only points out the 
problem earlier).

>>> diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
>>> index 4ad0527406..ba70315957 100644
>>> --- a/gdb/dwarf2read.c
>>> +++ b/gdb/dwarf2read.c
>>> @@ -9701,6 +9701,24 @@ compute_delayed_physnames (struct dwarf2_cu 
>>> *cu)
>>>    cu->method_list.clear ();
>>>  }
>>> 
>>> +/* A wrapper for add_symbol_to_list to ensure that SYMBOL's language 
>>> is
>>> +   the same as all other symbols in LISTHEAD.  If a new symbol is 
>>> added
>>> +   with a different language, this function asserts.  */
>>> +
>>> +static inline void
>>> +dw2_add_symbol_to_list (struct symbol *symbol, struct pending 
>>> **listhead)
>>> +{
>>> +  /* Only assert if LISTHEAD already contains symbols of a different
>>> +     language (dict_create_hashed/insert_symbol_hashed requires that 
>>> all
>>> +     symbols in this list are of the same language).  */
>>> +  gdb_assert ((*listhead) == NULL
>>> +	      || (*listhead)->nsyms == 0
>>> +	      || (SYMBOL_LANGUAGE ((*listhead)->symbol[0])
>>> +		  == SYMBOL_LANGUAGE (symbol)));
>> 
>> I wonder if it's actually possible to have (*listhead)->nsyms == 0, 
>> since as
>> soon as a "struct pending" is allocated, there is at least one sym put 
>> into
>> it (in add_symbol_to_list).
> 
> Ha! Yes, that's true. I didn't even look. I just erred on the
> defensive side. I'll remove that.
> 
>> It's probably a bit hard to tell for older debug formats, but is it 
>> only for
>> DWARF that this condition must hold?  Did you consider putting this 
>> assertion
>> directly in add_symbol_to_list?  I'm not saying it's the right thing 
>> to do, I'm
>> just asking.
> 
> The assertion holds for any debug reader that uses hashed
> dictionaries, and that list isn't a straightforward list to compose.
> That's probably why I originally isolated to dwarf2read.c. Or I just
> plain didn't think about (interfering with) other debuginfo readers.
> 
> Any symbol reader using add_symbol_to_list could create hashed
> dictionaries, but AFAICT there is no requirement that it does.
> 
> Out of curiosity, I moved (and adjusted) the assertion to
> add_symbol_to_list and ran some tests. As expected, DWARF tests showed
> no difference. STABS testing, though, triggers the assertion.
> add_symbol_to_list is often called before any dictionary is created.

Ok, thanks.  The patch looks good to me, but I'd like if somebody with 
more experience in this area gave a second opinion.

Simon
  
Simon Marchi July 24, 2018, 8:13 p.m. UTC | #5
On 2018-07-12 01:12 PM, Simon Marchi wrote:
> Ok, thanks.  The patch looks good to me, but I'd like if somebody with more experience in this area gave a second opinion.
> 
> Simon

Since we didn't hear from anybody else and don't want to keep you waiting
for ever, please go ahead and push the patch.

Thanks,

Simon
  
Keith Seitz July 24, 2018, 8:25 p.m. UTC | #6
On 07/24/2018 01:13 PM, Simon Marchi wrote:
> On 2018-07-12 01:12 PM, Simon Marchi wrote:
>> Ok, thanks.  The patch looks good to me, but I'd like if somebody with more experience in this area gave a second opinion.
>>
>> Simon
> 
> Since we didn't hear from anybody else and don't want to keep you waiting
> for ever, please go ahead and push the patch.

Done [for both master and gdb-8.2-branch].

Thank you!
Keith
  

Patch

diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
index 4ad0527406..ba70315957 100644
--- a/gdb/dwarf2read.c
+++ b/gdb/dwarf2read.c
@@ -9701,6 +9701,24 @@  compute_delayed_physnames (struct dwarf2_cu *cu)
   cu->method_list.clear ();
 }
 
+/* A wrapper for add_symbol_to_list to ensure that SYMBOL's language is
+   the same as all other symbols in LISTHEAD.  If a new symbol is added
+   with a different language, this function asserts.  */
+
+static inline void
+dw2_add_symbol_to_list (struct symbol *symbol, struct pending **listhead)
+{
+  /* Only assert if LISTHEAD already contains symbols of a different
+     language (dict_create_hashed/insert_symbol_hashed requires that all
+     symbols in this list are of the same language).  */
+  gdb_assert ((*listhead) == NULL
+	      || (*listhead)->nsyms == 0
+	      || (SYMBOL_LANGUAGE ((*listhead)->symbol[0])
+		  == SYMBOL_LANGUAGE (symbol)));
+
+  add_symbol_to_list (symbol, listhead);
+}
+
 /* Go objects should be embedded in a DW_TAG_module DIE,
    and it's not clear if/how imported objects will appear.
    To keep Go support simple until that's worked out,
@@ -9772,7 +9790,7 @@  fixup_go_packaging (struct dwarf2_cu *cu)
       SYMBOL_ACLASS_INDEX (sym) = LOC_TYPEDEF;
       SYMBOL_TYPE (sym) = type;
 
-      add_symbol_to_list (sym, &global_symbols);
+      dw2_add_symbol_to_list (sym, &global_symbols);
 
       xfree (package_name);
     }
@@ -11432,6 +11450,7 @@  read_file_scope (struct die_info *die, struct dwarf2_cu *cu)
   struct die_info *child_die;
   CORE_ADDR baseaddr;
 
+  prepare_one_comp_unit (cu, die, cu->language);
   baseaddr = ANOFFSET (objfile->section_offsets, SECT_OFF_TEXT (objfile));
 
   get_scope_pc_bounds (die, &lowpc, &highpc, cu);
@@ -11444,8 +11463,6 @@  read_file_scope (struct die_info *die, struct dwarf2_cu *cu)
 
   file_and_directory fnd = find_file_and_directory (die, cu);
 
-  prepare_one_comp_unit (cu, die, cu->language);
-
   /* The XLCL doesn't generate DW_LANG_OpenCL because this attribute is not
      standardised yet.  As a workaround for the language detection we fall
      back to the DW_AT_producer string.  */
@@ -21202,7 +21219,7 @@  new_symbol (struct die_info *die, struct type *type, struct dwarf2_cu *cu,
 	  SYMBOL_TYPE (sym) = objfile_type (objfile)->builtin_core_addr;
 	  SYMBOL_DOMAIN (sym) = LABEL_DOMAIN;
 	  SYMBOL_ACLASS_INDEX (sym) = LOC_LABEL;
-	  add_symbol_to_list (sym, cu->list_in_scope);
+	  dw2_add_symbol_to_list (sym, cu->list_in_scope);
 	  break;
 	case DW_TAG_subprogram:
 	  /* SYMBOL_BLOCK_VALUE (sym) will be filled in later by
@@ -21460,7 +21477,7 @@  new_symbol (struct die_info *die, struct type *type, struct dwarf2_cu *cu,
 	case DW_TAG_common_block:
 	  SYMBOL_ACLASS_INDEX (sym) = LOC_COMMON_BLOCK;
 	  SYMBOL_DOMAIN (sym) = COMMON_BLOCK_DOMAIN;
-	  add_symbol_to_list (sym, cu->list_in_scope);
+	  dw2_add_symbol_to_list (sym, cu->list_in_scope);
 	  break;
 	default:
 	  /* Not a tag we recognize.  Hopefully we aren't processing
@@ -21480,7 +21497,7 @@  new_symbol (struct die_info *die, struct type *type, struct dwarf2_cu *cu,
 	}
 
       if (list_to_add != NULL)
-	add_symbol_to_list (sym, list_to_add);
+	dw2_add_symbol_to_list (sym, list_to_add);
 
       /* For the benefit of old versions of GCC, check for anonymous
 	 namespaces based on the demangled name.  */