Dwarf 5: Handle debug_str_offsets and indexed attributes that have base offsets.

  Hi Simon,
Thanks for the review. Please see my replies below.

> -  ULONGEST ranges_base = 0;
> +  gdb::optional<ULONGEST> ranges_base = 0;
***  I am probably missing something, but I don't really understand the logic
of making this field optional and initializing it to 0, or the need to make it
optional at all. If the CU does not have a DW_AT_rnglists_base attribute,
doesn't it implicitly mean that this CU's ranges offset is 0?  So it would seem
fine to just leave it at 0 and keep the arithmetic as before.

Ok, I changed the initial value to 'no value' and thank you, it's better. It
seems a good idea to differentiate between an unexisting value and an existing
value of 0 even if the math stays the same. For example, for str_offsets_base,
there are different code paths depending on whether there is a value (not
depending whether the value is 0). It seems to me all of [addr_base,
ranges_base, str_offsets_base] should have similar semantics. But I agree that
currently there is no behavioral difference in ranges_base case, do you prefer
it ULONGEST?

***  Two spaces after period.
Done.

> +#define DW_STRING_IS_STR_INDEX(attr) ((attr)->string_is_str_index)
***  I don't really see the advantage of those macros.

I agree and I don't like these macros at all. But I wanted to be
consistent. (Maybe I should clear all of them as a separate refactoring cl
sometime)

> +static gdb::optional<ULONGEST>
> +lookup_addr_base (struct dwarf2_cu *cu, struct die_info* comp_unit_die,
> +               bool will_follow)
***  Please provide a comment for this function.
Done.

***   The curly braces scopes above are missing an indentation level.
Done.

> +static gdb::optional<ULONGEST>
> +lookup_ranges_base (struct dwarf2_cu *cu, struct die_info* comp_unit_die)
***  Please provide a comment for this function too.
Done.

> +  struct attribute *attr = dwarf2_attr (comp_unit_die,
> +                                     DW_AT_str_offsets_base, cu);
> +  if (attr)
*** Use: if (attr != nullptr)

Done and sent another patch to replace all "if (attr)" with 
"if (attr != nullptr)" in this file (there were 48 occurences).

> +/* Read CU/TU THIS_CU but do not follow DW_AT_GNU_dwo_name (DW_AT_GNU_dwo_name)
***  You listed DW_AT_GNU_dwo_name twice, I don't think that was the intent.
Done.

>  static void
>  init_cutu_and_read_dies_no_follow (struct dwarf2_per_cu_data *this_cu,
> +                                struct dwarf2_cu *parent_cu,
***  Please document this new parameter. Since "parent" is passed only to get
the base offsets, I think it would be clearer to pass the offsets directly. 
If I look at the signature of the function and see "dwarf2_cu *parent_cu", I
can't really tell what it's for (though a comment might help).  But if I see
"ULONGEST str_offsets_base" and "ULONGEST addr_base", it's more self-explanatory.

I wrote a comment, I hope it's good. I prefer not changing the parameters to
ULONGEST str_offsets_base and ULONGEST addr_base, because I am confused a little
too: there is a case where there is no parent_cu and in that case I believe any
existing value should not be overridden and it is difficult to do that on the
caller site.

*** Remove the curly braces.
DONE

> -           if (sibling_ptr < info_ptr)
> -             complaint (_("DW_AT_sibling points backwards"));
> -           else if (sibling_ptr > reader->buffer_end)
> +           if (sibling_ptr > reader->buffer_end)
*** Improvements like this one look good, but if they are not directly related
to the patch, please submit them in their own patch, it's easier to review and
justify that way.

OK. I need to dig up the exact case how the code failed and I made this change,
until then I am reverting this part.

***  I'm a bit confused with those different cases, could you please add a bit
of comments to explain the logic for each case?  Why is it not possible to
obtain the string value at this point?  If cu->str_offsets_base has no value,
doesn't it mean that the offset is 0?

Maybe it would be more obvious if I saw the actual DWARF being parsed.  Could
you suggest some gcc or clang commands to obtain some DWARF where we have a
DW_AT_dwo_name that is an indirect string?

Reply:
Sure, here's an example.
$  cd /tmp/dw5 && echo "int calculate() { return 4; } int main(int argc, char** argv) { return calculate(); }" >>main.cc
$  clang -gdwarf-5 -gsplit-dwarf main.cc
$  ls
a.out main.cc main.dwo 

$  llvm-dwarfdump --all a.out
a.out:	file format ELF64-x86-64

.debug_abbrev contents:
Abbrev table for offset: 0x00000000
[1] DW_TAG_compile_unit	DW_CHILDREN_no
	DW_AT_stmt_list	DW_FORM_sec_offset
	DW_AT_str_offsets_base	DW_FORM_sec_offset
	DW_AT_comp_dir	DW_FORM_strx1
	DW_AT_GNU_pubnames	DW_FORM_flag_present
	DW_AT_GNU_dwo_name	DW_FORM_strx1
	DW_AT_low_pc	DW_FORM_addrx
	DW_AT_high_pc	DW_FORM_data4
	DW_AT_addr_base	DW_FORM_sec_offset

.debug_info contents:
0x00000000: Compile Unit: length = 0x00000024 version = 0x0005 unit_type = DW_UT_skeleton abbr_offset = 0x0000 addr_size = 0x08 DWO_id = 0xee1d4b42a2f0ca0b (next unit at 0x00000028)

0x00000014: DW_TAG_compile_unit
              DW_AT_stmt_list	(0x00000000)
              DW_AT_str_offsets_base	(0x00000008)
              DW_AT_comp_dir	("/tmp/dw5")
              DW_AT_GNU_pubnames	(true)
              DW_AT_GNU_dwo_name	("main.dwo")
              DW_AT_low_pc	(0x0000000000401110)
              DW_AT_high_pc	(0x0000000000401141)
              DW_AT_addr_base	(0x00000008)
....
.debug_str contents:
0x00000000: "/tmp/dw5"
0x00000009: "main.dwo"
....
.debug_str_offsets contents:
0x00000000: Contribution size = 12, Format = DWARF32, Version = 5
0x00000008: 00000000 "/tmp/dw5"
0x0000000c: 00000009 "main.dwo"

Here is what happens. gdb starts to parse DW_TAG_compile_unit DIE. It comes
to DW_AT_GNU_dwo_name. It is of form DW_FORM_strx1 and it has a value of 1.
The actual value is somewhere in .debug_str section. To find it we need to
process .debug_str_offsets (refer to 1st index somewhere within) and also know
the value of DW_AT_str_offsets_base. However, we are in the middle of parsing
the die and there is no guarantee that we have yet processed
DW_AT_str_offsets_base, it may be parsed later. 

My solution is to temporarily write "1" as the value of the attribute
DW_AT_GNU_dwo_name, and mark it as 'needs reprocessing'. After all the
attributes of the die have been processed, DW_AT_str_offsets_base will hold the
correct value if it exists. Then, we revisit the marked attributes. During
reprocess, we don't need to read the binary file again, because we had already
written the value we need (1) in the first pass. We calculate the correct
address using that value, the contents of .debug_str* sections and
DW_AT_str_offsets_base value.

I am not claiming this is the only or the best solution, but I think it is
intuitive and changes relatively few lines of code.

---

* Process debug_str_offsets section. Handle DW_AT_str_offsets_base attribute and
keep the value in dwarf2_cu.

* Make addr_base and ranges_base fields in dwarf2_cu optional to disambiguate 0
value (absent or present and 0).

* During parsing, there is no guarantee that DW_AT_str_offsets_base and
DW_AT_rnglists_base fields will be processed before the attributes that need
those values for correct computation. So make two passes, on the first one mark
the attributes that depend on *_base attributes and process only the others.
On the second pass, only process the attributes that are marked on the first
pass.

* For string attributes, differentiate between addresses that directly point to
a string and those that point to an offset in debug_str_offsets section.

* There are now two attributes, DW_AT_addr_base and DW_AT_GNU_addr_base to read
address offset base. Likewise, there are two attributes, DW_AT_rnglists_base
and DW_AT_GNU_ranges_base to read ranges base. Since there is no guarantee which
ones the compiler will generate, create helper functions to handle all cases.

Tested with CC=/usr/bin/gcc (version 8.3.0) against master branch (also with
-gsplit-dwarf and -gdwarf-4 flags) and there was no increase in the set of
tests that fails. (gdb still cannot debug a 'hello world' program with DWARF 5,
so for the time being, this is all we care about).

This is part of an effort to support DWARF-5 in gdb.

gdb/ChangeLog

	* dwarf2read.c (dwarf2_debug_sections): Add debug_str_offsets sections
	(dwarf2_cu): Add str_offsets_base field. Change the type of addr_base
	and ranges_base to gdb::optional. Update comments.
	(dwo_file): Update comments.
	(attribute): Add string_is_str_index field.
	(DW_STRING_IS_STR_INDEX): New macro (to comply with local standard).
	(read_attribute): Update API to take an additional out parameter,
	need_reprocess. This is used to mark attributes that need other
	attributes (e.g. str_offsets_base) for correct computation which may not
	have been read yet.
	(read_addr_index): New function.
	(read_dwo_str_index): Likewise.
	(read_stub_str_index): Likewise.
	(get_comp_dir_attr): Likewise.
	(get_stub_string_attr): Likewise.
	(dwarf2_per_objfile::locate_sections): Handle debug_str_offsets section.
	(lookup_addr_base): New function.
	(lookup_ranges_base): Likewise.
	(read_cutu_die_from_dwo): Use the new functions: lookup_addr_base,
	lookup_ranges_base, DW_STRING_IS_STR_INDEX, get_comp_dir_attr.
	(init_cutu_and_read_dies): Initialize str_offsets_base fields.
	(init_cutu_and_read_dies_no_follow): Change API to take parent compile
	unit. This is used to inherit parent's str_offsets_base and addr_base.
	Update comments.
	(init_cutu_and_read_dies_simple): Reflect API changes.
	(skip_one_die): Likewise. Support DW_FORM_rnglistx.
	(create_cus_hash_table): Reflect API changes.
	(open_and_init_dwo_file): Likewise.
	(dwarf2_get_pc_bounds): Likewise.
	(dwarf2_record_block_ranges): Likewise.
	(read_full_die_1): Change implementation to reprocess attributes that
	need str_offsets_base and addr_base.
	(partial_die_info::read): Change implementation to reprocess attributes
	that need str_offsets_base and addr_base. Update Api change of
	addr_base field.
	(read_attribute_reprocess): New method.
	(read_attribute_value): Change API to take an additional out parameter,
	need_reprocess. Initialize string_is_str_index field. No longer mark an
	error when a non-dwo compile unit has index based attributes.
	(read_attribute): Reflect API changes.
	(read_addr_index_1): Reflect API changes. Update comments.
	(dwarf2_read_addr_index_data): Reflect API changes.
	(read_str_index): Change API and implementation. This becomes a helper
	to be used by the new string index related methods.
	* dwarf2read.h (dwarf2_per_objfile): Add str_offsets field.
	* symfile.h (dwarf2_debug_sections): Likewise.
	* xcoffread.c (dwarf2_xcoff_names): Likewise.
---
 gdb/dwarf2read.c | 428 ++++++++++++++++++++++++++++++++++++-----------
 gdb/dwarf2read.h |   1 +
 gdb/symfile.h    |   1 +
 gdb/xcoffread.c  |   1 +
 4 files changed, 330 insertions(+), 101 deletions(-)

Dwarf 5: Handle debug_str_offsets and indexed attributes that have base offsets.

Commit Message

Comments

Patch