[v2,00/21] Refactor (k)symtab reader

Message ID 20200703164651.1510825-1-maennich@google.com
Headers
Series Refactor (k)symtab reader |

Message

Matthias Männich July 3, 2020, 4:46 p.m. UTC
  The current implementation that reads the symtab and the ksymtab has grown
over time from simple symtab reading to way more complex ksymtab reading
including taking care of little details like position relative relocations,
symbol namespaces, etc. Yet, more features are coming to the Linux kernels that
make this parsing even more tricky: Further changes to the ksymtab layout and
different needs to lookup symbols caused by features like LTO (causing RELA
relocations in the ksymtab entries) and CFI (causing additional jump table
symbols) that are highly confusing the meaning of ksymtab entries and make it
increasingly challenging for a static analysis tool like libabigail to properly
process the ksymtab values.

This added complexity also adds more and more responsibilities to the
read_context that already has a lot of different tasks to juggle. It gets
increasingly difficult to ensure, further development in the dwarf reader can
be done without subtly regressing existing functionality.

Hence, attempt a refactoring (one could argue: rewrite, but a lot of
functionality is just migrated out) of the symtab reading code.

The first 2 commits set up some prerequisites, like a partial backport of
std::optional and enabling std::bind and friends.

Commit 3 and 4 modify abg-ir's elf_symbol to be able to carry 'is_suppressed'
and 'is_in_ksymtab'.

Commit 5 and 6 implement the new symtab reader.

The abg-symtab-reader has been introduced as an instance decoupled from dwarf
readers' read_context. This reduces the responsibilities of the dwarf reader
and separates the functionality into a new compilation unit. It contains
several components to make the main component 'symtab' easy to access and to
query. Refer to the extensive commit message there for details. The actual
core of the symtab reading has been taken as a base, but refactored where
useful. The ksymtab reading could be simplified by processing the corresponding
__ksymtab_* entries directly from symtab without the need to interpret the
binary ksymtab sections. That also resolves issues with wrong ksymtab reading:
Mapping from the ksymtab symbol address to the symtab entry might leave us with
a non-main symbol and hence leads to incorrect results. E.g. symbols like
strlen are implemented as __pi_strlen and are aliases to strlen in the kernel.
Only by reading the ksymtab entries we can decide which symbol to keep.
Otherwise we get indeterministic results. Furthermore, symbol whitelists might
list one or the other leading to issues of suppressed symbols for which we
might just see the wrong symbol and therefore suppress both from analysis.
In addition, detecting the format of the ksymtab, requires the first entry to
be a valid elf_symbol, which is not the case if filtered out via whitelist or
suppression. Finally, features like CFI require name based lookup into the
ksymtab and LTO with clang on aarch64 might make the ksymtab contain
relocatable entries. This is additional complexity hitting the dwarf reader.
Those are subtle issues that motivated this series.

Conceptionally, the new reader works quite similar. Except for the way
suppressions are applied: Instead of discarding symbols while reading, we flag
symbols as suppressed and keep them around for lookup purposes. That resolves
issues when dealing with symbol aliases.

Commit 7 integrates the new symtab reader into the existing code - side by side
with the current implementation.

Commits 8 - 12 migrate more and more symtab users over to the new symtab
reader, including the ksymtab functionality in commit 12 where the old
implementation could be obsoleted.

Commits 13 and 14 re-add the ppc64 support for ELFv1 binaries.

Commits 15 and 16 remove now obsolete functionality and remove the now old
implementation.

Commit 17 adds additional tests for whitelisted symbols.

Commit 18 and 19 address some flaws with aliasing in combination with
suppressions (e.g. triggered from whitelists). Those are issues I was seeing in
the current implementation as well, but could solve a bit easier with the new
symtab reader.

Commit 20 adds support for MODVERSIONS in the Linux kernel (CRC values).

Commit 21 addresses a bug that leads to an assertion hit when reading XML with
whitelisting that hides an aliased symbol.

Performance testing has been done with an 'allmodconfig' kernel config. That is
the worst case for kernels and representing the 'distribution kernel' use case.
During those tests, no significant performance impact could be measured.

In addition, various Android Kernels in various configurations have been tested
with this. The earlier added tests for reading symtab and ksymtab oviously
pass.

v1 -> v2:
  - Commit 16 now also drops the obsolete ksymtab enums.
  - Commit 17+ are new (alias improvements, modversions)
  - rebased on top of current master

Cheers,
Matthias

Matthias Maennich (21):
  abg-cxx-compat: add simplified version of std::optional
  abg-cxx-compat: more <functional> support: std::bind and friends
  abg-ir: elf_symbol: add is_in_ksymtab field
  abg-ir: elf_symbol: add is_suppressed field
  dwarf-reader split: create abg-symtab-reader.{h,cc} and test case
  Refactor ELF symbol table reading by adding a new symtab reader
  Integrate new symtab reader into corpus and read_context
  corpus: make get_(undefined_)?_(var|fun)_symbols use the new symtab
  corpus: make get_unreferenced_(function|variable)_symbols use the new
    symtab
  abg-reader: avoid using the (var|function)_symbol_map
  dwarf-reader: read_context: use new symtab in *_symbols_is_exported
  Switch kernel stuff over to new symtab and drop unused code
  abg-elf-helpers: migrate ppc64 specific helpers
  symtab_reader: add support for ppc64 ELFv1 binaries
  abg-corpus: remove symbol maps and their setters
  dwarf reader: drop now-unused code related to symbol table reading
  test-symtab: add tests for whitelisted functions
  symtab/dwarf-reader: allow hinting of main symbols for aliases
  dwarf-reader/writer: consider aliases when dealing with suppressions
  symtab: Add support for MODVERSIONS (CRC checksums)
  reader/symtab: Improve handling for suppressed aliases

 include/Makefile.am                           |    3 +-
 include/abg-corpus.h                          |   24 +-
 include/abg-cxx-compat.h                      |  100 +
 include/abg-dwarf-reader.h                    |    6 -
 include/abg-fwd.h                             |    8 +
 include/abg-ir.h                              |   53 +-
 include/abg-symtab-reader.h                   |  429 +++
 src/Makefile.am                               |    1 +
 src/abg-corpus-priv.h                         |   57 +-
 src/abg-corpus.cc                             |  645 ++---
 src/abg-dwarf-reader.cc                       | 2437 ++---------------
 src/abg-elf-helpers.cc                        |  186 ++
 src/abg-elf-helpers.h                         |    8 +
 src/abg-ir.cc                                 |  194 +-
 src/abg-reader.cc                             |   58 +-
 src/abg-reporter-priv.cc                      |   18 +-
 src/abg-symtab-reader.cc                      |  494 ++++
 src/abg-tools-utils.cc                        |   13 -
 src/abg-writer.cc                             |   55 +-
 tests/Makefile.am                             |    4 +
 tests/data/Makefile.am                        |   31 +-
 .../test-missing-alias-report.txt             |    0
 .../test-abidiff-exit/test-missing-alias.abi  |   12 +
 .../test-missing-alias.suppr                  |    4 +
 ...ibtirpc.so.report.txt => empty-report.txt} |    0
 .../test-abidiff/test-PR24552-report0.txt     |    3 -
 tests/data/test-abidiff/test-crc-0.xml        | 1601 +++++++++++
 tests/data/test-abidiff/test-crc-1.xml        | 1601 +++++++++++
 tests/data/test-abidiff/test-crc-2.xml        | 1601 +++++++++++
 tests/data/test-abidiff/test-crc-report.txt   |    9 +
 .../test-abidiff/test-empty-corpus-report.txt |    3 -
 .../data/test-annotate/test15-pr18892.so.abi  |  918 +++----
 ...19-pr19023-libtcmalloc_and_profiler.so.abi |   60 +-
 tests/data/test-annotate/test2.so.abi         |   12 +-
 tests/data/test-annotate/test3.so.abi         |    6 +-
 tests/data/test-diff-dwarf/test12-report.txt  |    7 +
 .../test42-PR21296-clanggcc-report0.txt       |    4 +-
 ...bb-4.3-3.20141204.fc23.x86_64-report-0.txt |    6 +-
 ...bb-4.3-3.20141204.fc23.x86_64-report-1.txt |    6 +-
 .../test23-alias-filter-4.suppr               |    4 +-
 .../test23-alias-filter-report-0.txt          |    4 +-
 .../test23-alias-filter-report-2.txt          |    4 +-
 .../PR22015-libboost_iostreams.so.abi         |   48 +-
 .../test-read-dwarf/PR22122-libftdc.so.abi    |    6 +-
 .../data/test-read-dwarf/PR25007-sdhci.ko.abi |   77 +-
 .../test-read-dwarf/test10-pr18818-gcc.so.abi |  192 +-
 .../test-read-dwarf/test11-pr18828.so.abi     |  516 ++--
 .../test-read-dwarf/test12-pr18844.so.abi     |   60 +-
 .../test-read-dwarf/test15-pr18892.so.abi     |  918 +++----
 .../test-read-dwarf/test16-pr18904.so.abi     |  964 +++----
 ...19-pr19023-libtcmalloc_and_profiler.so.abi |   60 +-
 tests/data/test-read-dwarf/test2.so.abi       |   12 +-
 tests/data/test-read-dwarf/test2.so.hash.abi  |   12 +-
 .../test22-pr19097-libstdc++.so.6.0.17.so.abi |  878 +++---
 .../test-read-dwarf/test3-alias-1.so.hash.abi |   14 +
 .../data/test-read-dwarf/test3-alias-1.suppr  |    3 +
 .../test-read-dwarf/test3-alias-2.so.hash.abi |   18 +
 .../data/test-read-dwarf/test3-alias-2.suppr  |    3 +
 .../test-read-dwarf/test3-alias-3.so.hash.abi |   14 +
 .../data/test-read-dwarf/test3-alias-3.suppr  |    3 +
 .../test-read-dwarf/test3-alias-4.so.hash.abi |    8 +
 .../data/test-read-dwarf/test3-alias-4.suppr  |    3 +
 tests/data/test-read-dwarf/test3.so.abi       |    6 +-
 tests/data/test-read-dwarf/test3.so.hash.abi  |    6 +-
 tests/data/test-read-write/test-crc.xml       |   10 +
 tests/data/test-symtab/basic/aliases.c        |   13 +
 tests/data/test-symtab/basic/aliases.so       |  Bin 0 -> 17176 bytes
 tests/data/test-symtab/basic/no_debug_info.c  |    2 +-
 tests/data/test-symtab/basic/no_debug_info.so |  Bin 15360 -> 15544 bytes
 .../one_function_one_variable_all.whitelist   |    3 +
 ...e_function_one_variable_function.whitelist |    2 +
 ...function_one_variable_irrelevant.whitelist |    2 +
 ...e_function_one_variable_variable.whitelist |    2 +
 .../test-symtab/kernel-modversions/Makefile   |   19 +
 .../kernel-modversions/one_of_each.c          |    1 +
 .../kernel-modversions/one_of_each.ko         |  Bin 0 -> 131760 bytes
 tests/test-abidiff-exit.cc                    |    9 +
 tests/test-abidiff.cc                         |   34 +-
 tests/test-cxx-compat.cc                      |   51 +
 tests/test-read-dwarf.cc                      |   32 +
 tests/test-read-write.cc                      |    6 +
 tests/test-symtab-reader.cc                   |   53 +
 tests/test-symtab.cc                          |  195 +-
 tools/abidw.cc                                |    2 -
 84 files changed, 9724 insertions(+), 5222 deletions(-)
 create mode 100644 include/abg-symtab-reader.h
 create mode 100644 src/abg-symtab-reader.cc
 create mode 100644 tests/data/test-abidiff-exit/test-missing-alias-report.txt
 create mode 100644 tests/data/test-abidiff-exit/test-missing-alias.abi
 create mode 100644 tests/data/test-abidiff-exit/test-missing-alias.suppr
 rename tests/data/test-abidiff/{test-PR18166-libtirpc.so.report.txt => empty-report.txt} (100%)
 delete mode 100644 tests/data/test-abidiff/test-PR24552-report0.txt
 create mode 100644 tests/data/test-abidiff/test-crc-0.xml
 create mode 100644 tests/data/test-abidiff/test-crc-1.xml
 create mode 100644 tests/data/test-abidiff/test-crc-2.xml
 create mode 100644 tests/data/test-abidiff/test-crc-report.txt
 delete mode 100644 tests/data/test-abidiff/test-empty-corpus-report.txt
 create mode 100644 tests/data/test-read-dwarf/test3-alias-1.so.hash.abi
 create mode 100644 tests/data/test-read-dwarf/test3-alias-1.suppr
 create mode 100644 tests/data/test-read-dwarf/test3-alias-2.so.hash.abi
 create mode 100644 tests/data/test-read-dwarf/test3-alias-2.suppr
 create mode 100644 tests/data/test-read-dwarf/test3-alias-3.so.hash.abi
 create mode 100644 tests/data/test-read-dwarf/test3-alias-3.suppr
 create mode 100644 tests/data/test-read-dwarf/test3-alias-4.so.hash.abi
 create mode 100644 tests/data/test-read-dwarf/test3-alias-4.suppr
 create mode 100644 tests/data/test-read-write/test-crc.xml
 create mode 100644 tests/data/test-symtab/basic/aliases.c
 create mode 100755 tests/data/test-symtab/basic/aliases.so
 create mode 100644 tests/data/test-symtab/basic/one_function_one_variable_all.whitelist
 create mode 100644 tests/data/test-symtab/basic/one_function_one_variable_function.whitelist
 create mode 100644 tests/data/test-symtab/basic/one_function_one_variable_irrelevant.whitelist
 create mode 100644 tests/data/test-symtab/basic/one_function_one_variable_variable.whitelist
 create mode 100644 tests/data/test-symtab/kernel-modversions/Makefile
 create mode 120000 tests/data/test-symtab/kernel-modversions/one_of_each.c
 create mode 100644 tests/data/test-symtab/kernel-modversions/one_of_each.ko
 create mode 100644 tests/test-symtab-reader.cc
  

Comments

Dodji Seketeli July 20, 2020, 2:27 p.m. UTC | #1
Hello,

Matthias Maennich <maennich@google.com> a écrit:

> The current implementation that reads the symtab and the ksymtab has grown
> over time from simple symtab reading to way more complex ksymtab reading
> including taking care of little details like position relative relocations,
> symbol namespaces, etc. Yet, more features are coming to the Linux kernels that
> make this parsing even more tricky: Further changes to the ksymtab layout and
> different needs to lookup symbols caused by features like LTO (causing RELA
> relocations in the ksymtab entries) and CFI (causing additional jump table
> symbols) that are highly confusing the meaning of ksymtab entries and make it
> increasingly challenging for a static analysis tool like libabigail to properly
> process the ksymtab values.
>
> This added complexity also adds more and more responsibilities to the
> read_context that already has a lot of different tasks to juggle. It gets
> increasingly difficult to ensure, further development in the dwarf reader can
> be done without subtly regressing existing functionality.

Agreed.

From my point of view, even independently from the kernel symbol table
reading requirement, I think we need to separate the ELF
reading/handling and the DWARF handling to make the code base more
future-proof and capable of handling things that we may face in the
future like supporting of other debuginfo formats or, who knows, other
binary formats.

I thus welcome changes that move us into that direction.  Thank you for
looking into this.

This is a big code drop so if you don't mind, I'll be reviewing this in
"rounds", going over a subset of patches multiple times, discussing
different aspects at once.

[...]

Cheers,