Message ID | 20200318121241.146259-1-gprocida@google.com |
---|---|
State | Committed |
Headers |
From: gprocida@google.com (Giuliano Procida) Date: Wed, 18 Mar 2020 12:12:41 +0000 Subject: [PATCH v2] dwarf-reader: Use all bits of Bloom filter words. In-Reply-To: <20200318113754.GI211970@google.com> References: <20200318113754.GI211970@google.com> Message-ID: <20200318121241.146259-1-gprocida@google.com> |
Series |
[v2] dwarf-reader: Use all bits of Bloom filter words.
|
|
Commit Message
Giuliano Procida
March 18, 2020, 12:12 p.m. UTC
Most of the bit values used for GNU hash ELF section Bloom filtering
were being ignored due to integer narrowing, reducing missing symbol
filtering efficiency considerably.
This patch fixes this.
* src/abg-dwarf-reader.cc (lookup_symbol_from_gnu_hash_tab):
Don't narrow calculated Bloom word to 8 bits before using it
to mask the fetched Bloom word.
Note on testing.
The .gnu.hash section seems to be present in all the .so but none of
the .o test files. abisym doesn't appear to find dynamic symbols (nm
-D), only normal ones, so it was a little tricky to test this.
I found a Debian .so (libpthread) with both the .gnu.hash section and
normal symbols. abisym behaves identically with my change, looking up
lots of present and non-present (as far as it's concerned) symbols. I
just extracted a full list with nm/sed and looked up each one.
389 symbols looked up, 241 present, 148 absent
8-bit filter: 336 maybe, 53 no (53/148 filtering efficiency)
64-bit filter: 255 maybe, 134 no (134/148 filtering efficiency)
Signed-off-by: Giuliano Procida <gprocida@google.com>
---
src/abg-dwarf-reader.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
Hello Giuliano, Giuliano Procida <gprocida@google.com> a ?crit: [...] > diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc > index 3454fcf5..5556bde5 100644 > --- a/src/abg-dwarf-reader.cc > +++ b/src/abg-dwarf-reader.cc > @@ -2025,7 +2025,7 @@ lookup_symbol_from_gnu_hash_tab(const environment* env, > // filter, in bits. > int c = get_elf_class_size_in_bytes(elf_handle) * 8; > int n = (h1 / c) % ht.bf_nwords; > - unsigned char bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c)); > + GElf_Word bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c)); Good catch! I wonder what made me chose a byte-wide bitmask to begin with. The docmentation of the bloom filter of the GNU Hash table at https://blogs.oracle.com/solaris/gnu-hash-elf-sections-v2 says: "The filter consists of maskwords words, each of which is 32-bits (ELFCLASS32) or 64-bits (ELFCLASS64) depending on the class of object" So it's clear, the bitmask must be of type GElf_Word. Have you seen any practical speed increase in the loading of ELF binaries due to this change? [...] > Most of the bit values used for GNU hash ELF section Bloom filtering > were being ignored due to integer narrowing, reducing missing symbol > filtering efficiency considerably. > > This patch fixes this. > > * src/abg-dwarf-reader.cc (lookup_symbol_from_gnu_hash_tab): > Don't narrow calculated Bloom word to 8 bits before using it > to mask the fetched Bloom word. I have amended the patch to have the ChangeLog section above come last in the commit log message. It really needs to come last because we have a script that build a ChangeLog file from all the change log parts of the commit log and that one is shipped in the released tarball, making it compliant with the GNU tarball standards. That way, users who consume the tarball can access some history without needing to have (internet) access to the full Git history. [...] > Note on testing. > > The .gnu.hash section seems to be present in all the .so but none of > the .o test files. abisym doesn't appear to find dynamic symbols (nm > -D), only normal ones, so it was a little tricky to test this. > > I found a Debian .so (libpthread) with both the .gnu.hash section and > normal symbols. abisym behaves identically with my change, looking up > lots of present and non-present (as far as it's concerned) symbols. I > just extracted a full list with nm/sed and looked up each one. > > 389 symbols looked up, 241 present, 148 absent > 8-bit filter: 336 maybe, 53 no (53/148 filtering efficiency) > 64-bit filter: 255 maybe, 134 no (134/148 filtering efficiency) Thanks. So I am applying this to master. Below is the actual patch that got applied. Cheers, -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-dwarf-reader-Use-all-bits-of-Bloom-filter-words.patch Type: text/x-patch Size: 2062 bytes Desc: applied patch URL: <http://sourceware.org/pipermail/libabigail/attachments/20200319/6ab05151/attachment-0001.bin> -------------- next part --------------
Hi Dodji. On Thu, 19 Mar 2020 at 10:21, Dodji Seketeli <dodji@seketeli.org> wrote: > > Hello Giuliano, > > Giuliano Procida <gprocida@google.com> a ?crit: > > [...] > > > diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc > > index 3454fcf5..5556bde5 100644 > > --- a/src/abg-dwarf-reader.cc > > +++ b/src/abg-dwarf-reader.cc > > @@ -2025,7 +2025,7 @@ lookup_symbol_from_gnu_hash_tab(const environment* env, > > // filter, in bits. > > int c = get_elf_class_size_in_bytes(elf_handle) * 8; > > int n = (h1 / c) % ht.bf_nwords; > > - unsigned char bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c)); > > + GElf_Word bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c)); > > Good catch! I wonder what made me chose a byte-wide bitmask to begin > with. The docmentation of the bloom filter of the GNU Hash table at > https://blogs.oracle.com/solaris/gnu-hash-elf-sections-v2 says: > > "The filter consists of maskwords words, each of which is 32-bits > (ELFCLASS32) or 64-bits (ELFCLASS64) depending on the class of object" > > So it's clear, the bitmask must be of type GElf_Word. > > Have you seen any practical speed increase in the loading of ELF > binaries due to this change? I spotted the bug when reviewing maennich's fix for undefined behaviour (integer overflow). I'm afraid I didn't go as far as testing on a real workload as I don't think we have one that exercises this code. It might help someone else though. The whole of make check (excluding fedabipkgdiff tests) only made 3 calls to the function. I think the efficiency was 100% both before and after (2/2 absent symbols excluded by the Bloom filter). :-) > [...] > > > Most of the bit values used for GNU hash ELF section Bloom filtering > > were being ignored due to integer narrowing, reducing missing symbol > > filtering efficiency considerably. > > > > This patch fixes this. > > > > * src/abg-dwarf-reader.cc (lookup_symbol_from_gnu_hash_tab): > > Don't narrow calculated Bloom word to 8 bits before using it > > to mask the fetched Bloom word. > > I have amended the patch to have the ChangeLog section above come > last in the commit log message. > > It really needs to come last because we have a script that build a > ChangeLog file from all the change log parts of the commit log and that > one is shipped in the released tarball, making it compliant with the GNU > tarball standards. That way, users who consume the tarball can access > some history without needing to have (internet) access to the full Git > history. I was happy for the note on testing to be excluded from the commit message. Would it be better to just reply to the patch to add this sort of context, or would you like to see these notes in the commit messages? Regards, Giuliano. > [...] > > > Note on testing. > > > > The .gnu.hash section seems to be present in all the .so but none of > > the .o test files. abisym doesn't appear to find dynamic symbols (nm > > -D), only normal ones, so it was a little tricky to test this. > > > > I found a Debian .so (libpthread) with both the .gnu.hash section and > > normal symbols. abisym behaves identically with my change, looking up > > lots of present and non-present (as far as it's concerned) symbols. I > > just extracted a full list with nm/sed and looked up each one. > > > > 389 symbols looked up, 241 present, 148 absent > > 8-bit filter: 336 maybe, 53 no (53/148 filtering efficiency) > > 64-bit filter: 255 maybe, 134 no (134/148 filtering efficiency) > > Thanks. > > So I am applying this to master. > > Below is the actual patch that got applied. > > Cheers, > > > -- > Dodji
On Thu, Mar 19, 2020 at 10:46:12AM +0000, Giuliano Procida wrote: >Hi Dodji. > >On Thu, 19 Mar 2020 at 10:21, Dodji Seketeli <dodji@seketeli.org> wrote: >> >> Hello Giuliano, >> >> Giuliano Procida <gprocida@google.com> a ?crit: >> >> [...] >> >> > diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc >> > index 3454fcf5..5556bde5 100644 >> > --- a/src/abg-dwarf-reader.cc >> > +++ b/src/abg-dwarf-reader.cc >> > @@ -2025,7 +2025,7 @@ lookup_symbol_from_gnu_hash_tab(const environment* env, >> > // filter, in bits. >> > int c = get_elf_class_size_in_bytes(elf_handle) * 8; >> > int n = (h1 / c) % ht.bf_nwords; >> > - unsigned char bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c)); >> > + GElf_Word bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c)); >> >> Good catch! I wonder what made me chose a byte-wide bitmask to begin >> with. The docmentation of the bloom filter of the GNU Hash table at >> https://blogs.oracle.com/solaris/gnu-hash-elf-sections-v2 says: >> >> "The filter consists of maskwords words, each of which is 32-bits >> (ELFCLASS32) or 64-bits (ELFCLASS64) depending on the class of object" >> >> So it's clear, the bitmask must be of type GElf_Word. >> >> Have you seen any practical speed increase in the loading of ELF >> binaries due to this change? > >I spotted the bug when reviewing maennich's fix for undefined >behaviour (integer overflow). I'm afraid I didn't go as far as testing >on a real workload as I don't think we have one that exercises this >code. It might help someone else though. > >The whole of make check (excluding fedabipkgdiff tests) only made 3 >calls to the function. I think the efficiency was 100% both before and >after (2/2 absent symbols excluded by the Bloom filter). :-) > >> [...] >> >> > Most of the bit values used for GNU hash ELF section Bloom filtering >> > were being ignored due to integer narrowing, reducing missing symbol >> > filtering efficiency considerably. >> > >> > This patch fixes this. >> > >> > * src/abg-dwarf-reader.cc (lookup_symbol_from_gnu_hash_tab): >> > Don't narrow calculated Bloom word to 8 bits before using it >> > to mask the fetched Bloom word. >> >> I have amended the patch to have the ChangeLog section above come >> last in the commit log message. >> >> It really needs to come last because we have a script that build a >> ChangeLog file from all the change log parts of the commit log and that >> one is shipped in the released tarball, making it compliant with the GNU >> tarball standards. That way, users who consume the tarball can access >> some history without needing to have (internet) access to the full Git >> history. > >I was happy for the note on testing to be excluded from the commit >message. Would it be better to just reply to the patch to add this >sort of context, or would you like to see these notes in the commit >messages? I would prefer those information in the commit message. But note, everything below the '---' line is omitted from the commit message and is therefore a good place to put additional information not to be persisted in the final commit. Cheers, Matthias > >Regards, >Giuliano. > >> [...] >> >> > Note on testing. >> > >> > The .gnu.hash section seems to be present in all the .so but none of >> > the .o test files. abisym doesn't appear to find dynamic symbols (nm >> > -D), only normal ones, so it was a little tricky to test this. >> > >> > I found a Debian .so (libpthread) with both the .gnu.hash section and >> > normal symbols. abisym behaves identically with my change, looking up >> > lots of present and non-present (as far as it's concerned) symbols. I >> > just extracted a full list with nm/sed and looked up each one. >> > >> > 389 symbols looked up, 241 present, 148 absent >> > 8-bit filter: 336 maybe, 53 no (53/148 filtering efficiency) >> > 64-bit filter: 255 maybe, 134 no (134/148 filtering efficiency) >> >> Thanks. >> >> So I am applying this to master. >> >> Below is the actual patch that got applied. >> >> Cheers, >> >> >> -- >> Dodji
Matthias M?nnich <maennich@google.com> a ?crit: >> >>I was happy for the note on testing to be excluded from the commit >>message. Would it be better to just reply to the patch to add this >>sort of context, or would you like to see these notes in the commit >>messages? > > I would prefer those information in the commit message. But note, > everything below the '---' line is omitted from the commit message and > is therefore a good place to put additional information not to be > persisted in the final commit. I didn't exclude the testing note from the commit message. I just put the ChangeLog part at the end. Cheers,
diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc index 3454fcf5..5556bde5 100644 --- a/src/abg-dwarf-reader.cc +++ b/src/abg-dwarf-reader.cc @@ -2025,7 +2025,7 @@ lookup_symbol_from_gnu_hash_tab(const environment* env, // filter, in bits. int c = get_elf_class_size_in_bytes(elf_handle) * 8; int n = (h1 / c) % ht.bf_nwords; - unsigned char bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c)); + GElf_Word bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c)); // Test if the symbol is *NOT* present in this ELF file. if ((bloom_word_at(elf_handle, ht.bloom_filter, n) & bitmask) != bitmask)