[v2] dwarf-reader: Use all bits of Bloom filter words.

Message ID 20200318121241.146259-1-gprocida@google.com
State Committed
Headers
Series [v2] dwarf-reader: Use all bits of Bloom filter words. |

Commit Message

Giuliano Procida March 18, 2020, 12:12 p.m. UTC
  Most of the bit values used for GNU hash ELF section Bloom filtering
were being ignored due to integer narrowing, reducing missing symbol
filtering efficiency considerably.

This patch fixes this.

	* src/abg-dwarf-reader.cc (lookup_symbol_from_gnu_hash_tab):
	Don't narrow calculated Bloom word to 8 bits before using it
	to mask the fetched Bloom word.

Note on testing.

The .gnu.hash section seems to be present in all the .so but none of
the .o test files. abisym doesn't appear to find dynamic symbols (nm
-D), only normal ones, so it was a little tricky to test this.

I found a Debian .so (libpthread) with both the .gnu.hash section and
normal symbols. abisym behaves identically with my change, looking up
lots of present and non-present (as far as it's concerned) symbols. I
just extracted a full list with nm/sed and looked up each one.

389 symbols looked up, 241 present, 148 absent
8-bit filter: 336 maybe, 53 no (53/148 filtering efficiency)
64-bit filter: 255 maybe, 134 no (134/148 filtering efficiency)

Signed-off-by: Giuliano Procida <gprocida@google.com>
---
 src/abg-dwarf-reader.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Dodji Seketeli March 19, 2020, 10:20 a.m. UTC | #1
Hello Giuliano,

Giuliano Procida <gprocida@google.com> a ?crit:

[...]

> diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
> index 3454fcf5..5556bde5 100644
> --- a/src/abg-dwarf-reader.cc
> +++ b/src/abg-dwarf-reader.cc
> @@ -2025,7 +2025,7 @@ lookup_symbol_from_gnu_hash_tab(const environment*		env,
>    // filter, in bits.
>    int c = get_elf_class_size_in_bytes(elf_handle) * 8;
>    int n =  (h1 / c) % ht.bf_nwords;
> -  unsigned char bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c));
> +  GElf_Word bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c));

Good catch!  I wonder what made me chose a byte-wide bitmask to begin
with.  The docmentation of the bloom filter of the GNU Hash table at
https://blogs.oracle.com/solaris/gnu-hash-elf-sections-v2 says:

    "The filter consists of maskwords words, each of which is 32-bits
    (ELFCLASS32) or 64-bits (ELFCLASS64) depending on the class of object"

So it's clear, the bitmask must be of type GElf_Word.

Have you seen any practical speed increase in the loading of ELF
binaries due to this change?

[...]

> Most of the bit values used for GNU hash ELF section Bloom filtering
> were being ignored due to integer narrowing, reducing missing symbol
> filtering efficiency considerably.
>
> This patch fixes this.
>
> 	* src/abg-dwarf-reader.cc (lookup_symbol_from_gnu_hash_tab):
> 	Don't narrow calculated Bloom word to 8 bits before using it
> 	to mask the fetched Bloom word.

I have amended the patch to have the ChangeLog section above come
last in the commit log message.

It really needs to come last because we have a script that build a
ChangeLog file from all the change log parts of the commit log and that
one is shipped in the released tarball, making it compliant with the GNU
tarball standards.  That way, users who consume the tarball can access
some history without needing to have (internet) access to the full Git
history.

[...]

> Note on testing.
>
> The .gnu.hash section seems to be present in all the .so but none of
> the .o test files. abisym doesn't appear to find dynamic symbols (nm
> -D), only normal ones, so it was a little tricky to test this.
>
> I found a Debian .so (libpthread) with both the .gnu.hash section and
> normal symbols. abisym behaves identically with my change, looking up
> lots of present and non-present (as far as it's concerned) symbols. I
> just extracted a full list with nm/sed and looked up each one.
>
> 389 symbols looked up, 241 present, 148 absent
> 8-bit filter: 336 maybe, 53 no (53/148 filtering efficiency)
> 64-bit filter: 255 maybe, 134 no (134/148 filtering efficiency)

Thanks.

So I am applying this to master.

Below is the actual patch that got applied.

Cheers,

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-dwarf-reader-Use-all-bits-of-Bloom-filter-words.patch
Type: text/x-patch
Size: 2062 bytes
Desc: applied patch
URL: <http://sourceware.org/pipermail/libabigail/attachments/20200319/6ab05151/attachment-0001.bin>
-------------- next part --------------
  
Giuliano Procida March 19, 2020, 10:46 a.m. UTC | #2
Hi Dodji.

On Thu, 19 Mar 2020 at 10:21, Dodji Seketeli <dodji@seketeli.org> wrote:
>
> Hello Giuliano,
>
> Giuliano Procida <gprocida@google.com> a ?crit:
>
> [...]
>
> > diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
> > index 3454fcf5..5556bde5 100644
> > --- a/src/abg-dwarf-reader.cc
> > +++ b/src/abg-dwarf-reader.cc
> > @@ -2025,7 +2025,7 @@ lookup_symbol_from_gnu_hash_tab(const environment*              env,
> >    // filter, in bits.
> >    int c = get_elf_class_size_in_bytes(elf_handle) * 8;
> >    int n =  (h1 / c) % ht.bf_nwords;
> > -  unsigned char bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c));
> > +  GElf_Word bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c));
>
> Good catch!  I wonder what made me chose a byte-wide bitmask to begin
> with.  The docmentation of the bloom filter of the GNU Hash table at
> https://blogs.oracle.com/solaris/gnu-hash-elf-sections-v2 says:
>
>     "The filter consists of maskwords words, each of which is 32-bits
>     (ELFCLASS32) or 64-bits (ELFCLASS64) depending on the class of object"
>
> So it's clear, the bitmask must be of type GElf_Word.
>
> Have you seen any practical speed increase in the loading of ELF
> binaries due to this change?

I spotted the bug when reviewing maennich's fix for undefined
behaviour (integer overflow). I'm afraid I didn't go as far as testing
on a real workload as I don't think we have one that exercises this
code. It might help someone else though.

The whole of make check (excluding fedabipkgdiff tests) only made 3
calls to the function. I think the efficiency was 100% both before and
after (2/2 absent symbols excluded by the Bloom filter). :-)

> [...]
>
> > Most of the bit values used for GNU hash ELF section Bloom filtering
> > were being ignored due to integer narrowing, reducing missing symbol
> > filtering efficiency considerably.
> >
> > This patch fixes this.
> >
> >       * src/abg-dwarf-reader.cc (lookup_symbol_from_gnu_hash_tab):
> >       Don't narrow calculated Bloom word to 8 bits before using it
> >       to mask the fetched Bloom word.
>
> I have amended the patch to have the ChangeLog section above come
> last in the commit log message.
>
> It really needs to come last because we have a script that build a
> ChangeLog file from all the change log parts of the commit log and that
> one is shipped in the released tarball, making it compliant with the GNU
> tarball standards.  That way, users who consume the tarball can access
> some history without needing to have (internet) access to the full Git
> history.

I was happy for the note on testing to be excluded from the commit
message. Would it be better to just reply to the patch to add this
sort of context, or would you like to see these notes in the commit
messages?

Regards,
Giuliano.

> [...]
>
> > Note on testing.
> >
> > The .gnu.hash section seems to be present in all the .so but none of
> > the .o test files. abisym doesn't appear to find dynamic symbols (nm
> > -D), only normal ones, so it was a little tricky to test this.
> >
> > I found a Debian .so (libpthread) with both the .gnu.hash section and
> > normal symbols. abisym behaves identically with my change, looking up
> > lots of present and non-present (as far as it's concerned) symbols. I
> > just extracted a full list with nm/sed and looked up each one.
> >
> > 389 symbols looked up, 241 present, 148 absent
> > 8-bit filter: 336 maybe, 53 no (53/148 filtering efficiency)
> > 64-bit filter: 255 maybe, 134 no (134/148 filtering efficiency)
>
> Thanks.
>
> So I am applying this to master.
>
> Below is the actual patch that got applied.
>
> Cheers,
>
>
> --
>                 Dodji
  
Matthias Männich March 19, 2020, 11:05 a.m. UTC | #3
On Thu, Mar 19, 2020 at 10:46:12AM +0000, Giuliano Procida wrote:
>Hi Dodji.
>
>On Thu, 19 Mar 2020 at 10:21, Dodji Seketeli <dodji@seketeli.org> wrote:
>>
>> Hello Giuliano,
>>
>> Giuliano Procida <gprocida@google.com> a ?crit:
>>
>> [...]
>>
>> > diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>> > index 3454fcf5..5556bde5 100644
>> > --- a/src/abg-dwarf-reader.cc
>> > +++ b/src/abg-dwarf-reader.cc
>> > @@ -2025,7 +2025,7 @@ lookup_symbol_from_gnu_hash_tab(const environment*              env,
>> >    // filter, in bits.
>> >    int c = get_elf_class_size_in_bytes(elf_handle) * 8;
>> >    int n =  (h1 / c) % ht.bf_nwords;
>> > -  unsigned char bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c));
>> > +  GElf_Word bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c));
>>
>> Good catch!  I wonder what made me chose a byte-wide bitmask to begin
>> with.  The docmentation of the bloom filter of the GNU Hash table at
>> https://blogs.oracle.com/solaris/gnu-hash-elf-sections-v2 says:
>>
>>     "The filter consists of maskwords words, each of which is 32-bits
>>     (ELFCLASS32) or 64-bits (ELFCLASS64) depending on the class of object"
>>
>> So it's clear, the bitmask must be of type GElf_Word.
>>
>> Have you seen any practical speed increase in the loading of ELF
>> binaries due to this change?
>
>I spotted the bug when reviewing maennich's fix for undefined
>behaviour (integer overflow). I'm afraid I didn't go as far as testing
>on a real workload as I don't think we have one that exercises this
>code. It might help someone else though.
>
>The whole of make check (excluding fedabipkgdiff tests) only made 3
>calls to the function. I think the efficiency was 100% both before and
>after (2/2 absent symbols excluded by the Bloom filter). :-)
>
>> [...]
>>
>> > Most of the bit values used for GNU hash ELF section Bloom filtering
>> > were being ignored due to integer narrowing, reducing missing symbol
>> > filtering efficiency considerably.
>> >
>> > This patch fixes this.
>> >
>> >       * src/abg-dwarf-reader.cc (lookup_symbol_from_gnu_hash_tab):
>> >       Don't narrow calculated Bloom word to 8 bits before using it
>> >       to mask the fetched Bloom word.
>>
>> I have amended the patch to have the ChangeLog section above come
>> last in the commit log message.
>>
>> It really needs to come last because we have a script that build a
>> ChangeLog file from all the change log parts of the commit log and that
>> one is shipped in the released tarball, making it compliant with the GNU
>> tarball standards.  That way, users who consume the tarball can access
>> some history without needing to have (internet) access to the full Git
>> history.
>
>I was happy for the note on testing to be excluded from the commit
>message. Would it be better to just reply to the patch to add this
>sort of context, or would you like to see these notes in the commit
>messages?

I would prefer those information in the commit message. But note,
everything below the '---' line is omitted from the commit message and
is therefore a good place to put additional information not to be
persisted in the final commit.

Cheers,
Matthias

>
>Regards,
>Giuliano.
>
>> [...]
>>
>> > Note on testing.
>> >
>> > The .gnu.hash section seems to be present in all the .so but none of
>> > the .o test files. abisym doesn't appear to find dynamic symbols (nm
>> > -D), only normal ones, so it was a little tricky to test this.
>> >
>> > I found a Debian .so (libpthread) with both the .gnu.hash section and
>> > normal symbols. abisym behaves identically with my change, looking up
>> > lots of present and non-present (as far as it's concerned) symbols. I
>> > just extracted a full list with nm/sed and looked up each one.
>> >
>> > 389 symbols looked up, 241 present, 148 absent
>> > 8-bit filter: 336 maybe, 53 no (53/148 filtering efficiency)
>> > 64-bit filter: 255 maybe, 134 no (134/148 filtering efficiency)
>>
>> Thanks.
>>
>> So I am applying this to master.
>>
>> Below is the actual patch that got applied.
>>
>> Cheers,
>>
>>
>> --
>>                 Dodji
  
Dodji Seketeli March 19, 2020, 3:17 p.m. UTC | #4
Matthias M?nnich <maennich@google.com> a ?crit:

>>
>>I was happy for the note on testing to be excluded from the commit
>>message. Would it be better to just reply to the patch to add this
>>sort of context, or would you like to see these notes in the commit
>>messages?
>
> I would prefer those information in the commit message. But note,
> everything below the '---' line is omitted from the commit message and
> is therefore a good place to put additional information not to be
> persisted in the final commit.

I didn't exclude the testing note from the commit message.  I just put
the ChangeLog part at the end.

Cheers,
  

Patch

diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index 3454fcf5..5556bde5 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -2025,7 +2025,7 @@  lookup_symbol_from_gnu_hash_tab(const environment*		env,
   // filter, in bits.
   int c = get_elf_class_size_in_bytes(elf_handle) * 8;
   int n =  (h1 / c) % ht.bf_nwords;
-  unsigned char bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c));
+  GElf_Word bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c));
 
   // Test if the symbol is *NOT* present in this ELF file.
   if ((bloom_word_at(elf_handle, ht.bloom_filter, n) & bitmask) != bitmask)