PR32260: Improve estimate of number of strings

Message ID 84634a8b-773e-f7c6-c1fb-1b0bc5fbeab1@suse.de
State New
Headers
Series PR32260: Improve estimate of number of strings |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 fail Patch failed to apply

Commit Message

Michael Matz Oct. 16, 2024, 1:41 p.m. UTC
  we want to pre-size the hashtable for mergable strings,
and so need an estimate for the number of them in one input
section.  This reduces the over-estimation a little.

	bfd/

	PR ld/32260
	* merge.c (record_section): Throttle down the overestimation
	of number of entries in a section.
---
Tested without regressions on Alans list of targets.  Okay for master?

 bfd/merge.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)
  

Comments

Andreas Schwab Oct. 16, 2024, 2:04 p.m. UTC | #1
On Okt 16 2024, Michael Matz wrote:

> diff --git a/bfd/merge.c b/bfd/merge.c
> index c811bc57eae..2b4d2e428d4 100644
> --- a/bfd/merge.c
> +++ b/bfd/merge.c
> @@ -736,10 +736,42 @@ record_section (struct sec_merge_info *sinfo,
>  
>    /* Now populate the hash table and offset mapping.  */
>  
> +  /* We use an upper bounds of the number of strings that can be

s/bounds/bound/
  
Michael Matz Oct. 16, 2024, 2:07 p.m. UTC | #2
Hello,

On Wed, 16 Oct 2024, Andreas Schwab wrote:

> On Okt 16 2024, Michael Matz wrote:
> 
> > diff --git a/bfd/merge.c b/bfd/merge.c
> > index c811bc57eae..2b4d2e428d4 100644
> > --- a/bfd/merge.c
> > +++ b/bfd/merge.c
> > @@ -736,10 +736,42 @@ record_section (struct sec_merge_info *sinfo,
> >  
> >    /* Now populate the hash table and offset mapping.  */
> >  
> > +  /* We use an upper bounds of the number of strings that can be
> 
> s/bounds/bound/

And "bound on the ...".  Fixed in my local copy.


Thanks,
Michael.
  
Alan Modra Oct. 17, 2024, 12:42 a.m. UTC | #3
On Wed, Oct 16, 2024 at 03:41:31PM +0200, Michael Matz wrote:
> we want to pre-size the hashtable for mergable strings,
> and so need an estimate for the number of them in one input
> section.  This reduces the over-estimation a little.
> 
> 	bfd/
> 
> 	PR ld/32260
> 	* merge.c (record_section): Throttle down the overestimation
> 	of number of entries in a section.

So this does cut down the estimate, but I think it's still wildly
oversize to use an upper bound here.  You're calculating for an input
section of unique strings (8-bit too, not 7-bit).  How often is that
going to happen in real object files?

The other thing that occurs to me is that the merge.c code doesn't
avoid resizing sinfo->htab.  Can't we delay creation of sinfo->htab
until _bfd_merge_sections, and then make a better estimate of the size
required from the total size of sections that will be added to the
hash table?  If you used your upper bound then you'd need no resizing.

Perhaps even better would be to use a more conservative estimate
based on the total section size, then implement resizing in the normal
fashion on inserting entries.

> ---
> Tested without regressions on Alans list of targets.  Okay for master?
> 
>  bfd/merge.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/bfd/merge.c b/bfd/merge.c
> index c811bc57eae..2b4d2e428d4 100644
> --- a/bfd/merge.c
> +++ b/bfd/merge.c
> @@ -736,10 +736,42 @@ record_section (struct sec_merge_info *sinfo,
>  
>    /* Now populate the hash table and offset mapping.  */
>  
> +  /* We use an upper bounds of the number of strings that can be
> +     possibly contained in this section.  There are a maximum of
> +     NMAX=255^n strings of length n, which then need NMAX*(n+1)
> +     bytes.  We also know that the section size fits into mapofs_type
> +     (i.e. is smaller than 4G).  */
> +  mapofs_type nestimate;
> +  if (sec->flags & SEC_STRINGS)
> +    {
> +      mapofs_type nmax, slen, remaining;
> +      nestimate = 1;
> +      nmax = 255;
> +      slen = 2;
> +      remaining = sec->size;
> +      while (remaining > 0)
> +	{
> +	  mapofs_type fits = remaining / slen;
> +	  if (fits <= nmax)
> +	    {
> +	      nestimate += fits;
> +	      break;
> +	    }
> +	  nestimate += nmax;
> +	  /* Ensure that nmax doesn't overflow.  */
> +	  BFD_ASSERT (slen < 5);
> +	  remaining -= nmax * slen;
> +	  nmax *= 255;
> +	  slen++;
> +	}
> +    }
> +  else
> +    nestimate = sec->size / sec->entsize;
> +
>    /* Presize the hash table for what we're going to add.  We overestimate
>       quite a bit, but if it turns out to be too much then other sections
>       merged into this area will make use of that as well.  */
> -  if (!sec_merge_maybe_resize (sinfo->htab, 1 + sec->size / 2))
> +  if (!sec_merge_maybe_resize (sinfo->htab, nestimate))
>      {
>        free (contents);
>        return 2;
> -- 
> 2.42.0
  
Michael Matz Oct. 17, 2024, 1:28 p.m. UTC | #4
Hello,

On Thu, 17 Oct 2024, Alan Modra wrote:

> > 	bfd/
> > 
> > 	PR ld/32260
> > 	* merge.c (record_section): Throttle down the overestimation
> > 	of number of entries in a section.
> 
> So this does cut down the estimate, but I think it's still wildly
> oversize to use an upper bound here.

Yes, it's complete bollocks^W^Wvery very conservative.  

> You're calculating for an input section of unique strings (8-bit too, 
> not 7-bit).  How often is that going to happen in real object files?

Mostly never, of course.  Except in constructed examples that would try to 
trigger this by enumerating all 5-character strings for instance. I 
wanted to get it 100% correct in these cases as well, first, and then see 
how bad it is in practice (on the grounds that the terrible 
over-estimation will usually be used up by further mergable blobs).  
Turns out that it seems to be mostly okay, except in these insane cases 
like in the bug report :-)

So, yeah, I think I'll try to do something more fancy.  There are 
probalistic measures about how many unique things will most likely be 
contained in blobs, and suchlike.  Thing is: when I wrote all that code 
there was really a non-negligible performance impact when trying to do the 
resizing on demand in the hot loop instead of ensuring its not necessary.  

> The other thing that occurs to me is that the merge.c code doesn't
> avoid resizing sinfo->htab.  Can't we delay creation of sinfo->htab
> until _bfd_merge_sections,

Delaying the creation itself doesn't matter much, the resizing and real 
large allocations already happen only from within record_section and 
hence _bfd_merge_sections ...

> and then make a better estimate of the size
> required from the total size of sections that will be added to the
> hash table?

... but this is possible.  If there's better estimate of course.  Right 
now it conceptually works like this:

  for all input-sections S:
    estimate = foo(S.size);
    resize (estimate);
    do_stuff_without_resizing(S);  // 1

At (1) the potential overestimations from earlier S's will be used up.  If 
we were to instead resize the htab to be completely outside the loop over 
the input blobs:

  foreach S:
    wholesize += S.size;
  resize (estimate (wholesize));
  foreach S:
    do_stuff_without_resizing(S);

then there is no opportunity anymore to use up earlier over-estimates, as 
there's just a single measurement.  So the estimate() really needs to be 
good now, and doing that without reading and interpreting S's contents is hard.

I can see only one way here: do a less conservative estimate, and then 
live with the fact that there needs to be a needs-resize check in the hot 
loop.

> If you used your upper bound then you'd need no resizing.
> 
> Perhaps even better would be to use a more conservative estimate
> based on the total section size, then implement resizing in the normal
> fashion on inserting entries.

You mean "less conservative", right?  In the sense of less over-estimation 
and hence not all content fitting into the table.

Anyway, I'll try to rewrite the code a little and see how the performance 
fares.


Ciao,
Michael.
  
Alan Modra Oct. 18, 2024, 12:31 a.m. UTC | #5
On Thu, Oct 17, 2024 at 03:28:58PM +0200, Michael Matz wrote:
> On Thu, 17 Oct 2024, Alan Modra wrote:
> > If you used your upper bound then you'd need no resizing.
> > 
> > Perhaps even better would be to use a more conservative estimate
> > based on the total section size, then implement resizing in the normal
> > fashion on inserting entries.
> 
> You mean "less conservative", right?  In the sense of less over-estimation 
> and hence not all content fitting into the table.

Yes.

> Anyway, I'll try to rewrite the code a little and see how the performance 
> fares.

Thanks.  I'm happy to do a little rewriting myself, but I'll not start
doing that unless you tell me to go ahead.
  
Michael Matz Oct. 21, 2024, 4:25 p.m. UTC | #6
Hey,

On Fri, 18 Oct 2024, Alan Modra wrote:

> On Thu, Oct 17, 2024 at 03:28:58PM +0200, Michael Matz wrote:
> > On Thu, 17 Oct 2024, Alan Modra wrote:
> > > If you used your upper bound then you'd need no resizing.
> > > 
> > > Perhaps even better would be to use a more conservative estimate
> > > based on the total section size, then implement resizing in the normal
> > > fashion on inserting entries.
> > 
> > You mean "less conservative", right?  In the sense of less over-estimation 
> > and hence not all content fitting into the table.
> 
> Yes.

Actually, over the weekend I remembered the original motivation for doing 
the pre-sizing before the hot loop.  It wasn't that it was (much) faster, 
but rather that this way it's easier to do the hashing with multiple 
threads by some clever atomic accesses on a table that is guaranteed to 
not need resizes.

Alas, that multithreading didn't happen yet (though I still somewhat plan 
to tackle that).  So there's currently no reason to not just do the 
obvious thing: resize when necessary without any sort of over-estimation.

So, like below.  Okay if it passes regtesting?

(The only other material change is that I now regard all allocation 
errors, even small ones, as soft errors that will only disable 
deduplication on the current blob, even though it's then likely to not 
succeed on further blobs either.  But if it's really a small allocation 
that doesn't work here, then other parts of the linker will likely still 
result in a linker error.)

> > Anyway, I'll try to rewrite the code a little and see how the 
> > performance fares.
> Thanks.  I'm happy to do a little rewriting myself, but I'll not start
> doing that unless you tell me to go ahead.

Go wild!


Ciao,
Michael.

------

From ba043810892684578cc5eb213627d3be2a7e4d8c Mon Sep 17 00:00:00 2001
From: Michael Matz <matz@suse.de>
Date: Mon, 21 Oct 2024 17:58:32 +0200
Subject: stringmerge: don't presize hash table
To: binutils@sourceware.org

originally the reason for pre-sizing was that that's easier
for a multi-threaded use of the hash table.  That hasn't materialized
yet, so there's not much sense in using the very very conservative
estimates for pre-sizing.  Doing the resize on-demand, whenever we
actually need to add a new entry doesn't change performance.

	bfd/
	merge.c (sec_merge_hash_insert): Resize as needed from here ...
	(record_section): ... not from here.  Don't calculate estimates,
	return bool instead of three-state, regard all errors as soft
	errors.
	(_bfd_merge_sections): Adjust.
---
 bfd/merge.c | 70 ++++++++++++++++++++++++-----------------------------
 1 file changed, 31 insertions(+), 39 deletions(-)

diff --git a/bfd/merge.c b/bfd/merge.c
index abb0269227e..cb8251cb42e 100644
--- a/bfd/merge.c
+++ b/bfd/merge.c
@@ -244,8 +244,23 @@ sec_merge_hash_insert (struct sec_merge_hash *table,
   hashp->alignment = 0;
   hashp->u.suffix = NULL;
   hashp->next = NULL;
-  // We must not need resizing, otherwise the estimation was wrong
-  BFD_ASSERT (!NEEDS_RESIZE (bfdtab->count + 1, table->nbuckets));
+
+  if (NEEDS_RESIZE (bfdtab->count + 1, table->nbuckets))
+    {
+      if (!sec_merge_maybe_resize (table, 1))
+	return NULL;
+      uint64_t *key_lens = table->key_lens;
+      unsigned int nbuckets = table->nbuckets;
+      _index = hash & (nbuckets - 1);
+      while (1)
+	{
+	  uint64_t candlen = key_lens[_index];
+	  if (!(candlen & (uint32_t)-1))
+	    break;
+	  _index = (_index + 1) & (nbuckets - 1);
+	}
+    }
+
   bfdtab->count++;
   table->key_lens[_index] = (hash << 32) | (uint32_t)len;
   table->values[_index] = hashp;
@@ -699,12 +714,11 @@ _bfd_add_merge_section (bfd *abfd, void **psinfo, asection *sec,
 }
 
 /* Record one whole input section (described by SECINFO) into the hash table
-   SINFO.  Returns 0 on hard errors (no sense in continuing link),
-   1 when section is completely recorded, and 2 when the section wasn't
-   recorded but we can continue (e.g. by simply not deduplicating this
-   section).  */
+   SINFO.  Returns true when section is completely recorded, and false when
+   it wasn't recorded but we can continue (e.g. by simply not deduplicating
+   this section).  */
 
-static int
+static bool
 record_section (struct sec_merge_info *sinfo,
 		struct sec_merge_sec_info *secinfo)
 {
@@ -736,15 +750,6 @@ record_section (struct sec_merge_info *sinfo,
 
   /* Now populate the hash table and offset mapping.  */
 
-  /* Presize the hash table for what we're going to add.  We overestimate
-     quite a bit, but if it turns out to be too much then other sections
-     merged into this area will make use of that as well.  */
-  if (!sec_merge_maybe_resize (sinfo->htab, 1 + sec->size / 2))
-    {
-      free (contents);
-      return 2;
-    }
-
   /* Walk through the contents, calculate hashes and length of all
      blobs (strings or fixed-size entries) we find and fill the
      hash and offset tables.  */
@@ -792,14 +797,12 @@ record_section (struct sec_merge_info *sinfo,
   /*printf ("ZZZ %s:%s %u entries\n", sec->owner->filename, sec->name,
 	  (unsigned)secinfo->noffsetmap);*/
 
-  return 1;
+  return true;
 
  error_return:
   free (contents);
   contents = NULL;
-  for (secinfo = sinfo->chain; secinfo; secinfo = secinfo->next)
-    *secinfo->psecinfo = NULL;
-  return 0;
+  return false;
 }
 
 /* qsort comparison function.  Won't ever return zero as all entries
@@ -987,31 +990,20 @@ _bfd_merge_sections (bfd *abfd,
       /* Record the sections into the hash table.  */
       align = 1;
       for (secinfo = sinfo->chain; secinfo; secinfo = secinfo->next)
-	if (secinfo->sec->flags & SEC_EXCLUDE)
+	if (secinfo->sec->flags & SEC_EXCLUDE
+	    || !record_section (sinfo, secinfo))
 	  {
 	    *secinfo->psecinfo = NULL;
 	    if (remove_hook)
 	      (*remove_hook) (abfd, secinfo->sec);
 	  }
-	else
+	else if (align)
 	  {
-	    int e = record_section (sinfo, secinfo);
-	    if (e == 0)
-	      return false;
-	    if (e == 2)
-	      {
-		*secinfo->psecinfo = NULL;
-		if (remove_hook)
-		  (*remove_hook) (abfd, secinfo->sec);
-	      }
-	    else if (align)
-	      {
-		unsigned int opb = bfd_octets_per_byte (abfd, secinfo->sec);
-
-		align = (bfd_size_type) 1 << secinfo->sec->alignment_power;
-		if (((secinfo->sec->size / opb) & (align - 1)) != 0)
-		  align = 0;
-	      }
+	    unsigned int opb = bfd_octets_per_byte (abfd, secinfo->sec);
+
+	    align = (bfd_size_type) 1 << secinfo->sec->alignment_power;
+	    if (((secinfo->sec->size / opb) & (align - 1)) != 0)
+	      align = 0;
 	  }
 
       if (sinfo->htab->first == NULL)
  
Alan Modra Oct. 21, 2024, 10:07 p.m. UTC | #7
On Mon, Oct 21, 2024 at 06:25:21PM +0200, Michael Matz wrote:
> So, like below.  Okay if it passes regtesting?

Yes, thanks.
  

Patch

diff --git a/bfd/merge.c b/bfd/merge.c
index c811bc57eae..2b4d2e428d4 100644
--- a/bfd/merge.c
+++ b/bfd/merge.c
@@ -736,10 +736,42 @@  record_section (struct sec_merge_info *sinfo,
 
   /* Now populate the hash table and offset mapping.  */
 
+  /* We use an upper bounds of the number of strings that can be
+     possibly contained in this section.  There are a maximum of
+     NMAX=255^n strings of length n, which then need NMAX*(n+1)
+     bytes.  We also know that the section size fits into mapofs_type
+     (i.e. is smaller than 4G).  */
+  mapofs_type nestimate;
+  if (sec->flags & SEC_STRINGS)
+    {
+      mapofs_type nmax, slen, remaining;
+      nestimate = 1;
+      nmax = 255;
+      slen = 2;
+      remaining = sec->size;
+      while (remaining > 0)
+	{
+	  mapofs_type fits = remaining / slen;
+	  if (fits <= nmax)
+	    {
+	      nestimate += fits;
+	      break;
+	    }
+	  nestimate += nmax;
+	  /* Ensure that nmax doesn't overflow.  */
+	  BFD_ASSERT (slen < 5);
+	  remaining -= nmax * slen;
+	  nmax *= 255;
+	  slen++;
+	}
+    }
+  else
+    nestimate = sec->size / sec->entsize;
+
   /* Presize the hash table for what we're going to add.  We overestimate
      quite a bit, but if it turns out to be too much then other sections
      merged into this area will make use of that as well.  */
-  if (!sec_merge_maybe_resize (sinfo->htab, 1 + sec->size / 2))
+  if (!sec_merge_maybe_resize (sinfo->htab, nestimate))
     {
       free (contents);
       return 2;