RFC: LD: Changing the size of the delta used when growing merge offset maps
Checks
| Context |
Check |
Description |
| linaro-tcwg-bot/tcwg_binutils_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_binutils_check--master-arm |
success
|
Test passed
|
| linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 |
success
|
Test passed
|
Commit Message
Hi Guys,
Running the linker under sysprof shows that the append_offsetmap()
function in merge.c is responsible for a large amount of the memory
allocated during a link, and a lot of those allocations are calls to
bfd_realloc(). Since reallocating memory takes time, and represents
an inefficiency it seems to me that we can do better than having a
fixed delta for increasing the size of the offset maps.
So I am proposing the attached patch. It creates a new static
variable called map_delta which is used for the array increments. The
value of map_delta is selected based upon whether
--reduce-memory-overheads or -O have been specified on the command
line. (I am not sure about the use of the -O flag here, but I could
not find a more relevant flag to indicate 'this is a big link').
The results are underwhelming, but still do still make a small amount
of difference. For example linking LLVM's ld.llc executable on my
machine uses 7,121,480 Kb of memory by default and takes around 14.97
seconds. But with the patch applied and adding -O to the link command
line this changes to 7,120,196 Kb and 14.89 seconds. Not a lot I know
but maybe a step in the right direction.
Thoughts / comments ?
Cheers
Nick
Comments
On Mon, Mar 16, 2026 at 9:37 AM Nick Clifton <nickc@redhat.com> wrote:
>
> Hi Guys,
>
> Running the linker under sysprof shows that the append_offsetmap()
> function in merge.c is responsible for a large amount of the memory
> allocated during a link, and a lot of those allocations are calls to
> bfd_realloc(). Since reallocating memory takes time, and represents
> an inefficiency it seems to me that we can do better than having a
> fixed delta for increasing the size of the offset maps.
>
> So I am proposing the attached patch. It creates a new static
> variable called map_delta which is used for the array increments. The
> value of map_delta is selected based upon whether
> --reduce-memory-overheads or -O have been specified on the command
> line. (I am not sure about the use of the -O flag here, but I could
> not find a more relevant flag to indicate 'this is a big link').
>
> The results are underwhelming, but still do still make a small amount
> of difference. For example linking LLVM's ld.llc executable on my
> machine uses 7,121,480 Kb of memory by default and takes around 14.97
> seconds. But with the patch applied and adding -O to the link command
> line this changes to 7,120,196 Kb and 14.89 seconds. Not a lot I know
Is bfd_realloc really the bottleneck? If yes, why are the execution times almost
the same? It looks like that the memory usage doesn't change much. If we can't
improve both execution time and memory usage at the same time, shouldn't we
focus on one of them?
> but maybe a step in the right direction.
>
> Thoughts / comments ?
>
> Cheers
> Nick
>
Hello,
On Mon, 16 Mar 2026, Nick Clifton wrote:
> Running the linker under sysprof shows that the append_offsetmap()
> function in merge.c is responsible for a large amount of the memory
> allocated during a link, and a lot of those allocations are calls to
> bfd_realloc(). Since reallocating memory takes time, and represents
> an inefficiency it seems to me that we can do better than having a
> fixed delta for increasing the size of the offset maps.
>
> So I am proposing the attached patch. It creates a new static
> variable called map_delta which is used for the array increments. The
> value of map_delta is selected based upon whether
> --reduce-memory-overheads or -O have been specified on the command
> line. (I am not sure about the use of the -O flag here, but I could
> not find a more relevant flag to indicate 'this is a big link').
>
> The results are underwhelming, but still do still make a small amount
> of difference. For example linking LLVM's ld.llc executable on my
> machine uses 7,121,480 Kb of memory by default and takes around 14.97
> seconds. But with the patch applied and adding -O to the link command
> line this changes to 7,120,196 Kb and 14.89 seconds. Not a lot I know
> but maybe a step in the right direction.
>
> Thoughts / comments ?
Have you tried to establish the best case possible for this scenario?
Measure the maxsize of these and realloc that size right from the start,
so that no reallocs are ever called in that run?
See also commit 21160d8a1 about a related but opposite problem: very many
tiny mergable sections; a larger increment would have aggrevated
that specific, but now fixed, problem.
Thing is: every reallocation is interspersed with a fixed number of new
offset/entry pairs for mergable sections, and each of these pairs is only
created after parsing and hashing the sections contents. So the final
size of the offset map will be directly related to the number of hashed
blobs in it, and hence roughly related to its size. So another approach I
considered but didn't follow up on, was to approximate the initial number
of entries to (say) section-size divided by 16, guarded by some minimum
and maximum, or to perhaps make the increment itself somewhat depending on
that (i.e. large input sections will get a larger increment).
The "16" above would make it so that a mergable string section whose
average string length is 16 would come out exactly preallocated.
If we'd go with your patch I'd say we don't need it runtime configurable.
Just use the larger value. Due to the commit above, after a mergable
section is parsed that map will be reallocated back to its really used
size anyway, so any waste during that parsing will be returned back to the
allocator immediately. And that waste will be strictly bounded by the
increment size. I.e. for your new value of 8192 the waste will be max
8191 entries (total, not per input section), which translates to 98k .
Ciao,
Michael.
Hi H.J.
>> Running the linker under sysprof shows that the append_offsetmap()
>> function in merge.c is responsible for a large amount of the memory
>> allocated during a link, and a lot of those allocations are calls to
>> bfd_realloc().
> Is bfd_realloc really the bottleneck?
Not really no. It does take a small amount of time, of course, especially
if a new memory region needs to be used. But in the grand scheme of things
it is not actually a significant time hog.
Cheers
Nick
Hi Michael,
>> Running the linker under sysprof shows that the append_offsetmap()
>> function in merge.c is responsible for a large amount of the memory
>> allocated during a link, and a lot of those allocations are calls to
>> bfd_realloc().
> Have you tried to establish the best case possible for this scenario?
> Measure the maxsize of these and realloc that size right from the start,
> so that no reallocs are ever called in that run?
Hmm, no ...
> Thing is: every reallocation is interspersed with a fixed number of new
> offset/entry pairs for mergable sections, and each of these pairs is only
> created after parsing and hashing the sections contents. So the final
> size of the offset map will be directly related to the number of hashed
> blobs in it, and hence roughly related to its size. So another approach I
> considered but didn't follow up on, was to approximate the initial number
> of entries to (say) section-size divided by 16, guarded by some minimum
> and maximum, or to perhaps make the increment itself somewhat depending on
> that (i.e. large input sections will get a larger increment).
Actually that sounds like an interesting idea. I may have a go myself.
> If we'd go with your patch
I think that I am going to withdraw the patch. It did not achieve enough
of a gain to make it really useful and your idea of preallocating enough
space at the start looks like it might be far more effective.
Cheers
Nick
@@ -469,6 +469,38 @@ sec_merge_init (unsigned int entsize, bool strings)
return table;
}
+/* Choose the size increment for the offset maps. A large value means
+ that we will not have to call bfd_realloc() as many times, if we are
+ dealing with large input sections, and hence the link will be faster.
+ But a large value can also waste a lot of memory if we are dealing
+ with a big number of small input sections. Which is a problem if we
+ are linking in an environment with a restricted amount of memory.
+
+ FIXME: The values below are a bit arbitrary. It would be better to
+ create a tuneable parameter that could be set by the user. */
+#define MAP_DELTA_SMALL 1024
+#define MAP_DELTA_NORM 2048
+#define MAP_DELTA_LARGE 8192
+
+/* In order to avoid the possibility of using map_delta before it
+ has been set to a value selected by choose_map_delta() we initialise
+ it to a semi-reasonable value. */
+#define MAP_DELTA_UNSET 4096
+static unsigned int map_delta = MAP_DELTA_UNSET;
+
+static unsigned int
+choose_map_delta (struct bfd_link_info *info)
+{
+ if (info->reduce_memory_overheads)
+ return MAP_DELTA_SMALL;
+
+ /* FIXME: The optimize flag is not really the right flag to test here. */
+ if (info->optimize)
+ return MAP_DELTA_LARGE;
+
+ return MAP_DELTA_NORM;
+}
+
/* Append the tuple of input-offset O corresponding
to hash table ENTRY into SECINFO, such that we later may lookup the
entry just by O. */
@@ -478,10 +510,12 @@ append_offsetmap (struct sec_merge_sec_info *secinfo,
mapofs_type o,
struct sec_merge_hash_entry *entry)
{
- if ((secinfo->noffsetmap & 2047) == 0)
+ // BFD_ASSERT (map_delta != MAP_DELTA_UNSET);
+ // BFD_ASSERT (map_delta != 0);
+ if ((secinfo->noffsetmap & (map_delta - 1)) == 0)
{
bfd_size_type amt;
- amt = (secinfo->noffsetmap + 2048);
+ amt = (secinfo->noffsetmap + map_delta);
secinfo->map_ofs = bfd_realloc (secinfo->map_ofs,
amt * sizeof(secinfo->map_ofs[0]));
if (!secinfo->map_ofs)
@@ -784,7 +818,8 @@ record_section (struct sec_merge_info *sinfo,
free (contents);
contents = NULL;
- /* We allocate the ofsmap arrays in blocks of 2048 elements.
+ /* We allocate the ofsmap arrays in blocks of map_delta elements.
+ (See choose_map_delta() above).
In case we have very many small input files/sections,
this might waste large amounts of memory, so reallocate these
arrays here to their true size. */
@@ -1066,6 +1101,9 @@ bfd_merge_sections (bfd *obfd, struct bfd_link_info *info)
if (!obfd->xvec->merge_sections)
return true;
+ if (map_delta == MAP_DELTA_UNSET)
+ map_delta = choose_map_delta (info);
+
for (ibfd = info->input_bfds; ibfd != NULL; ibfd = ibfd->link.next)
if ((ibfd->flags & DYNAMIC) == 0)
for (sec = ibfd->sections; sec != NULL; sec = sec->next)