[v4] gprof: only process line numbers for intersection of vmas and histograms
Checks
| Context |
Check |
Description |
| linaro-tcwg-bot/tcwg_binutils_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 |
success
|
Test passed
|
| linaro-tcwg-bot/tcwg_binutils_check--master-arm |
success
|
Test passed
|
Commit Message
On Sun, Mar 9, 2025 at 9:08 PM Richard Allen <rsaxvc@gmail.com> wrote:
>
> > Here is the v3 patch.
>
> I'm afraid loading the symbols on-demand may add a
> dependency on the order of records in the gmon file, so that
> [call-graph record, histogram record] may parse differently from
> [histogram record, callgraph record].
>
> >> Also, I'm not sure if it is enough for core_create_line_syms() to check
> >> only the histograms for symbols. Do you think we'll also need to lookup
> >> symbols for basic-block and call-graph addresses?
> > How can we get these symbols?
>
> Idea1:
> 1) For bb_read_rec and cg_read_rec, split the loading apart from the
> processing, and perform only the loading first, similar to hist_read_rec.
> 2) lookup symbols for all three record types, according to the addresses
> from each record type parsed.
> 3) perform processing for all three record types.
>
> Idea2:
> Build an expandable cache of address->symbol mappings.
> On cache miss, perform BFD lookup and insert in cache.
>
> -Richard
The profile data is written out with
/* write PC histogram: */
write_hist (fd, load_address);
/* write call-graph: */
write_call_graph (fd, load_address);
/* write basic-block execution counts: */
write_bb_counts (fd);
The histogram records capture low PC and high PC of program execution.
It is OK to look up only the line numbers within low PC and high PC in histogram
records.
Here is the v4 patch without the gprof.c change.
Comments
>The profile data is written out with..
I was worried about other sources of gmon files, but
checked a few other systems, and all wrote histograms
first too, so it sounds fine to me.
I've noticed that some of my profiles have discrepancies
with the change to limit to histogram range parsing,
usually of no more than 2-3 bytes. I don't think this was
anything you've caused.
H.J.,
For your patchset V4, without loading the histograms
before loading the symbols, line-by-line mode gives
misleading results - maybe based on the line-number
of the function symbols.
-Richard
> I've noticed that some of my profiles have discrepancies
> with the change to limit to histogram range parsing,
> usually of no more than 2-3 bytes.
The discrepancy occurs in line-by-line mode, when a
low histogram bound occurs within the PC range for a line.
Ex: draw_border_generic (lv_draw_sw_rect.c:1260 @ 4209fc15)
vs draw_border_generic (lv_draw_sw_rect.c:1260 @ 4209fc17)
When loading all address symbols, 4209fc15
points to the first instruction of the line.
When loading all histogram symbols,4209fc17 points to
the first instruction sampled by the histogram, which
happens 2B/1inst after the first instruction of the line.
This isn't a problem for me. If it's a problem for anyone,
we could reproduce the exact original printout by parsing the
address range of all functions that intersect the histograms.
On Mon, Mar 10, 2025 at 10:53 PM Richard Allen <rsaxvc@gmail.com> wrote:
>
> > I've noticed that some of my profiles have discrepancies
> > with the change to limit to histogram range parsing,
> > usually of no more than 2-3 bytes.
>
> The discrepancy occurs in line-by-line mode, when a
> low histogram bound occurs within the PC range for a line.
>
> Ex: draw_border_generic (lv_draw_sw_rect.c:1260 @ 4209fc15)
> vs draw_border_generic (lv_draw_sw_rect.c:1260 @ 4209fc17)
>
> When loading all address symbols, 4209fc15
> points to the first instruction of the line.
>
> When loading all histogram symbols,4209fc17 points to
> the first instruction sampled by the histogram, which
> happens 2B/1inst after the first instruction of the line.
>
> This isn't a problem for me. If it's a problem for anyone,
> we could reproduce the exact original printout by parsing the
> address range of all functions that intersect the histograms.
I sent the v5 patch to use histogram records only if they
come first.
Patch v5 looks good to me.
Thanks H.J.
On Tue, Mar 11, 2025 at 8:30 PM Richard Allen <rsaxvc@gmail.com> wrote:
>
> Patch v5 looks good to me.
>
> Thanks H.J.
I am checking in the v5 patch:
https://sourceware.org/pipermail/binutils/2025-March/139952.html
next week.
On Wed, Mar 12, 2025 at 04:49:00PM -0700, H.J. Lu wrote:
> On Tue, Mar 11, 2025 at 8:30 PM Richard Allen <rsaxvc@gmail.com> wrote:
> >
> > Patch v5 looks good to me.
> >
> > Thanks H.J.
>
> I am checking in the v5 patch:
>
> https://sourceware.org/pipermail/binutils/2025-March/139952.html
>
> next week.
That's fine by me too.
On Thu, Mar 13, 2025 at 5:06 PM Alan Modra <amodra@gmail.com> wrote:
>
> On Wed, Mar 12, 2025 at 04:49:00PM -0700, H.J. Lu wrote:
> > On Tue, Mar 11, 2025 at 8:30 PM Richard Allen <rsaxvc@gmail.com> wrote:
> > >
> > > Patch v5 looks good to me.
> > >
> > > Thanks H.J.
> >
> > I am checking in the v5 patch:
> >
> > https://sourceware.org/pipermail/binutils/2025-March/139952.html
> >
> > next week.
>
> That's fine by me too.
>
> --
> Alan Modra
I am checking it in now.
Thanks.
From aba2b1fba1339793c5babd0a247ba71ac7b927a6 Mon Sep 17 00:00:00 2001
From: Richard Allen <rsaxvc@gmail.com>
Date: Sun, 16 Feb 2025 16:50:05 -0600
Subject: [PATCH v4] gprof: only process line numbers for intersection of vmas
and histograms
Some programs like RTOS firmware may have a large number of symbols.
The profile information in the profile data file includes histogram
records, which capture low PC and high PC of program execution. We
can look up only the line numbers within low PC and high PC in histogram
records, which reduces processing time for such a firmware from ~2
minutes to ~2 seconds.
Signed-off-by: Richard Allen <rsaxvc@gmail.com>
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
---
gprof/corefile.c | 96 +++++++++++++++++++++++++-----------------------
1 file changed, 51 insertions(+), 45 deletions(-)
@@ -755,8 +755,9 @@ core_create_line_syms (void)
Sym prev, *sym;
const char *filename;
Sym_Table ltab;
- bfd_vma vma_high;
size_t ltab_reserved;
+ bfd_vma bfd_vma_low = core_text_sect->vma;
+ bfd_vma bfd_vma_high = bfd_vma_low + bfd_section_size (core_text_sect);
/* Create symbols for functions as usual. This is necessary in
cases where parts of a program were not compiled with -g. For
@@ -786,53 +787,58 @@ core_create_line_syms (void)
lot cleaner now. */
memset (&prev, 0, sizeof (prev));
- vma_high = core_text_sect->vma + bfd_section_size (core_text_sect);
- for (vma = core_text_sect->vma; vma < vma_high; vma += insn_boundary)
+ for (size_t i = 0; i < num_histograms; ++i)
{
- if (ltab.len >= ltab_reserved)
+ bfd_vma hist_vma_high = histograms[i].highpc;
+ bfd_vma vma_low = MAX (histograms[i].lowpc, bfd_vma_low);
+ bfd_vma vma_high = MIN (bfd_vma_high, hist_vma_high);
+ for (vma = vma_low; vma < vma_high; vma += insn_boundary)
{
- /* Reserve more space for line symbols. */
- ltab_reserved *= 2;
- ltab.base = (Sym *) xrealloc (ltab.base, ltab_reserved * sizeof (Sym));
- ltab.limit = ltab.base + ltab.len;
+ if (ltab.len >= ltab_reserved)
+ {
+ /* Reserve more space for line symbols. */
+ ltab_reserved *= 2;
+ ltab.base = xrealloc (ltab.base, ltab_reserved * sizeof (Sym));
+ ltab.limit = ltab.base + ltab.len;
+ }
+ sym_init (ltab.limit);
+
+ if (!get_src_info (vma, &filename, <ab.limit->name, <ab.limit->line_num)
+ || (prev.name && prev.line_num == ltab.limit->line_num
+ && strcmp (prev.name, ltab.limit->name) == 0
+ && filename_cmp (prev.file->name, filename) == 0))
+ continue;
+
+ /* Make name pointer a malloc'ed string. */
+ ltab.limit->name = xstrdup (ltab.limit->name);
+ ltab.limit->file = source_file_lookup_path (filename);
+
+ ltab.limit->addr = vma;
+
+ /* Set is_static based on the enclosing function, using either:
+ 1) the previous symbol, if it's from the same function, or
+ 2) a symtab lookup. */
+ if (prev.name && ltab.limit->file == prev.file
+ && strcmp (ltab.limit->name, prev.name) == 0)
+ {
+ ltab.limit->is_static = prev.is_static;
+ }
+ else
+ {
+ sym = sym_lookup(&symtab, ltab.limit->addr);
+ if (sym)
+ ltab.limit->is_static = sym->is_static;
+ }
+
+ prev = *ltab.limit;
+
+ DBG (AOUTDEBUG, printf ("[core_create_line_syms] %lu %s 0x%lx\n",
+ (unsigned long) (ltab.limit - ltab.base),
+ ltab.limit->name,
+ (unsigned long) ltab.limit->addr));
+ ++ltab.limit;
+ ++ltab.len;
}
- sym_init (ltab.limit);
-
- if (!get_src_info (vma, &filename, <ab.limit->name, <ab.limit->line_num)
- || (prev.name && prev.line_num == ltab.limit->line_num
- && strcmp (prev.name, ltab.limit->name) == 0
- && filename_cmp (prev.file->name, filename) == 0))
- continue;
-
- /* Make name pointer a malloc'ed string. */
- ltab.limit->name = xstrdup (ltab.limit->name);
- ltab.limit->file = source_file_lookup_path (filename);
-
- ltab.limit->addr = vma;
-
- /* Set is_static based on the enclosing function, using either:
- 1) the previous symbol, if it's from the same function, or
- 2) a symtab lookup. */
- if (ltab.limit->file == prev.file
- && strcmp (ltab.limit->name, prev.name) == 0)
- {
- ltab.limit->is_static = prev.is_static;
- }
- else
- {
- sym = sym_lookup(&symtab, ltab.limit->addr);
- if (sym)
- ltab.limit->is_static = sym->is_static;
- }
-
- prev = *ltab.limit;
-
- DBG (AOUTDEBUG, printf ("[core_create_line_syms] %lu %s 0x%lx\n",
- (unsigned long) (ltab.limit - ltab.base),
- ltab.limit->name,
- (unsigned long) ltab.limit->addr));
- ++ltab.limit;
- ++ltab.len;
}
/* Reserve space for function symbols and/or trim excess space. */
--
2.48.1