[v4] gprof: only process line numbers for intersection of vmas and histograms

Message ID CAMe9rOrn0USZpQxZtVwLLv04nTNi1XfELZTCp8c-b60gNcucfw@mail.gmail.com
State New
Headers
Series [v4] gprof: only process line numbers for intersection of vmas and histograms |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_binutils_check--master-arm success Test passed

Commit Message

H.J. Lu March 10, 2025, 2:23 p.m. UTC
  On Sun, Mar 9, 2025 at 9:08 PM Richard Allen <rsaxvc@gmail.com> wrote:
>
> > Here is the v3 patch.
>
> I'm afraid loading the symbols on-demand may add a
> dependency on the order of records in the gmon file, so that
> [call-graph record, histogram record] may parse differently from
> [histogram record, callgraph record].
>
> >> Also, I'm not sure if it is enough for core_create_line_syms() to check
> >> only the histograms for symbols. Do you think we'll also need to lookup
> >> symbols for basic-block and call-graph addresses?
> > How can we get these symbols?
>
> Idea1:
> 1) For bb_read_rec and cg_read_rec, split the loading apart from the
> processing, and perform only the loading first, similar to hist_read_rec.
> 2) lookup symbols for all three record types, according to the addresses
> from each record type parsed.
> 3) perform processing for all three record types.
>
> Idea2:
> Build an expandable cache of address->symbol mappings.
> On cache miss, perform BFD lookup and insert in cache.
>
> -Richard

The profile data is written out with

   /* write PC histogram: */
    write_hist (fd, load_address);

    /* write call-graph: */
    write_call_graph (fd, load_address);

    /* write basic-block execution counts: */
    write_bb_counts (fd);

The histogram records capture low PC and high PC of program execution.
It is OK to look up only the line numbers within low PC and high PC in histogram
records.

Here is the v4 patch without the gprof.c change.
  

Comments

Richard Allen March 11, 2025, 4:15 a.m. UTC | #1
>The profile data is written out with..

I was worried about other sources of gmon files, but
checked a few other systems, and all wrote histograms
first too, so it sounds fine to me.

I've noticed that some of my profiles have discrepancies
with the change to limit to histogram range parsing,
usually of no more than 2-3 bytes. I don't think this was
anything you've caused.
  
Richard Allen March 11, 2025, 5:32 a.m. UTC | #2
H.J.,

For your patchset V4, without loading the histograms
before loading the symbols, line-by-line mode gives
misleading results - maybe based on the line-number
of the function symbols.

-Richard
  
Richard Allen March 11, 2025, 5:53 a.m. UTC | #3
> I've noticed that some of my profiles have discrepancies
> with the change to limit to histogram range parsing,
> usually of no more than 2-3 bytes.

The discrepancy occurs in line-by-line mode, when a
low histogram bound occurs within the PC range for a line.

Ex: draw_border_generic (lv_draw_sw_rect.c:1260 @ 4209fc15)
vs  draw_border_generic (lv_draw_sw_rect.c:1260 @ 4209fc17)

When loading all address symbols, 4209fc15
points to the first instruction of the line.

When loading all histogram symbols,4209fc17 points to
the first instruction sampled by the histogram, which
happens 2B/1inst after the first instruction of the line.

This isn't a problem for me. If it's a problem for anyone,
we could reproduce the exact original printout by parsing the
address range of all functions that intersect the histograms.
  
H.J. Lu March 11, 2025, 1:56 p.m. UTC | #4
On Mon, Mar 10, 2025 at 10:53 PM Richard Allen <rsaxvc@gmail.com> wrote:
>
> > I've noticed that some of my profiles have discrepancies
> > with the change to limit to histogram range parsing,
> > usually of no more than 2-3 bytes.
>
> The discrepancy occurs in line-by-line mode, when a
> low histogram bound occurs within the PC range for a line.
>
> Ex: draw_border_generic (lv_draw_sw_rect.c:1260 @ 4209fc15)
> vs  draw_border_generic (lv_draw_sw_rect.c:1260 @ 4209fc17)
>
> When loading all address symbols, 4209fc15
> points to the first instruction of the line.
>
> When loading all histogram symbols,4209fc17 points to
> the first instruction sampled by the histogram, which
> happens 2B/1inst after the first instruction of the line.
>
> This isn't a problem for me. If it's a problem for anyone,
> we could reproduce the exact original printout by parsing the
> address range of all functions that intersect the histograms.

I sent the v5 patch to use histogram records only if they
come first.
  
Richard Allen March 12, 2025, 3:30 a.m. UTC | #5
Patch v5 looks good to me.

Thanks H.J.
  
H.J. Lu March 12, 2025, 11:49 p.m. UTC | #6
On Tue, Mar 11, 2025 at 8:30 PM Richard Allen <rsaxvc@gmail.com> wrote:
>
> Patch v5 looks good to me.
>
> Thanks H.J.

I am checking in the v5 patch:

https://sourceware.org/pipermail/binutils/2025-March/139952.html

next week.
  
Alan Modra March 14, 2025, 12:06 a.m. UTC | #7
On Wed, Mar 12, 2025 at 04:49:00PM -0700, H.J. Lu wrote:
> On Tue, Mar 11, 2025 at 8:30 PM Richard Allen <rsaxvc@gmail.com> wrote:
> >
> > Patch v5 looks good to me.
> >
> > Thanks H.J.
> 
> I am checking in the v5 patch:
> 
> https://sourceware.org/pipermail/binutils/2025-March/139952.html
> 
> next week.

That's fine by me too.
  
H.J. Lu March 14, 2025, 1:59 p.m. UTC | #8
On Thu, Mar 13, 2025 at 5:06 PM Alan Modra <amodra@gmail.com> wrote:
>
> On Wed, Mar 12, 2025 at 04:49:00PM -0700, H.J. Lu wrote:
> > On Tue, Mar 11, 2025 at 8:30 PM Richard Allen <rsaxvc@gmail.com> wrote:
> > >
> > > Patch v5 looks good to me.
> > >
> > > Thanks H.J.
> >
> > I am checking in the v5 patch:
> >
> > https://sourceware.org/pipermail/binutils/2025-March/139952.html
> >
> > next week.
>
> That's fine by me too.
>
> --
> Alan Modra

I am checking it in now.

Thanks.
  

Patch

From aba2b1fba1339793c5babd0a247ba71ac7b927a6 Mon Sep 17 00:00:00 2001
From: Richard Allen <rsaxvc@gmail.com>
Date: Sun, 16 Feb 2025 16:50:05 -0600
Subject: [PATCH v4] gprof: only process line numbers for intersection of vmas
 and histograms

Some programs like RTOS firmware may have a large number of symbols.
The profile information in the profile data file includes histogram
records, which capture low PC and high PC of program execution.  We
can look up only the line numbers within low PC and high PC in histogram
records, which reduces processing time for such a firmware from ~2
minutes to ~2 seconds.

Signed-off-by: Richard Allen <rsaxvc@gmail.com>
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
---
 gprof/corefile.c | 96 +++++++++++++++++++++++++-----------------------
 1 file changed, 51 insertions(+), 45 deletions(-)

diff --git a/gprof/corefile.c b/gprof/corefile.c
index bc26bd7883e..a8970b3200d 100644
--- a/gprof/corefile.c
+++ b/gprof/corefile.c
@@ -755,8 +755,9 @@  core_create_line_syms (void)
   Sym prev, *sym;
   const char *filename;
   Sym_Table ltab;
-  bfd_vma vma_high;
   size_t ltab_reserved;
+  bfd_vma bfd_vma_low = core_text_sect->vma;
+  bfd_vma bfd_vma_high = bfd_vma_low + bfd_section_size (core_text_sect);
 
   /* Create symbols for functions as usual.  This is necessary in
      cases where parts of a program were not compiled with -g.  For
@@ -786,53 +787,58 @@  core_create_line_syms (void)
      lot cleaner now.  */
   memset (&prev, 0, sizeof (prev));
 
-  vma_high = core_text_sect->vma + bfd_section_size (core_text_sect);
-  for (vma = core_text_sect->vma; vma < vma_high; vma += insn_boundary)
+  for (size_t i = 0; i < num_histograms; ++i)
     {
-      if (ltab.len >= ltab_reserved)
+      bfd_vma hist_vma_high = histograms[i].highpc;
+      bfd_vma vma_low = MAX (histograms[i].lowpc, bfd_vma_low);
+      bfd_vma vma_high = MIN (bfd_vma_high, hist_vma_high);
+      for (vma = vma_low; vma < vma_high; vma += insn_boundary)
 	{
-	  /* Reserve more space for line symbols.  */
-	  ltab_reserved *= 2;
-	  ltab.base = (Sym *) xrealloc (ltab.base, ltab_reserved * sizeof (Sym));
-	  ltab.limit = ltab.base + ltab.len;
+	  if (ltab.len >= ltab_reserved)
+	    {
+	      /* Reserve more space for line symbols.  */
+	      ltab_reserved *= 2;
+	      ltab.base = xrealloc (ltab.base, ltab_reserved * sizeof (Sym));
+	      ltab.limit = ltab.base + ltab.len;
+	    }
+	  sym_init (ltab.limit);
+
+	  if (!get_src_info (vma, &filename, &ltab.limit->name, &ltab.limit->line_num)
+	      || (prev.name && prev.line_num == ltab.limit->line_num
+		  && strcmp (prev.name, ltab.limit->name) == 0
+		  && filename_cmp (prev.file->name, filename) == 0))
+	    continue;
+
+	  /* Make name pointer a malloc'ed string.  */
+	  ltab.limit->name = xstrdup (ltab.limit->name);
+	  ltab.limit->file = source_file_lookup_path (filename);
+
+	  ltab.limit->addr = vma;
+
+	  /* Set is_static based on the enclosing function, using either:
+	     1) the previous symbol, if it's from the same function, or
+	     2) a symtab lookup.  */
+	  if (prev.name && ltab.limit->file == prev.file
+	      && strcmp (ltab.limit->name, prev.name) == 0)
+	    {
+	      ltab.limit->is_static = prev.is_static;
+	    }
+	  else
+	    {
+	      sym = sym_lookup(&symtab, ltab.limit->addr);
+	      if (sym)
+		ltab.limit->is_static = sym->is_static;
+	    }
+
+	  prev = *ltab.limit;
+
+	  DBG (AOUTDEBUG, printf ("[core_create_line_syms] %lu %s 0x%lx\n",
+				  (unsigned long) (ltab.limit - ltab.base),
+				  ltab.limit->name,
+				  (unsigned long) ltab.limit->addr));
+	  ++ltab.limit;
+	  ++ltab.len;
 	}
-      sym_init (ltab.limit);
-
-      if (!get_src_info (vma, &filename, &ltab.limit->name, &ltab.limit->line_num)
-	  || (prev.name && prev.line_num == ltab.limit->line_num
-	      && strcmp (prev.name, ltab.limit->name) == 0
-	      && filename_cmp (prev.file->name, filename) == 0))
-	continue;
-
-      /* Make name pointer a malloc'ed string.  */
-      ltab.limit->name = xstrdup (ltab.limit->name);
-      ltab.limit->file = source_file_lookup_path (filename);
-
-      ltab.limit->addr = vma;
-
-      /* Set is_static based on the enclosing function, using either:
-	 1) the previous symbol, if it's from the same function, or
-	 2) a symtab lookup.  */
-      if (ltab.limit->file == prev.file
-	  && strcmp (ltab.limit->name, prev.name) == 0)
-	{
-	  ltab.limit->is_static = prev.is_static;
-	}
-      else
-	{
-	  sym = sym_lookup(&symtab, ltab.limit->addr);
-          if (sym)
-	    ltab.limit->is_static = sym->is_static;
-	}
-
-      prev = *ltab.limit;
-
-      DBG (AOUTDEBUG, printf ("[core_create_line_syms] %lu %s 0x%lx\n",
-			      (unsigned long) (ltab.limit - ltab.base),
-			      ltab.limit->name,
-			      (unsigned long) ltab.limit->addr));
-      ++ltab.limit;
-      ++ltab.len;
     }
 
   /* Reserve space for function symbols and/or trim excess space.  */
-- 
2.48.1