[v2] Bug 17394: we cannot put a break-point at a global function for ASM file

Message ID 542459B9.9090705@redhat.com
State New, archived
Headers

Commit Message

Keith Seitz Sept. 25, 2014, 6:06 p.m. UTC
  Hi,

On 09/17/2014 04:53 AM, Mihail-Marian Nistor wrote:
> We need to cover the following test case: the user wants to do an action
> only for the function that was defined into a selected file name.
> An example: the user wants to put a breakpoint only for functions "func"
> that was defined in the file name "file.s"
>      e.i. of gdb command line: b file.s:func
> Due to the limitation that the GAS doesn't generate debug info for
> functions/symbols, we cannot find the function information if we look only
> in file symbtabs that was collected by using the file name specified by
> the user.
> We need to look into a global default symtab if we want to find minimal
> information about functions that were written in the ASM file.
> And after that, we need to select only functions that were defined into
> the file name specified by the user.

Thanks for the detailed explanation (and *especially* the test case) -- 
those have helped me immensely to understand the problem.

I spent some time looking into this patch/problem, and I am not quite 
sure the proposed solution is the right way to attack this.

My current feeling on this is: the behavior of the linespec parser (and 
coordinating routines) should not behave necessarily any different 
between the two use cases "break foo.c:func" and "break foo.s:func". 
Yet, the former works and the later does not.

So I did some digging. Maybe you have already discounted the approach I 
am going to suggest -- if so, I would be very interested in hearing 
about it.

When a user specifies a C/C++/Ada/Java/Asm/whatever filename, the 
linespec parser starts by building a list of file symtabs for the 
specified files.

All subsequent symbol searches are limited to results in those 
files/symtabs.

The problem here is that, as you point out, gas does not emit any symbol 
information for the .s file. Thus, we have a symtab for the file ("info 
sources" shows the file), but it contains no symbols.

When find_linespec_symbols is called in linespec_parse_basic, it calls 
find_function_symbols, which uses add_matching_symbols_to_info to 
collect all matching symbols.

That function does [pardon any mangled formatting]:

   for (ix = 0; VEC_iterate (symtab_ptr, info->file_symtabs, ix, elt); ++ix)
     {
       if (elt == NULL)
	{
	  iterate_over_all_matching_symtabs (info->state, name, VAR_DOMAIN,
					     collect_symbols, info,
					     pspace, 1);
	  search_minsyms_for_name (info, name, pspace);
	}
       else if (pspace == NULL || pspace == SYMTAB_PSPACE (elt))
	{
	  /* Program spaces that are executing startup should have
	     been filtered out earlier.  */
	  gdb_assert (!SYMTAB_PSPACE (elt)->executing_startup);
	  set_current_program_space (SYMTAB_PSPACE (elt));
	  iterate_over_file_blocks (elt, name, VAR_DOMAIN,
				    collect_symbols, info);
	}
     }

This iterates over the symtabs. In the failing use case, ELT is non-NULL 
(points to the symtab for the .s file), so it calls 
iterate_over_file_blocks. Herein is where the problem exists: it is 
assumed that if NAME exists, it must exist in the given symtab -- a 
reasonable assumption for "normal" (non-asm) cases. It never searches 
minimal symbols (or in the global default symtab).

This is where I think where the fix should start. While attempting to 
convince myself that approach is both appropriate and "correct," I've 
actually written a version of your patch which does this.

It is important to note that iterating over minsyms is fairly expensive, 
so in my patch, I've opted to only search minimal symbols for NAME if 
the symtab's language is language_asm and iterate_over_file_blocks 
returns no symbols. That should, hopefully, mitigate any performance 
impact this might have.

This is especially exasperated by the need to map the address of the 
minimal symbol back to its symtab. You'll see this (expensive) added 
complexity in add_minsym.

When all is said and done, though, when find_linespec_symbols returns, 
it will have collated the appropriate symbol from the .s file -- exactly 
the same way it would have if one had typed "break file.c:func".

What do you think about this? Does this fail on any use cases you have?

As for the test case, I would very much like to see this important 
functionality tested on every platform. I haven't tried it yet myself, 
but I see that some other tests in our test suite use some minimal 
assembler program which (presumably) runs on nearly every configuration. 
See, for example, gdb.dwarf2/dw2-anonymous-func.exp.
Your test example is simple enough that it should be fairly trivial to 
fixup.

In any case, I would move the test from gdb.arch to gdb.linespec, 
collecting it together with its linespec-specific test brethren.

Keith
  

Comments

Mihail-Marian Nistor Sept. 28, 2014, 3:46 p.m. UTC | #1
Hi Keith,

Thank you for the time you spent on this patch/problem. I have rewritten the test case to be run on the targets that support dwarf2_support. I have also changed the location of test case from
gdb.arch to gdb.linespec as you recommended.

Your changes look very well, I have one little observation regarding the filtering condition when we need to get information from minimal symbol.
Scenario: We can mix objects that were compiled/assembled by different vendors and the third party assembler provides debug information and it doesn't have this limitation. An application may contain more than one file that have the same name. We cannot put a breakpoint at a function in all objects that have the same name if an object that comes from ASM file has debug info and is linked before the objects assembled with the GNU-GAS. In the new test case I have tried to emulate the third party assembler behavior in the break-asm-file1.s.

	{
+ 	  int prev_len = VEC_length (symbolp, info->result.symbols);
	  /* Program spaces that are executing startup should have
	     been filtered out earlier.  */
	  gdb_assert (!SYMTAB_PSPACE (elt)->executing_startup);
	  set_current_program_space (SYMTAB_PSPACE (elt));
	  iterate_over_file_blocks (elt, name, VAR_DOMAIN,
				    collect_symbols, info);
-	  /* If no symbols were found and this symtab is in
+	  /* If no new symbols were found in this iteration and this symtab is in
	     assembler, we might actually be looking for a
	     label for which we don't have debug info.  Check
	     for a minimal symbol in this case.  */
-       if (VEC_empty (symbolp, info->result.symbols)
+	  if ((prev_len == VEC_length (symbolp, info->result.symbols))
	      && elt->language == language_asm)
	    search_minsyms_for_name (info, name, pspace, elt);

	}

Best regards,
Mihai
-----Original Message-----
From: Keith Seitz [mailto:keiths@redhat.com] 
Sent: Thursday, September 25, 2014 9:07 PM
To: Nistor Mihail-MNISTOR1; gdb-patches@sourceware.org
Subject: Re: [PATCH v2] Bug 17394: we cannot put a break-point at a global function for ASM file

Hi,

On 09/17/2014 04:53 AM, Mihail-Marian Nistor wrote:
> We need to cover the following test case: the user wants to do an 
> action only for the function that was defined into a selected file name.
> An example: the user wants to put a breakpoint only for functions "func"
> that was defined in the file name "file.s"
>      e.i. of gdb command line: b file.s:func Due to the limitation 
> that the GAS doesn't generate debug info for functions/symbols, we 
> cannot find the function information if we look only in file symbtabs 
> that was collected by using the file name specified by the user.
> We need to look into a global default symtab if we want to find 
> minimal information about functions that were written in the ASM file.
> And after that, we need to select only functions that were defined 
> into the file name specified by the user.

Thanks for the detailed explanation (and *especially* the test case) -- those have helped me immensely to understand the problem.

I spent some time looking into this patch/problem, and I am not quite sure the proposed solution is the right way to attack this.

My current feeling on this is: the behavior of the linespec parser (and coordinating routines) should not behave necessarily any different between the two use cases "break foo.c:func" and "break foo.s:func". 
Yet, the former works and the later does not.

So I did some digging. Maybe you have already discounted the approach I am going to suggest -- if so, I would be very interested in hearing about it.

When a user specifies a C/C++/Ada/Java/Asm/whatever filename, the linespec parser starts by building a list of file symtabs for the specified files.

All subsequent symbol searches are limited to results in those files/symtabs.

The problem here is that, as you point out, gas does not emit any symbol information for the .s file. Thus, we have a symtab for the file ("info sources" shows the file), but it contains no symbols.

When find_linespec_symbols is called in linespec_parse_basic, it calls find_function_symbols, which uses add_matching_symbols_to_info to collect all matching symbols.

That function does [pardon any mangled formatting]:

   for (ix = 0; VEC_iterate (symtab_ptr, info->file_symtabs, ix, elt); ++ix)
     {
       if (elt == NULL)
	{
	  iterate_over_all_matching_symtabs (info->state, name, VAR_DOMAIN,
					     collect_symbols, info,
					     pspace, 1);
	  search_minsyms_for_name (info, name, pspace);
	}
       else if (pspace == NULL || pspace == SYMTAB_PSPACE (elt))
	{
	  /* Program spaces that are executing startup should have
	     been filtered out earlier.  */
	  gdb_assert (!SYMTAB_PSPACE (elt)->executing_startup);
	  set_current_program_space (SYMTAB_PSPACE (elt));
	  iterate_over_file_blocks (elt, name, VAR_DOMAIN,
				    collect_symbols, info);
	}
     }

This iterates over the symtabs. In the failing use case, ELT is non-NULL (points to the symtab for the .s file), so it calls iterate_over_file_blocks. Herein is where the problem exists: it is assumed that if NAME exists, it must exist in the given symtab -- a reasonable assumption for "normal" (non-asm) cases. It never searches minimal symbols (or in the global default symtab).

This is where I think where the fix should start. While attempting to convince myself that approach is both appropriate and "correct," I've actually written a version of your patch which does this.

It is important to note that iterating over minsyms is fairly expensive, so in my patch, I've opted to only search minimal symbols for NAME if the symtab's language is language_asm and iterate_over_file_blocks returns no symbols. That should, hopefully, mitigate any performance impact this might have.

This is especially exasperated by the need to map the address of the minimal symbol back to its symtab. You'll see this (expensive) added complexity in add_minsym.

When all is said and done, though, when find_linespec_symbols returns, it will have collated the appropriate symbol from the .s file -- exactly the same way it would have if one had typed "break file.c:func".

What do you think about this? Does this fail on any use cases you have?

As for the test case, I would very much like to see this important functionality tested on every platform. I haven't tried it yet myself, but I see that some other tests in our test suite use some minimal assembler program which (presumably) runs on nearly every configuration. 
See, for example, gdb.dwarf2/dw2-anonymous-func.exp.
Your test example is simple enough that it should be fairly trivial to fixup.

In any case, I would move the test from gdb.arch to gdb.linespec, collecting it together with its linespec-specific test brethren.

Keith
  
Mihail-Marian Nistor Sept. 28, 2014, 4:08 p.m. UTC | #2
Hi Keith,

I forgot to move the test location from gdb.arch to gdb.linespec.
You will find enclosed the patch with the updated test location.

Best regards,
Mihai

-----Original Message-----
From: Nistor Mihail-MNISTOR1 
Sent: Sunday, September 28, 2014 6:47 PM
To: Keith Seitz; gdb-patches@sourceware.org
Cc: Nistor Mihail-MNISTOR1
Subject: RE: [PATCH v2] Bug 17394: we cannot put a break-point at a global function for ASM file

Hi Keith,

Thank you for the time you spent on this patch/problem. I have rewritten the test case to be run on the targets that support dwarf2_support. I have also changed the location of test case from gdb.arch to gdb.linespec as you recommended.

Your changes look very well, I have one little observation regarding the filtering condition when we need to get information from minimal symbol.
Scenario: We can mix objects that were compiled/assembled by different vendors and the third party assembler provides debug information and it doesn't have this limitation. An application may contain more than one file that have the same name. We cannot put a breakpoint at a function in all objects that have the same name if an object that comes from ASM file has debug info and is linked before the objects assembled with the GNU-GAS. In the new test case I have tried to emulate the third party assembler behavior in the break-asm-file1.s.

	{
+ 	  int prev_len = VEC_length (symbolp, info->result.symbols);
	  /* Program spaces that are executing startup should have
	     been filtered out earlier.  */
	  gdb_assert (!SYMTAB_PSPACE (elt)->executing_startup);
	  set_current_program_space (SYMTAB_PSPACE (elt));
	  iterate_over_file_blocks (elt, name, VAR_DOMAIN,
				    collect_symbols, info);
-	  /* If no symbols were found and this symtab is in
+	  /* If no new symbols were found in this iteration and this symtab is 
+in
	     assembler, we might actually be looking for a
	     label for which we don't have debug info.  Check
	     for a minimal symbol in this case.  */
-       if (VEC_empty (symbolp, info->result.symbols)
+	  if ((prev_len == VEC_length (symbolp, info->result.symbols))
	      && elt->language == language_asm)
	    search_minsyms_for_name (info, name, pspace, elt);

	}

Best regards,
Mihai
-----Original Message-----
From: Keith Seitz [mailto:keiths@redhat.com]
Sent: Thursday, September 25, 2014 9:07 PM
To: Nistor Mihail-MNISTOR1; gdb-patches@sourceware.org
Subject: Re: [PATCH v2] Bug 17394: we cannot put a break-point at a global function for ASM file

Hi,

On 09/17/2014 04:53 AM, Mihail-Marian Nistor wrote:
> We need to cover the following test case: the user wants to do an 
> action only for the function that was defined into a selected file name.
> An example: the user wants to put a breakpoint only for functions "func"
> that was defined in the file name "file.s"
>      e.i. of gdb command line: b file.s:func Due to the limitation 
> that the GAS doesn't generate debug info for functions/symbols, we 
> cannot find the function information if we look only in file symbtabs 
> that was collected by using the file name specified by the user.
> We need to look into a global default symtab if we want to find 
> minimal information about functions that were written in the ASM file.
> And after that, we need to select only functions that were defined 
> into the file name specified by the user.

Thanks for the detailed explanation (and *especially* the test case) -- those have helped me immensely to understand the problem.

I spent some time looking into this patch/problem, and I am not quite sure the proposed solution is the right way to attack this.

My current feeling on this is: the behavior of the linespec parser (and coordinating routines) should not behave necessarily any different between the two use cases "break foo.c:func" and "break foo.s:func". 
Yet, the former works and the later does not.

So I did some digging. Maybe you have already discounted the approach I am going to suggest -- if so, I would be very interested in hearing about it.

When a user specifies a C/C++/Ada/Java/Asm/whatever filename, the linespec parser starts by building a list of file symtabs for the specified files.

All subsequent symbol searches are limited to results in those files/symtabs.

The problem here is that, as you point out, gas does not emit any symbol information for the .s file. Thus, we have a symtab for the file ("info sources" shows the file), but it contains no symbols.

When find_linespec_symbols is called in linespec_parse_basic, it calls find_function_symbols, which uses add_matching_symbols_to_info to collect all matching symbols.

That function does [pardon any mangled formatting]:

   for (ix = 0; VEC_iterate (symtab_ptr, info->file_symtabs, ix, elt); ++ix)
     {
       if (elt == NULL)
	{
	  iterate_over_all_matching_symtabs (info->state, name, VAR_DOMAIN,
					     collect_symbols, info,
					     pspace, 1);
	  search_minsyms_for_name (info, name, pspace);
	}
       else if (pspace == NULL || pspace == SYMTAB_PSPACE (elt))
	{
	  /* Program spaces that are executing startup should have
	     been filtered out earlier.  */
	  gdb_assert (!SYMTAB_PSPACE (elt)->executing_startup);
	  set_current_program_space (SYMTAB_PSPACE (elt));
	  iterate_over_file_blocks (elt, name, VAR_DOMAIN,
				    collect_symbols, info);
	}
     }

This iterates over the symtabs. In the failing use case, ELT is non-NULL (points to the symtab for the .s file), so it calls iterate_over_file_blocks. Herein is where the problem exists: it is assumed that if NAME exists, it must exist in the given symtab -- a reasonable assumption for "normal" (non-asm) cases. It never searches minimal symbols (or in the global default symtab).

This is where I think where the fix should start. While attempting to convince myself that approach is both appropriate and "correct," I've actually written a version of your patch which does this.

It is important to note that iterating over minsyms is fairly expensive, so in my patch, I've opted to only search minimal symbols for NAME if the symtab's language is language_asm and iterate_over_file_blocks returns no symbols. That should, hopefully, mitigate any performance impact this might have.

This is especially exasperated by the need to map the address of the minimal symbol back to its symtab. You'll see this (expensive) added complexity in add_minsym.

When all is said and done, though, when find_linespec_symbols returns, it will have collated the appropriate symbol from the .s file -- exactly the same way it would have if one had typed "break file.c:func".

What do you think about this? Does this fail on any use cases you have?

As for the test case, I would very much like to see this important functionality tested on every platform. I haven't tried it yet myself, but I see that some other tests in our test suite use some minimal assembler program which (presumably) runs on nearly every configuration. 
See, for example, gdb.dwarf2/dw2-anonymous-func.exp.
Your test example is simple enough that it should be fairly trivial to fixup.

In any case, I would move the test from gdb.arch to gdb.linespec, collecting it together with its linespec-specific test brethren.

Keith
  

Patch

---
 gdb/linespec.c | 95 +++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 71 insertions(+), 24 deletions(-)

diff --git a/gdb/linespec.c b/gdb/linespec.c
index 8a2c8e3..702ffb3 100644
--- a/gdb/linespec.c
+++ b/gdb/linespec.c
@@ -3446,6 +3446,9 @@  struct collect_minsyms
   /* The objfile we're examining.  */
   struct objfile *objfile;
 
+  /* Only search the given symtab, or NULL to search for all symbols.  */
+  struct symtab *symtab;
+
   /* The funfirstline setting from the initial call.  */
   int funfirstline;
 
@@ -3505,6 +3508,24 @@  add_minsym (struct minimal_symbol *minsym, void *d)
   mo.minsym = minsym;
   mo.objfile = info->objfile;
 
+  if (info->symtab != NULL)
+    {
+      CORE_ADDR pc;
+      struct symtab_and_line sal;
+      struct gdbarch *gdbarch = get_objfile_arch (info->objfile);
+
+      sal = find_pc_sect_line (MSYMBOL_VALUE_ADDRESS (info->objfile, minsym),
+			       NULL, 0);
+      sal.section = MSYMBOL_OBJ_SECTION (info->objfile, minsym);
+      pc
+	= gdbarch_convert_from_func_ptr_addr (gdbarch, sal.pc, &current_target);
+      if (pc != sal.pc)
+	sal = find_pc_sect_line (pc, NULL, 0);
+
+      if (info->symtab != sal.symtab)
+	return;
+    }
+
   /* Exclude data symbols when looking for breakpoint locations.   */
   if (!info->list_mode)
     switch (minsym->type)
@@ -3531,40 +3552,59 @@  add_minsym (struct minimal_symbol *minsym, void *d)
   VEC_safe_push (bound_minimal_symbol_d, info->msyms, &mo);
 }
 
-/* Search minimal symbols in all objfiles for NAME.  If SEARCH_PSPACE
+/* Search for minimal symbols called NAME.  If SEARCH_PSPACE
    is not NULL, the search is restricted to just that program
-   space.  */
+   space.
+
+   If SYMTAB is NULL, search all objfiles, otherwise
+   restrict results to the given SYMTAB.  */
 
 static void
 search_minsyms_for_name (struct collect_info *info, const char *name,
-			 struct program_space *search_pspace)
+			 struct program_space *search_pspace,
+			 struct symtab *symtab)
 {
-  struct objfile *objfile;
-  struct program_space *pspace;
+  struct collect_minsyms local;
+  struct cleanup *cleanup;
 
-  ALL_PSPACES (pspace)
-  {
-    struct collect_minsyms local;
-    struct cleanup *cleanup;
+  memset (&local, 0, sizeof (local));
+  local.funfirstline = info->state->funfirstline;
+  local.list_mode = info->state->list_mode;
+  local.symtab = symtab;
 
-    if (search_pspace != NULL && search_pspace != pspace)
-      continue;
-    if (pspace->executing_startup)
-      continue;
+  cleanup = make_cleanup (VEC_cleanup (bound_minimal_symbol_d), &local.msyms);
 
-    set_current_program_space (pspace);
+  if (symtab == NULL)
+    {
+      struct program_space *pspace;
+
+      ALL_PSPACES (pspace)
+      {
+	struct objfile *objfile;
 
-    memset (&local, 0, sizeof (local));
-    local.funfirstline = info->state->funfirstline;
-    local.list_mode = info->state->list_mode;
+	if (search_pspace != NULL && search_pspace != pspace)
+	  continue;
+	if (pspace->executing_startup)
+	  continue;
 
-    cleanup = make_cleanup (VEC_cleanup (bound_minimal_symbol_d),
-			    &local.msyms);
+	set_current_program_space (pspace);
 
-    ALL_OBJFILES (objfile)
+	ALL_OBJFILES (objfile)
+	{
+	  local.objfile = objfile;
+	  iterate_over_minimal_symbols (objfile, name, add_minsym, &local);
+	}
+      }
+    }
+  else
     {
-      local.objfile = objfile;
-      iterate_over_minimal_symbols (objfile, name, add_minsym, &local);
+      if (search_pspace == NULL || SYMTAB_PSPACE (symtab) == search_pspace)
+	{
+	  set_current_program_space (SYMTAB_PSPACE (symtab));
+	  local.objfile = symtab->objfile;
+	  iterate_over_minimal_symbols (local.objfile, name, add_minsym,
+					&local);
+	}
     }
 
     if (!VEC_empty (bound_minimal_symbol_d, local.msyms))
@@ -3597,7 +3637,6 @@  search_minsyms_for_name (struct collect_info *info, const char *name,
       }
 
     do_cleanups (cleanup);
-  }
 }
 
 /* A helper function to add all symbols matching NAME to INFO.  If
@@ -3619,7 +3658,7 @@  add_matching_symbols_to_info (const char *name,
 	  iterate_over_all_matching_symtabs (info->state, name, VAR_DOMAIN,
 					     collect_symbols, info,
 					     pspace, 1);
-	  search_minsyms_for_name (info, name, pspace);
+	  search_minsyms_for_name (info, name, pspace, NULL);
 	}
       else if (pspace == NULL || pspace == SYMTAB_PSPACE (elt))
 	{
@@ -3629,6 +3668,14 @@  add_matching_symbols_to_info (const char *name,
 	  set_current_program_space (SYMTAB_PSPACE (elt));
 	  iterate_over_file_blocks (elt, name, VAR_DOMAIN,
 				    collect_symbols, info);
+
+	  /* If no symbols were found and this symtab is in
+	     assembler, we might actually be looking for a
+	     label for which we don't have debug info.  Check
+	     for a minimal symbol in this case.  */
+	  if (VEC_empty (symbolp, info->result.symbols)
+	      && elt->language == language_asm)
+	    search_minsyms_for_name (info, name, pspace, elt);
 	}
     }
 }
-- 
1.9.3