Fix an issue with the gdb step-over aka. "n" command

Message ID AM0PR08MB3714EAB5F7DE39836DA77AB7E4520@AM0PR08MB3714.eurprd08.prod.outlook.com
State New, archived
Headers

Commit Message

Bernd Edlinger Dec. 19, 2019, 10:53 p.m. UTC
  On 12/15/19 9:39 AM, Bernd Edlinger wrote:
> On 12/15/19 2:25 AM, Simon Marchi wrote:
>> On 2019-11-24 7:17 a.m., Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this fixes an issue with the gdb step-over aka. "n" command.
>>>
>>> Apologies, the motivation for this patch was from sub-optimal
>>> debug experience using some gcc code involving inlined functions,
>>> and initially I tried to convince gcc folks that it is in fact a
>>> gcc bug, but...
>>>
>>> It can be seen when you debug an optimized stage-3 cc1
>>> it does not affect -O0 code, though.
>>>
>>> Note: you can use "gcc -S hello.c -wrapper gdb,--args" to invoke cc1 with
>>> debugger attached.
>>>
>>> This example debug session will explain the effect.
>>>
>>> (gdb) b get_alias_set
>>> Breakpoint 5 at 0xa099f0: file ../../gcc-trunk/gcc/alias.c, line 837.
>>> (gdb) r
>>> Breakpoint 5, get_alias_set (t=t@entry=0x7ffff7ff7ab0) at ../../gcc-trunk/gcc/alias.c:837
>>> 837	  if (t == error_mark_node
>>> (gdb) n
>>> 839		  && (TREE_TYPE (t) == 0 || TREE_TYPE (t) == error_mark_node)))
>>> (gdb) n
>>> 3382	  return __t;  <-- now we have a problem: wrong line info here
>>> (gdb) bt
>>> #0  get_alias_set (t=t@entry=0x7ffff7ff7ab0) at ../../gcc-trunk/gcc/tree.h:3382
>>> #1  0x0000000000b25dfe in set_mem_attributes_minus_bitpos (ref=0x7ffff746f990, t=0x7ffff7ff7ab0, objectp=1, bitpos=...)
>>>     at ../../gcc-trunk/gcc/emit-rtl.c:1957
>>> #2  0x0000000001137a55 in make_decl_rtl (decl=0x7ffff7ff7ab0) at ../../gcc-trunk/gcc/varasm.c:1518
>>> #3  0x000000000113b6e8 in assemble_variable (decl=0x7ffff7ff7ab0, top_level=<optimized out>, at_end=<optimized out>, 
>>>     dont_output_data=0) at ../../gcc-trunk/gcc/varasm.c:2246
>>> #4  0x000000000113f0ea in varpool_node::assemble_decl (this=0x7ffff745b000) at ../../gcc-trunk/gcc/varpool.c:584
>>> #5  0x000000000113fa17 in varpool_node::assemble_decl (this=0x7ffff745b000) at ../../gcc-trunk/gcc/varpool.c:750
>>
>> I have a hard time understanding what is going wrong and what we should see
>> instead.  I think it would help if you showed what's $pc at every place where
>> you are stopping, as well as the output for readelf --debug-dump=decodedline
>> for those regions.  It would also help if you provided the same session without
>> and with your patch applied, so we could see the difference.
>>
> 
> Hi Simon,
> 
> the issue is there is a line-info from the inline function right at the end of the
> inline superblock without any code just variable tacking info there.
> 
> Maybe it helps to get the background if you look at my attempt of fixing this as
> a gcc bug:
> 
> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01459.html
> 
> and Alexandre Oliva's response here:
> 
> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01771.html
> 
> 
> So this is a problem in the design of the dwarf line info which
> has stmt-type line info at the end of a inlined subroutine.
> 
> Those fall outside the inline block, therefore the step over stops
> at the line where the inline function ends, and for gdb it just appears
> as if this line was in the calling subroutine, which happens since
> the line info does not have any connection to the inlined subroutine
> range info.
> 
> Contents of the .debug_info section:
> 
>  <2><4f686>: Abbrev Number: 12 (DW_TAG_inlined_subroutine)
>     <4f687>   DW_AT_abstract_origin: <0x53d4e>
>     <4f68b>   DW_AT_entry_pc    : 0x7280
>     <4f693>   DW_AT_GNU_entry_view: 1
>     <4f695>   DW_AT_ranges      : 0xb480
>     <4f699>   DW_AT_call_file   : 8  <- alias.c
>     <4f69a>   DW_AT_call_line   : 839
>     <4f69c>   DW_AT_call_column : 8
>     <4f69d>   DW_AT_sibling     : <0x4f717>
> 
>  The File Name Table (offset 0x253):
>   8     2       0       0       alias.c
>   10    2       0       0       tree.h
> 
> Contents of the .debug_ranges section:
> 
>     0000b480 0000000000007280 0000000000007291 
>     0000b480 0000000000002764 000000000000277e 
>     0000b480 <End of list>
> 
> The problem is at pc=0x7291 in the Line Number Section:
> 
>  Line Number Statements:
> 
>   [0x00008826]  Special opcode 61: advance Address by 4 to 0x7284 and Line by 0 to 3380
>   [0x00008827]  Set is_stmt to 1
>   [0x00008828]  Special opcode 189: advance Address by 13 to 0x7291 and Line by 2 to 3382 (*)
>   [0x00008829]  Set is_stmt to 0 (**)
>   [0x0000882a]  Copy (view 1)
>   [0x0000882b]  Set File Name to entry 8 in the File Name Table <- back to alias.c
>   [0x0000882d]  Set column to 8
>   [0x0000882f]  Advance Line by -2543 to 839
>   [0x00008832]  Copy (view 2)
>   [0x00008833]  Set column to 27
>   [0x00008835]  Special opcode 61: advance Address by 4 to 0x7295 and Line by 0 to 839
>   [0x00008836]  Set column to 3
>   [0x00008838]  Set is_stmt to 1 <-- next line info counts: alias.c:847
>   [0x00008839]  Special opcode 153: advance Address by 10 to 0x729f and Line by 8 to 847
>   [0x0000883a]  Set column to 7
> 
> (*) this line is tree.h:3382, but the program counter is *not* within the subroutine,
> but exactly at the first instruction *after* the subroutine according to the debug_ranges.
> 
> What makes it worse, is that (**) makes gdb ignore the new location info alias.c:839,
> which means, normally the n command would have continued to pc=0x729f, which is at alias.c:847.
> 
> 
> What this patch does, is a heuristic, that means when the last satement line number (*)
> contains no code, and is followed by a non-statment line number in another file, then
> pretend the non-statement (**) was actually a stmt-type line number.  By adding the
> end of sequence marker here this code in buildsym.c cancels the last line number in the
> inline file:
> 
>   /* Normally, we treat lines as unsorted.  But the end of sequence
>      marker is special.  We sort line markers at the same PC by line
>      number, so end of sequence markers (which have line == 0) appear
>      first.  This is right if the marker ends the previous function,
>      and there is no padding before the next function.  But it is
>      wrong if the previous line was empty and we are now marking a
>      switch to a different subfile.  We must leave the end of sequence
>      marker at the end of this group of lines, not sort the empty line
>      to after the marker.  The easiest way to accomplish this is to
>      delete any empty lines from our table, if they are followed by
>      end of sequence markers.  All we lose is the ability to set
>      breakpoints at some lines which contain no instructions
>      anyway.  */
>   if (line == 0 && subfile->line_vector->nitems > 0)
>     {
>       e = subfile->line_vector->item + subfile->line_vector->nitems - 1;
>       while (subfile->line_vector->nitems > 0 && e->pc == pc)
>         {
>           e--;
>           subfile->line_vector->nitems--;
>         }
>     }
> 
> 
> (That's where I discovered the line number 65535 issue BTW)
> 
> 
> Bernd.
> 

Does this explanation make sense?

Maybe it helps, but I have been able to come up with a test case.
It uses C++ and needs to be compiled with -Og or higher, with recent
gcc versions, the next command steps multiple times into the header
file, where the inline function is, that we ought to step over.
Since I already know that in optimized build stepping may or may
not stop in the opening brace, dependent on the gcc version,
I merged the { with the first statement, to make the test case
more stable although normally that would be avoided, but having any
test case at all might be worth it.

Would that be okay as a test case, it appears to work with
gcc 4.8 (where the test case works also unpatched gdb)
and gcc-Version 10.0.0 20191123, where passed only with patched gdb)


rom a9bf22888b4655d104c7956894a96a542bcbdd57 Mon Sep 17 00:00:00 2001
From: Bernd Edlinger <bernd.edlinger@hotmail.de>
Date: Thu, 19 Dec 2019 23:41:37 +0100
Subject: [PATCH] Add a test case for step over inline functions

gdb/testsuite:
2019-12-19  Bernd Edlinger  <bernd.edlinger@hotmail.de>

        * gdb.cp/next-inline.exp: New file.
        * gdb.cp/next-inline.cc: New file.
        * gdb.cp/next-inline.h: New file.
---
 gdb/testsuite/gdb.cp/next-inline.cc  | 34 +++++++++++++++++++++++++++++++
 gdb/testsuite/gdb.cp/next-inline.exp | 39 ++++++++++++++++++++++++++++++++++++
 gdb/testsuite/gdb.cp/next-inline.h   | 34 +++++++++++++++++++++++++++++++
 3 files changed, 107 insertions(+)
 create mode 100644 gdb/testsuite/gdb.cp/next-inline.cc
 create mode 100644 gdb/testsuite/gdb.cp/next-inline.exp
 create mode 100644 gdb/testsuite/gdb.cp/next-inline.h
  

Comments

Simon Marchi Dec. 20, 2019, 6:13 a.m. UTC | #1
Hi Bernd,

On 2019-12-19 5:53 p.m., Bernd Edlinger wrote:
> Does this explanation make sense?

Yes.  Well I think so, I have to admit this is a bit over my head,
there are a lot of pieces to put together to have a good understanding
of the problem.  I just did a first read, I'll sleep on it and come
back to it later.

Thanks for the small reproducer, this is extremely valuable.  I think it
will be a good idea to integrate it as a test case in the test suite.

In your patch to dwarf2read.c, I was a bit surprised to see:

  m_last_subfile != m_cu->get_builder ()->get_current_subfile ()

So your fix only works if the inlined subroutine comes from another file?  If
I move the tree_check function in next-inline.cc, the fix no longer applies,
and we get the broken behavior.  From your previous email, I understand that
this is expected.  I guess that if both are in the same file, we can't detect
this situation using the same technique.

I also read about location views, since that's what Alexandre referred to.  It
sounds like it's a magic solution that will allow GDB to do the right thing in
this kind of situation.  If that's indeed the case, then it might be good to start
exploring this path.  I'd like to have a better understanding of how this will help
GDB make a smarter "next", and what kind of effort is needed to make GDB use it.  My
understanding is that location views allow having an address mapped to multiple
source locations.  For example, here's the problematic address in the next-inline
test case I've compiled:

  ./next-inline.h:[++]
  next-inline.h                                 28              0x1175               x
  next-inline.h                                 30              0x1175       1       x

  ./next-inline.cc:[++]
  next-inline.cc                                22              0x1175       2

Today, when I ask GDB "which source line does this address correspond to", it gives me
one answer.  Does this mean that GDB will now say that 0x1175 corresponds to

- next-inline.h:28
- next-inline.h:30
- next-inline.cc:22

all at the same time?  Is one of these source locations more important than the others?
If execution happens to stop exactly at this address, which location do we present to
the user?

And to come back the problem at hand, how does this help GDB make a smarter "next"?

Btw, I stumbled on a bug with the TUI source display.  It might be caused by this patch,
or it might be that this patch uncovers it.

When I do these actions:

- Start GDB with the next-inline test file (from this patch)
- Enable the TUI
- Type "start"
- Type "s"
- Type "n" twice

The TUI source display wrongfully jumps to the header file, line 24.
When I type "frame", it says I'm stopped at next-line.cc:24.  So it
is showing the right line number of the wrong file.

Thanks,

Simon
  
Bernd Edlinger Dec. 20, 2019, 7:57 p.m. UTC | #2
On 12/20/19 7:13 AM, Simon Marchi wrote:
> Hi Bernd,
> 
> On 2019-12-19 5:53 p.m., Bernd Edlinger wrote:
>> Does this explanation make sense?
> 
> Yes.  Well I think so, I have to admit this is a bit over my head,
> there are a lot of pieces to put together to have a good understanding
> of the problem.  I just did a first read, I'll sleep on it and come
> back to it later.
> 
> Thanks for the small reproducer, this is extremely valuable.  I think it
> will be a good idea to integrate it as a test case in the test suite.
> 
> In your patch to dwarf2read.c, I was a bit surprised to see:
> 
>   m_last_subfile != m_cu->get_builder ()->get_current_subfile ()
> 
> So your fix only works if the inlined subroutine comes from another file?  If
> I move the tree_check function in next-inline.cc, the fix no longer applies,
> and we get the broken behavior.  From your previous email, I understand that
> this is expected.  I guess that if both are in the same file, we can't detect
> this situation using the same technique.

Yes, when the inline function is not in a header file that will not help.
But it solves 90% of the problem with a simple and obvious heuristic.

To attack the rest of the problem we would need to know the PCs where inlined
subroutines and each corresponding range infos do end, but that data is
only available long after the line info is parsed.

> 
> I also read about location views, since that's what Alexandre referred to.  It
> sounds like it's a magic solution that will allow GDB to do the right thing in
> this kind of situation.  If that's indeed the case, then it might be good to start
> exploring this path.  I'd like to have a better understanding of how this will help
> GDB make a smarter "next", and what kind of effort is needed to make GDB use it.  My
> understanding is that location views allow having an address mapped to multiple
> source locations.  For example, here's the problematic address in the next-inline
> test case I've compiled:
> 
>   ./next-inline.h:[++]
>   next-inline.h                                 28              0x1175               x
>   next-inline.h                                 30              0x1175       1       x
> 
>   ./next-inline.cc:[++]
>   next-inline.cc                                22              0x1175       2
> 

I think the main problem here, is that from the line numbers alone it is not clear which
of these location infos is in the subroutine and which is in the caller.
The only link is the program address which is ambiguous at the end of the inline
block.

So my impression is that we need a connection between the location views and the
inlined subroutine info.

> Today, when I ask GDB "which source line does this address correspond to", it gives me
> one answer.  Does this mean that GDB will now say that 0x1175 corresponds to
> 
> - next-inline.h:28
> - next-inline.h:30
> - next-inline.cc:22
> 
> all at the same time?  Is one of these source locations more important than the others?
> If execution happens to stop exactly at this address, which location do we present to
> the user?
> 

No idea. That will likely be confusing.

> And to come back the problem at hand, how does this help GDB make a smarter "next"?
> 
> Btw, I stumbled on a bug with the TUI source display.  It might be caused by this patch,
> or it might be that this patch uncovers it.
> 
> When I do these actions:
> 
> - Start GDB with the next-inline test file (from this patch)
> - Enable the TUI
> - Type "start"
> - Type "s"
> - Type "n" twice
> 
> The TUI source display wrongfully jumps to the header file, line 24.
> When I type "frame", it says I'm stopped at next-line.cc:24.  So it
> is showing the right line number of the wrong file.
> 

Ah, yes.  That is already preexistent.
Consider the attached idea for a test case.
I have no idea yet how to make a working test case
out if it.
But I can fix the tui bug, it is quite easy.
I will send a patch for that in a moment.


Thanks
Bernd.
  

Patch

diff --git a/gdb/testsuite/gdb.cp/next-inline.cc b/gdb/testsuite/gdb.cp/next-inline.cc
new file mode 100644
index 0000000..dcf5ab9
--- /dev/null
+++ b/gdb/testsuite/gdb.cp/next-inline.cc
@@ -0,0 +1,34 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2019 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "next-inline.h"
+
+int __attribute__((noinline, noclone))
+get_alias_set (tree *t)
+{ if (t != NULL
+      && TREE_TYPE (t).z != 1
+      && TREE_TYPE (t).z != 2
+      && TREE_TYPE (t).z != 3)
+    return 0;
+  return 1;
+}
+
+tree xx;
+int main()
+{ get_alias_set(&xx);
+  abort();
+}
diff --git a/gdb/testsuite/gdb.cp/next-inline.exp b/gdb/testsuite/gdb.cp/next-inline.exp
new file mode 100644
index 0000000..6badc8c
--- /dev/null
+++ b/gdb/testsuite/gdb.cp/next-inline.exp
@@ -0,0 +1,39 @@ 
+#   Copyright 2019 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" "next-inline" \
+			  {next-inline.cc} \
+			  {debug nowarnings optimize=-O2}] } {
+    return -1
+}
+
+if ![runto_main] {
+    fail "can't run to main"
+    return
+}
+
+gdb_test "bt" "\\s*\\#0\\s+main.*" "in main"
+gdb_test "step" ".*" "step into get_alias_set"
+gdb_test "bt" "\\s*\\#0\\s+\[^\r\]*next-inline\\.cc:.*" "not in inline"
+gdb_test "next" ".*" "next step 1"
+gdb_test "bt" "\\s*\\#0\\s+\[^\r\]*next-inline\\.cc:.*" "not in inline 1"
+gdb_test "next" ".*" "next step 2"
+gdb_test "bt" "\\s*\\#0\\s+\[^\r\]*next-inline\\.cc:.*" "not in inline 2"
+gdb_test "next" ".*" "next step 3"
+gdb_test "bt" "\\s*\\#0\\s+\[^\r\]*next-inline\\.cc:.*" "not in inline 3"
+gdb_test "next" ".*" "next step 4"
+gdb_test "bt" "\\s*\\#0\\s+\[^\r\]*next-inline\\.cc:.*" "not in inline 4"
diff --git a/gdb/testsuite/gdb.cp/next-inline.h b/gdb/testsuite/gdb.cp/next-inline.h
new file mode 100644
index 0000000..99fb1b2
--- /dev/null
+++ b/gdb/testsuite/gdb.cp/next-inline.h
@@ -0,0 +1,34 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2019 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <stdlib.h>
+
+struct tree{
+  volatile int x;
+  volatile int z;
+};
+
+#define TREE_TYPE(NODE) (*tree_check (NODE, 0))
+
+inline tree *
+tree_check (tree *t, int i)
+{
+  if (t->x != i)
+    abort();
+  tree *x = t;
+  return x;
+}