diff mbox

Fix wrong assertions

Message ID 87vbg1eg08.fsf@igel.home
State New
Headers show

Commit Message

Andreas Schwab May 9, 2015, 6:56 p.m. UTC
Both callers and callees are lengths, and the sum of them can validly
equal the total length.

Andreas.

	PR symtab/18392
	* dwarf2-frame-tailcall.c (pretended_chain_levels): Correct
	assertion.
	* dwarf2loc.c (chain_candidate): Likewise.
---
 gdb/dwarf2-frame-tailcall.c | 2 +-
 gdb/dwarf2loc.c             | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Jan Kratochvil May 13, 2015, 2:01 p.m. UTC | #1
On Sat, 09 May 2015 20:56:55 +0200, Andreas Schwab wrote:
> Both callers and callees are lengths, and the sum of them can validly
> equal the total length.

That '<' and not '<=' was there intentional.  Personally I think it needs more
investigation why that can happen.  The idea was that if two solutions exist
neither can be perfect so there have to be some ambiguous enties so there will
be '<' and not '<=' (to fit the ambiguous entries between).

But creating artifical reproducers is a bit difficult and you haven't provided
a reproducer so I cannot say anything much specific.

Personally I do not mind, it is up to the maintainers whether the goal is just
to remove the assertion or verify there aren't some other bugs causing it.


Jan
Andreas Schwab May 13, 2015, 2:35 p.m. UTC | #2
Jan Kratochvil <jan.kratochvil@redhat.com> writes:

> But creating artifical reproducers is a bit difficult and you haven't provided
> a reproducer so I cannot say anything much specific.

Break on reload1.c:2917 (which is "return replace_equiv_address_nv (x,
new_rtx);") and try to step into it.

Andreas.
Yao Qi May 29, 2015, 9:31 a.m. UTC | #3
Jan Kratochvil <jan.kratochvil@redhat.com> writes:

> That '<' and not '<=' was there intentional.  Personally I think it needs more
> investigation why that can happen.  The idea was that if two solutions exist
> neither can be perfect so there have to be some ambiguous enties so there will
> be '<' and not '<=' (to fit the ambiguous entries between).
>
> But creating artifical reproducers is a bit difficult and you haven't provided
> a reproducer so I cannot say anything much specific.
>
> Personally I do not mind, it is up to the maintainers whether the goal is just
> to remove the assertion or verify there aren't some other bugs causing it.

I can reproduce this internal error via Jan's test case, and spend some
time investigating it, but still unable to fully understand the code.

Jan, since you wrote this piece of code, please help me to understand
it.

(gdb) set debug entry-values 1
(gdb) bt
tailcall: initial: 0x40052e(a)
tailcall: compare: 0x400527(a) 0x40052e(a)
tailcall: reduced: | 0x40052e(a)
gdb/git/gdb/dwarf2loc.c:834: internal-error: chain_candidate: Assertion `result->callers + result->callees < result->length' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y

I don't know why we need do intersection in chain_candidate, as the
comments say:

/* Intersect RESULTP with CHAIN to keep RESULTP unambiguous, keep in RESULTP
   only top callers and bottom callees which are present in both.  GDBARCH is
   used only for ENTRY_VALUES_DEBUG.  RESULTP is NULL after return if there are
   no remaining possibilities to provide unambiguous non-trivial result.
   RESULTP should point to NULL on the first (initialization) call.  Caller is
   responsible for xfree of any RESULTP data.  */

What do you mean by "ambiguous" here?  Is it ambiguous if we can get
more than one call chain path from caller_pc to callee_pc?  For example,
main tail calls a, a tail call b and c, b and c tail call d, when GDB
unwinds from d, there are two chains, main -> a -> b -> d, and main -> a
-> c -> d.  Are they ambiguous by your definition?

Further, what is "partially ambiguous result" in the comments below?

/* Determined tail calls for constructing virtual tail call frames.  */

struct call_site_chain
  {
    /* Initially CALLERS == CALLEES == LENGTH.  For partially ambiguous result
       CALLERS + CALLEES < LENGTH.  */
    int callers, callees, length;

    /* Variably sized array with LENGTH elements.  Later [0..CALLERS-1] contain
       top (GDB "prev") sites and [LENGTH-CALLEES..LENGTH-1] contain bottom
       (GDB "next") sites.  One is interested primarily in the PC field.  */
    struct call_site *call_site[1];
  };

I am confused by the usage of the variable-sized array call_site,
elements from 0 to CALLERS-1 are top sites, and elements from
LENGTH-CALLEES to LENGTH-1 are bottom sites, so I conclude that
CALLERS-1 < LENGTH-CALLEES, then CALLERS + CALLEES < LENGTH + 1,
then CALLERS + CALLEES =< LENGTH.  Is it right?

I still have other questions, but I'd like to stop here, otherwise this
mail will be too long.  Your answers to these questions may help to
answer the rest of my questions.
Jan Kratochvil May 29, 2015, 11:31 a.m. UTC | #4
On Fri, 29 May 2015 11:31:18 +0200, Yao Qi wrote:
> spend some
> time investigating it, but still unable to fully understand the code.

I admit the code comments are not too great, I did not notice that when
writing them.


> (gdb) set debug entry-values 1
> (gdb) bt
> tailcall: initial: 0x40052e(a)
> tailcall: compare: 0x400527(a) 0x40052e(a)
> tailcall: reduced: | 0x40052e(a)
> gdb/git/gdb/dwarf2loc.c:834: internal-error: chain_candidate: Assertion `result->callers + result->callees < result->length' failed.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
> Quit this debugging session? (y or n) y
> 
> I don't know why we need do intersection in chain_candidate, as the
> comments say:
> 
> /* Intersect RESULTP with CHAIN to keep RESULTP unambiguous, keep in RESULTP
>    only top callers and bottom callees which are present in both.  GDBARCH is
>    used only for ENTRY_VALUES_DEBUG.  RESULTP is NULL after return if there are
>    no remaining possibilities to provide unambiguous non-trivial result.
>    RESULTP should point to NULL on the first (initialization) call.  Caller is
>    responsible for xfree of any RESULTP data.  */
> 
> What do you mean by "ambiguous" here?  Is it ambiguous if we can get
> more than one call chain path from caller_pc to callee_pc?

Yes.


> For example,
> main tail calls a, a tail call b and c, b and c tail call d, when GDB
> unwinds from d, there are two chains, main -> a -> b -> d, and main -> a
> -> c -> d.  Are they ambiguous by your definition?

Those two chains are ambigous as it could be also the other chain.

Chain intersecting those two chains is:
	main -> a -> <???> -> d


> Further, what is "partially ambiguous result" in the comments below?

The terminology seems bogus there.

"partially ambiguous" was meant the chain:
	main -> a -> <???> -> d
An intersection of all possible chains.


> /* Determined tail calls for constructing virtual tail call frames.  */
> 
> struct call_site_chain
>   {
>     /* Initially CALLERS == CALLEES == LENGTH.  For partially ambiguous result
>        CALLERS + CALLEES < LENGTH.  */
>     int callers, callees, length;
> 
>     /* Variably sized array with LENGTH elements.  Later [0..CALLERS-1] contain
>        top (GDB "prev") sites and [LENGTH-CALLEES..LENGTH-1] contain bottom
>        (GDB "next") sites.  One is interested primarily in the PC field.  */
>     struct call_site *call_site[1];
>   };
> 
> I am confused by the usage of the variable-sized array call_site,
> elements from 0 to CALLERS-1 are top sites, and elements from
> LENGTH-CALLEES to LENGTH-1 are bottom sites, so I conclude that
> CALLERS-1 < LENGTH-CALLEES, then CALLERS + CALLEES < LENGTH + 1,
> then CALLERS + CALLEES =< LENGTH.  Is it right?

Yes, that is right.  Initially there is some chain (let's say the longest one
but that doe snot matter).  Consequently its elements from the middle are
being removed and there remains only some few unambiguous top and bottom ones.

The original idea why the comparison should be sharp ("<") was that if there
are multiple chains like (0xaddr show jmp instruction address):
	main(0x100) -> a(0x200) -> d(0x400)
	main(0x100) -> a(0x200) -> c(0x300) -> d(0x400)
then - such situation cannot exist - if two jmp instructions in "a" have the
same address they must also jump to the same address (*).

(*) jump to a computed address would be never considered for the DWARF
    tail-call records.

So there could be:
	main(0x100) -> a(0x200) -> d(0x400)
	main(0x100) -> a(0x270) -> c(0x300) -> d(0x400)
But then "a" frame itself is ambiguous and it must not be displayed.

I did not realize that there can be self-tail-call:
	main(0x100) -> a(0x200) -> d(0x400)
	main(0x100) -> a(0x280) -> a(0x200) -> d(0x400)
which intersects to:
	main(0x100) -> <???>? -> a(0x200) -> d(0x400)
And so if the first chain was chosen the
	main(0x100) -> a(0x200) -> d(0x400)
then the final intersection has callers+callees==length.

Originally the patchset tried to display the "ambiguous" part <???> in
backtrace creating a bogus frame there but GDB had too many problems with such
a frame.  So currently no such frame is created although still backtrace could
annotate it somehow there are "ambiguous" frames between these two frames.


Jan
Yao Qi May 29, 2015, 1:43 p.m. UTC | #5
Jan Kratochvil <jan.kratochvil@redhat.com> writes:

Hi, Jan,
thanks for your explanations... they are very helpful.

>> Further, what is "partially ambiguous result" in the comments below?
>
> The terminology seems bogus there.
>
> "partially ambiguous" was meant the chain:
> 	main -> a -> <???> -> d
> An intersection of all possible chains.
>

Sounds like "partially ambiguous" is equivalent to "ambiguous".

>
>> /* Determined tail calls for constructing virtual tail call frames.  */
>> 
>> struct call_site_chain
>>   {
>>     /* Initially CALLERS == CALLEES == LENGTH.  For partially ambiguous result
>>        CALLERS + CALLEES < LENGTH.  */
>>     int callers, callees, length;
>> 
>>     /* Variably sized array with LENGTH elements.  Later [0..CALLERS-1] contain
>>        top (GDB "prev") sites and [LENGTH-CALLEES..LENGTH-1] contain bottom
>>        (GDB "next") sites.  One is interested primarily in the PC field.  */
>>     struct call_site *call_site[1];
>>   };
>> 
>> I am confused by the usage of the variable-sized array call_site,
>> elements from 0 to CALLERS-1 are top sites, and elements from
>> LENGTH-CALLEES to LENGTH-1 are bottom sites, so I conclude that
>> CALLERS-1 < LENGTH-CALLEES, then CALLERS + CALLEES < LENGTH + 1,
>> then CALLERS + CALLEES =< LENGTH.  Is it right?
>
> Yes, that is right.  Initially there is some chain (let's say the longest one

If that is right, the assert below is too strict, isn't?

  /* See call_site_find_chain_1 why there is no way to reach the bottom callee
     PC again.  In such case there must be two different code paths to reach
     it, therefore some of the former determined intermediate PCs must differ
     and the unambiguous chain gets shortened.  */
  gdb_assert (result->callers + result->callees < result->length);

> but that doe snot matter).  Consequently its elements from the middle are
> being removed and there remains only some few unambiguous top and
> bottom ones.

If there is no call sites removed from the chain during the intersection,
CALLERS + CALLEES == LENGTH, right?  in function chain_candidate,
result->length is set by the length of a chain.  If this chain is the
shortest one, CALLERS + CALLEES == LENGTH otherwise,
CALLERS + CALLEES < LENGTH.  Is it right?  If so, we need to relax the
condition in the assert and update the comments.

>
> The original idea why the comparison should be sharp ("<") was that if there
> are multiple chains like (0xaddr show jmp instruction address):
> 	main(0x100) -> a(0x200) -> d(0x400)
> 	main(0x100) -> a(0x200) -> c(0x300) -> d(0x400)
> then - such situation cannot exist - if two jmp instructions in "a" have the
> same address they must also jump to the same address (*).
>
> (*) jump to a computed address would be never considered for the DWARF
>     tail-call records.
>
> So there could be:
> 	main(0x100) -> a(0x200) -> d(0x400)
> 	main(0x100) -> a(0x270) -> c(0x300) -> d(0x400)
> But then "a" frame itself is ambiguous and it must not be displayed.
>
> I did not realize that there can be self-tail-call:
> 	main(0x100) -> a(0x200) -> d(0x400)
> 	main(0x100) -> a(0x280) -> a(0x200) -> d(0x400)
> which intersects to:
> 	main(0x100) -> <???>? -> a(0x200) -> d(0x400)
> And so if the first chain was chosen the
> 	main(0x100) -> a(0x200) -> d(0x400)
> then the final intersection has callers+callees==length.

What are the definitions of CALLERS, CALLEES, top and bottom? given this example?
diff mbox

Patch

diff --git a/gdb/dwarf2-frame-tailcall.c b/gdb/dwarf2-frame-tailcall.c
index b412a5b..f964ab2 100644
--- a/gdb/dwarf2-frame-tailcall.c
+++ b/gdb/dwarf2-frame-tailcall.c
@@ -197,7 +197,7 @@  pretended_chain_levels (struct call_site_chain *chain)
     return chain->length;
 
   chain_levels = chain->callers + chain->callees;
-  gdb_assert (chain_levels < chain->length);
+  gdb_assert (chain_levels <= chain->length);
 
   return chain_levels;
 }
diff --git a/gdb/dwarf2loc.c b/gdb/dwarf2loc.c
index d811106..29b265b 100644
--- a/gdb/dwarf2loc.c
+++ b/gdb/dwarf2loc.c
@@ -827,7 +827,7 @@  chain_candidate (struct gdbarch *gdbarch, struct call_site_chain **resultp,
      PC again.  In such case there must be two different code paths to reach
      it, therefore some of the former determined intermediate PCs must differ
      and the unambiguous chain gets shortened.  */
-  gdb_assert (result->callers + result->callees < result->length);
+  gdb_assert (result->callers + result->callees <= result->length);
 }
 
 /* Create and return call_site_chain for CALLER_PC and CALLEE_PC.  All the