Fix an out of bounds array access in, find_epilogue_using_linetable

Message ID AS8P193MB1285A29FFB90BE6B6D2E892BE4382@AS8P193MB1285.EURP193.PROD.OUTLOOK.COM
State New
Headers
Series Fix an out of bounds array access in, find_epilogue_using_linetable |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed

Commit Message

Bernd Edlinger March 31, 2024, 10:45 a.m. UTC
  This causes random test failures like these:

FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: $fba_value == $fn_fba
FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: check frame-id matches
FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: bt 2
FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: up
FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: $sp_value == $::main_sp
FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: $fba_value == $::main_fba
FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: [string equal $fid $::main_fid]

Here the read happens below the first element of the line
table, and the test failure depends on the value that is
read from there.

Theoretically it is also possible that std::lower_bound
returns a pointer exactly at the upper bound of the line table,
also here the read value is undefined, that happens in this test:

FAIL: gdb.dwarf2/dw2-epilogue-begin.exp: confirm watchpoint doesn't trigger

Fixes: 528b729be1a2 ("gdb/dwarf2: Add support for DW_LNS_set_epilogue_begin in line-table")
---
 gdb/symtab.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
  

Comments

Tom de Vries April 1, 2024, 9:07 a.m. UTC | #1
On 3/31/24 12:45, Bernd Edlinger wrote:
> This causes random test failures like these:
> 

Hi Bernd,

thanks for the analysis and the patch.

I think with "this" you mean $subject, which was not immediately clear 
to me when reading it on the mailing list where there's a whole bunch of 
text inbetween, so please consider spelling it out or using $subject or 
some such.

> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: $fba_value == $fn_fba
> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: check frame-id matches
> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: bt 2
> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: up
> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: $sp_value == $::main_sp
> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: $fba_value == $::main_fba
> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: [string equal $fid $::main_fid]
> 
> Here the read happens below the first element of the line
> table, and the test failure depends on the value that is
> read from there.
> 

Using this description, I managed to minimally rewrite the loop into a 
form where I could easily add an assert, and then add an assert that 
reliably detects the problem:
...
       while (1)
         {
           gdb_assert (it >= &linetable->item[0]);
           if (it->unrelocated_pc () >= unrel_start)
             ;
           else
             break;
           if (it->epilogue_begin)
             return {it->pc (objfile)};
           it --;
         }
...

> Theoretically it is also possible that std::lower_bound
> returns a pointer exactly at the upper bound of the line table,
> also here the read value is undefined, that happens in this test:
> 
> FAIL: gdb.dwarf2/dw2-epilogue-begin.exp: confirm watchpoint doesn't trigger
> 

Ack, thanks for finding and reporting that.

> Fixes: 528b729be1a2 ("gdb/dwarf2: Add support for DW_LNS_set_epilogue_begin in line-table")
> ---
>   gdb/symtab.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/gdb/symtab.c b/gdb/symtab.c
> index 86603dfebc3..2fc8e819932 100644
> --- a/gdb/symtab.c
> +++ b/gdb/symtab.c
> @@ -4177,11 +4177,10 @@ find_epilogue_using_linetable (CORE_ADDR func_addr)
>   	   return lte.unrelocated_pc () < pc;
>   	 });
>   
> -      while (it->unrelocated_pc () >= unrel_start)
> +      while (it > linetable->item && (--it)->unrelocated_pc () >= unrel_start)
>         {
>   	if (it->epilogue_begin)
>   	  return {it->pc (objfile)};
> -	it --;
>         }
>       }
>     return {};

I think this makes the while condition fairly convoluted.

Also, AFAIU the problem you mention related to "pointer exactly at the 
upper bound of the line table", may or may not happen, and the new --it 
placement that seems intended to address that is executed 
unconditionally, so wouldn't this sometimes skip an item?

I think things can be solved simpler by:
- using a trivial for loop that expresses the boundary of the line
   table, and
- moving the while condition into the loop, while
- handling the exceptional case explictly, outside the loop:
...
       if (it == &linetable->item[linetable->nitems])
         it--;

       for (; it >= &linetable->item[0]; it--)
         {
           if (it->unrelocated_pc () < unrel_start)
             break;
           if (it->epilogue_begin)
             return {it->pc (objfile)};
         }
...

WDYT?

Thanks,
- Tom
  
Bernd Edlinger April 1, 2024, 4:41 p.m. UTC | #2
On 4/1/24 11:07, Tom de Vries wrote:
> On 3/31/24 12:45, Bernd Edlinger wrote:
>> This causes random test failures like these:
>>
> 
> Hi Bernd,
> 
> thanks for the analysis and the patch.
> 
> I think with "this" you mean $subject, which was not immediately clear to me when reading it on the mailing list where there's a whole bunch of text inbetween, so please consider spelling it out or using $subject or some such.
> 

ok will do.

>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: $fba_value == $fn_fba
>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: check frame-id matches
>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: bt 2
>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: up
>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: $sp_value == $::main_sp
>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: $fba_value == $::main_fba
>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: [string equal $fid $::main_fid]
>>
>> Here the read happens below the first element of the line
>> table, and the test failure depends on the value that is
>> read from there.
>>
> 
> Using this description, I managed to minimally rewrite the loop into a form where I could easily add an assert, and then add an assert that reliably detects the problem:
> ...
>       while (1)
>         {
>           gdb_assert (it >= &linetable->item[0]);
>           if (it->unrelocated_pc () >= unrel_start)
>             ;
>           else
>             break;
>           if (it->epilogue_begin)
>             return {it->pc (objfile)};
>           it --;
>         }
> ...
> 
>> Theoretically it is also possible that std::lower_bound
>> returns a pointer exactly at the upper bound of the line table,
>> also here the read value is undefined, that happens in this test:
>>
>> FAIL: gdb.dwarf2/dw2-epilogue-begin.exp: confirm watchpoint doesn't trigger
>>
> 
> Ack, thanks for finding and reporting that.
> 
>> Fixes: 528b729be1a2 ("gdb/dwarf2: Add support for DW_LNS_set_epilogue_begin in line-table")
>> ---
>>   gdb/symtab.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/gdb/symtab.c b/gdb/symtab.c
>> index 86603dfebc3..2fc8e819932 100644
>> --- a/gdb/symtab.c
>> +++ b/gdb/symtab.c
>> @@ -4177,11 +4177,10 @@ find_epilogue_using_linetable (CORE_ADDR func_addr)
>>          return lte.unrelocated_pc () < pc;
>>        });
>>   -      while (it->unrelocated_pc () >= unrel_start)
>> +      while (it > linetable->item && (--it)->unrelocated_pc () >= unrel_start)
>>         {
>>       if (it->epilogue_begin)
>>         return {it->pc (objfile)};
>> -    it --;
>>         }
>>       }
>>     return {};
> 
> I think this makes the while condition fairly convoluted.
> 
> Also, AFAIU the problem you mention related to "pointer exactly at the upper bound of the line table", may or may not happen, and the new --it placement that seems intended to address that is executed unconditionally, so wouldn't this sometimes skip an item?
> 

No, because the upper limit is exactly where the next function starts,
so it will evaluate the first line table entry of the following function,
probably harmless, since that should not be the epilogue of the function.

> I think things can be solved simpler by:
> - using a trivial for loop that expresses the boundary of the line
>   table, and
> - moving the while condition into the loop, while
> - handling the exceptional case explictly, outside the loop:
> ...
>       if (it == &linetable->item[linetable->nitems])
>         it--;
> 

no, that is also wrong if linetable->nitems == 0, `it` is
&linetable->item[-1], and the comparing that value to
&linetable->item[0] yields an undefined boolean value.
And as I said above looking at the line table entry
that belongs to the next function is asking for trouble.

>       for (; it >= &linetable->item[0]; it--)
>         {
>           if (it->unrelocated_pc () < unrel_start)
>             break;
>           if (it->epilogue_begin)
>             return {it->pc (objfile)};
>         }
> ...
> 
> WDYT?
> 

Even decrementing `it` below the first element of an array
is undefined behavior, avoiding that is the main reason
why I wrote the if condition in this way.

I would agree to add a comment here which explains how the
loop works.

OK?

Thanks,
Bernd.

> Thanks,
> - Tom
  
Tom de Vries April 5, 2024, 3:11 p.m. UTC | #3
On 4/1/24 18:41, Bernd Edlinger wrote:
> On 4/1/24 11:07, Tom de Vries wrote:
>> On 3/31/24 12:45, Bernd Edlinger wrote:
>>> This causes random test failures like these:
>>>
>>
>> Hi Bernd,
>>
>> thanks for the analysis and the patch.
>>
>> I think with "this" you mean $subject, which was not immediately clear to me when reading it on the mailing list where there's a whole bunch of text inbetween, so please consider spelling it out or using $subject or some such.
>>
> 
> ok will do.
> 
>>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: $fba_value == $fn_fba
>>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: check frame-id matches
>>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: bt 2
>>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: up
>>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: $sp_value == $::main_sp
>>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: $fba_value == $::main_fba
>>> FAIL: gdb.base/unwind-on-each-insn-amd64.exp: foo: instruction 6: [string equal $fid $::main_fid]
>>>
>>> Here the read happens below the first element of the line
>>> table, and the test failure depends on the value that is
>>> read from there.
>>>
>>
>> Using this description, I managed to minimally rewrite the loop into a form where I could easily add an assert, and then add an assert that reliably detects the problem:
>> ...
>>        while (1)
>>          {
>>            gdb_assert (it >= &linetable->item[0]);
>>            if (it->unrelocated_pc () >= unrel_start)
>>              ;
>>            else
>>              break;
>>            if (it->epilogue_begin)
>>              return {it->pc (objfile)};
>>            it --;
>>          }
>> ...
>>
>>> Theoretically it is also possible that std::lower_bound
>>> returns a pointer exactly at the upper bound of the line table,
>>> also here the read value is undefined, that happens in this test:
>>>
>>> FAIL: gdb.dwarf2/dw2-epilogue-begin.exp: confirm watchpoint doesn't trigger
>>>
>>
>> Ack, thanks for finding and reporting that.
>>
>>> Fixes: 528b729be1a2 ("gdb/dwarf2: Add support for DW_LNS_set_epilogue_begin in line-table")
>>> ---
>>>    gdb/symtab.c | 3 +--
>>>    1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/gdb/symtab.c b/gdb/symtab.c
>>> index 86603dfebc3..2fc8e819932 100644
>>> --- a/gdb/symtab.c
>>> +++ b/gdb/symtab.c
>>> @@ -4177,11 +4177,10 @@ find_epilogue_using_linetable (CORE_ADDR func_addr)
>>>           return lte.unrelocated_pc () < pc;
>>>         });
>>>    -      while (it->unrelocated_pc () >= unrel_start)
>>> +      while (it > linetable->item && (--it)->unrelocated_pc () >= unrel_start)
>>>          {
>>>        if (it->epilogue_begin)
>>>          return {it->pc (objfile)};
>>> -    it --;
>>>          }
>>>        }
>>>      return {};
>>
>> I think this makes the while condition fairly convoluted.
>>
>> Also, AFAIU the problem you mention related to "pointer exactly at the upper bound of the line table", may or may not happen, and the new --it placement that seems intended to address that is executed unconditionally, so wouldn't this sometimes skip an item?
>>
> 
> No, because the upper limit is exactly where the next function starts,
> so it will evaluate the first line table entry of the following function,
> probably harmless, since that should not be the epilogue of the function.
> 

Looking at the "pointer exactly at the upper bound of the line table" 
problem a bit more, I think it can happen that it->epilogue_begin is 
evaluated for an end_sequence entry.  My guess is that this is harmless, 
but it's better to avoid it.

Atm it's hard to reason about what happens because we try to handle 
several initial scenarios in the loop.  It's best to untangle this:
- handle exceptional situations before the loop, and
- reserve the loop to handle the steady state: iterating over
   proper line-table entries of the current function.

>> I think things can be solved simpler by:
>> - using a trivial for loop that expresses the boundary of the line
>>    table, and
>> - moving the while condition into the loop, while
>> - handling the exceptional case explictly, outside the loop:
>> ...
>>        if (it == &linetable->item[linetable->nitems])
>>          it--;
>>
> 
> no, that is also wrong if linetable->nitems == 0, `it` is
> &linetable->item[-1], and the comparing that value to
> &linetable->item[0] yields an undefined boolean value.

I see, I forgot about that, thanks for pointing that out.  It's good to 
make explicit in the implementation that and where we're avoid this.

Btw, if "linetable->nitems == 0" indeed can happen, it's best handled 
using an early-out.

> And as I said above looking at the line table entry
> that belongs to the next function is asking for trouble.
> 
>>        for (; it >= &linetable->item[0]; it--)
>>          {
>>            if (it->unrelocated_pc () < unrel_start)
>>              break;
>>            if (it->epilogue_begin)
>>              return {it->pc (objfile)};
>>          }
>> ...
>>
>> WDYT?
>>
> 
> Even decrementing `it` below the first element of an array
> is undefined behavior, avoiding that is the main reason
> why I wrote the if condition in this way.
> 
> I would agree to add a comment here which explains how the
> loop works.
> 
> OK?
> 

I've written a version that avoids the undefined behaviour decrement, 
clearly advertises this, handles cornercases explicitly and adds and 
reorganizes comments, as well as adds some asserts.  I've submitted it 
as a v3, which also adds an additional test-case.

Thanks,
- Tom
  
Bernd Edlinger April 5, 2024, 6:13 p.m. UTC | #4
On 4/5/24 17:11, Tom de Vries wrote:
> 
> Looking at the "pointer exactly at the upper bound of the line table" problem a bit more, I think it can happen that it->epilogue_begin is evaluated for an end_sequence entry.  My guess is that this is harmless, but it's better to avoid it.
> 
> Atm it's hard to reason about what happens because we try to handle several initial scenarios in the loop.  It's best to untangle this:
> - handle exceptional situations before the loop, and
> - reserve the loop to handle the steady state: iterating over
>   proper line-table entries of the current function.
> 
>>> I think things can be solved simpler by:
>>> - using a trivial for loop that expresses the boundary of the line
>>>    table, and
>>> - moving the while condition into the loop, while
>>> - handling the exceptional case explictly, outside the loop:
>>> ...
>>>        if (it == &linetable->item[linetable->nitems])
>>>          it--;
>>>
>>
>> no, that is also wrong if linetable->nitems == 0, `it` is
>> &linetable->item[-1], and the comparing that value to
>> &linetable->item[0] yields an undefined boolean value.
> 
> I see, I forgot about that, thanks for pointing that out.  It's good to make explicit in the implementation that and where we're avoid this.
> 
> Btw, if "linetable->nitems == 0" indeed can happen, it's best handled using an early-out.
> 
>> And as I said above looking at the line table entry
>> that belongs to the next function is asking for trouble.
>>
>> Even decrementing `it` below the first element of an array
>> is undefined behavior, avoiding that is the main reason
>> why I wrote the if condition in this way.
>>
>> I would agree to add a comment here which explains how the
>> loop works.
>>
>> OK?
>>
> 
> I've written a version that avoids the undefined behaviour decrement, clearly advertises this, handles cornercases explicitly and adds and reorganizes comments, as well as adds some asserts.  I've submitted it as a v3, which also adds an additional test-case.
> 

Ah okay,
I forgot about the end_sequence, that should be at the upper bound of the function range,
but it should not be possible, that it has an epilogue-begin bit set in this line table
item, see dwarf_finish_line, there is only a IS_STMT flag, no LEF_EPILOGUE_BEGIN:

dwarf_finish_line (struct gdbarch *gdbarch, struct subfile *subfile,
                   unrelocated_addr address, struct dwarf2_cu *cu,
                   bool end_sequence) 
{           
  if (subfile == NULL)
    return;

  if (dwarf_line_debug)
    {
      gdb_printf (gdb_stdlog,
                  "Finishing current line, file %s, address %s\n",
                  lbasename (subfile->name.c_str ()),
                  paddress (gdbarch, (CORE_ADDR) address));
    }

  dwarf_record_line_1 (gdbarch, subfile, end_sequence ? 0 : -1, address,
                       LEF_IS_STMT, cu);
}

So it should not be necessary to check end-sequence for epilogue-begin bit.
 
But even if an epilogue-begin line table entry would be at the same address,
that can only happen due to an invalid debug info: That would by my reasoning
be impossible, because that would imply a zero byte epilogue and that
cannot be correct because the epilogue consists of at least a return instruction.
Therefore I'd suggest to skip that line table entry, because even if it has the
epilogue-begin bit set, it can only be invalid and it is better ignored.


Thanks,
Bernd.



> Thanks,
> - Tom
>
  

Patch

diff --git a/gdb/symtab.c b/gdb/symtab.c
index 86603dfebc3..2fc8e819932 100644
--- a/gdb/symtab.c
+++ b/gdb/symtab.c
@@ -4177,11 +4177,10 @@  find_epilogue_using_linetable (CORE_ADDR func_addr)
 	   return lte.unrelocated_pc () < pc;
 	 });
 
-      while (it->unrelocated_pc () >= unrel_start)
+      while (it > linetable->item && (--it)->unrelocated_pc () >= unrel_start)
       {
 	if (it->epilogue_begin)
 	  return {it->pc (objfile)};
-	it --;
       }
     }
   return {};