[v2,PR,gdb/29272] Make sure a copy_insn_closure is available when we have a match in copy_insn_closure_by_addr

Message ID 20221102143341.2807182-1-luis.machado@arm.com
State New
Headers
Series [v2,PR,gdb/29272] Make sure a copy_insn_closure is available when we have a match in copy_insn_closure_by_addr |

Commit Message

Luis Machado Nov. 2, 2022, 2:33 p.m. UTC
  v2: Add try/catch block

Investigating PR29272, it was mentioned a particular test used to work on
GDB 10, but it started failing with GDB 11 onwards. I tracked it down to
some displaced stepping improvements on commit
187b041e2514827b9d86190ed2471c4c7a352874.

In particular, one of the corner cases using copy_insn_closure_by_addr got
silently broken. It is hard to spot because it doesn't have any good tests
for it, and the situation is quite specific to the Arm target.

Essentially, the change from the displaced stepping improvements made it so
we could still invoke copy_insn_closure_by_addr correctly to return the
pointer to a copy_insn_closure, but it always returned nullptr due to
the order of the statements in displaced_step_buffer::prepare.

The way it is now, we first write the address of the displaced step buffer
to PC and then save the copy_insn_closure pointer.

The problem is that writing to PC for the Arm target requires figuring
out if the new PC is thumb mode or not.

With no copy_insn_closure data, the logic to determine the thumb mode
during displaced stepping doesn't work, and gives random results that
are difficult to track (SIGILL, SIGSEGV etc).

Fix this by reordering the PC write in displaced_step_buffer::prepare
and, for safety, add an assertion to
displaced_step_buffer::copy_insn_closure_by_addr so GDB stops right
when it sees this invalid situation. If this gets broken again in the
future, it will be easier to spot.

Guard the code in a try/catch block to handle the case where we can't
write the PC, so as to not leave partial state in the displaced step
machinery.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29272
---
 gdb/displaced-stepping.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)
  

Comments

Simon Marchi Nov. 2, 2022, 5:44 p.m. UTC | #1
On 11/2/22 10:33, Luis Machado via Gdb-patches wrote:
> v2: Add try/catch block
> 
> Investigating PR29272, it was mentioned a particular test used to work on
> GDB 10, but it started failing with GDB 11 onwards. I tracked it down to
> some displaced stepping improvements on commit
> 187b041e2514827b9d86190ed2471c4c7a352874.
> 
> In particular, one of the corner cases using copy_insn_closure_by_addr got
> silently broken. It is hard to spot because it doesn't have any good tests
> for it, and the situation is quite specific to the Arm target.
> 
> Essentially, the change from the displaced stepping improvements made it so
> we could still invoke copy_insn_closure_by_addr correctly to return the
> pointer to a copy_insn_closure, but it always returned nullptr due to
> the order of the statements in displaced_step_buffer::prepare.
> 
> The way it is now, we first write the address of the displaced step buffer
> to PC and then save the copy_insn_closure pointer.
> 
> The problem is that writing to PC for the Arm target requires figuring
> out if the new PC is thumb mode or not.
> 
> With no copy_insn_closure data, the logic to determine the thumb mode
> during displaced stepping doesn't work, and gives random results that
> are difficult to track (SIGILL, SIGSEGV etc).
> 
> Fix this by reordering the PC write in displaced_step_buffer::prepare
> and, for safety, add an assertion to
> displaced_step_buffer::copy_insn_closure_by_addr so GDB stops right
> when it sees this invalid situation. If this gets broken again in the
> future, it will be easier to spot.
> 
> Guard the code in a try/catch block to handle the case where we can't
> write the PC, so as to not leave partial state in the displaced step
> machinery.
> 
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29272
> ---
>  gdb/displaced-stepping.c | 26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/gdb/displaced-stepping.c b/gdb/displaced-stepping.c
> index eac2c5dab94..3b5376cf31b 100644
> --- a/gdb/displaced-stepping.c
> +++ b/gdb/displaced-stepping.c
> @@ -139,15 +139,31 @@ displaced_step_buffers::prepare (thread_info *thread, CORE_ADDR &displaced_pc)
>        return DISPLACED_STEP_PREPARE_STATUS_CANT;
>      }
>  
> -  /* Resume execution at the copy.  */
> -  regcache_write_pc (regcache, buffer->addr);
> -
>    /* This marks the buffer as being in use.  */
>    buffer->current_thread = thread;
>  
>    /* Save this, now that we know everything went fine.  */
>    buffer->copy_insn_closure = std::move (copy_insn_closure);
>  
> +  /* Adjust the PC so it points to the displaced step buffer address that will
> +     be used.  This needs to be done after we save the copy_insn_closure, as
> +     some architectures (Arm, for one) need that information so they can adjust
> +     other data as needed.  In particular, Arm needs to know if the instruction
> +     being executed in the displaced step buffer is thumb or not.  Without that
> +     information, things will be very wrong in a random way.  */
> +  try
> +    {
> +      regcache_write_pc (regcache, buffer->addr);
> +    }
> +  catch (const gdb_exception_error &except)
> +    {
> +      /* Reset the displaced step buffer state if we failed to write PC.
> +	 Otherwise we will prevent this buffer from being used, as it will
> +	 always have a thread in buffer->current_thread.  */
> +      buffer->current_thread = nullptr;
> +      copy_insn_closure = std::move (buffer->copy_insn_closure);

The intention would be clearer by just doing:

  buffer->copy_insn_closure.reset ()

> +      return DISPLACED_STEP_PREPARE_STATUS_CANT;

I think we should just let the exception escape,
DISPLACED_STEP_PREPARE_STATUS_CANT isn't meant to convey an error.
Would this work, using make_scope_exit?

  /* Reset the displaced step buffer state if we failed to write PC.
     Otherwise we will prevent this buffer from being used, as it will
     always have a thread in buffer->current_thread.  */
  auto reset_buffer = make_scope_exit
    ([buffer] ()
       {
         buffer->current_thread = nullptr;
         buffer->copy_insn_closure.reset ();
       });

  /* Adjust the PC so it points to the displaced step buffer address that will
     be used.  This needs to be done after we save the copy_insn_closure, as
     some architectures (Arm, for one) need that information so they can adjust
     other data as needed.  In particular, Arm needs to know if the instruction
     being executed in the displaced step buffer is thumb or not.  Without that
     information, things will be very wrong in a random way.  */
  regcache_write_pc (regcache, buffer->addr);

  reset_buffer.release ();

Simon
  
Luis Machado Nov. 2, 2022, 6:06 p.m. UTC | #2
On 11/2/22 17:44, Simon Marchi wrote:
> On 11/2/22 10:33, Luis Machado via Gdb-patches wrote:
>> v2: Add try/catch block
>>
>> Investigating PR29272, it was mentioned a particular test used to work on
>> GDB 10, but it started failing with GDB 11 onwards. I tracked it down to
>> some displaced stepping improvements on commit
>> 187b041e2514827b9d86190ed2471c4c7a352874.
>>
>> In particular, one of the corner cases using copy_insn_closure_by_addr got
>> silently broken. It is hard to spot because it doesn't have any good tests
>> for it, and the situation is quite specific to the Arm target.
>>
>> Essentially, the change from the displaced stepping improvements made it so
>> we could still invoke copy_insn_closure_by_addr correctly to return the
>> pointer to a copy_insn_closure, but it always returned nullptr due to
>> the order of the statements in displaced_step_buffer::prepare.
>>
>> The way it is now, we first write the address of the displaced step buffer
>> to PC and then save the copy_insn_closure pointer.
>>
>> The problem is that writing to PC for the Arm target requires figuring
>> out if the new PC is thumb mode or not.
>>
>> With no copy_insn_closure data, the logic to determine the thumb mode
>> during displaced stepping doesn't work, and gives random results that
>> are difficult to track (SIGILL, SIGSEGV etc).
>>
>> Fix this by reordering the PC write in displaced_step_buffer::prepare
>> and, for safety, add an assertion to
>> displaced_step_buffer::copy_insn_closure_by_addr so GDB stops right
>> when it sees this invalid situation. If this gets broken again in the
>> future, it will be easier to spot.
>>
>> Guard the code in a try/catch block to handle the case where we can't
>> write the PC, so as to not leave partial state in the displaced step
>> machinery.
>>
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29272
>> ---
>>   gdb/displaced-stepping.c | 26 +++++++++++++++++++++++---
>>   1 file changed, 23 insertions(+), 3 deletions(-)
>>
>> diff --git a/gdb/displaced-stepping.c b/gdb/displaced-stepping.c
>> index eac2c5dab94..3b5376cf31b 100644
>> --- a/gdb/displaced-stepping.c
>> +++ b/gdb/displaced-stepping.c
>> @@ -139,15 +139,31 @@ displaced_step_buffers::prepare (thread_info *thread, CORE_ADDR &displaced_pc)
>>         return DISPLACED_STEP_PREPARE_STATUS_CANT;
>>       }
>>   
>> -  /* Resume execution at the copy.  */
>> -  regcache_write_pc (regcache, buffer->addr);
>> -
>>     /* This marks the buffer as being in use.  */
>>     buffer->current_thread = thread;
>>   
>>     /* Save this, now that we know everything went fine.  */
>>     buffer->copy_insn_closure = std::move (copy_insn_closure);
>>   
>> +  /* Adjust the PC so it points to the displaced step buffer address that will
>> +     be used.  This needs to be done after we save the copy_insn_closure, as
>> +     some architectures (Arm, for one) need that information so they can adjust
>> +     other data as needed.  In particular, Arm needs to know if the instruction
>> +     being executed in the displaced step buffer is thumb or not.  Without that
>> +     information, things will be very wrong in a random way.  */
>> +  try
>> +    {
>> +      regcache_write_pc (regcache, buffer->addr);
>> +    }
>> +  catch (const gdb_exception_error &except)
>> +    {
>> +      /* Reset the displaced step buffer state if we failed to write PC.
>> +	 Otherwise we will prevent this buffer from being used, as it will
>> +	 always have a thread in buffer->current_thread.  */
>> +      buffer->current_thread = nullptr;
>> +      copy_insn_closure = std::move (buffer->copy_insn_closure);
> 
> The intention would be clearer by just doing:
> 
>    buffer->copy_insn_closure.reset ()
> 
>> +      return DISPLACED_STEP_PREPARE_STATUS_CANT;
> 
> I think we should just let the exception escape,
> DISPLACED_STEP_PREPARE_STATUS_CANT isn't meant to convey an error.

Wouldn't letting it escape completely abort the single-stepping operation? I was expecting a return of
DISPLACED_STEP_PREPARE_STATUS_CANT to have a fallback of stepping in-place. Isn't that the case?

> Would this work, using make_scope_exit?

Ah, possibly. Let me try that. Thanks for the suggestion.

> 
>    /* Reset the displaced step buffer state if we failed to write PC.
>       Otherwise we will prevent this buffer from being used, as it will
>       always have a thread in buffer->current_thread.  */
>    auto reset_buffer = make_scope_exit
>      ([buffer] ()
>         {
>           buffer->current_thread = nullptr;
>           buffer->copy_insn_closure.reset ();
>         });
> 
>    /* Adjust the PC so it points to the displaced step buffer address that will
>       be used.  This needs to be done after we save the copy_insn_closure, as
>       some architectures (Arm, for one) need that information so they can adjust
>       other data as needed.  In particular, Arm needs to know if the instruction
>       being executed in the displaced step buffer is thumb or not.  Without that
>       information, things will be very wrong in a random way.  */
>    regcache_write_pc (regcache, buffer->addr);
> 
>    reset_buffer.release ();
> 
> Simon
  
Simon Marchi Nov. 2, 2022, 6:22 p.m. UTC | #3
On 11/2/22 14:06, Luis Machado wrote:
> On 11/2/22 17:44, Simon Marchi wrote:
>> On 11/2/22 10:33, Luis Machado via Gdb-patches wrote:
>>> v2: Add try/catch block
>>>
>>> Investigating PR29272, it was mentioned a particular test used to work on
>>> GDB 10, but it started failing with GDB 11 onwards. I tracked it down to
>>> some displaced stepping improvements on commit
>>> 187b041e2514827b9d86190ed2471c4c7a352874.
>>>
>>> In particular, one of the corner cases using copy_insn_closure_by_addr got
>>> silently broken. It is hard to spot because it doesn't have any good tests
>>> for it, and the situation is quite specific to the Arm target.
>>>
>>> Essentially, the change from the displaced stepping improvements made it so
>>> we could still invoke copy_insn_closure_by_addr correctly to return the
>>> pointer to a copy_insn_closure, but it always returned nullptr due to
>>> the order of the statements in displaced_step_buffer::prepare.
>>>
>>> The way it is now, we first write the address of the displaced step buffer
>>> to PC and then save the copy_insn_closure pointer.
>>>
>>> The problem is that writing to PC for the Arm target requires figuring
>>> out if the new PC is thumb mode or not.
>>>
>>> With no copy_insn_closure data, the logic to determine the thumb mode
>>> during displaced stepping doesn't work, and gives random results that
>>> are difficult to track (SIGILL, SIGSEGV etc).
>>>
>>> Fix this by reordering the PC write in displaced_step_buffer::prepare
>>> and, for safety, add an assertion to
>>> displaced_step_buffer::copy_insn_closure_by_addr so GDB stops right
>>> when it sees this invalid situation. If this gets broken again in the
>>> future, it will be easier to spot.
>>>
>>> Guard the code in a try/catch block to handle the case where we can't
>>> write the PC, so as to not leave partial state in the displaced step
>>> machinery.
>>>
>>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29272
>>> ---
>>>   gdb/displaced-stepping.c | 26 +++++++++++++++++++++++---
>>>   1 file changed, 23 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/gdb/displaced-stepping.c b/gdb/displaced-stepping.c
>>> index eac2c5dab94..3b5376cf31b 100644
>>> --- a/gdb/displaced-stepping.c
>>> +++ b/gdb/displaced-stepping.c
>>> @@ -139,15 +139,31 @@ displaced_step_buffers::prepare (thread_info *thread, CORE_ADDR &displaced_pc)
>>>         return DISPLACED_STEP_PREPARE_STATUS_CANT;
>>>       }
>>>   -  /* Resume execution at the copy.  */
>>> -  regcache_write_pc (regcache, buffer->addr);
>>> -
>>>     /* This marks the buffer as being in use.  */
>>>     buffer->current_thread = thread;
>>>       /* Save this, now that we know everything went fine.  */
>>>     buffer->copy_insn_closure = std::move (copy_insn_closure);
>>>   +  /* Adjust the PC so it points to the displaced step buffer address that will
>>> +     be used.  This needs to be done after we save the copy_insn_closure, as
>>> +     some architectures (Arm, for one) need that information so they can adjust
>>> +     other data as needed.  In particular, Arm needs to know if the instruction
>>> +     being executed in the displaced step buffer is thumb or not.  Without that
>>> +     information, things will be very wrong in a random way.  */
>>> +  try
>>> +    {
>>> +      regcache_write_pc (regcache, buffer->addr);
>>> +    }
>>> +  catch (const gdb_exception_error &except)
>>> +    {
>>> +      /* Reset the displaced step buffer state if we failed to write PC.
>>> +     Otherwise we will prevent this buffer from being used, as it will
>>> +     always have a thread in buffer->current_thread.  */
>>> +      buffer->current_thread = nullptr;
>>> +      copy_insn_closure = std::move (buffer->copy_insn_closure);
>>
>> The intention would be clearer by just doing:
>>
>>    buffer->copy_insn_closure.reset ()
>>
>>> +      return DISPLACED_STEP_PREPARE_STATUS_CANT;
>>
>> I think we should just let the exception escape,
>> DISPLACED_STEP_PREPARE_STATUS_CANT isn't meant to convey an error.
> 
> Wouldn't letting it escape completely abort the single-stepping operation? I was expecting a return of
> DISPLACED_STEP_PREPARE_STATUS_CANT to have a fallback of stepping in-place. Isn't that the case?

Yeah, but I think that's what we want.  Failing to write the PC is an
"abort mission" kind of failure, IMO.  Something is very broken.

DISPLACED_STEP_PREPARE_STATUS_CANT is not equivalent to an errorp, it's "we have
successfully analyzed the instruction and concluded it can't be
displaced-step".  If we wanted to return a status code, I would suggest
to introduce a new one (e.g. DISPLACED_STEP_PREPARE_STATUS_ERROR).  But
I think the exception is fine, this is how other kinds of failure that
happen when resuming are reported, like when we fail to insert
breakpoints.  We arguably are not very good at handling those
gracefully, but that's the problem of this code here.

Simon
  
Luis Machado Nov. 2, 2022, 7:15 p.m. UTC | #4
On 11/2/22 18:22, Simon Marchi wrote:
> On 11/2/22 14:06, Luis Machado wrote:
>> On 11/2/22 17:44, Simon Marchi wrote:
>>> On 11/2/22 10:33, Luis Machado via Gdb-patches wrote:
>>>> v2: Add try/catch block
>>>>
>>>> Investigating PR29272, it was mentioned a particular test used to work on
>>>> GDB 10, but it started failing with GDB 11 onwards. I tracked it down to
>>>> some displaced stepping improvements on commit
>>>> 187b041e2514827b9d86190ed2471c4c7a352874.
>>>>
>>>> In particular, one of the corner cases using copy_insn_closure_by_addr got
>>>> silently broken. It is hard to spot because it doesn't have any good tests
>>>> for it, and the situation is quite specific to the Arm target.
>>>>
>>>> Essentially, the change from the displaced stepping improvements made it so
>>>> we could still invoke copy_insn_closure_by_addr correctly to return the
>>>> pointer to a copy_insn_closure, but it always returned nullptr due to
>>>> the order of the statements in displaced_step_buffer::prepare.
>>>>
>>>> The way it is now, we first write the address of the displaced step buffer
>>>> to PC and then save the copy_insn_closure pointer.
>>>>
>>>> The problem is that writing to PC for the Arm target requires figuring
>>>> out if the new PC is thumb mode or not.
>>>>
>>>> With no copy_insn_closure data, the logic to determine the thumb mode
>>>> during displaced stepping doesn't work, and gives random results that
>>>> are difficult to track (SIGILL, SIGSEGV etc).
>>>>
>>>> Fix this by reordering the PC write in displaced_step_buffer::prepare
>>>> and, for safety, add an assertion to
>>>> displaced_step_buffer::copy_insn_closure_by_addr so GDB stops right
>>>> when it sees this invalid situation. If this gets broken again in the
>>>> future, it will be easier to spot.
>>>>
>>>> Guard the code in a try/catch block to handle the case where we can't
>>>> write the PC, so as to not leave partial state in the displaced step
>>>> machinery.
>>>>
>>>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29272
>>>> ---
>>>>    gdb/displaced-stepping.c | 26 +++++++++++++++++++++++---
>>>>    1 file changed, 23 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/gdb/displaced-stepping.c b/gdb/displaced-stepping.c
>>>> index eac2c5dab94..3b5376cf31b 100644
>>>> --- a/gdb/displaced-stepping.c
>>>> +++ b/gdb/displaced-stepping.c
>>>> @@ -139,15 +139,31 @@ displaced_step_buffers::prepare (thread_info *thread, CORE_ADDR &displaced_pc)
>>>>          return DISPLACED_STEP_PREPARE_STATUS_CANT;
>>>>        }
>>>>    -  /* Resume execution at the copy.  */
>>>> -  regcache_write_pc (regcache, buffer->addr);
>>>> -
>>>>      /* This marks the buffer as being in use.  */
>>>>      buffer->current_thread = thread;
>>>>        /* Save this, now that we know everything went fine.  */
>>>>      buffer->copy_insn_closure = std::move (copy_insn_closure);
>>>>    +  /* Adjust the PC so it points to the displaced step buffer address that will
>>>> +     be used.  This needs to be done after we save the copy_insn_closure, as
>>>> +     some architectures (Arm, for one) need that information so they can adjust
>>>> +     other data as needed.  In particular, Arm needs to know if the instruction
>>>> +     being executed in the displaced step buffer is thumb or not.  Without that
>>>> +     information, things will be very wrong in a random way.  */
>>>> +  try
>>>> +    {
>>>> +      regcache_write_pc (regcache, buffer->addr);
>>>> +    }
>>>> +  catch (const gdb_exception_error &except)
>>>> +    {
>>>> +      /* Reset the displaced step buffer state if we failed to write PC.
>>>> +     Otherwise we will prevent this buffer from being used, as it will
>>>> +     always have a thread in buffer->current_thread.  */
>>>> +      buffer->current_thread = nullptr;
>>>> +      copy_insn_closure = std::move (buffer->copy_insn_closure);
>>>
>>> The intention would be clearer by just doing:
>>>
>>>     buffer->copy_insn_closure.reset ()
>>>
>>>> +      return DISPLACED_STEP_PREPARE_STATUS_CANT;
>>>
>>> I think we should just let the exception escape,
>>> DISPLACED_STEP_PREPARE_STATUS_CANT isn't meant to convey an error.
>>
>> Wouldn't letting it escape completely abort the single-stepping operation? I was expecting a return of
>> DISPLACED_STEP_PREPARE_STATUS_CANT to have a fallback of stepping in-place. Isn't that the case?
> 
> Yeah, but I think that's what we want.  Failing to write the PC is an
> "abort mission" kind of failure, IMO.  Something is very broken.
> 
> DISPLACED_STEP_PREPARE_STATUS_CANT is not equivalent to an errorp, it's "we have
> successfully analyzed the instruction and concluded it can't be
> displaced-step".  If we wanted to return a status code, I would suggest
> to introduce a new one (e.g. DISPLACED_STEP_PREPARE_STATUS_ERROR).  But
> I think the exception is fine, this is how other kinds of failure that
> happen when resuming are reported, like when we fail to insert
> breakpoints.  We arguably are not very good at handling those
> gracefully, but that's the problem of this code here.

Yeah. That's a reasonable point. I was hoping to salvage something from this bad situation and at least
let the user complete a single-stepping.

Let me get a v3 going.

> 
> Simon
  

Patch

diff --git a/gdb/displaced-stepping.c b/gdb/displaced-stepping.c
index eac2c5dab94..3b5376cf31b 100644
--- a/gdb/displaced-stepping.c
+++ b/gdb/displaced-stepping.c
@@ -139,15 +139,31 @@  displaced_step_buffers::prepare (thread_info *thread, CORE_ADDR &displaced_pc)
       return DISPLACED_STEP_PREPARE_STATUS_CANT;
     }
 
-  /* Resume execution at the copy.  */
-  regcache_write_pc (regcache, buffer->addr);
-
   /* This marks the buffer as being in use.  */
   buffer->current_thread = thread;
 
   /* Save this, now that we know everything went fine.  */
   buffer->copy_insn_closure = std::move (copy_insn_closure);
 
+  /* Adjust the PC so it points to the displaced step buffer address that will
+     be used.  This needs to be done after we save the copy_insn_closure, as
+     some architectures (Arm, for one) need that information so they can adjust
+     other data as needed.  In particular, Arm needs to know if the instruction
+     being executed in the displaced step buffer is thumb or not.  Without that
+     information, things will be very wrong in a random way.  */
+  try
+    {
+      regcache_write_pc (regcache, buffer->addr);
+    }
+  catch (const gdb_exception_error &except)
+    {
+      /* Reset the displaced step buffer state if we failed to write PC.
+	 Otherwise we will prevent this buffer from being used, as it will
+	 always have a thread in buffer->current_thread.  */
+      buffer->current_thread = nullptr;
+      copy_insn_closure = std::move (buffer->copy_insn_closure);
+      return DISPLACED_STEP_PREPARE_STATUS_CANT;
+    }
   /* Tell infrun not to try preparing a displaced step again for this inferior if
      all buffers are taken.  */
   thread->inf->displaced_step_state.unavailable = true;
@@ -264,7 +280,11 @@  displaced_step_buffers::copy_insn_closure_by_addr (CORE_ADDR addr)
   for (const displaced_step_buffer &buffer : m_buffers)
     {
       if (addr == buffer.addr)
+      {
+	/* The closure information should always be available. */
+	gdb_assert (buffer.copy_insn_closure.get () != nullptr);
 	return buffer.copy_insn_closure.get ();
+      }
     }
 
   return nullptr;