[v2,2/2,gdb/dap] Fix stray KeyboardInterrupt after cancel
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gdb_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gdb_check--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 |
success
|
Testing passed
|
Commit Message
When running test-case gdb.dap/pause.exp 100 times in a loop, it passes
100/100.
But if we remove the two "sleep 0.2" from the test-case, we run into
(copied from dap.log and edited for readability):
...
Traceback (most recent call last):
File "startup.py", line 251, in message
def message():
KeyboardInterrupt
Quit
...
This happens as follows.
CancellationHandler.cancel calls gdb.interrupt to cancel a request in flight.
The idea is that this interrupt triggers while in fn here in message (a nested
function of send_gdb_with_response):
...
def message():
try:
val = fn()
result_q.put(val)
except (Exception, KeyboardInterrupt) as e:
result_q.put(e)
...
but instead it triggers outside the try/except.
Fix this by:
- in CancellationHandler, renaming variable in_flight to in_flight_dap_thread,
and adding a variable in_flight_gdb_thread to be able to distinguish when
a request is in flight in the dap thread or the gdb thread.
- adding a wrapper Cancellable to to deal with cancelling the wrapped
event
- using Cancellable in send_gdb and send_gdb_with_response to wrap the posted
event
- in CancellationHandler.cancel, only call gdb.interrupt if
req == self.in_flight_gdb_thread.
This makes the test-case pass 100/100, also when adding the extra stressor of
"taskset -c 0", which makes the fail more likely without the patch.
Tested on aarch64-linux.
PR dap/31275
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31275
---
gdb/python/lib/gdb/dap/server.py | 88 ++++++++++++++++++++++++--------
1 file changed, 67 insertions(+), 21 deletions(-)
Comments
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
Tom> Fix this by:
Tom> - in CancellationHandler, renaming variable in_flight to in_flight_dap_thread,
Tom> and adding a variable in_flight_gdb_thread to be able to distinguish when
Tom> a request is in flight in the dap thread or the gdb thread.
Tom> - adding a wrapper Cancellable to to deal with cancelling the wrapped
Tom> event
Tom> - using Cancellable in send_gdb and send_gdb_with_response to wrap the posted
Tom> event
Tom> - in CancellationHandler.cancel, only call gdb.interrupt if
Tom> req == self.in_flight_gdb_thread.
Thanks. I like this patch a lot.
I have one question about it.
Tom> + @contextmanager
Tom> + def interruptable_region(self, req):
Tom> + with self.lock:
Tom> + # If the request is cancelled, don't execute the region.
Tom> + while len(self.reqs) > 0 and self.reqs[0] <= req:
Tom> + if heapq.heappop(self.reqs) == req:
Tom> + raise KeyboardInterrupt()
Tom> + # Request is being handled by the gdb thread.
Tom> + self.in_flight_gdb_thread = req
(see below)
Tom> + try:
Tom> + # Execute region. This may be interrupted by gdb.interrupt.
Tom> + yield None
Tom> + finally:
Tom> + # Request has been handled by the gdb thread,
Tom> + with self.lock:
Tom> + self.in_flight_gdb_thread = None
Tom> +
I think it's possible for the interrupt to occur at the marked spot
above -- in_flight_gdb_thread has been set and the lock has been
released.
However, in this case it seems like in_flight_gdb_thread won't be reset?
I'm not really sure if this matters. However, wouldn't moving the 'with'
block inside the 'try' definitely fix this?
Tom
On 2/28/24 21:12, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
>
> Tom> Fix this by:
> Tom> - in CancellationHandler, renaming variable in_flight to in_flight_dap_thread,
> Tom> and adding a variable in_flight_gdb_thread to be able to distinguish when
> Tom> a request is in flight in the dap thread or the gdb thread.
> Tom> - adding a wrapper Cancellable to to deal with cancelling the wrapped
> Tom> event
> Tom> - using Cancellable in send_gdb and send_gdb_with_response to wrap the posted
> Tom> event
> Tom> - in CancellationHandler.cancel, only call gdb.interrupt if
> Tom> req == self.in_flight_gdb_thread.
>
> Thanks. I like this patch a lot.
>
> I have one question about it.
>
> Tom> + @contextmanager
> Tom> + def interruptable_region(self, req):
> Tom> + with self.lock:
> Tom> + # If the request is cancelled, don't execute the region.
> Tom> + while len(self.reqs) > 0 and self.reqs[0] <= req:
> Tom> + if heapq.heappop(self.reqs) == req:
> Tom> + raise KeyboardInterrupt()
> Tom> + # Request is being handled by the gdb thread.
> Tom> + self.in_flight_gdb_thread = req
>
> (see below)
>
> Tom> + try:
> Tom> + # Execute region. This may be interrupted by gdb.interrupt.
> Tom> + yield None
> Tom> + finally:
> Tom> + # Request has been handled by the gdb thread,
> Tom> + with self.lock:
> Tom> + self.in_flight_gdb_thread = None
> Tom> +
>
> I think it's possible for the interrupt to occur at the marked spot
> above -- in_flight_gdb_thread has been set and the lock has been
> released.
>
> However, in this case it seems like in_flight_gdb_thread won't be reset?
>
> I'm not really sure if this matters. However, wouldn't moving the 'with'
> block inside the 'try' definitely fix this?
>
Done in v3.
Thanks,
- Tom
> Tom
@@ -19,6 +19,7 @@ import heapq
import inspect
import json
import threading
+from contextlib import contextmanager
from .io import start_json_writer, read_json
from .startup import (
@@ -59,24 +60,19 @@ class CancellationHandler:
# Methods on this class acquire this lock before proceeding.
self.lock = threading.Lock()
# The request currently being handled, or None.
- self.in_flight = None
+ self.in_flight_dap_thread = None
+ self.in_flight_gdb_thread = None
self.reqs = []
def starting(self, req):
- """Call at the start of the given request.
-
- Throws the appropriate exception if the request should be
- immediately cancelled."""
+ """Call at the start of the given request."""
with self.lock:
- self.in_flight = req
- while len(self.reqs) > 0 and self.reqs[0] <= req:
- if heapq.heappop(self.reqs) == req:
- raise KeyboardInterrupt()
+ self.in_flight_dap_thread = req
def done(self, req):
"""Indicate that the request is done."""
with self.lock:
- self.in_flight = None
+ self.in_flight_dap_thread = None
def cancel(self, req):
"""Call to cancel a request.
@@ -85,7 +81,7 @@ class CancellationHandler:
If the request is in flight, it is interrupted.
If the request has not yet been seen, the cancellation is queued."""
with self.lock:
- if req == self.in_flight:
+ if req == self.in_flight_gdb_thread:
gdb.interrupt()
else:
# We don't actually ignore the request here, but in
@@ -96,6 +92,29 @@ class CancellationHandler:
# to try to check for this.
heapq.heappush(self.reqs, req)
+ @contextmanager
+ def interruptable_region(self, req):
+ """Return a new context manager that sets in_flight_gdb_thread to
+ REQ."""
+ if req is None:
+ # No request is handled in the region, just execute the region.
+ yield None
+ return
+ with self.lock:
+ # If the request is cancelled, don't execute the region.
+ while len(self.reqs) > 0 and self.reqs[0] <= req:
+ if heapq.heappop(self.reqs) == req:
+ raise KeyboardInterrupt()
+ # Request is being handled by the gdb thread.
+ self.in_flight_gdb_thread = req
+ try:
+ # Execute region. This may be interrupted by gdb.interrupt.
+ yield None
+ finally:
+ # Request has been handled by the gdb thread,
+ with self.lock:
+ self.in_flight_gdb_thread = None
+
class Server:
"""The DAP server class."""
@@ -433,13 +452,45 @@ class Invoker(object):
exec_and_log(self.cmd)
+class Cancellable(object):
+
+ def __init__(self, fn, result_q=None):
+ self.fn = fn
+ self.result_q = result_q
+ with _server.canceller.lock:
+ self.req = _server.canceller.in_flight_dap_thread
+
+ # This is invoked in the gdb thread to run self.fn.
+ @in_gdb_thread
+ def __call__(self):
+ try:
+ with _server.canceller.interruptable_region(self.req):
+ val = self.fn()
+ if self.result_q is not None:
+ self.result_q.put(val)
+ except (Exception, KeyboardInterrupt) as e:
+ if self.result_q is not None:
+ # Pass result or exception to caller.
+ self.result_q.put(e)
+ elif isinstance(e, KeyboardInterrupt):
+ # Fn was cancelled.
+ pass
+ else:
+ # Exception happened. Ignore and log it.
+ err_string = "%s, %s" % (err, type(err))
+ thread_log("caught exception: " + err_string)
+ log_stack()
+
+
def send_gdb(cmd):
"""Send CMD to the gdb thread.
CMD can be either a function or a string.
If it is a string, it is passed to gdb.execute."""
if isinstance(cmd, str):
cmd = Invoker(cmd)
- gdb.post_event(cmd)
+
+ # Post the event and don't wait for the result.
+ gdb.post_event(Cancellable(cmd))
def send_gdb_with_response(fn):
@@ -451,17 +502,12 @@ def send_gdb_with_response(fn):
"""
if isinstance(fn, str):
fn = Invoker(fn)
- result_q = DAPQueue()
-
- def message():
- try:
- val = fn()
- result_q.put(val)
- except (Exception, KeyboardInterrupt) as e:
- result_q.put(e)
- send_gdb(message)
+ # Post the event and wait for the result in result_q.
+ result_q = DAPQueue()
+ gdb.post_event(Cancellable(fn, result_q))
val = result_q.get()
+
if isinstance(val, (Exception, KeyboardInterrupt)):
raise val
return val