[2/2,gdb/dap] Fix stray KeyboardInterrupt after cancel
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gdb_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gdb_check--master-arm |
fail
|
Testing failed
|
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 |
fail
|
Testing failed
|
Commit Message
When running test-case gdb.dap/pause.exp 100 times in a loop, it passes
100/100.
But if we remove the two "sleep 0.2" from the test-case, we run into
(copied from dap.log and edited for readability):
...
Traceback (most recent call last):
File "startup.py", line 251, in message
def message():
KeyboardInterrupt
Quit
...
This happens as follows.
CancellationHandler.cancel calls gdb.interrupt to cancel a request in flight.
The idea is that this interrupt triggers while in fn here in message (a nested
function of send_gdb_with_response):
...
def message():
try:
val = fn()
result_q.put(val)
except (Exception, KeyboardInterrupt) as e:
result_q.put(e)
...
but instead it triggers outside the try/except.
Fix this by:
- in CancellationHandler, adding an variable interruptable to track whether
we're in the try/except,
- in CancellationHandler.cancel, setting a variable cancel_pending instead
of callling gdb.interrupt if not in the try/except, and
- handling cancel_pending in a few places.
This makes the test-case pass 100/100, also when adding the extra stressor of
"taskset -c 0", which makes the fail more likely without the patch.
I left the two "sleep 0.2" in the test-case, I didn't see a reason to remove
those.
Tested on aarch64-linux.
PR dap/31275
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31275
---
gdb/python/lib/gdb/dap/server.py | 42 ++++++++++++++++++++++++++++++--
1 file changed, 40 insertions(+), 2 deletions(-)
Comments
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
Tom> The idea is that this interrupt triggers while in fn here in message (a nested
Tom> function of send_gdb_with_response):
Tom> ...
Tom> def message():
Tom> try:
Tom> val = fn()
Tom> result_q.put(val)
Tom> except (Exception, KeyboardInterrupt) as e:
Tom> result_q.put(e)
Tom> ...
Tom> but instead it triggers outside the try/except.
Thanks for the clear explanation.
Tom> I left the two "sleep 0.2" in the test-case, I didn't see a reason to remove
Tom> those.
Those are there to try to make the test actually interrupt the command
in question -- removing them would make it more likely to just interrupt
the surrounding dispatch machinery, which isn't interesting.
Tom> @@ -435,12 +466,19 @@ def send_gdb_with_response(fn):
It seems to me that send_gdb also has this issue -- that is, the
cancellation could arrive just in time to cause a KeyboardInterrupt.
Also I wonder how multi-step / DAP-thread commands like 'continue' are
handled.
Tom> def message():
Tom> try:
Tom> - val = fn()
Tom> + if not _server.canceller.enter_interruptable():
Tom> + raise KeyboardInterrupt()
Tom> + try:
Tom> + val = fn()
Tom> + finally:
Tom> + _server.canceller.exit_interruptable()
This kind of thing is better with a context manager.
Tom> send_gdb(message)
Tom> + if _server.canceller.is_cancel_pending():
Tom> + result_q.put(KeyboardInterrupt())
Tom> val = result_q.get()
I don't understand how this addition has an effect. It seems to me that
since a Queue is FIFO, and since the response Queue is only used for a
single value, that this addition will just be ignored.
FWIW, cancellation is inherently racy. I'm not opposed to trying to
make this more reliable but I think it may be pretty hard.
Maybe some kind of installable exception handler that is run when
post_event fails would be one way. Not sure.
Tom
On 2/23/24 18:27, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
>
> Tom> The idea is that this interrupt triggers while in fn here in message (a nested
> Tom> function of send_gdb_with_response):
> Tom> ...
> Tom> def message():
> Tom> try:
> Tom> val = fn()
> Tom> result_q.put(val)
> Tom> except (Exception, KeyboardInterrupt) as e:
> Tom> result_q.put(e)
> Tom> ...
> Tom> but instead it triggers outside the try/except.
>
> Thanks for the clear explanation.
>
> Tom> I left the two "sleep 0.2" in the test-case, I didn't see a reason to remove
> Tom> those.
>
> Those are there to try to make the test actually interrupt the command
> in question -- removing them would make it more likely to just interrupt
> the surrounding dispatch machinery, which isn't interesting.
>
Posted v2 here (
https://sourceware.org/pipermail/gdb-patches/2024-February/206821.html ).
Ack, dropped that bit in the commit message in v2.
> Tom> @@ -435,12 +466,19 @@ def send_gdb_with_response(fn):
>
> It seems to me that send_gdb also has this issue -- that is, the
> cancellation could arrive just in time to cause a KeyboardInterrupt.
>
Fixed in v2 (though I'm not sure if there's a test-case exercising this).
> Also I wonder how multi-step / DAP-thread commands like 'continue' are
> handled.
>
I think that should be handled in v2.
> Tom> def message():
> Tom> try:
> Tom> - val = fn()
> Tom> + if not _server.canceller.enter_interruptable():
> Tom> + raise KeyboardInterrupt()
> Tom> + try:
> Tom> + val = fn()
> Tom> + finally:
> Tom> + _server.canceller.exit_interruptable()
>
> This kind of thing is better with a context manager.
>
Indeed, used that in v2.
> Tom> send_gdb(message)
> Tom> + if _server.canceller.is_cancel_pending():
> Tom> + result_q.put(KeyboardInterrupt())
> Tom> val = result_q.get()
>
> I don't understand how this addition has an effect. It seems to me that
> since a Queue is FIFO, and since the response Queue is only used for a
> single value, that this addition will just be ignored.
>
Agreed, it's unnecessary. Dropped in v2.
> FWIW, cancellation is inherently racy. I'm not opposed to trying to
> make this more reliable but I think it may be pretty hard.
>
> Maybe some kind of installable exception handler that is run when
> post_event fails would be one way. Not sure.
Not sure either. Anyway I hope v2 is an improvement.
Thanks,
- Tom
@@ -62,6 +62,8 @@ class CancellationHandler:
# The request currently being handled, or None.
self.in_flight = None
self.reqs = []
+ self.interruptable = False
+ self.cancel_pending = False
def starting(self, req):
"""Call at the start of the given request.
@@ -87,7 +89,10 @@ class CancellationHandler:
If the request has not yet been seen, the cancellation is queued."""
with self.lock:
if req == self.in_flight:
- gdb.interrupt()
+ if self.interruptable:
+ gdb.interrupt()
+ else:
+ self.cancel_pending = True
else:
# We don't actually ignore the request here, but in
# the 'starting' method. This way we don't have to
@@ -97,6 +102,29 @@ class CancellationHandler:
# to try to check for this.
heapq.heappush(self.reqs, req)
+ def is_cancel_pending(self):
+ """Is a cancellation pending. This can only be true when not in the
+ interruptable state."""
+ with self.lock:
+ if not self.cancel_pending:
+ return False
+ self.cancel_pending = False
+ return True
+
+ def enter_interruptable(self):
+ """Enter the interruptable state."""
+ with self.lock:
+ if self.cancel_pending:
+ self.cancel_pending = False
+ return False
+ self.interruptable = True
+ return True
+
+ def exit_interruptable(self):
+ """Exit the interruptable state."""
+ with self.lock:
+ self.interruptable = False
+
class Server:
"""The DAP server class."""
@@ -159,6 +187,9 @@ class Server:
result["success"] = False
result["message"] = str(e)
self.canceller.done(req)
+ if self.canceller.is_cancel_pending():
+ result["success"] = False
+ result["message"] = "cancelled"
return result
# Read inferior output and sends OutputEvents to the client. It
@@ -435,12 +466,19 @@ def send_gdb_with_response(fn):
def message():
try:
- val = fn()
+ if not _server.canceller.enter_interruptable():
+ raise KeyboardInterrupt()
+ try:
+ val = fn()
+ finally:
+ _server.canceller.exit_interruptable()
result_q.put(val)
except (Exception, KeyboardInterrupt) as e:
result_q.put(e)
send_gdb(message)
+ if _server.canceller.is_cancel_pending():
+ result_q.put(KeyboardInterrupt())
val = result_q.get()
if isinstance(val, (Exception, KeyboardInterrupt)):
raise val