[v2,2/2,gdb/dap] Fix stray KeyboardInterrupt after cancel

Message ID 20240226115825.29705-2-tdevries@suse.de
State Superseded
Headers
Series [v2,1/2,gdb/dap] Move send_gdb and send_gdb_with_response to server module |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed

Commit Message

Tom de Vries Feb. 26, 2024, 11:58 a.m. UTC
  When running test-case gdb.dap/pause.exp 100 times in a loop, it passes
100/100.

But if we remove the two "sleep 0.2" from the test-case, we run into
(copied from dap.log and edited for readability):
...
Traceback (most recent call last):
  File "startup.py", line 251, in message
    def message():

KeyboardInterrupt
Quit
...

This happens as follows.

CancellationHandler.cancel calls gdb.interrupt to cancel a request in flight.

The idea is that this interrupt triggers while in fn here in message (a nested
function of send_gdb_with_response):
...
    def message():
        try:
            val = fn()
            result_q.put(val)
        except (Exception, KeyboardInterrupt) as e:
            result_q.put(e)
...
but instead it triggers outside the try/except.

Fix this by:
- in CancellationHandler, renaming variable in_flight to in_flight_dap_thread,
  and adding a variable in_flight_gdb_thread to be able to distinguish when
  a request is in flight in the dap thread or the gdb thread.
- adding a wrapper Cancellable to to deal with cancelling the wrapped
  event
- using Cancellable in send_gdb and send_gdb_with_response to wrap the posted
  event
- in CancellationHandler.cancel, only call gdb.interrupt if
  req == self.in_flight_gdb_thread.

This makes the test-case pass 100/100, also when adding the extra stressor of
"taskset -c 0", which makes the fail more likely without the patch.

Tested on aarch64-linux.

PR dap/31275
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31275
---
 gdb/python/lib/gdb/dap/server.py | 88 ++++++++++++++++++++++++--------
 1 file changed, 67 insertions(+), 21 deletions(-)
  

Comments

Tom Tromey Feb. 28, 2024, 8:12 p.m. UTC | #1
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:

Tom> Fix this by:
Tom> - in CancellationHandler, renaming variable in_flight to in_flight_dap_thread,
Tom>   and adding a variable in_flight_gdb_thread to be able to distinguish when
Tom>   a request is in flight in the dap thread or the gdb thread.
Tom> - adding a wrapper Cancellable to to deal with cancelling the wrapped
Tom>   event
Tom> - using Cancellable in send_gdb and send_gdb_with_response to wrap the posted
Tom>   event
Tom> - in CancellationHandler.cancel, only call gdb.interrupt if
Tom>   req == self.in_flight_gdb_thread.

Thanks.  I like this patch a lot.

I have one question about it.

Tom> +    @contextmanager
Tom> +    def interruptable_region(self, req):
Tom> +        with self.lock:
Tom> +            # If the request is cancelled, don't execute the region.
Tom> +            while len(self.reqs) > 0 and self.reqs[0] <= req:
Tom> +                if heapq.heappop(self.reqs) == req:
Tom> +                    raise KeyboardInterrupt()
Tom> +            # Request is being handled by the gdb thread.
Tom> +            self.in_flight_gdb_thread = req

(see below)

Tom> +        try:
Tom> +            # Execute region.  This may be interrupted by gdb.interrupt.
Tom> +            yield None
Tom> +        finally:
Tom> +            # Request has been handled by the gdb thread,
Tom> +            with self.lock:
Tom> +                self.in_flight_gdb_thread = None
Tom> +

I think it's possible for the interrupt to occur at the marked spot
above -- in_flight_gdb_thread has been set and the lock has been
released.

However, in this case it seems like in_flight_gdb_thread won't be reset?

I'm not really sure if this matters.  However, wouldn't moving the 'with'
block inside the 'try' definitely fix this?

Tom
  
Tom de Vries Feb. 29, 2024, 2:07 p.m. UTC | #2
On 2/28/24 21:12, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
> 
> Tom> Fix this by:
> Tom> - in CancellationHandler, renaming variable in_flight to in_flight_dap_thread,
> Tom>   and adding a variable in_flight_gdb_thread to be able to distinguish when
> Tom>   a request is in flight in the dap thread or the gdb thread.
> Tom> - adding a wrapper Cancellable to to deal with cancelling the wrapped
> Tom>   event
> Tom> - using Cancellable in send_gdb and send_gdb_with_response to wrap the posted
> Tom>   event
> Tom> - in CancellationHandler.cancel, only call gdb.interrupt if
> Tom>   req == self.in_flight_gdb_thread.
> 
> Thanks.  I like this patch a lot.
> 
> I have one question about it.
> 
> Tom> +    @contextmanager
> Tom> +    def interruptable_region(self, req):
> Tom> +        with self.lock:
> Tom> +            # If the request is cancelled, don't execute the region.
> Tom> +            while len(self.reqs) > 0 and self.reqs[0] <= req:
> Tom> +                if heapq.heappop(self.reqs) == req:
> Tom> +                    raise KeyboardInterrupt()
> Tom> +            # Request is being handled by the gdb thread.
> Tom> +            self.in_flight_gdb_thread = req
> 
> (see below)
> 
> Tom> +        try:
> Tom> +            # Execute region.  This may be interrupted by gdb.interrupt.
> Tom> +            yield None
> Tom> +        finally:
> Tom> +            # Request has been handled by the gdb thread,
> Tom> +            with self.lock:
> Tom> +                self.in_flight_gdb_thread = None
> Tom> +
> 
> I think it's possible for the interrupt to occur at the marked spot
> above -- in_flight_gdb_thread has been set and the lock has been
> released.
> 
> However, in this case it seems like in_flight_gdb_thread won't be reset?
> 
> I'm not really sure if this matters.  However, wouldn't moving the 'with'
> block inside the 'try' definitely fix this?
> 

Done in v3.

Thanks,
- Tom


> Tom
  

Patch

diff --git a/gdb/python/lib/gdb/dap/server.py b/gdb/python/lib/gdb/dap/server.py
index e957b886fb8..59f486f5729 100644
--- a/gdb/python/lib/gdb/dap/server.py
+++ b/gdb/python/lib/gdb/dap/server.py
@@ -19,6 +19,7 @@  import heapq
 import inspect
 import json
 import threading
+from contextlib import contextmanager
 
 from .io import start_json_writer, read_json
 from .startup import (
@@ -59,24 +60,19 @@  class CancellationHandler:
         # Methods on this class acquire this lock before proceeding.
         self.lock = threading.Lock()
         # The request currently being handled, or None.
-        self.in_flight = None
+        self.in_flight_dap_thread = None
+        self.in_flight_gdb_thread = None
         self.reqs = []
 
     def starting(self, req):
-        """Call at the start of the given request.
-
-        Throws the appropriate exception if the request should be
-        immediately cancelled."""
+        """Call at the start of the given request."""
         with self.lock:
-            self.in_flight = req
-            while len(self.reqs) > 0 and self.reqs[0] <= req:
-                if heapq.heappop(self.reqs) == req:
-                    raise KeyboardInterrupt()
+            self.in_flight_dap_thread = req
 
     def done(self, req):
         """Indicate that the request is done."""
         with self.lock:
-            self.in_flight = None
+            self.in_flight_dap_thread = None
 
     def cancel(self, req):
         """Call to cancel a request.
@@ -85,7 +81,7 @@  class CancellationHandler:
         If the request is in flight, it is interrupted.
         If the request has not yet been seen, the cancellation is queued."""
         with self.lock:
-            if req == self.in_flight:
+            if req == self.in_flight_gdb_thread:
                 gdb.interrupt()
             else:
                 # We don't actually ignore the request here, but in
@@ -96,6 +92,29 @@  class CancellationHandler:
                 # to try to check for this.
                 heapq.heappush(self.reqs, req)
 
+    @contextmanager
+    def interruptable_region(self, req):
+        """Return a new context manager that sets in_flight_gdb_thread to
+        REQ."""
+        if req is None:
+            # No request is handled in the region, just execute the region.
+            yield None
+            return
+        with self.lock:
+            # If the request is cancelled, don't execute the region.
+            while len(self.reqs) > 0 and self.reqs[0] <= req:
+                if heapq.heappop(self.reqs) == req:
+                    raise KeyboardInterrupt()
+            # Request is being handled by the gdb thread.
+            self.in_flight_gdb_thread = req
+        try:
+            # Execute region.  This may be interrupted by gdb.interrupt.
+            yield None
+        finally:
+            # Request has been handled by the gdb thread,
+            with self.lock:
+                self.in_flight_gdb_thread = None
+
 
 class Server:
     """The DAP server class."""
@@ -433,13 +452,45 @@  class Invoker(object):
         exec_and_log(self.cmd)
 
 
+class Cancellable(object):
+
+    def __init__(self, fn, result_q=None):
+        self.fn = fn
+        self.result_q = result_q
+        with _server.canceller.lock:
+            self.req = _server.canceller.in_flight_dap_thread
+
+    # This is invoked in the gdb thread to run self.fn.
+    @in_gdb_thread
+    def __call__(self):
+        try:
+            with _server.canceller.interruptable_region(self.req):
+                val = self.fn()
+                if self.result_q is not None:
+                    self.result_q.put(val)
+        except (Exception, KeyboardInterrupt) as e:
+            if self.result_q is not None:
+                # Pass result or exception to caller.
+                self.result_q.put(e)
+            elif isinstance(e, KeyboardInterrupt):
+                # Fn was cancelled.
+                pass
+            else:
+                # Exception happened.  Ignore and log it.
+                err_string = "%s, %s" % (err, type(err))
+                thread_log("caught exception: " + err_string)
+                log_stack()
+
+
 def send_gdb(cmd):
     """Send CMD to the gdb thread.
     CMD can be either a function or a string.
     If it is a string, it is passed to gdb.execute."""
     if isinstance(cmd, str):
         cmd = Invoker(cmd)
-    gdb.post_event(cmd)
+
+    # Post the event and don't wait for the result.
+    gdb.post_event(Cancellable(cmd))
 
 
 def send_gdb_with_response(fn):
@@ -451,17 +502,12 @@  def send_gdb_with_response(fn):
     """
     if isinstance(fn, str):
         fn = Invoker(fn)
-    result_q = DAPQueue()
-
-    def message():
-        try:
-            val = fn()
-            result_q.put(val)
-        except (Exception, KeyboardInterrupt) as e:
-            result_q.put(e)
 
-    send_gdb(message)
+    # Post the event and wait for the result in result_q.
+    result_q = DAPQueue()
+    gdb.post_event(Cancellable(fn, result_q))
     val = result_q.get()
+
     if isinstance(val, (Exception, KeyboardInterrupt)):
         raise val
     return val