[2/2,gdb/dap] Fix stray KeyboardInterrupt after cancel

Message ID 20240223165128.32211-2-tdevries@suse.de
State Superseded
Headers
Series [1/2,gdb/dap] Move send_gdb_with_response to server module |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm fail Testing failed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 fail Testing failed

Commit Message

Tom de Vries Feb. 23, 2024, 4:51 p.m. UTC
  When running test-case gdb.dap/pause.exp 100 times in a loop, it passes
100/100.

But if we remove the two "sleep 0.2" from the test-case, we run into
(copied from dap.log and edited for readability):
...
Traceback (most recent call last):
  File "startup.py", line 251, in message
    def message():

KeyboardInterrupt
Quit
...

This happens as follows.

CancellationHandler.cancel calls gdb.interrupt to cancel a request in flight.

The idea is that this interrupt triggers while in fn here in message (a nested
function of send_gdb_with_response):
...
    def message():
        try:
            val = fn()
            result_q.put(val)
        except (Exception, KeyboardInterrupt) as e:
            result_q.put(e)
...
but instead it triggers outside the try/except.

Fix this by:
- in CancellationHandler, adding an variable interruptable to track whether
  we're in the try/except,
- in CancellationHandler.cancel, setting a variable cancel_pending instead
  of callling gdb.interrupt if not in the try/except, and
- handling cancel_pending in a few places.

This makes the test-case pass 100/100, also when adding the extra stressor of
"taskset -c 0", which makes the fail more likely without the patch.

I left the two "sleep 0.2" in the test-case, I didn't see a reason to remove
those.

Tested on aarch64-linux.

PR dap/31275
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31275
---
 gdb/python/lib/gdb/dap/server.py | 42 ++++++++++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)
  

Comments

Tom Tromey Feb. 23, 2024, 5:27 p.m. UTC | #1
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:

Tom> The idea is that this interrupt triggers while in fn here in message (a nested
Tom> function of send_gdb_with_response):
Tom> ...
Tom>     def message():
Tom>         try:
Tom>             val = fn()
Tom>             result_q.put(val)
Tom>         except (Exception, KeyboardInterrupt) as e:
Tom>             result_q.put(e)
Tom> ...
Tom> but instead it triggers outside the try/except.

Thanks for the clear explanation.

Tom> I left the two "sleep 0.2" in the test-case, I didn't see a reason to remove
Tom> those.

Those are there to try to make the test actually interrupt the command
in question -- removing them would make it more likely to just interrupt
the surrounding dispatch machinery, which isn't interesting.

Tom> @@ -435,12 +466,19 @@ def send_gdb_with_response(fn):
 
It seems to me that send_gdb also has this issue -- that is, the
cancellation could arrive just in time to cause a KeyboardInterrupt.

Also I wonder how multi-step / DAP-thread commands like 'continue' are
handled.

Tom>      def message():
Tom>          try:
Tom> -            val = fn()
Tom> +            if not _server.canceller.enter_interruptable():
Tom> +                raise KeyboardInterrupt()
Tom> +            try:
Tom> +                val = fn()
Tom> +            finally:
Tom> +                _server.canceller.exit_interruptable()

This kind of thing is better with a context manager.

Tom>      send_gdb(message)
Tom> +    if _server.canceller.is_cancel_pending():
Tom> +        result_q.put(KeyboardInterrupt())
Tom>      val = result_q.get()

I don't understand how this addition has an effect.  It seems to me that
since a Queue is FIFO, and since the response Queue is only used for a
single value, that this addition will just be ignored.

FWIW, cancellation is inherently racy.  I'm not opposed to trying to
make this more reliable but I think it may be pretty hard.

Maybe some kind of installable exception handler that is run when
post_event fails would be one way.  Not sure.

Tom
  
Tom de Vries Feb. 26, 2024, 12:24 p.m. UTC | #2
On 2/23/24 18:27, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
> 
> Tom> The idea is that this interrupt triggers while in fn here in message (a nested
> Tom> function of send_gdb_with_response):
> Tom> ...
> Tom>     def message():
> Tom>         try:
> Tom>             val = fn()
> Tom>             result_q.put(val)
> Tom>         except (Exception, KeyboardInterrupt) as e:
> Tom>             result_q.put(e)
> Tom> ...
> Tom> but instead it triggers outside the try/except.
> 
> Thanks for the clear explanation.
> 
> Tom> I left the two "sleep 0.2" in the test-case, I didn't see a reason to remove
> Tom> those.
> 
> Those are there to try to make the test actually interrupt the command
> in question -- removing them would make it more likely to just interrupt
> the surrounding dispatch machinery, which isn't interesting.
> 

Posted v2 here ( 
https://sourceware.org/pipermail/gdb-patches/2024-February/206821.html ).

Ack, dropped that bit in the commit message in v2.

> Tom> @@ -435,12 +466,19 @@ def send_gdb_with_response(fn):
>   
> It seems to me that send_gdb also has this issue -- that is, the
> cancellation could arrive just in time to cause a KeyboardInterrupt.
> 

Fixed in v2 (though I'm not sure if there's a test-case exercising this).

> Also I wonder how multi-step / DAP-thread commands like 'continue' are
> handled.
> 

I think that should be handled in v2.

> Tom>      def message():
> Tom>          try:
> Tom> -            val = fn()
> Tom> +            if not _server.canceller.enter_interruptable():
> Tom> +                raise KeyboardInterrupt()
> Tom> +            try:
> Tom> +                val = fn()
> Tom> +            finally:
> Tom> +                _server.canceller.exit_interruptable()
> 
> This kind of thing is better with a context manager.
> 

Indeed, used that in v2.

> Tom>      send_gdb(message)
> Tom> +    if _server.canceller.is_cancel_pending():
> Tom> +        result_q.put(KeyboardInterrupt())
> Tom>      val = result_q.get()
> 
> I don't understand how this addition has an effect.  It seems to me that
> since a Queue is FIFO, and since the response Queue is only used for a
> single value, that this addition will just be ignored.
> 

Agreed, it's unnecessary.  Dropped in v2.

> FWIW, cancellation is inherently racy.  I'm not opposed to trying to
> make this more reliable but I think it may be pretty hard.
> 
> Maybe some kind of installable exception handler that is run when
> post_event fails would be one way.  Not sure.

Not sure either.  Anyway I hope v2 is an improvement.

Thanks,
- Tom
  

Patch

diff --git a/gdb/python/lib/gdb/dap/server.py b/gdb/python/lib/gdb/dap/server.py
index ecec41cc321..0b27d23da1e 100644
--- a/gdb/python/lib/gdb/dap/server.py
+++ b/gdb/python/lib/gdb/dap/server.py
@@ -62,6 +62,8 @@  class CancellationHandler:
         # The request currently being handled, or None.
         self.in_flight = None
         self.reqs = []
+        self.interruptable = False
+        self.cancel_pending = False
 
     def starting(self, req):
         """Call at the start of the given request.
@@ -87,7 +89,10 @@  class CancellationHandler:
         If the request has not yet been seen, the cancellation is queued."""
         with self.lock:
             if req == self.in_flight:
-                gdb.interrupt()
+                if self.interruptable:
+                    gdb.interrupt()
+                else:
+                    self.cancel_pending = True
             else:
                 # We don't actually ignore the request here, but in
                 # the 'starting' method.  This way we don't have to
@@ -97,6 +102,29 @@  class CancellationHandler:
                 # to try to check for this.
                 heapq.heappush(self.reqs, req)
 
+    def is_cancel_pending(self):
+        """Is a cancellation pending.  This can only be true when not in the
+        interruptable state."""
+        with self.lock:
+            if not self.cancel_pending:
+                return False
+            self.cancel_pending = False
+            return True
+
+    def enter_interruptable(self):
+        """Enter the interruptable state."""
+        with self.lock:
+            if self.cancel_pending:
+                self.cancel_pending = False
+                return False
+            self.interruptable = True
+            return True
+
+    def exit_interruptable(self):
+        """Exit the interruptable state."""
+        with self.lock:
+            self.interruptable = False
+
 
 class Server:
     """The DAP server class."""
@@ -159,6 +187,9 @@  class Server:
             result["success"] = False
             result["message"] = str(e)
         self.canceller.done(req)
+        if self.canceller.is_cancel_pending():
+            result["success"] = False
+            result["message"] = "cancelled"
         return result
 
     # Read inferior output and sends OutputEvents to the client.  It
@@ -435,12 +466,19 @@  def send_gdb_with_response(fn):
 
     def message():
         try:
-            val = fn()
+            if not _server.canceller.enter_interruptable():
+                raise KeyboardInterrupt()
+            try:
+                val = fn()
+            finally:
+                _server.canceller.exit_interruptable()
             result_q.put(val)
         except (Exception, KeyboardInterrupt) as e:
             result_q.put(e)
 
     send_gdb(message)
+    if _server.canceller.is_cancel_pending():
+        result_q.put(KeyboardInterrupt())
     val = result_q.get()
     if isinstance(val, (Exception, KeyboardInterrupt)):
         raise val