[RFC] Release the GIL while running a gdb command or expression

Message ID 807f284a-e227-37ed-c197-170a7f2abe40@redhat.com
State Under Review, archived
Headers

Commit Message

Phil Muldoon Oct. 9, 2018, 9:51 a.m. UTC
  On 15/09/2018 05:07, Tom Tromey wrote:
> PR python/23615 points out that gdb.execute_gdb_command does not
> release the Python GIL.  This means that, while the gdb command is
> running, other Python threads do not run.
> 
> This patch solves the problem by introducing a new RAII class that can
> be used to temporarily release and then re-acquire the GIL, then puts
> this into the appropriate places in execute_gdb_command and
> gdbpy_parse_and_eval.
> 
> The main issue with this patch is that I could not think of a non-racy
> way to test it.  Any ideas?

We've had a similar patch in the Fedora RPM for a while. It's been on
my list to upstream for a bit.  Initially I was a bit reluctant
because I hadn't audited all the Python reentry points in GDB to make
sure we reacquired the GIL before interacting with Python. I realize
now this was not a real concern (reviews would catch this and if one
did slip by it would be fixed as a bug.) 

The only real difference in the patch approach was to make the GIL
release an option over a default.  I've included the relevant patch
snippet here:


Cheers,

Phil
  

Comments

Tom Tromey Oct. 9, 2018, 7:42 p.m. UTC | #1
>>>>> "Phil" == Phil Muldoon <pmuldoon@redhat.com> writes:

Phil> We've had a similar patch in the Fedora RPM for a while. It's been on
Phil> my list to upstream for a bit.  Initially I was a bit reluctant
Phil> because I hadn't audited all the Python reentry points in GDB to make
Phil> sure we reacquired the GIL before interacting with Python. I realize
Phil> now this was not a real concern (reviews would catch this and if one
Phil> did slip by it would be fixed as a bug.) 

Yes, I think it would result in a crash.

Phil> The only real difference in the patch approach was to make the GIL
Phil> release an option over a default.

I don't think this is necessary, mostly because I can't think of when it
would be desirable not to release the GIL; but also because when writing
Python one doesn't normally have to worry about the GIL -- it's not
really a Python-visible feature, nor should it be, since implementations
like PyPY don't have it.

Phil> As for a test, we also have a test included. It does not appear to be
Phil> racy for our purposes. I also include it for consideration. This
Phil> snippet is just for initial consideration and will make any changes
Phil> needed to include it in the patch.

I can try it.  If it works, whose name should I put on the ChangeLog?

Tom
  
Phil Muldoon Oct. 10, 2018, 8:33 a.m. UTC | #2
On 09/10/2018 20:42, Tom Tromey wrote:
>>>>>> "Phil" == Phil Muldoon <pmuldoon@redhat.com> writes:
> 
> Phil> We've had a similar patch in the Fedora RPM for a while. It's been on
> Phil> my list to upstream for a bit.  Initially I was a bit reluctant
> Phil> because I hadn't audited all the Python reentry points in GDB to make
> Phil> sure we reacquired the GIL before interacting with Python. I realize
> Phil> now this was not a real concern (reviews would catch this and if one
> Phil> did slip by it would be fixed as a bug.) 
> 
> Yes, I think it would result in a crash.
> 
> Phil> The only real difference in the patch approach was to make the GIL
> Phil> release an option over a default.
> 
> I don't think this is necessary, mostly because I can't think of when it
> would be desirable not to release the GIL; but also because when writing
> Python one doesn't normally have to worry about the GIL -- it's not
> really a Python-visible feature, nor should it be, since implementations
> like PyPY don't have it.

It's not so much an implementation detail that should be exposed to
the user but rather a change in behavior around gdb.execute. Given
that now, with this patch, we always release the Python GIL during the
execution of a GDB command via gdb.execute, any other Python thread
that was previously blocked by the GIL is now unblocked, and it may
appear to those threads that the Python thread that initiated the
gdb.execute has returned from that command when it may not have (this
is especially so in cases where a GDB command takes seconds to
complete a command). Also, any other Python threads that wish to
interact with GDB will have to wait until the GDB event loop returns
to a state where it is accepting input (at least I think this is
true). This may break some scripts out there. Are these scripts
relying on what we now classify as a bug or is there is a reasonable
expectation, on the users' behalf, that a script could rely on GDB's
previous GIL blocking behavior?  I'm not advocating we should have a
release_gil parameter, I'm just unsure of the expectations of users
and scripts out there, and that if we don't provide a mechanism to
optionally block the GIL, it will cause disruption to any established
scripts out there.

I suppose the solution is to either provide a GIL blocking parameter
or to thoroughly document this new behavior in the manual. What do
you think?

> Phil> As for a test, we also have a test included. It does not appear to be
> Phil> racy for our purposes. I also include it for consideration. This
> Phil> snippet is just for initial consideration and will make any changes
> Phil> needed to include it in the patch.
> 
> I can try it.  If it works, whose name should I put on the ChangeLog?

I wrote the GIL patch for Fedora so I suppose mine. The test itself may
need some modification in that it can be optimized and it may also
need some format tweaks.

Cheers

Phil
  
Tom Tromey Oct. 10, 2018, 2:07 p.m. UTC | #3
>>>>> "Phil" == Phil Muldoon <pmuldoon@redhat.com> writes:

>> I don't think this is necessary, mostly because I can't think of when it
>> would be desirable not to release the GIL; but also because when writing
>> Python one doesn't normally have to worry about the GIL -- it's not
>> really a Python-visible feature, nor should it be, since implementations
>> like PyPY don't have it.

Phil> It's not so much an implementation detail that should be exposed to
Phil> the user but rather a change in behavior around gdb.execute. Given
Phil> that now, with this patch, we always release the Python GIL during the
Phil> execution of a GDB command via gdb.execute, any other Python thread
Phil> that was previously blocked by the GIL is now unblocked, and it may
Phil> appear to those threads that the Python thread that initiated the
Phil> gdb.execute has returned from that command when it may not have (this
Phil> is especially so in cases where a GDB command takes seconds to
Phil> complete a command). Also, any other Python threads that wish to
Phil> interact with GDB will have to wait until the GDB event loop returns
Phil> to a state where it is accepting input (at least I think this is
Phil> true).

Actually we forbid using gdb APIs from threads other than the gdb
thread.  From python.texi:

    @value{GDBN} is not thread-safe.  If your Python program uses multiple
    threads, you must be careful to only call @value{GDBN}-specific
    functions in the @value{GDBN} thread.  @code{post_event} ensures
    this.

Phil> This may break some scripts out there. Are these scripts
Phil> relying on what we now classify as a bug or is there is a reasonable
Phil> expectation, on the users' behalf, that a script could rely on GDB's
Phil> previous GIL blocking behavior?  I'm not advocating we should have a
Phil> release_gil parameter, I'm just unsure of the expectations of users
Phil> and scripts out there, and that if we don't provide a mechanism to
Phil> optionally block the GIL, it will cause disruption to any established
Phil> scripts out there.

Phil> I suppose the solution is to either provide a GIL blocking parameter
Phil> or to thoroughly document this new behavior in the manual. What do
Phil> you think?

I think there's little risk of this breaking anything.  It seems like
just an ordinary bug fix to me.

Tom
  
Phil Muldoon Oct. 10, 2018, 2:38 p.m. UTC | #4
On 10/10/2018 15:07, Tom Tromey wrote:
>>>>>> "Phil" == Phil Muldoon <pmuldoon@redhat.com> writes:
> 
>>> I don't think this is necessary, mostly because I can't think of when it
>>> would be desirable not to release the GIL; but also because when writing
>>> Python one doesn't normally have to worry about the GIL -- it's not
>>> really a Python-visible feature, nor should it be, since implementations
>>> like PyPY don't have it.
> 
> Phil> It's not so much an implementation detail that should be exposed to
> Phil> the user but rather a change in behavior around gdb.execute. Given
> Phil> that now, with this patch, we always release the Python GIL during the
> Phil> execution of a GDB command via gdb.execute, any other Python thread
> Phil> that was previously blocked by the GIL is now unblocked, and it may
> Phil> appear to those threads that the Python thread that initiated the
> Phil> gdb.execute has returned from that command when it may not have (this
> Phil> is especially so in cases where a GDB command takes seconds to
> Phil> complete a command). Also, any other Python threads that wish to
> Phil> interact with GDB will have to wait until the GDB event loop returns
> Phil> to a state where it is accepting input (at least I think this is
> Phil> true).
> 
> Actually we forbid using gdb APIs from threads other than the gdb
> thread.  From python.texi:
> 
>     @value{GDBN} is not thread-safe.  If your Python program uses multiple
>     threads, you must be careful to only call @value{GDBN}-specific
>     functions in the @value{GDBN} thread.  @code{post_event} ensures
>     this.

Aha, well I learned something new today!

 
> Phil> This may break some scripts out there. Are these scripts
> Phil> relying on what we now classify as a bug or is there is a reasonable
> Phil> expectation, on the users' behalf, that a script could rely on GDB's
> Phil> previous GIL blocking behavior?  I'm not advocating we should have a
> Phil> release_gil parameter, I'm just unsure of the expectations of users
> Phil> and scripts out there, and that if we don't provide a mechanism to
> Phil> optionally block the GIL, it will cause disruption to any established
> Phil> scripts out there.
> 
> Phil> I suppose the solution is to either provide a GIL blocking parameter
> Phil> or to thoroughly document this new behavior in the manual. What do
> Phil> you think?
> 
> I think there's little risk of this breaking anything.  It seems like
> just an ordinary bug fix to me.
 

Ok. I looked at the patch and, from my point of view, it looks fine to
me.  I'll leave any testing decisions in your hands.

Cheers

Phil
  

Patch

--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -554,12 +554,16 @@  execute_gdb_command (PyObject *self, PyObject *args, PyObject *kw)
 {
   const char *arg;
   PyObject *from_tty_obj = NULL, *to_string_obj = NULL;
-  int from_tty, to_string;
-  static const char *keywords[] = { "command", "from_tty", "to_string", NULL };
+  int from_tty, to_string, release_gil;
+  static const char *keywords[] = {"command", "from_tty", "to_string", "release_gil", NULL };
+  PyObject *release_gil_obj = NULL;
 
-  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "s|O!O!", keywords, &arg,
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "s|O!O!O!", keywords, &arg,
 					&PyBool_Type, &from_tty_obj,
-					&PyBool_Type, &to_string_obj))
+					&PyBool_Type, &to_string_obj,
+					&PyBool_Type, &release_gil_obj))
     return NULL;
 
   from_tty = 0;
@@ -580,12 +584,28 @@  execute_gdb_command (PyObject *self, PyObject *args, PyObject *kw)
       to_string = cmp;
     }
 
+  release_gil = 0;
+  if (release_gil_obj)
+    {
+      int cmp = PyObject_IsTrue (release_gil_obj);
+      if (cmp < 0)
+	return NULL;
+      release_gil = cmp;
+    }
+
   std::string to_string_res;
 
   TRY
     {
       struct interp *interp;
 
+      /* In the case of long running GDB commands, allow the user to
+	 release the Python GIL.  Restore the GIL
+	 after the command has completed before handing back to
+	 Python.  */
+      if (release_gil)


As for a test, we also have a test included. It does not appear to be
racy for our purposes. I also include it for consideration. This
snippet is just for initial consideration and will make any changes
needed to include it in the patch.

diff --git a/gdb/testsuite/gdb.python/py-gil-mthread.c b/gdb/testsuite/gdb.python/py-gil-mthread.c
new file mode 100644
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-gil-mthread.c
@@ -0,0 +1,13 @@ 
+#include <stdio.h>
+#include <unistd.h>
+
+int
+main (void)
+{
+  int i;
+  for (i = 0; i < 10; i++)
+    {
+      sleep (1); /* break-here */
+      printf ("Sleeping %d\n", i);
+    }
+}
diff --git a/gdb/testsuite/gdb.python/py-gil-mthread.exp b/gdb/testsuite/gdb.python/py-gil-mthread.exp
new file mode 100644
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-gil-mthread.exp
@@ -0,0 +1,69 @@ 
+# Copyright (C) 2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+standard_testfile .c .py
+set executable $testfile
+
+if { [prepare_for_testing $testfile.exp $executable $srcfile] } {
+    return -1
+}
+
+# Skip all tests if Python scripting is not enabled.
+if { [skip_python_tests] } { continue }
+
+if ![runto_main] {
+    return -1
+}
+
+gdb_breakpoint $srcfile:[gdb_get_line_number "break-here"] temporary
+gdb_continue_to_breakpoint "break-here" ".* break-here .*"
+
+set test "response"
+set timeout 60
+set sleeping_last -1
+set hello_last 0
+set minimal 5
+gdb_test_multiple "python exec (open ('$srcdir/$subdir/$srcfile2').read ())" $test {
+    -re "Error: unable to start thread\r\n" {
+	fail $test
+    }
+    -re "Sleeping (\[0-9\]+)\r\n" {
+	set n $expect_out(1, string)
+	if { $sleeping_last + 1 != $n } {
+	    fail $test
+	} else {
+	    set sleeping_last $n
+	    if { $sleeping_last >= $minimal && $hello_last >= $minimal } {
+		pass $test
+	    } else {
+		exp_continue
+	    }
+	}
+    }
+    -re "Hello \\( (\[0-9\]+) \\)\r\n" {
+	set n $expect_out(1,string)
+	if { $hello_last + 1 != $n } {
+	    fail $test
+	} else {
+	    set hello_last $n
+	    if { $sleeping_last >= $minimal && $hello_last >= $minimal } {
+		pass $test
+	    } else {
+		exp_continue
+	    }
+	}
+    }
+}
diff --git a/gdb/testsuite/gdb.python/py-gil-mthread.py b/gdb/testsuite/gdb.python/py-gil-mthread.py
new file mode 100644
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-gil-mthread.py
@@ -0,0 +1,28 @@ 
+try:
+   import thread
+except:
+   import _thread
+import time
+import gdb
+
+# Define a function for the thread
+def print_thread_hello():
+   count = 0
+   while count < 10:
+      time.sleep(1)
+      count += 1
+      print ("Hello ( %d )" % count)
+
+# Create a thread and continue
+try:
+   thread.start_new_thread (print_thread_hello, ())
+   gdb.execute ("continue", release_gil=True)
+except:
+   try:
+      _thread.start_new_thread (print_thread_hello, ())
+      gdb.execute ("continue", release_gil=True)
+   except:
+      print ("Error: unable to start thread")
+
+while 1:
+   pass