linux_nat_kill() compat. with linux-2.4.x

Message ID 20140316135916.GA31463@host2.jankratochvil.net
State Superseded
Headers

Commit Message

Jan Kratochvil March 16, 2014, 1:59 p.m. UTC
  Hi,

it had been already approved by Tom that time so I will check it in in some time.
	[patch] Fix SIGTERM signal safety (PR gdb/15358)
	https://sourceware.org/ml/gdb-patches/2013-07/msg00094.html
	Message-ID: <20130702200010.GA23478@host2.jankratochvil.net>

Doug has just asked to split it out of:
	Re: [patchv2] Fix SIGTERM signal safety (PR gdb/15358) [refresh]
	https://sourceware.org/ml/gdb-patches/2014-03/msg00342.html
	Message-ID: <21284.44419.745786.47756@ruffy.mtv.corp.google.com>

The testcase of the patch above still PASSes for me on Fedora 20 x86_64 even
without this patch, I guess this patch was needed only on that 2.4.x Linux
kernel it mentions.  I am no longer interested in 2.4.x so if anyone has any
concerns I am also fine with dropping this patch.


Thanks,
Jan


gdb/
2014-03-16  Jan Kratochvil  <jan.kratochvil@redhat.com>

	PR gdb/15358
        * linux-nat.c (linux_nat_kill): Use kill_callback first.
        Extend the comment for stop_callback.
  

Comments

Pedro Alves May 21, 2014, 12:49 p.m. UTC | #1
Hi Jan,

On 03/16/2014 01:59 PM, Jan Kratochvil wrote:

> gdb/
> 2014-03-16  Jan Kratochvil  <jan.kratochvil@redhat.com>
> 
> 	PR gdb/15358
>         * linux-nat.c (linux_nat_kill): Use kill_callback first.
>         Extend the comment for stop_callback.
> 
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index b615423..ec84188 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -3777,8 +3777,15 @@ linux_nat_kill (struct target_ops *ops)
>      {
>        ptid_t ptid = pid_to_ptid (ptid_get_pid (inferior_ptid));
>  
> +      /* Kill all LWP's before trying to stop them.  In rare cases the
> +	 lwp_info state may not match the inferior and
> +	 stop_wait_callback could lock up.  */

Hmm, I find this comment confusing and not really enlightening.
What sort of rare cases?  It that PR15713?  Best just fix that.
I've sent a patch:
https://sourceware.org/ml/gdb-patches/2014-05/msg00473.html
  
Jan Kratochvil June 6, 2014, 8:15 p.m. UTC | #2
On Wed, 21 May 2014 14:49:43 +0200, Pedro Alves wrote:
> On 03/16/2014 01:59 PM, Jan Kratochvil wrote:
> > diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> > index b615423..ec84188 100644
> > --- a/gdb/linux-nat.c
> > +++ b/gdb/linux-nat.c
> > @@ -3777,8 +3777,15 @@ linux_nat_kill (struct target_ops *ops)
> >      {
> >        ptid_t ptid = pid_to_ptid (ptid_get_pid (inferior_ptid));
> >  
> > +      /* Kill all LWP's before trying to stop them.  In rare cases the
> > +	 lwp_info state may not match the inferior and
> > +	 stop_wait_callback could lock up.  */
> 
> Hmm, I find this comment confusing and not really enlightening.
> What sort of rare cases?  It that PR15713?  Best just fix that.
> I've sent a patch:
> https://sourceware.org/ml/gdb-patches/2014-05/msg00473.html

The reproducible case is that PR15713 and it is sure great you have fixed it.

Fine with dropping the patch although I still do not find it obvious the patch
is no longer relevant.

FSF GDB now relies on fact that ptraced inferior state always matches
lp->stopped and there is a matching signal to wait for etc.  In some cases GDB
hangs during quit (and inferiors cleanup) and one has to kill GDB itself.
Reasons are not known to me as I do not know how to reproduce it.
(It may be also possible all such reasons have been fixed now.)
It also may hang somewhere else and not in linux_nat_kill().

This patch made GDB foolproof against any state of inferior when killing the
inferior so that GDB could no longer hang.  But it would hide some possible
remaining bugs in the code (which may be causing the GDB hangs).


Jan
  
Pedro Alves June 9, 2014, 10:29 a.m. UTC | #3
On 06/06/2014 09:15 PM, Jan Kratochvil wrote:
> On Wed, 21 May 2014 14:49:43 +0200, Pedro Alves wrote:
>> On 03/16/2014 01:59 PM, Jan Kratochvil wrote:
>>> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
>>> index b615423..ec84188 100644
>>> --- a/gdb/linux-nat.c
>>> +++ b/gdb/linux-nat.c
>>> @@ -3777,8 +3777,15 @@ linux_nat_kill (struct target_ops *ops)
>>>      {
>>>        ptid_t ptid = pid_to_ptid (ptid_get_pid (inferior_ptid));
>>>  
>>> +      /* Kill all LWP's before trying to stop them.  In rare cases the
>>> +	 lwp_info state may not match the inferior and
>>> +	 stop_wait_callback could lock up.  */
>>
>> Hmm, I find this comment confusing and not really enlightening.
>> What sort of rare cases?  It that PR15713?  Best just fix that.
>> I've sent a patch:
>> https://sourceware.org/ml/gdb-patches/2014-05/msg00473.html
> 
> The reproducible case is that PR15713 and it is sure great you have fixed it.

Great.

> Fine with dropping the patch although I still do not find it obvious the patch
> is no longer relevant.

Thanks, I'd rather drop it.

> FSF GDB now relies on fact that ptraced inferior state always matches
> lp->stopped and there is a matching signal to wait for etc.  In some cases GDB
> hangs during quit (and inferiors cleanup) and one has to kill GDB itself.
> Reasons are not known to me as I do not know how to reproduce it.
> (It may be also possible all such reasons have been fixed now.)

I think any case that this bullet proofing patch could work around
have now been fixed.

> It also may hang somewhere else and not in linux_nat_kill().

Right.  I'd say it's much more likely (even though not very likely, tbc)
that we see hangs elsewhere than in linux_nat_kill now.  E.g., GDB core's
is_executing state getting out of sync with lwp->stopped.

> This patch made GDB foolproof against any state of inferior when killing the
> inferior so that GDB could no longer hang.  But it would hide some possible
> remaining bugs in the code (which may be causing the GDB hangs).

Right, I'd rather just fix the GDB bugs, and only consider such a patch
if we need to work around a kernel bug (though I'd rather just
ignore very old kernels like 2.4 by now).
  
Jan Kratochvil June 9, 2014, 10:32 a.m. UTC | #4
On Mon, 09 Jun 2014 12:29:28 +0200, Pedro Alves wrote:
> On 06/06/2014 09:15 PM, Jan Kratochvil wrote:
> > It also may hang somewhere else and not in linux_nat_kill().
> 
> Right.  I'd say it's much more likely (even though not very likely, tbc)
> that we see hangs elsewhere than in linux_nat_kill now.  E.g., GDB core's
> is_executing state getting out of sync with lwp->stopped.
> 
> > This patch made GDB foolproof against any state of inferior when killing the
> > inferior so that GDB could no longer hang.  But it would hide some possible
> > remaining bugs in the code (which may be causing the GDB hangs).
> 
> Right, I'd rather just fix the GDB bugs, and only consider such a patch
> if we need to work around a kernel bug (though I'd rather just
> ignore very old kernels like 2.4 by now).

OK, so according to the approach chosen above considering this patch dropped.


Jan
  

Patch

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index b615423..ec84188 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -3777,8 +3777,15 @@  linux_nat_kill (struct target_ops *ops)
     {
       ptid_t ptid = pid_to_ptid (ptid_get_pid (inferior_ptid));
 
+      /* Kill all LWP's before trying to stop them.  In rare cases the
+	 lwp_info state may not match the inferior and
+	 stop_wait_callback could lock up.  */
+      iterate_over_lwps (ptid, kill_callback, NULL);
+
       /* Stop all threads before killing them, since ptrace requires
-	 that the thread is stopped to sucessfully PTRACE_KILL.  */
+	 that the thread is stopped to sucessfully PTRACE_KILL.
+	 kill_callback normally already turned the inferior into a zombie
+	 except for old Linux kernels 2.4.x.  */
       iterate_over_lwps (ptid, stop_callback, NULL);
       /* ... and wait until all of them have reported back that
 	 they're no longer running.  */