Readline: Cleanup some warnings

Message ID 3463805B-A8BF-4C20-ACE3-C21AE3F7DB62@arm.com
State New, archived
Headers

Commit Message

Alan Hayward Jan. 31, 2019, 5:24 p.m. UTC
  > On 31 Jan 2019, at 10:02, Alan Hayward <Alan.Hayward@arm.com> wrote:
> 
> 
> 
>> On 31 Jan 2019, at 07:59, Joel Brobecker <brobecker@adacore.com> wrote:
>> 
>>> (Posted this first to binutils, then was directed back here and
>>> pointed at the upstream readline.  Looking at the upstream
>>> readline it has already fixed the issues below in the same way.
>>> Tested by running the gdb testsuite - couldn't see a readline
>>> specific suite.) 
>>> 
>>> Cleanup the readline warnings that gdb buildbot complains about.
>>> 
>>> To prevent wcwidth missing declaration warnings, add the SOURCE /
>>> EXTENSION macros to config.in that have already checked for in
>>> configure.  Use the exact same list as GDB - it seemed sensible
>>> to add all of them.
>>> 
>>> Ensure pid is a long before printing as one.  Also fix GNU style.
>>> 
>>> Check the return value of write the same way as history_do_write ().
>>> 
>>> These changes are consistent with upstream readline.
>>> 
>>> readline/ChangeLog.gdb:
>>> 
>>> 2019-01-30  Alan Hayward  <alan.hayward@arm.com>
>>> 
>>> 	* config.h.in: Add SOURCE/EXTENSION macros.
>>> 	* histfile.c (history_truncate_file): Check return of write.
>>> 	* util.c: Ensure pid is long.
>> 
>> If it is a backport
> 
> Technically, not a back port because I wrote the changes then realised
> they are the same as upstream.  Just a quick thought - would it help
> with future rebasing if I ensured the changes were *exactly* the same?
> (For example, the config.h.in changes are in a different place in the 
> file with different comments).
> 
>> from mainline readline, and you've run
>> the testsuite (readline is being necessarily extensively covered,
>> but at least it is used implicitly, since it provides the framework
>> for the interactive prompt, which is being driven via expect/tcl),
>> it's OK to push.

Thinking about what you said, I’ve updated the config.h.in code so it
is a direct backport, and pushed.  Functionally it’s the same as the
original version. Patch pasted below.
  

Comments

Joel Brobecker Feb. 1, 2019, 8:05 a.m. UTC | #1
> >> If it is a backport
> > 
> > Technically, not a back port because I wrote the changes then realised
> > they are the same as upstream.  Just a quick thought - would it help
> > with future rebasing if I ensured the changes were *exactly* the same?
> > (For example, the config.h.in changes are in a different place in the 
> > file with different comments).
> > 
> >> from mainline readline, and you've run
> >> the testsuite (readline is being necessarily extensively covered,
> >> but at least it is used implicitly, since it provides the framework
> >> for the interactive prompt, which is being driven via expect/tcl),
> >> it's OK to push.
> 
> Thinking about what you said, I’ve updated the config.h.in code so it
> is a direct backport, and pushed.  Functionally it’s the same as the
> original version. Patch pasted below.

Thank you. It's always better if it is an exact backport, because
it facilitates future resyncs with the upstream versions.
  
Tom Tromey Feb. 1, 2019, 12:47 p.m. UTC | #2
>>>>> "Joel" == Joel Brobecker <brobecker@adacore.com> writes:

Joel> Thank you. It's always better if it is an exact backport, because
Joel> it facilitates future resyncs with the upstream versions.

Speaking of, I still have a branch with a newer import on it.  It could
use some more testing.  It's been a while since I poked at this, so I
don't really remember, but I think it fails a couple of gdb tests.  So,
some readline debugging is also needed.

Tom
  
Philippe Waroquiers Feb. 1, 2019, 6:54 p.m. UTC | #3
On Fri, 2019-02-01 at 05:47 -0700, Tom Tromey wrote:
> > > > > > "Joel" == Joel Brobecker <brobecker@adacore.com> writes:
> 
> Joel> Thank you. It's always better if it is an exact backport, because
> Joel> it facilitates future resyncs with the upstream versions.
> 
> Speaking of, I still have a branch with a newer import on it.  It could
> use some more testing.  It's been a while since I poked at this, so I
> don't really remember, but I think it fails a couple of gdb tests.  So,
> some readline debugging is also needed.
To solve a leak, I now configure/build GDB to use
the debian stable readline, and I only have one more
test failing compared to the GDB 6.2:

"This 'by design' leak is fixed in readline 7, while
the GDB readline version is 6.2 according to the last
'import' message in readline ChangeLog.gdb.

On my system (debian 9.6), the leaks are fixed by using
   --with-system-readline=yes
with the system readline version being 7.0-3

Switching to this readline version causes the following checks to fail:
FAIL: gdb.gdb/selftest.exp: send SIGINT signal to child process (timeout)
FAIL: gdb.gdb/selftest.exp: thread 1 (timeout)
FAIL: gdb.gdb/selftest.exp: backtrace through signal handler (timeout)

Apart of that, no problem seen."

Philippe
  
Pedro Alves Feb. 6, 2019, 7:56 p.m. UTC | #4
On 02/01/2019 06:54 PM, Philippe Waroquiers wrote:
> 
> Switching to this readline version causes the following checks to fail:
> FAIL: gdb.gdb/selftest.exp: send SIGINT signal to child process (timeout)
> FAIL: gdb.gdb/selftest.exp: thread 1 (timeout)
> FAIL: gdb.gdb/selftest.exp: backtrace through signal handler (timeout)
> 
> Apart of that, no problem seen."

See the description in commit 4a11f2065906, which was later
reverted:

~~~
    Sync readline/ to version 7.0 alpha
...
    After the sync there is one testsuite regression, the test
    "signal SIGINT" in gdb.gdb/selftest.exp which now FAILs.  Previously,
    the readline 6.2 SIGINT handler would temporarily reinstall the
    underlying application's SIGINT handler and immediately re-raise SIGINT
    so that the orginal handler gets invoked.  But now (since readline 6.3)
    its SIGINT handler does not re-raise SIGINT or directly invoke the
    original handler; it now sets a flag marking that SIGINT was raised, and
    waits until readline explicitly has control to call the application's
    SIGINT handler.  Anyway, because SIGINT is no longer re-raised from
    within readline's SIGINT handler, doing "signal SIGINT" with a stopped
    inferior gdb process will no longer resume and then immediately stop the
    process (since there is no 2nd SIGINT to immediately catch).  Instead,
    the inferior gdb process will now just print "Quit" and continue to run.
    So with this commit, this particular test case is adjusted to reflect
    this change in behavior (we now have to send a 2nd SIGINT manually to
    stop it).
~~~

Thanks,
Pedro Alves
  
Tom Tromey March 17, 2019, 5:30 p.m. UTC | #5
>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:

Pedro> See the description in commit 4a11f2065906, which was later
Pedro> reverted:

Thanks.  I've pulled that into my readline upgrade series.

I have to disable guile to get this test to work.  When guile is
enabled, there's an early SEGV in the garbage collector -- it is
expected and is caught by the GC, but this test case doesn't know how to
cope.

I'm looking now at importing readline 8.0.

I still don't really know what to do about the readline-related hack in
the mingw gdb_select.  I'd like to remove it, but I can't test it and I
don't know whether it's still needed.

Tom
  
Eli Zaretskii March 17, 2019, 6:35 p.m. UTC | #6
> From: Tom Tromey <tom@tromey.com>
> Cc: "gdb-patches\@sourceware.org" <gdb-patches@sourceware.org>
> Date: Sun, 17 Mar 2019 11:30:38 -0600
> 
> I still don't really know what to do about the readline-related hack in
> the mingw gdb_select.  I'd like to remove it, but I can't test it and I
> don't know whether it's still needed.

Can someone point me to the PR that led to that hack?  I'd like to see
what happens in that use case and why is this code needed to solve it.
(I tried "git annotate", but that didn't tell me anything interesting
about the history of that snippet.  Apologies if I missed something.)
  
Tom Tromey March 19, 2019, 4:04 p.m. UTC | #7
>>>>> "Eli" == Eli Zaretskii <eliz@gnu.org> writes:

>> From: Tom Tromey <tom@tromey.com>
>> Cc: "gdb-patches\@sourceware.org" <gdb-patches@sourceware.org>
>> Date: Sun, 17 Mar 2019 11:30:38 -0600
>> 
>> I still don't really know what to do about the readline-related hack in
>> the mingw gdb_select.  I'd like to remove it, but I can't test it and I
>> don't know whether it's still needed.

Eli> Can someone point me to the PR that led to that hack?  I'd like to see
Eli> what happens in that use case and why is this code needed to solve it.
Eli> (I tried "git annotate", but that didn't tell me anything interesting
Eli> about the history of that snippet.  Apologies if I missed something.)

All I know is what annotate says.  The commit message is appended.

Here's the gdb-patches thread about it:

https://sourceware.org/ml/gdb-patches/2008-02/msg00423.html

Tom

commit b803fb0f0f7a90ca764d08f93104bc262d63ad40
Author: Daniel Jacobowitz <drow@false.org>
Date:   Wed Mar 5 17:21:10 2008 +0000

            * Makefile.in (mingw-hdep.o, posix-hdep.o, remote-fileio.o): Update.
            * event-loop.c (call_async_signal_handler): New.
            * event-loop.h (call_async_signal_handler)
            (gdb_call_async_signal_handler): Declare.
            (mark_async_signal_handler): Add comments.
            * event-top.c (handle_sigint): Use gdb_call_async_signal_handler.
            * mingw-hdep.c (sigint_event, sigint_handler): New.
            (gdb_select): Use them.  Wait for the readline signal handler
            to finish.
            (gdb_call_async_signal_handler, _initialize_mingw_hdep): New functions.
            * posix-hdep.c (gdb_call_async_signal_handler): New function.
            * remote-fileio.c (sigint_fileio_token, async_remote_fileio_interrupt):
            New.
            (remote_fileio_ctrl_c_signal_handler): Use
            gdb_call_async_signal_handler.
            (initialize_remote_fileio): Initialize sigint_fileio_token.
            * remote.c (initialize_sigint_signal_handler, handle_remote_sigint): Do
            not initialize tokens here.
            (handle_remote_sigint_twice): Likewise.  Reinstall
            handle_remote_sigint.
            (async_remote_interrupt_twice): Just call interrupt_query.
            (cleanup_sigint_signal_handler): Do not delete tokens.
            (remote_interrupt, remote_interrupt_twice): Use
            gdb_call_async_signal_handler.
            (interrupt_query): Reinstall the default signal handler.
            (_initialize_remote): Initialize tokens here.
  
Eli Zaretskii March 19, 2019, 6:36 p.m. UTC | #8
> From: Tom Tromey <tom@tromey.com>
> Cc: Tom Tromey <tom@tromey.com>,  palves@redhat.com,  gdb-patches@sourceware.org
> Date: Tue, 19 Mar 2019 10:04:48 -0600
> 
> >> I still don't really know what to do about the readline-related hack in
> >> the mingw gdb_select.  I'd like to remove it, but I can't test it and I
> >> don't know whether it's still needed.
> 
> Eli> Can someone point me to the PR that led to that hack?  I'd like to see
> Eli> what happens in that use case and why is this code needed to solve it.
> Eli> (I tried "git annotate", but that didn't tell me anything interesting
> Eli> about the history of that snippet.  Apologies if I missed something.)
> 
> All I know is what annotate says.  The commit message is appended.

Yes, thanks.  I've seen that, of course, but the log is uninformative.

> Here's the gdb-patches thread about it:
> 
> https://sourceware.org/ml/gdb-patches/2008-02/msg00423.html

Thanks, I will study that.  Hopefully, it will allow me to understand
more about the issue.
  
Pedro Alves March 19, 2019, 7:02 p.m. UTC | #9
On 03/19/2019 04:04 PM, Tom Tromey wrote:
>>>>>> "Eli" == Eli Zaretskii <eliz@gnu.org> writes:
> 
>>> From: Tom Tromey <tom@tromey.com>
>>> Cc: "gdb-patches\@sourceware.org" <gdb-patches@sourceware.org>
>>> Date: Sun, 17 Mar 2019 11:30:38 -0600
>>>
>>> I still don't really know what to do about the readline-related hack in
>>> the mingw gdb_select.  I'd like to remove it, but I can't test it and I
>>> don't know whether it's still needed.
> 
> Eli> Can someone point me to the PR that led to that hack?  I'd like to see
> Eli> what happens in that use case and why is this code needed to solve it.
> Eli> (I tried "git annotate", but that didn't tell me anything interesting
> Eli> about the history of that snippet.  Apologies if I missed something.)
> 
> All I know is what annotate says.  The commit message is appended.
> 
> Here's the gdb-patches thread about it:
> 
> https://sourceware.org/ml/gdb-patches/2008-02/msg00423.html

Hmmm, 

Daniel wrote:

> GDB has several SIGINT handlers which call longjmp.  This is
> problematic for at least two reasons.  One is that we could be in the
> middle of something unwise to longjmp out of, for instance malloc.  In
> practice, this never happens because we're usually waiting for I/O
> when one of the relevant handlers is invoked, but there are a number
> of places where it could definitely happen.

That was indeed true back then, but since then, immediate_quit
was completely eliminated, and we no longer longjmp from signal
handlers anymore, since:
 https://sourceware.org/ml/gdb-patches/2016-03/msg00351.html

Daniel wrote:

> My goals in fixing this were to hide the Windows ugliness, and to fit
> in nicely with GDB's asynchronous event loop.  Since we do not return
> to the primary event loop during target actions (for the current,
> non-async GDB), I couldn't rely on the event loop entirely.  But I
> could use the same token mechanism and thus share the bodies of
> handlers for async mode with the Windows case.
>
> The new interface is gdb_call_async_signal_handler.  SIGINT handlers,

This interface he mentioned, gdb_call_async_signal_handler, was
eliminated by that series too:

 https://sourceware.org/ml/gdb-patches/2016-03/msg00347.html

So all that's left is that little readline hack, it seems:

  /* With multi-threaded SIGINT handling, there is a race between the
     readline signal handler and GDB.  It may still be in
     rl_prep_terminal in another thread.  Do not return until it is
     done; we can check the state here because we never longjmp from
     signal handlers on Windows.  */
  while (RL_ISSTATE (RL_STATE_SIGHANDLER))
    Sleep (1);

(Curiously, that bit only appeared in a later version of Dan's patch,
here: https://sourceware.org/ml/gdb-patches/2008-03/msg00034.html)

I'm not seeing why we'd still need that bit, but then again,
I'm not seeing why it was needed in the first place.
The signal handler could run concurrently with gdb at any other
point in the gdb code, not just here, so at any point we
call into readline, we can be running readline code in parallel
with a signal handler touching readline's state.  It sounds like
that should be a readline problem to worry about.

That could be related to the fact that readline's signal handler
overrides gdb's, does its thing, and then calls gdb's signal
handler manually?  If the WaitForSingleObject call had already
woken up, then gdb's signal handler has already run and SetEvent
on sigint_event.  Then the code would go and run the deferred
signal handler.  In the remote case, that handler would
issue prompt "Give up (and stop debugging it)? (y or n)" prompt,
and if that is running in parallel with readline's signal
handler still calling rl_prep_terminal, bad things would happen.

But again, why isn't that a readline problem, instead of
a gdb problem?

In current GDB, we no longer issue that query from a signal
handler, we run it from a quit handler, always in mainline
code.  In essence, you could see it as if Dan's old Windows
signal-handler-deferring code was generalized.
But that would mean that we still have the same issue.
We end up in a quick handler issuing that same prompt shortly
after the SIGINT interrupting the event loop, via a serial_event,
which for Windows is a CreateEvent event, just like sigint_event
was on Dan's patch.

I'm still puzzled on why this isn't a readline issue.  Shouldn't
readline's Windows signal handler be synchronizing with mainline
code such that if a signal handler is running, mainline calls into
readline would block?

I think there must be something else to this.

Thanks,
Pedro Alves
  
Pedro Alves March 19, 2019, 7:04 p.m. UTC | #10
On 03/19/2019 07:02 PM, Pedro Alves wrote:
> We end up in a quick handler issuing that same prompt shortly

I meant a "in a quit handler", in this case,
remote_target::remote_serial_quit_handler, I think.

Thanks,
Pedro Alves
  
Eli Zaretskii March 19, 2019, 8:14 p.m. UTC | #11
> Cc: gdb-patches@sourceware.org
> From: Pedro Alves <palves@redhat.com>
> Date: Tue, 19 Mar 2019 19:02:43 +0000
> 
> > https://sourceware.org/ml/gdb-patches/2008-02/msg00423.html

Caveat: I didn't yet read that thread myself.

> Hmmm, 
> 
> Daniel wrote:
> 
> > GDB has several SIGINT handlers which call longjmp.  This is
> > problematic for at least two reasons.  One is that we could be in the
> > middle of something unwise to longjmp out of, for instance malloc.  In
> > practice, this never happens because we're usually waiting for I/O
> > when one of the relevant handlers is invoked, but there are a number
> > of places where it could definitely happen.
> 
> That was indeed true back then, but since then, immediate_quit
> was completely eliminated, and we no longer longjmp from signal
> handlers anymore, since:
>  https://sourceware.org/ml/gdb-patches/2016-03/msg00351.html
> 
> Daniel wrote:
> 
> > My goals in fixing this were to hide the Windows ugliness, and to fit
> > in nicely with GDB's asynchronous event loop.  Since we do not return
> > to the primary event loop during target actions (for the current,
> > non-async GDB), I couldn't rely on the event loop entirely.  But I
> > could use the same token mechanism and thus share the bodies of
> > handlers for async mode with the Windows case.
> >
> > The new interface is gdb_call_async_signal_handler.  SIGINT handlers,
> 
> This interface he mentioned, gdb_call_async_signal_handler, was
> eliminated by that series too:
> 
>  https://sourceware.org/ml/gdb-patches/2016-03/msg00347.html
> 
> So all that's left is that little readline hack, it seems:
> 
>   /* With multi-threaded SIGINT handling, there is a race between the
>      readline signal handler and GDB.  It may still be in
>      rl_prep_terminal in another thread.  Do not return until it is
>      done; we can check the state here because we never longjmp from
>      signal handlers on Windows.  */
>   while (RL_ISSTATE (RL_STATE_SIGHANDLER))
>     Sleep (1);
> 
> (Curiously, that bit only appeared in a later version of Dan's patch,
> here: https://sourceware.org/ml/gdb-patches/2008-03/msg00034.html)
> 
> I'm not seeing why we'd still need that bit, but then again,
> I'm not seeing why it was needed in the first place.
> The signal handler could run concurrently with gdb at any other
> point in the gdb code, not just here, so at any point we
> call into readline, we can be running readline code in parallel
> with a signal handler touching readline's state.  It sounds like
> that should be a readline problem to worry about.
> 
> That could be related to the fact that readline's signal handler
> overrides gdb's, does its thing, and then calls gdb's signal
> handler manually?  If the WaitForSingleObject call had already
> woken up, then gdb's signal handler has already run and SetEvent
> on sigint_event.  Then the code would go and run the deferred
> signal handler.  In the remote case, that handler would
> issue prompt "Give up (and stop debugging it)? (y or n)" prompt,
> and if that is running in parallel with readline's signal
> handler still calling rl_prep_terminal, bad things would happen.

Not sure if the above refers to what I wanted to say, but: as I'm sure
you know, SIGINT handlers on Windows run in a separate thread, created
by the OS, so a Readline SIGINT handler could ruin in parallel both
with Readline's other code and in parallel with GDB's code, depending
on when exactly did the user type Ctrl-C.  In a few cases where it was
important to emulate Posix behavior in order not to step on the troes
of the mainline code, I needed to stop the main thread while the
SIGINT handler was running.  Could it be that the code we are
discussing does something similar?

> But again, why isn't that a readline problem, instead of
> a gdb problem?

I agree: the right solution would be for the Readline's SIGINT handler
to stop the main thread (e.g., by using SuspendThread).

> I'm still puzzled on why this isn't a readline issue.  Shouldn't
> readline's Windows signal handler be synchronizing with mainline
> code such that if a signal handler is running, mainline calls into
> readline would block?

Yes, I think so.

> I think there must be something else to this.

Maybe.  I will try to read that discussion soon.

Thanks.
  
Eli Zaretskii March 20, 2019, 8:55 a.m. UTC | #12
> Date: Tue, 19 Mar 2019 22:14:36 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> CC: tom@tromey.com, gdb-patches@sourceware.org
> 
> > Cc: gdb-patches@sourceware.org
> > From: Pedro Alves <palves@redhat.com>
> > Date: Tue, 19 Mar 2019 19:02:43 +0000
> > 
> > > https://sourceware.org/ml/gdb-patches/2008-02/msg00423.html
> 
> Caveat: I didn't yet read that thread myself.

I have now.

So yes, this is about SIGINT handler being run on Windows in a
separate thread.  Since the Readline SIGINT handler executes
non-trivial Readline code, it should first stop the main thread in its
tracks.

But the question in the case of GDB is: could GDB run Readline code in
more than one thread simultaneously?  And also, when the Readline
handler invokes the GDB handler, can the code run by the GDB SIGINT
handler get in the way of some other GDB code which runs concurrently?
This latter consideration might become more relevant with Tom's work
on multi-threading the symtab reading.  I'm not familiar with the
current GDB architecture (specifically of the Windows port) well
enough to answer my own questions on this matter.

These questions are relevant because if threads other than the main
one could be involved in this, we will have to stop them as well for
as long as the SIGINT handler runs, and doing that from Readline's own
code might prove tricky.  By contrast, stopping just the main thread
could be done entirely in the Readline sources.

Comments?
  
Pedro Alves March 20, 2019, 3:46 p.m. UTC | #13
On 03/19/2019 08:14 PM, Eli Zaretskii wrote:
>> Cc: gdb-patches@sourceware.org
>> From: Pedro Alves <palves@redhat.com>

> Not sure if the above refers to what I wanted to say, but: as I'm sure
> you know, SIGINT handlers on Windows run in a separate thread, created
> by the OS, so a Readline SIGINT handler could ruin in parallel both
> with Readline's other code and in parallel with GDB's code, depending
> on when exactly did the user type Ctrl-C.  In a few cases where it was
> important to emulate Posix behavior in order not to step on the troes
> of the mainline code, I needed to stop the main thread while the
> SIGINT handler was running.  Could it be that the code we are
> discussing does something similar?
> 
>> But again, why isn't that a readline problem, instead of
>> a gdb problem?
> 
> I agree: the right solution would be for the Readline's SIGINT handler
> to stop the main thread (e.g., by using SuspendThread).

I don't sure how having the SIGINT handler stop the main thread is
a 100% correct solution.  By the time you stop it, the main thread can well
be already running readline code, halfway through updating some data
structure, even if the mainline code disabled SIGINT temporarily,
with _rl_block_sigint, because by the time the mainline code calls 
_rl_block_sigint, the SIGINT thread may have already have been spawned.

Also, you're not ever supposed to use SuspendThread for synchronization, I
believe.

MSDN at <https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-suspendthread> says:

 "This function is primarily designed for use by debuggers. It is not intended
 to be used for thread synchronization."

And here <https://devblogs.microsoft.com/oldnewthing/?p=44743> it says:

  "The Suspend­Thread function tells the scheduler to suspend the thread
  but does not wait for an acknowledgment from the scheduler that the suspension
  has actually occurred."

I think a mutex/lock for synchronization would be a better solution within
readline.  Make all mainline readline entry points grab a mutex on entry
and release it on exit.  Likewise, make the readline signal handler
hold a mutex around any code that is unsafe to run in parallel
with mainline code.

Note however, that the readline signal handling code was changed
in recent years.  IIRC, it used to do unsafe non-async-signal-safe things,
but it's gotten better.  It's because readline signal handling changed
that the gdb.gdb/selftest.exp fails under newer readline.  And one of
the changes, which I think may be relevant here, is that in callback mode,
readline no longer installs its own signal handler.

Using my "info signal-dispositions" script (*), when using the system
readline (v7):

 (top-gdb) info signal-dispositions 2
 Number  Name            Description                     Disposition
 2       SIGINT          Interrupt                       handle_sigint(int) in section .text of build-sys-readline/gdb/gdb

While with the bundled readline I get:

 (top-gdb) info signal-dispositions 2
 Number  Name            Description                     Disposition
 2       SIGINT          Interrupt                       rl_signal_handler in section .text of build/gdb/gdb


I'm not sure whether readline still installs its own signal handler
internally in other situations, but it'd be worth it to check that.
Maybe we never run any readline signal handler on Windows at all
nowadays with recent readlines, which would simplify things for us,
rendering the gdb hack in question obsolete.

(*) - http://palves.net/list-active-signal-handlers-with-gdb/

Thanks,
Pedro Alves

> 
>> I'm still puzzled on why this isn't a readline issue.  Shouldn't
>> readline's Windows signal handler be synchronizing with mainline
>> code such that if a signal handler is running, mainline calls into
>> readline would block?
> 
> Yes, I think so.
> 
>> I think there must be something else to this.
> 
> Maybe.  I will try to read that discussion soon.
  
Pedro Alves March 20, 2019, 3:50 p.m. UTC | #14
On 03/20/2019 03:46 PM, Pedro Alves wrote:
> 
>  (top-gdb) info signal-dispositions 2
>  Number  Name            Description                     Disposition
>  2       SIGINT          Interrupt                       handle_sigint(int) in section .text of build-sys-readline/gdb/gdb

To clarify here.  "handle_sigint(int)" is the GDB handler for SIGINT.

> 
> While with the bundled readline I get:
> 
>  (top-gdb) info signal-dispositions 2
>  Number  Name            Description                     Disposition
>  2       SIGINT          Interrupt                       rl_signal_handler in section .text of build/gdb/gdb
> 

While "rl_signal_handler" is the readline handler for SIGINT.

Thanks,
Pedro Alves
  
Eli Zaretskii March 20, 2019, 5:38 p.m. UTC | #15
> Cc: tom@tromey.com, gdb-patches@sourceware.org
> From: Pedro Alves <palves@redhat.com>
> Date: Wed, 20 Mar 2019 15:46:47 +0000
> 
> > I agree: the right solution would be for the Readline's SIGINT handler
> > to stop the main thread (e.g., by using SuspendThread).
> 
> I don't sure how having the SIGINT handler stop the main thread is
> a 100% correct solution.  By the time you stop it, the main thread can well
> be already running readline code, halfway through updating some data
> structure, even if the mainline code disabled SIGINT temporarily,
> with _rl_block_sigint, because by the time the mainline code calls 
> _rl_block_sigint, the SIGINT thread may have already have been spawned.

How is this different from what happens on Posix platforms, where the
SIGINT handler can be invoked at any moment, while Readline might be
doing anything at all?

> Also, you're not ever supposed to use SuspendThread for synchronization, I
> believe.

We are also not supposed to mix CRT and Win32 API functions, but we do
that all the time.

> MSDN at <https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-suspendthread> says:
> 
>  "This function is primarily designed for use by debuggers. It is not intended
>  to be used for thread synchronization."
> 
> And here <https://devblogs.microsoft.com/oldnewthing/?p=44743> it says:
> 
>   "The Suspend­Thread function tells the scheduler to suspend the thread
>   but does not wait for an acknowledgment from the scheduler that the suspension
>   has actually occurred."

GNU Make does it for many years, and I have yet to hear a single
complaint about that part.

> I think a mutex/lock for synchronization would be a better solution within
> readline.  Make all mainline readline entry points grab a mutex on entry
> and release it on exit.  Likewise, make the readline signal handler
> hold a mutex around any code that is unsafe to run in parallel
> with mainline code.

That's fine with me, but is much more complex, and will probably slow
down the code.  I won't object if you want to do it that way, though.

> I'm not sure whether readline still installs its own signal handler
> internally in other situations, but it'd be worth it to check that.
> Maybe we never run any readline signal handler on Windows at all
> nowadays with recent readlines, which would simplify things for us,
> rendering the gdb hack in question obsolete.

Maybe we should bring Chet into this discussion.
  
Pedro Alves March 20, 2019, 5:50 p.m. UTC | #16
On 03/20/2019 05:38 PM, Eli Zaretskii wrote:
>> Cc: tom@tromey.com, gdb-patches@sourceware.org
>> From: Pedro Alves <palves@redhat.com>
>> Date: Wed, 20 Mar 2019 15:46:47 +0000
>>
>>> I agree: the right solution would be for the Readline's SIGINT handler
>>> to stop the main thread (e.g., by using SuspendThread).
>>
>> I don't sure how having the SIGINT handler stop the main thread is
>> a 100% correct solution.  By the time you stop it, the main thread can well
>> be already running readline code, halfway through updating some data
>> structure, even if the mainline code disabled SIGINT temporarily,
>> with _rl_block_sigint, because by the time the mainline code calls 
>> _rl_block_sigint, the SIGINT thread may have already have been spawned.
> 
> How is this different from what happens on Posix platforms, where the
> SIGINT handler can be invoked at any moment, while Readline might be
> doing anything at all?

On Posix platforms the signal handler runs in the same thread as the
mainline code.  There's _never_ any parallel execution.  When the
mainline code calls _rl_block_sigint, you're absolutely sure that the
mainline code that follows will not be running in parallel with
a SIGINT handler.

> 
>> Also, you're not ever supposed to use SuspendThread for synchronization, I
>> believe.
> 
> We are also not supposed to mix CRT and Win32 API functions, but we do
> that all the time.

That's not at all the same thing, I'm afraid.  We're not talking about
"in theory you shouldn't, but in practice it's OK.".  We're talking about
the very real fact that you cannot use it to synchronize correctly,
as shown in the blog post I pasted below.  If you're using the function
to pause a thread, and then do things, but the thread doesn't actually
pause before you do things, then you have a problem.

> 
>> MSDN at <https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-suspendthread> says:
>>
>>  "This function is primarily designed for use by debuggers. It is not intended
>>  to be used for thread synchronization."
>>
>> And here <https://devblogs.microsoft.com/oldnewthing/?p=44743> it says:
>>
>>   "The Suspend­Thread function tells the scheduler to suspend the thread
>>   but does not wait for an acknowledgment from the scheduler that the suspension
>>   has actually occurred."
> 
> GNU Make does it for many years, and I have yet to hear a single
> complaint about that part.

Not hearing a complaint does not mean that the problem isn't real.
It probably simply means that the race window is narrow, so the number
of people that observe any issue, _and_ are willing to report an issue
is small, and also, is someone notices something odd, it may be simply
impossible to tell what happened and realize what the problem was,
after the fact.

> 
>> I think a mutex/lock for synchronization would be a better solution within
>> readline.  Make all mainline readline entry points grab a mutex on entry
>> and release it on exit.  Likewise, make the readline signal handler
>> hold a mutex around any code that is unsafe to run in parallel
>> with mainline code.
> 
> That's fine with me, but is much more complex, and will probably slow
> down the code.  I won't object if you want to do it that way, though.
> 
>> I'm not sure whether readline still installs its own signal handler
>> internally in other situations, but it'd be worth it to check that.
>> Maybe we never run any readline signal handler on Windows at all
>> nowadays with recent readlines, which would simplify things for us,
>> rendering the gdb hack in question obsolete.
> 
> Maybe we should bring Chet into this discussion.

Thanks,
Pedro Alves
  
Eli Zaretskii March 20, 2019, 6:01 p.m. UTC | #17
> Cc: tom@tromey.com, gdb-patches@sourceware.org
> From: Pedro Alves <palves@redhat.com>
> Date: Wed, 20 Mar 2019 17:50:36 +0000
> 
> On 03/20/2019 05:38 PM, Eli Zaretskii wrote:
> >> Cc: tom@tromey.com, gdb-patches@sourceware.org
> >> From: Pedro Alves <palves@redhat.com>
> >> Date: Wed, 20 Mar 2019 15:46:47 +0000
> >>
> >>> I agree: the right solution would be for the Readline's SIGINT handler
> >>> to stop the main thread (e.g., by using SuspendThread).
> >>
> >> I don't sure how having the SIGINT handler stop the main thread is
> >> a 100% correct solution.  By the time you stop it, the main thread can well
> >> be already running readline code, halfway through updating some data
> >> structure, even if the mainline code disabled SIGINT temporarily,
> >> with _rl_block_sigint, because by the time the mainline code calls 
> >> _rl_block_sigint, the SIGINT thread may have already have been spawned.
> > 
> > How is this different from what happens on Posix platforms, where the
> > SIGINT handler can be invoked at any moment, while Readline might be
> > doing anything at all?
> 
> On Posix platforms the signal handler runs in the same thread as the
> mainline code.  There's _never_ any parallel execution.  When the
> mainline code calls _rl_block_sigint, you're absolutely sure that the
> mainline code that follows will not be running in parallel with
> a SIGINT handler.
> 
> > 
> >> Also, you're not ever supposed to use SuspendThread for synchronization, I
> >> believe.
> > 
> > We are also not supposed to mix CRT and Win32 API functions, but we do
> > that all the time.
> 
> That's not at all the same thing, I'm afraid.  We're not talking about
> "in theory you shouldn't, but in practice it's OK.".  We're talking about
> the very real fact that you cannot use it to synchronize correctly,
> as shown in the blog post I pasted below.  If you're using the function
> to pause a thread, and then do things, but the thread doesn't actually
> pause before you do things, then you have a problem.
> 
> > 
> >> MSDN at <https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-suspendthread> says:
> >>
> >>  "This function is primarily designed for use by debuggers. It is not intended
> >>  to be used for thread synchronization."
> >>
> >> And here <https://devblogs.microsoft.com/oldnewthing/?p=44743> it says:
> >>
> >>   "The Suspend­Thread function tells the scheduler to suspend the thread
> >>   but does not wait for an acknowledgment from the scheduler that the suspension
> >>   has actually occurred."
> > 
> > GNU Make does it for many years, and I have yet to hear a single
> > complaint about that part.
> 
> Not hearing a complaint does not mean that the problem isn't real.
> It probably simply means that the race window is narrow, so the number
> of people that observe any issue, _and_ are willing to report an issue
> is small, and also, is someone notices something odd, it may be simply
> impossible to tell what happened and realize what the problem was,
> after the fact.

I'm sorry I brought this up.  I should have known better.
  
Pedro Alves March 20, 2019, 6:28 p.m. UTC | #18
On 03/20/2019 06:01 PM, Eli Zaretskii wrote:

> I'm sorry I brought this up.  I should have known better.

That's frustrating to hear.  I'm only trying to help -- I believe that trying
to prevent a design mistake is a form of helping.  If anything I said is
technically incorrect, I'd appreciate being corrected.

Thanks,
Pedro Alves
  
Pedro Alves March 21, 2019, 5:31 p.m. UTC | #19
On 03/20/2019 03:46 PM, Pedro Alves wrote:
> 
> Note however, that the readline signal handling code was changed
> in recent years.  IIRC, it used to do unsafe non-async-signal-safe things,
> but it's gotten better.  It's because readline signal handling changed
> that the gdb.gdb/selftest.exp fails under newer readline.  And one of
> the changes, which I think may be relevant here, is that in callback mode,
> readline no longer installs its own signal handler.
> 
> Using my "info signal-dispositions" script (*), when using the system
> readline (v7):
> 
>  (top-gdb) info signal-dispositions 2
>  Number  Name            Description                     Disposition
>  2       SIGINT          Interrupt                       handle_sigint(int) in section .text of build-sys-readline/gdb/gdb
> 
> While with the bundled readline I get:
> 
>  (top-gdb) info signal-dispositions 2
>  Number  Name            Description                     Disposition
>  2       SIGINT          Interrupt                       rl_signal_handler in section .text of build/gdb/gdb
> 
> 
> I'm not sure whether readline still installs its own signal handler
> internally in other situations, but it'd be worth it to check that.
> Maybe we never run any readline signal handler on Windows at all
> nowadays with recent readlines, which would simplify things for us,
> rendering the gdb hack in question obsolete.

I looked into this a bit more today.  Most of the time gdb's signal handlers
are active, but readline still installs its signal handlers for a bit.
E.g., rl_callback_read_char calls rl_set_signals.  However, the readline
signal handler nowadays is much simplified.

In gdb's readline bundled copy (readline 6.2), the signal handler
looks like this:

static RETSIGTYPE
rl_signal_handler (sig)
     int sig;
{
  if (_rl_interrupt_immediately || RL_ISSTATE(RL_STATE_CALLBACK))
    {
      _rl_interrupt_immediately = 0;
      _rl_handle_signal (sig);
    }
  else
    _rl_caught_signal = sig;

  SIGHANDLER_RETURN;
}


And since we use the callback interface, RL_ISSTATE(RL_STATE_CALLBACK)
will be true.  Thus we'll handle the signal immediately.
_rl_handle_signal will then change readline's state to 
RL_STATE_SIGHANDLER, and do other non-async-signal-safe things, like
calling _rl_reset_completion_state and rl_free_line_state.

But nowadays, it looks like this:

static RETSIGTYPE
rl_signal_handler (int sig)
{
  if (_rl_interrupt_immediately)
    {
      _rl_interrupt_immediately = 0;
      _rl_handle_signal (sig);
    }
  else
    _rl_caught_signal = sig;

  SIGHANDLER_RETURN;
}

the RL_ISSTATE(RL_STATE_CALLBACK) check is gone.
And also, AFAICT, _rl_interrupt_immediately is never 
set anywhere.  So in effect, AFAICS, that's the same as:

static void
rl_signal_handler (int sig)
{
  _rl_caught_signal = sig;
}

I've asked for confirmation on the readline list:
  http://lists.gnu.org/archive/html/bug-readline/2019-03/msg00005.html
(Chet already confirmed, but it may not be on the archives yet when
you read this, there's a delay).

So that means that the readline hack in mingw-hdep.c:

  /* With multi-threaded SIGINT handling, there is a race between the
     readline signal handler and GDB.  It may still be in
     rl_prep_terminal in another thread.  Do not return until it is
     done; we can check the state here because we never longjmp from
     signal handlers on Windows.  */
  while (RL_ISSTATE (RL_STATE_SIGHANDLER))
    Sleep (1);

with new-enough readline, is no longer doing anything, since the while
loop's condition is always false.

However, there's another question that needs answering: what are we going to
do if even after upgrading our readline version, someone builds gdb against
the system readline, and the system readline happens to be an older version
that still depends on this or some other readline hack?  We'll silently
regress.  I think that we need to document the minimum readline version
somewhere (gdb/README?), and have something (configure?) enforce it.
(There's a RL_READLINE_VERSION symbol.)

Thanks,
Pedro Alves
  
Eli Zaretskii March 21, 2019, 6:29 p.m. UTC | #20
> Cc: tom@tromey.com, gdb-patches@sourceware.org
> From: Pedro Alves <palves@redhat.com>
> Date: Thu, 21 Mar 2019 17:31:07 +0000
> 
> So that means that the readline hack in mingw-hdep.c:
> 
>   /* With multi-threaded SIGINT handling, there is a race between the
>      readline signal handler and GDB.  It may still be in
>      rl_prep_terminal in another thread.  Do not return until it is
>      done; we can check the state here because we never longjmp from
>      signal handlers on Windows.  */
>   while (RL_ISSTATE (RL_STATE_SIGHANDLER))
>     Sleep (1);
> 
> with new-enough readline, is no longer doing anything, since the while
> loop's condition is always false.
> 
> However, there's another question that needs answering: what are we going to
> do if even after upgrading our readline version, someone builds gdb against
> the system readline, and the system readline happens to be an older version
> that still depends on this or some other readline hack?  We'll silently
> regress.  I think that we need to document the minimum readline version
> somewhere (gdb/README?), and have something (configure?) enforce it.
> (There's a RL_READLINE_VERSION symbol.)

SGTM, thanks for digging into this.
  

Patch

diff --git a/readline/config.h.in b/readline/config.h.in
index 86d86cfa3d..c194e761a4 100644
--- a/readline/config.h.in
+++ b/readline/config.h.in
@@ -1,5 +1,15 @@ 
 /* config.h.in.  Maintained by hand. */

+/* Template definitions for autoconf */
+#undef __EXTENSIONS__
+#undef _ALL_SOURCE
+#undef _GNU_SOURCE
+#undef _POSIX_SOURCE
+#undef _POSIX_1_SOURCE
+#undef _POSIX_PTHREAD_SEMANTICS
+#undef _TANDEM_SOURCE
+#undef _MINIX
+
 /* Define NO_MULTIBYTE_SUPPORT to not compile in support for multibyte
    characters, even if the OS supports them. */
 #undef NO_MULTIBYTE_SUPPORT
diff --git a/readline/histfile.c b/readline/histfile.c
index fffeb3fd31..56cbbf0498 100644
--- a/readline/histfile.c
+++ b/readline/histfile.c
@@ -407,7 +407,8 @@  history_truncate_file (fname, lines)
      truncate to. */
   if (bp > buffer && ((file = open (filename, O_WRONLY|O_TRUNC|O_BINARY, 0600)) != -1))
     {
-      write (file, bp, chars_read - (bp - buffer));
+      if (write (file, bp, chars_read - (bp - buffer)) < 0)
+       rv = errno;

 #if defined (__BEOS__)
       /* BeOS ignores O_TRUNC. */
diff --git a/readline/util.c b/readline/util.c
index d402fce842..13bd00c09c 100644
--- a/readline/util.c
+++ b/readline/util.c
@@ -515,11 +515,11 @@  _rl_tropen ()
           (sh_get_env_value ("TEMP")
            ? sh_get_env_value ("TEMP")
            : "."),
-          getpid());
+          getpid ());
 #else
-  sprintf (fnbuf, "/var/tmp/rltrace.%ld", getpid());
+  sprintf (fnbuf, "/var/tmp/rltrace.%ld", (long) getpid ());
 #endif
-  unlink(fnbuf);
+  unlink (fnbuf);
   _rl_tracefp = fopen (fnbuf, "w+");
   return _rl_tracefp != 0;
 }