Don't truncate the history file when history size is unlimited

Message ID 1434466413-28892-1-git-send-email-patrick@parcs.ath.cx
State New, archived
Headers

Commit Message

Patrick Palka June 16, 2015, 2:53 p.m. UTC
  We still do not handle "set history size unlimited" correctly.  In
particular, after writing to the history file, we truncate the history
even if it is unlimited.

This patch makes sure that we do not call history_truncate_file() if the
history is not stifled (i.e. if it's unlimited).  This bug causes the
history file to be truncated to zero on exit when one has "set history
size unlimited" in their gdbinit file.  Although this code exists in GDB
7.8, the bug is masked by a pre-existing bug that's been only fixed in
GDB 7.9 (PR gdb/17820).

gdb/ChangeLog:

	* top.c (gdb_safe_append_history): Do not call
	history_truncate_file if the history is not stifled.

gdb/testsuite/ChangeLog:

	* gdb.base/gdbinit-history.exp: Add test case to check that
	an unlimited history file does not get truncated on exit.
---
 gdb/testsuite/gdb.base/gdbinit-history.exp | 54 ++++++++++++++++++++++++++++++
 gdb/top.c                                  |  3 +-
 2 files changed, 56 insertions(+), 1 deletion(-)
  

Comments

Patrick Palka June 16, 2015, 3 p.m. UTC | #1
On Tue, Jun 16, 2015 at 10:53 AM, Patrick Palka <patrick@parcs.ath.cx> wrote:
> We still do not handle "set history size unlimited" correctly.  In
> particular, after writing to the history file, we truncate the history
> even if it is unlimited.
>
> This patch makes sure that we do not call history_truncate_file() if the
> history is not stifled (i.e. if it's unlimited).  This bug causes the
> history file to be truncated to zero on exit when one has "set history
> size unlimited" in their gdbinit file.  Although this code exists in GDB
> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
> GDB 7.9 (PR gdb/17820).
>
> gdb/ChangeLog:
>
>         * top.c (gdb_safe_append_history): Do not call
>         history_truncate_file if the history is not stifled.
>
> gdb/testsuite/ChangeLog:
>
>         * gdb.base/gdbinit-history.exp: Add test case to check that
>         an unlimited history file does not get truncated on exit.

I had written some extra stuff in the commit message within the patch
file  itself and then I overwritten it by running "git format-patch"
again...  This is basically what I wrote:

The reason I had so much trouble with writing the test case because I
failed to realize that there are two code paths for writing to the
history file: one for writing to a non-existing history file (i.e.
creating one) and one for appending to an existing history file.  The
bug only manifests itself when appending to the history file, not when
writing to it.  Thus the test has to stop/start GDB twice in order to
run both code paths and to trigger the bug.

By the way, what do you think about unsetting
HOME/HISTSIZE/GDBHISTFILE somewhere higher up in the testsuite
infrastructure?  It would be nice if individual tests did not have to
worry about such environment variables being leaked from userspace.
  
Pedro Alves June 17, 2015, 12:46 p.m. UTC | #2
On 06/16/2015 03:53 PM, Patrick Palka wrote:

> +    # We have to start then exit GDB twice: the first time to test the creation
> +    # of the initial history file, and the second time to test appending to it.
> +    # In either case the initial "print 1" command should persist through the
> +    # history file.
> +    with_test_prefix "truncation" {
> +        gdb_exit
> +        gdb_start
> +        gdb_test "print 1"
> +
> +        gdb_exit
> +        gdb_start
> +        gdb_test "server show commands" "    .  print 1.*"
> +
> +        gdb_exit
> +        gdb_start
> +        gdb_test "server show commands" "    .  print 1.*"

These two "show commands" tests have the same output message.  The
patch is OK with that fixed.

Thanks for following through with the test.

Thanks,
Pedro Alves
  
Pedro Alves June 17, 2015, 1:14 p.m. UTC | #3
On 06/16/2015 04:00 PM, Patrick Palka wrote:

> By the way, what do you think about unsetting
> HOME/HISTSIZE/GDBHISTFILE somewhere higher up in the testsuite
> infrastructure?  It would be nice if individual tests did not have to
> worry about such environment variables being leaked from userspace.

On first blush, that sounds like a good idea.  Though I wonder
whether that could have unintended consequences.  We won't know
until we try, I guess.

Does dejagnu need HOME set?  Does Python/Scheme?

Unsetting the GDB-specific GDBHISTSIZE/GDBHISTFILE vars sounds
like a no brainer.

For HISTSIZE, should we audit tests that might start an interactive
shell (e.g., "(gdb) shell") and result in truncating the history file?
Guess better would be to set HISTSIZE="" and HISTFILE=/dev/null
or something like that so shells we start don't have a chance of
mucking with the user's history files?

Urgh, yes, we already have that issue today:

$ make check RUNTESTFLAGS="gdb.base/batch-preserve-term-settings.exp"
$ tail -n 5 ~/.bash_history
stty || echo "not found"
/home/pedro/gdb/mygit/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/pedro/gdb/mygit/build/gdb/testsuite/../data-directory  -batch -ex "set height unlimited" -ex "start" --args "/home/pedro/gdb/mygit/build/gdb/testsuite/gdb.base/batch-preserve-term-settings"
echo test_echo
stty || echo "not found"
exit

Whoops...

Thanks,
Pedro Alves
  
Sergio Durigan Junior July 23, 2015, 6:42 p.m. UTC | #4
On Tuesday, June 16 2015, Patrick Palka wrote:

> We still do not handle "set history size unlimited" correctly.  In
> particular, after writing to the history file, we truncate the history
> even if it is unlimited.
>
> This patch makes sure that we do not call history_truncate_file() if the
> history is not stifled (i.e. if it's unlimited).  This bug causes the
> history file to be truncated to zero on exit when one has "set history
> size unlimited" in their gdbinit file.  Although this code exists in GDB
> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
> GDB 7.9 (PR gdb/17820).

Hey Patrick,

Looking at the BuildBot logs today, I found that this new test is
failing occasionally on native-extended-gdbserver testing.  Take a look
at the following build:

  <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>

You can see that gdb.base/gdbinit-history.exp failed:

  PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
  PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands

The gdb.log is here:

  <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>

I haven't really investigated to determine what's going on here, but let
me know if you need any help with this.

Thanks,
  
Patrick Palka July 23, 2015, 7:33 p.m. UTC | #5
On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
<sergiodj@redhat.com> wrote:
> On Tuesday, June 16 2015, Patrick Palka wrote:
>
>> We still do not handle "set history size unlimited" correctly.  In
>> particular, after writing to the history file, we truncate the history
>> even if it is unlimited.
>>
>> This patch makes sure that we do not call history_truncate_file() if the
>> history is not stifled (i.e. if it's unlimited).  This bug causes the
>> history file to be truncated to zero on exit when one has "set history
>> size unlimited" in their gdbinit file.  Although this code exists in GDB
>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>> GDB 7.9 (PR gdb/17820).
>
> Hey Patrick,
>
> Looking at the BuildBot logs today, I found that this new test is
> failing occasionally on native-extended-gdbserver testing.  Take a look
> at the following build:
>
>   <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>
>
> You can see that gdb.base/gdbinit-history.exp failed:
>
>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>
> The gdb.log is here:
>
>   <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>
>
> I haven't really investigated to determine what's going on here, but let
> me know if you need any help with this.

Thanks for the heads up.

When doing gdb_exit followed by gdb_start, in the output log sometimes
we have (this is printed shortly before the first FAIL)

(gdb) ...
Remote debugging from host 127.0.0.1
monitor exit
spawn ...
GNU gdb (GDB) 7.10.50.20150723-cvs
...

Other times we have (this is printed shortly before the second FAIL)

(gdb) ...
Remote debugging from host 127.0.0.1
monitor exit
(gdb) spawn ...
GNU gdb (GDB) 7.10.50.20150723-cvs
...

The literal difference being the "(gdb) " prompt printed before the
"spawn" message.  In the first case (where the "(gdb) " prefix is not
there) the history file does not seem to be written/appended to.  In
the second case (when the "(gdb) " prefix is there) the history file
is properly written/appended to (but it still FAILs because we're
missing the command history from before the first case).  So the race,
if there is one, may have something to do with whether or not the
"(gdb) " prompt gets printed after doing "monitor exit".  Or maybe
not.  I'll do more analysis later.
  
Patrick Palka July 24, 2015, 2:03 p.m. UTC | #6
On Thu, Jul 23, 2015 at 3:33 PM, Patrick Palka <patrick@parcs.ath.cx> wrote:
> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
> <sergiodj@redhat.com> wrote:
>> On Tuesday, June 16 2015, Patrick Palka wrote:
>>
>>> We still do not handle "set history size unlimited" correctly.  In
>>> particular, after writing to the history file, we truncate the history
>>> even if it is unlimited.
>>>
>>> This patch makes sure that we do not call history_truncate_file() if the
>>> history is not stifled (i.e. if it's unlimited).  This bug causes the
>>> history file to be truncated to zero on exit when one has "set history
>>> size unlimited" in their gdbinit file.  Although this code exists in GDB
>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>>> GDB 7.9 (PR gdb/17820).
>>
>> Hey Patrick,
>>
>> Looking at the BuildBot logs today, I found that this new test is
>> failing occasionally on native-extended-gdbserver testing.  Take a look
>> at the following build:
>>
>>   <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>
>>
>> You can see that gdb.base/gdbinit-history.exp failed:
>>
>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>>
>> The gdb.log is here:
>>
>>   <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>
>>
>> I haven't really investigated to determine what's going on here, but let
>> me know if you need any help with this.
>
> Thanks for the heads up.
>
> When doing gdb_exit followed by gdb_start, in the output log sometimes
> we have (this is printed shortly before the first FAIL)
>
> (gdb) ...
> Remote debugging from host 127.0.0.1
> monitor exit
> spawn ...
> GNU gdb (GDB) 7.10.50.20150723-cvs
> ...
>
> Other times we have (this is printed shortly before the second FAIL)
>
> (gdb) ...
> Remote debugging from host 127.0.0.1
> monitor exit
> (gdb) spawn ...
> GNU gdb (GDB) 7.10.50.20150723-cvs
> ...
>
> The literal difference being the "(gdb) " prompt printed before the
> "spawn" message.  In the first case (where the "(gdb) " prefix is not
> there) the history file does not seem to be written/appended to.  In
> the second case (when the "(gdb) " prefix is there) the history file
> is properly written/appended to (but it still FAILs because we're
> missing the command history from before the first case).  So the race,
> if there is one, may have something to do with whether or not the
> "(gdb) " prompt gets printed after doing "monitor exit".  Or maybe
> not.  I'll do more analysis later.

After further analysis I don't think there is any correlation between
whether "(gdb) spawn ..." or  else "spawn ..." gets printed, and if a
race between "gdb_exit" and "gdb_start" occurs.  In fact, I don't
think there even is a race between gdb_exit and gdb_start.  If I add
"after 100" (i.e. TCL's way of sleeping 100ms) in the middle of such
sequences (i.e. gdb_exit; after 100; gdb_start) in
gdbinit-history.exp, I can still reproduce the intermittent FAILs.

So it may be the case that sometimes gdb_exit does not kill the GDB
process properly, and by doing so the process doesn't get a chance to
save to the history file.  And it only happens with
extended-gdbserver, not with gdbserver or non-gdbserver.  A unique
code path taken only by the extended-gdbserver target is the code in
gdbserver-support.exp:gdb_exit guarded by "if {[info exists
gdb_spawn_id] && [info exists server_spawn_id]}", and then the
close_gdbserver proc that follows.  However if I just outright delete
that code (which should make the exit logic nearly identical with the
gdbserver target) I can still trigger the intermittent FAILs...  Maybe
it's an issue in dejagnu?  Or could it be an obscure bug in GDB??
  
Patrick Palka July 24, 2015, 2:15 p.m. UTC | #7
On Fri, Jul 24, 2015 at 10:03 AM, Patrick Palka <patrick@parcs.ath.cx> wrote:
> On Thu, Jul 23, 2015 at 3:33 PM, Patrick Palka <patrick@parcs.ath.cx> wrote:
>> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
>> <sergiodj@redhat.com> wrote:
>>> On Tuesday, June 16 2015, Patrick Palka wrote:
>>>
>>>> We still do not handle "set history size unlimited" correctly.  In
>>>> particular, after writing to the history file, we truncate the history
>>>> even if it is unlimited.
>>>>
>>>> This patch makes sure that we do not call history_truncate_file() if the
>>>> history is not stifled (i.e. if it's unlimited).  This bug causes the
>>>> history file to be truncated to zero on exit when one has "set history
>>>> size unlimited" in their gdbinit file.  Although this code exists in GDB
>>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>>>> GDB 7.9 (PR gdb/17820).
>>>
>>> Hey Patrick,
>>>
>>> Looking at the BuildBot logs today, I found that this new test is
>>> failing occasionally on native-extended-gdbserver testing.  Take a look
>>> at the following build:
>>>
>>>   <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>
>>>
>>> You can see that gdb.base/gdbinit-history.exp failed:
>>>
>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>>>
>>> The gdb.log is here:
>>>
>>>   <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>
>>>
>>> I haven't really investigated to determine what's going on here, but let
>>> me know if you need any help with this.
>>
>> Thanks for the heads up.
>>
>> When doing gdb_exit followed by gdb_start, in the output log sometimes
>> we have (this is printed shortly before the first FAIL)
>>
>> (gdb) ...
>> Remote debugging from host 127.0.0.1
>> monitor exit
>> spawn ...
>> GNU gdb (GDB) 7.10.50.20150723-cvs
>> ...
>>
>> Other times we have (this is printed shortly before the second FAIL)
>>
>> (gdb) ...
>> Remote debugging from host 127.0.0.1
>> monitor exit
>> (gdb) spawn ...
>> GNU gdb (GDB) 7.10.50.20150723-cvs
>> ...
>>
>> The literal difference being the "(gdb) " prompt printed before the
>> "spawn" message.  In the first case (where the "(gdb) " prefix is not
>> there) the history file does not seem to be written/appended to.  In
>> the second case (when the "(gdb) " prefix is there) the history file
>> is properly written/appended to (but it still FAILs because we're
>> missing the command history from before the first case).  So the race,
>> if there is one, may have something to do with whether or not the
>> "(gdb) " prompt gets printed after doing "monitor exit".  Or maybe
>> not.  I'll do more analysis later.
>
> After further analysis I don't think there is any correlation between
> whether "(gdb) spawn ..." or  else "spawn ..." gets printed, and if a
> race between "gdb_exit" and "gdb_start" occurs.  In fact, I don't
> think there even is a race between gdb_exit and gdb_start.  If I add
> "after 100" (i.e. TCL's way of sleeping 100ms) in the middle of such
> sequences (i.e. gdb_exit; after 100; gdb_start) in
> gdbinit-history.exp, I can still reproduce the intermittent FAILs.
>
> So it may be the case that sometimes gdb_exit does not kill the GDB
> process properly, and by doing so the process doesn't get a chance to
> save to the history file.  And it only happens with
> extended-gdbserver, not with gdbserver or non-gdbserver.  A unique
> code path taken only by the extended-gdbserver target is the code in
> gdbserver-support.exp:gdb_exit guarded by "if {[info exists
> gdb_spawn_id] && [info exists server_spawn_id]}", and then the
> close_gdbserver proc that follows.  However if I just outright delete
> that code (which should make the exit logic nearly identical with the
> gdbserver target) I can still trigger the intermittent FAILs...  Maybe
> it's an issue in dejagnu?  Or could it be an obscure bug in GDB??

One thing I noticed while repeatedly running the test with
--target_board=native-extended-gdbserver --verbose --verbose is that
it sometimes takes a few seconds to kill the GDB process.  The output
log pauses at

Quitting /scratchpad/binutils-gdb-build/gdb/testsuite/../../gdb/gdb
-nw -nx -data-directory
/scratchpad/binutils-gdb-build/gdb/testsuite/../data-directory  -ex
"set auto-connect-native-target off"
Closing the remote shell exp8
doing kill, pid is 16925
pid is 16925
<PAUSE>

for 5 or more seconds.  I only see this happen with
native-extended-gdbserver.  I suppose it's waiting for the command
(which gets spawned by dejagnu in its standard_close proc)

    sh -c exec > /dev/null 2>&1 && (kill -2 -16925 || kill -2 16925)
&& sleep 5 && (kill -16925 || kill 16925) && sleep 5 && (kill -9
-16925 || kill -9 16925) &

to finish sleeping.
  
Patrick Palka Aug. 13, 2015, 3:18 p.m. UTC | #8
On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
<sergiodj@redhat.com> wrote:
> On Tuesday, June 16 2015, Patrick Palka wrote:
>
>> We still do not handle "set history size unlimited" correctly.  In
>> particular, after writing to the history file, we truncate the history
>> even if it is unlimited.
>>
>> This patch makes sure that we do not call history_truncate_file() if the
>> history is not stifled (i.e. if it's unlimited).  This bug causes the
>> history file to be truncated to zero on exit when one has "set history
>> size unlimited" in their gdbinit file.  Although this code exists in GDB
>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>> GDB 7.9 (PR gdb/17820).
>
> Hey Patrick,
>
> Looking at the BuildBot logs today, I found that this new test is
> failing occasionally on native-extended-gdbserver testing.  Take a look
> at the following build:
>
>   <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>
>
> You can see that gdb.base/gdbinit-history.exp failed:
>
>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>
> The gdb.log is here:
>
>   <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>
>
> I haven't really investigated to determine what's going on here, but let
> me know if you need any help with this.

Hi Sergio,

Have you seen these spurious FAILs pop up recently?  I wonder if the
fixes to SIGTERM handling I made a few weeks ago may have fixed this
as well.
  
Sergio Durigan Junior Aug. 13, 2015, 10:27 p.m. UTC | #9
On Thursday, August 13 2015, Patrick Palka wrote:

> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
> <sergiodj@redhat.com> wrote:
>> On Tuesday, June 16 2015, Patrick Palka wrote:
>>
>>> We still do not handle "set history size unlimited" correctly.  In
>>> particular, after writing to the history file, we truncate the history
>>> even if it is unlimited.
>>>
>>> This patch makes sure that we do not call history_truncate_file() if the
>>> history is not stifled (i.e. if it's unlimited).  This bug causes the
>>> history file to be truncated to zero on exit when one has "set history
>>> size unlimited" in their gdbinit file.  Although this code exists in GDB
>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>>> GDB 7.9 (PR gdb/17820).
>>
>> Hey Patrick,
>>
>> Looking at the BuildBot logs today, I found that this new test is
>> failing occasionally on native-extended-gdbserver testing.  Take a look
>> at the following build:
>>
>>   <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>
>>
>> You can see that gdb.base/gdbinit-history.exp failed:
>>
>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>>
>> The gdb.log is here:
>>
>>   <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>
>>
>> I haven't really investigated to determine what's going on here, but let
>> me know if you need any help with this.
>
> Hi Sergio,
>
> Have you seen these spurious FAILs pop up recently?  I wonder if the
> fixes to SIGTERM handling I made a few weeks ago may have fixed this
> as well.

Hey Patrick,

Yeah, I still see those FAIL's from time to time:

 <http://gdb-build.sergiodj.net/builders/Fedora-ppc64le-native-extended-gdbserver-m64/builds/1593>
 <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-extended-gdbserver-m64/builds/1674>

I may be wrong, but I noticed that the FAIL's happen only on the PPC
buildslaves.

Thanks,
  
Pedro Alves Aug. 13, 2015, 11:26 p.m. UTC | #10
On 08/13/2015 11:27 PM, Sergio Durigan Junior wrote:
> On Thursday, August 13 2015, Patrick Palka wrote:
> 
>> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
>> <sergiodj@redhat.com> wrote:
>>> On Tuesday, June 16 2015, Patrick Palka wrote:
>>>
>>>> We still do not handle "set history size unlimited" correctly.  In
>>>> particular, after writing to the history file, we truncate the history
>>>> even if it is unlimited.
>>>>
>>>> This patch makes sure that we do not call history_truncate_file() if the
>>>> history is not stifled (i.e. if it's unlimited).  This bug causes the
>>>> history file to be truncated to zero on exit when one has "set history
>>>> size unlimited" in their gdbinit file.  Although this code exists in GDB
>>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>>>> GDB 7.9 (PR gdb/17820).
>>>
>>> Hey Patrick,
>>>
>>> Looking at the BuildBot logs today, I found that this new test is
>>> failing occasionally on native-extended-gdbserver testing.  Take a look
>>> at the following build:
>>>
>>>   <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>
>>>
>>> You can see that gdb.base/gdbinit-history.exp failed:
>>>
>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>>>
>>> The gdb.log is here:
>>>
>>>   <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>
>>>
>>> I haven't really investigated to determine what's going on here, but let
>>> me know if you need any help with this.
>>
>> Hi Sergio,
>>
>> Have you seen these spurious FAILs pop up recently?  I wonder if the
>> fixes to SIGTERM handling I made a few weeks ago may have fixed this
>> as well.
> 
> Hey Patrick,
> 
> Yeah, I still see those FAIL's from time to time:
> 
>  <http://gdb-build.sergiodj.net/builders/Fedora-ppc64le-native-extended-gdbserver-m64/builds/1593>
>  <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-extended-gdbserver-m64/builds/1674>
> 
> I may be wrong, but I noticed that the FAIL's happen only on the PPC
> buildslaves.
> 

Here's a theory:

(gdb) server show commands
    1  set height 0
    2  set width 0
    3  target extended-remote localhost:2348
    4  print 1
    5  monitor exit
    6  set height 0
    7  set width 0
    8  target extended-remote localhost:2349
(gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
Remote debugging from host 127.0.0.1
monitor exit
(gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora
-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu
ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off
GNU gdb (GDB) 7.10.50.20150812-cvs
...
(gdb) set height 0
(gdb) set width 0
(gdb) spawn ../gdbserver/gdbserver --once --multi :2350
Listening on port 2350
target extended-remote localhost:2350
Remote debugging using localhost:2350
(gdb) server show commands
    1  set height 0
    2  set width 0
    3  target extended-remote localhost:2350
(gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands

Observations:

#1 - In the FAILing case, the history only shows commands invoked in that
  gdb session.  That means that either that GDB invocation failed to
  load the history file, or the previous invocation failed to write it.

#2 - dejagnu's standard_close, what is ultimately used to exit gdb,
  does this:

 2.1 - send gdb a SIGINT.  This might help by interrupting any command
   that gdb might be stuck in, getting it back to
   the event loop processing input.

 2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout.

 2.3 - if gdb doesn't exit after 5 seconds, kills it
     with SIGTERM.  If it still doesn't exit after another 5
     seconds, dejagnu kill it with SIGKILL.

 Unless you're running dejagnu git master, 2.1 and 2.2 happen
 in parallel.

 I'm assuming that 2.3 is not happening.  It's 2.2 that normally
 causes gdb to exit.  See event-top.c:stdin_event_handler.

- Because, due to 2.2, stdin/stdout are already closed when gdb is
  running the teardown code (quit_force, what saves history), any
  warning/output that gdb might output goes nowhere, it won't
  appear in gdb.log.

#3 - If you'll recall, not long ago we made the code that appends
     the commands to the history file roughly:

   - mv old-history-file old-history-file.tmp
   - append to old-history-file.tmp
   - mv old-history-file.tmp old-history-file

That is top.c:gdb_safe_append_history.

AFAICS, SIGINT is not set to SA_RESTART.  Now imagine what happens
if the SIGINT lands exactly just when gdb is about to call
rename here, in gdb_safe_append_history:

      ret = rename (local_history_filename, history_filename);
      saved_errno = errno;
      if (ret < 0 && saved_errno != EEXIST)
        warning (_("Could not rename %s to %s: %s"),
		 local_history_filename, history_filename,
		 safe_strerror (saved_errno));
    }

and rename fails with EINTR.  Then the following gdb invocation
would not find any previous history file to load, which would
perfectly explain that FAIL.

If we still have the build/test directory of that test run,
if the theory is right, I think we'd find a
leftover gdbinit-history.gdb_history-gdb-$PID~ file.

What doesn't look right in the theory is that the rename man
does not document rename as failing with EINTR...  But
maybe it does?

Thanks,
Pedro Alves
  
Sergio Durigan Junior Aug. 14, 2015, 12:46 a.m. UTC | #11
On Thursday, August 13 2015, Pedro Alves wrote:

> Here's a theory:
>
> (gdb) server show commands
>     1  set height 0
>     2  set width 0
>     3  target extended-remote localhost:2348
>     4  print 1
>     5  monitor exit
>     6  set height 0
>     7  set width 0
>     8  target extended-remote localhost:2349
> (gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
> Remote debugging from host 127.0.0.1
> monitor exit
> (gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora
> -ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu
> ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off
> GNU gdb (GDB) 7.10.50.20150812-cvs
> ...
> (gdb) set height 0
> (gdb) set width 0
> (gdb) spawn ../gdbserver/gdbserver --once --multi :2350
> Listening on port 2350
> target extended-remote localhost:2350
> Remote debugging using localhost:2350
> (gdb) server show commands
>     1  set height 0
>     2  set width 0
>     3  target extended-remote localhost:2350
> (gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>
> Observations:
>
> #1 - In the FAILing case, the history only shows commands invoked in that
>   gdb session.  That means that either that GDB invocation failed to
>   load the history file, or the previous invocation failed to write it.

Yeah, it seems to me it's the second option: GDB failed to write the
history file.

> #2 - dejagnu's standard_close, what is ultimately used to exit gdb,
>   does this:
>
>  2.1 - send gdb a SIGINT.  This might help by interrupting any command
>    that gdb might be stuck in, getting it back to
>    the event loop processing input.
>
>  2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout.
>
>  2.3 - if gdb doesn't exit after 5 seconds, kills it
>      with SIGTERM.  If it still doesn't exit after another 5
>      seconds, dejagnu kill it with SIGKILL.
>
>  Unless you're running dejagnu git master, 2.1 and 2.2 happen
>  in parallel.
>
>  I'm assuming that 2.3 is not happening.  It's 2.2 that normally
>  causes gdb to exit.  See event-top.c:stdin_event_handler.
>
> - Because, due to 2.2, stdin/stdout are already closed when gdb is
>   running the teardown code (quit_force, what saves history), any
>   warning/output that gdb might output goes nowhere, it won't
>   appear in gdb.log.
>
> #3 - If you'll recall, not long ago we made the code that appends
>      the commands to the history file roughly:
>
>    - mv old-history-file old-history-file.tmp
>    - append to old-history-file.tmp
>    - mv old-history-file.tmp old-history-file
>
> That is top.c:gdb_safe_append_history.
>
> AFAICS, SIGINT is not set to SA_RESTART.  Now imagine what happens
> if the SIGINT lands exactly just when gdb is about to call
> rename here, in gdb_safe_append_history:
>
>       ret = rename (local_history_filename, history_filename);
>       saved_errno = errno;
>       if (ret < 0 && saved_errno != EEXIST)
>         warning (_("Could not rename %s to %s: %s"),
> 		 local_history_filename, history_filename,
> 		 safe_strerror (saved_errno));
>     }
>
> and rename fails with EINTR.  Then the following gdb invocation
> would not find any previous history file to load, which would
> perfectly explain that FAIL.
>
> If we still have the build/test directory of that test run,
> if the theory is right, I think we'd find a
> leftover gdbinit-history.gdb_history-gdb-$PID~ file.

Unfortunately we don't have the build directory available anymore,
because BuildBot deletes it in the next build.  Even if we did, it would
require permission/help from the buildslave admin to access it...

> What doesn't look right in the theory is that the rename man
> does not document rename as failing with EINTR...  But
> maybe it does?

Yeah, well...  I've made a few tests here, and I could not trigger EINTR
with rename.  I have also looked at the Linux kernel source code and I
don't see (at least directly) a way for rename to error with EINTR:

  <http://lxr.free-electrons.com/source/fs/namei.c#L4398>

It really is an atomic operation.  Of course, there might be some quirks
on PPC GNU/Linux that I'm not aware of...
  
Patrick Palka Aug. 17, 2015, 1:15 p.m. UTC | #12
On Thu, Aug 13, 2015 at 7:26 PM, Pedro Alves <palves@redhat.com> wrote:
> On 08/13/2015 11:27 PM, Sergio Durigan Junior wrote:
>> On Thursday, August 13 2015, Patrick Palka wrote:
>>
>>> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
>>> <sergiodj@redhat.com> wrote:
>>>> On Tuesday, June 16 2015, Patrick Palka wrote:
>>>>
>>>>> We still do not handle "set history size unlimited" correctly.  In
>>>>> particular, after writing to the history file, we truncate the history
>>>>> even if it is unlimited.
>>>>>
>>>>> This patch makes sure that we do not call history_truncate_file() if the
>>>>> history is not stifled (i.e. if it's unlimited).  This bug causes the
>>>>> history file to be truncated to zero on exit when one has "set history
>>>>> size unlimited" in their gdbinit file.  Although this code exists in GDB
>>>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>>>>> GDB 7.9 (PR gdb/17820).
>>>>
>>>> Hey Patrick,
>>>>
>>>> Looking at the BuildBot logs today, I found that this new test is
>>>> failing occasionally on native-extended-gdbserver testing.  Take a look
>>>> at the following build:
>>>>
>>>>   <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>
>>>>
>>>> You can see that gdb.base/gdbinit-history.exp failed:
>>>>
>>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>>>>
>>>> The gdb.log is here:
>>>>
>>>>   <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>
>>>>
>>>> I haven't really investigated to determine what's going on here, but let
>>>> me know if you need any help with this.
>>>
>>> Hi Sergio,
>>>
>>> Have you seen these spurious FAILs pop up recently?  I wonder if the
>>> fixes to SIGTERM handling I made a few weeks ago may have fixed this
>>> as well.
>>
>> Hey Patrick,
>>
>> Yeah, I still see those FAIL's from time to time:
>>
>>  <http://gdb-build.sergiodj.net/builders/Fedora-ppc64le-native-extended-gdbserver-m64/builds/1593>
>>  <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-extended-gdbserver-m64/builds/1674>
>>
>> I may be wrong, but I noticed that the FAIL's happen only on the PPC
>> buildslaves.
>>
>
> Here's a theory:
>
> (gdb) server show commands
>     1  set height 0
>     2  set width 0
>     3  target extended-remote localhost:2348
>     4  print 1
>     5  monitor exit
>     6  set height 0
>     7  set width 0
>     8  target extended-remote localhost:2349
> (gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
> Remote debugging from host 127.0.0.1
> monitor exit
> (gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora
> -ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu
> ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off
> GNU gdb (GDB) 7.10.50.20150812-cvs
> ...
> (gdb) set height 0
> (gdb) set width 0
> (gdb) spawn ../gdbserver/gdbserver --once --multi :2350
> Listening on port 2350
> target extended-remote localhost:2350
> Remote debugging using localhost:2350
> (gdb) server show commands
>     1  set height 0
>     2  set width 0
>     3  target extended-remote localhost:2350
> (gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>
> Observations:
>
> #1 - In the FAILing case, the history only shows commands invoked in that
>   gdb session.  That means that either that GDB invocation failed to
>   load the history file, or the previous invocation failed to write it.
>
> #2 - dejagnu's standard_close, what is ultimately used to exit gdb,
>   does this:
>
>  2.1 - send gdb a SIGINT.  This might help by interrupting any command
>    that gdb might be stuck in, getting it back to
>    the event loop processing input.
>
>  2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout.
>
>  2.3 - if gdb doesn't exit after 5 seconds, kills it
>      with SIGTERM.  If it still doesn't exit after another 5
>      seconds, dejagnu kill it with SIGKILL.
>
>  Unless you're running dejagnu git master, 2.1 and 2.2 happen
>  in parallel.
>
>  I'm assuming that 2.3 is not happening.  It's 2.2 that normally
>  causes gdb to exit.  See event-top.c:stdin_event_handler.
>
> - Because, due to 2.2, stdin/stdout are already closed when gdb is
>   running the teardown code (quit_force, what saves history), any
>   warning/output that gdb might output goes nowhere, it won't
>   appear in gdb.log.
>
> #3 - If you'll recall, not long ago we made the code that appends
>      the commands to the history file roughly:
>
>    - mv old-history-file old-history-file.tmp
>    - append to old-history-file.tmp
>    - mv old-history-file.tmp old-history-file
>
> That is top.c:gdb_safe_append_history.
>
> AFAICS, SIGINT is not set to SA_RESTART.  Now imagine what happens
> if the SIGINT lands exactly just when gdb is about to call
> rename here, in gdb_safe_append_history:
>
>       ret = rename (local_history_filename, history_filename);
>       saved_errno = errno;
>       if (ret < 0 && saved_errno != EEXIST)
>         warning (_("Could not rename %s to %s: %s"),
>                  local_history_filename, history_filename,
>                  safe_strerror (saved_errno));
>     }
>
> and rename fails with EINTR.  Then the following gdb invocation
> would not find any previous history file to load, which would
> perfectly explain that FAIL.
>
> If we still have the build/test directory of that test run,
> if the theory is right, I think we'd find a
> leftover gdbinit-history.gdb_history-gdb-$PID~ file.
>
> What doesn't look right in the theory is that the rename man
> does not document rename as failing with EINTR...  But
> maybe it does?

Interesting theory.  I had assumed that the problem lies in the
testsuite driver, not in the GDB source code, because the problem only
manifests itself under native-extended-gdbserver.  If I understand
correctly, your explanation is also plausible under native or
native-gdbserver, yet the problem was not observed to ever manifest
itself under these targets.

Also, if rename fails with EINTR, wouldn't a "Could not rename %s to
%s: %s" warning be emitted in the gdb output log?
  
Patrick Palka Aug. 17, 2015, 1:28 p.m. UTC | #13
On Mon, Aug 17, 2015 at 9:15 AM, Patrick Palka <patrick@parcs.ath.cx> wrote:
> On Thu, Aug 13, 2015 at 7:26 PM, Pedro Alves <palves@redhat.com> wrote:
>> On 08/13/2015 11:27 PM, Sergio Durigan Junior wrote:
>>> On Thursday, August 13 2015, Patrick Palka wrote:
>>>
>>>> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
>>>> <sergiodj@redhat.com> wrote:
>>>>> On Tuesday, June 16 2015, Patrick Palka wrote:
>>>>>
>>>>>> We still do not handle "set history size unlimited" correctly.  In
>>>>>> particular, after writing to the history file, we truncate the history
>>>>>> even if it is unlimited.
>>>>>>
>>>>>> This patch makes sure that we do not call history_truncate_file() if the
>>>>>> history is not stifled (i.e. if it's unlimited).  This bug causes the
>>>>>> history file to be truncated to zero on exit when one has "set history
>>>>>> size unlimited" in their gdbinit file.  Although this code exists in GDB
>>>>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>>>>>> GDB 7.9 (PR gdb/17820).
>>>>>
>>>>> Hey Patrick,
>>>>>
>>>>> Looking at the BuildBot logs today, I found that this new test is
>>>>> failing occasionally on native-extended-gdbserver testing.  Take a look
>>>>> at the following build:
>>>>>
>>>>>   <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>
>>>>>
>>>>> You can see that gdb.base/gdbinit-history.exp failed:
>>>>>
>>>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>>>>>
>>>>> The gdb.log is here:
>>>>>
>>>>>   <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>
>>>>>
>>>>> I haven't really investigated to determine what's going on here, but let
>>>>> me know if you need any help with this.
>>>>
>>>> Hi Sergio,
>>>>
>>>> Have you seen these spurious FAILs pop up recently?  I wonder if the
>>>> fixes to SIGTERM handling I made a few weeks ago may have fixed this
>>>> as well.
>>>
>>> Hey Patrick,
>>>
>>> Yeah, I still see those FAIL's from time to time:
>>>
>>>  <http://gdb-build.sergiodj.net/builders/Fedora-ppc64le-native-extended-gdbserver-m64/builds/1593>
>>>  <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-extended-gdbserver-m64/builds/1674>
>>>
>>> I may be wrong, but I noticed that the FAIL's happen only on the PPC
>>> buildslaves.
>>>
>>
>> Here's a theory:
>>
>> (gdb) server show commands
>>     1  set height 0
>>     2  set width 0
>>     3  target extended-remote localhost:2348
>>     4  print 1
>>     5  monitor exit
>>     6  set height 0
>>     7  set width 0
>>     8  target extended-remote localhost:2349
>> (gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>> Remote debugging from host 127.0.0.1
>> monitor exit
>> (gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora
>> -ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu
>> ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off
>> GNU gdb (GDB) 7.10.50.20150812-cvs
>> ...
>> (gdb) set height 0
>> (gdb) set width 0
>> (gdb) spawn ../gdbserver/gdbserver --once --multi :2350
>> Listening on port 2350
>> target extended-remote localhost:2350
>> Remote debugging using localhost:2350
>> (gdb) server show commands
>>     1  set height 0
>>     2  set width 0
>>     3  target extended-remote localhost:2350
>> (gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>
>> Observations:
>>
>> #1 - In the FAILing case, the history only shows commands invoked in that
>>   gdb session.  That means that either that GDB invocation failed to
>>   load the history file, or the previous invocation failed to write it.
>>
>> #2 - dejagnu's standard_close, what is ultimately used to exit gdb,
>>   does this:
>>
>>  2.1 - send gdb a SIGINT.  This might help by interrupting any command
>>    that gdb might be stuck in, getting it back to
>>    the event loop processing input.
>>
>>  2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout.
>>
>>  2.3 - if gdb doesn't exit after 5 seconds, kills it
>>      with SIGTERM.  If it still doesn't exit after another 5
>>      seconds, dejagnu kill it with SIGKILL.
>>
>>  Unless you're running dejagnu git master, 2.1 and 2.2 happen
>>  in parallel.
>>
>>  I'm assuming that 2.3 is not happening.  It's 2.2 that normally
>>  causes gdb to exit.  See event-top.c:stdin_event_handler.
>>
>> - Because, due to 2.2, stdin/stdout are already closed when gdb is
>>   running the teardown code (quit_force, what saves history), any
>>   warning/output that gdb might output goes nowhere, it won't
>>   appear in gdb.log.
>>
>> #3 - If you'll recall, not long ago we made the code that appends
>>      the commands to the history file roughly:
>>
>>    - mv old-history-file old-history-file.tmp
>>    - append to old-history-file.tmp
>>    - mv old-history-file.tmp old-history-file
>>
>> That is top.c:gdb_safe_append_history.
>>
>> AFAICS, SIGINT is not set to SA_RESTART.  Now imagine what happens
>> if the SIGINT lands exactly just when gdb is about to call
>> rename here, in gdb_safe_append_history:
>>
>>       ret = rename (local_history_filename, history_filename);
>>       saved_errno = errno;
>>       if (ret < 0 && saved_errno != EEXIST)
>>         warning (_("Could not rename %s to %s: %s"),
>>                  local_history_filename, history_filename,
>>                  safe_strerror (saved_errno));
>>     }
>>
>> and rename fails with EINTR.  Then the following gdb invocation
>> would not find any previous history file to load, which would
>> perfectly explain that FAIL.
>>
>> If we still have the build/test directory of that test run,
>> if the theory is right, I think we'd find a
>> leftover gdbinit-history.gdb_history-gdb-$PID~ file.
>>
>> What doesn't look right in the theory is that the rename man
>> does not document rename as failing with EINTR...  But
>> maybe it does?
>
> Interesting theory.  I had assumed that the problem lies in the
> testsuite driver, not in the GDB source code, because the problem only
> manifests itself under native-extended-gdbserver.  If I understand
> correctly, your explanation is also plausible under native or
> native-gdbserver, yet the problem was not observed to ever manifest
> itself under these targets.
>
> Also, if rename fails with EINTR, wouldn't a "Could not rename %s to
> %s: %s" warning be emitted in the gdb output log?

Ah, you already addressed this: a warning is not emitted because
stdout is closed..

But because the problem only occurs under extended-gdbserver, I'm
inclined to think the issue is with the testsuite driver, in
particular with the gdb_exit implementation in
lib/gdbserver-support.exp.  One potential issue I notice in this proc
is that when we send "monitor exit" to GDB, we don't necessarily wait
for the command to finish (i.e. for the gdb prompt to get printed).
As soon as the server is observed to get killed, we continue with
exiting.  Dunno if that's substantial..
  
Pedro Alves Aug. 17, 2015, 2:02 p.m. UTC | #14
On 08/17/2015 02:28 PM, Patrick Palka wrote:

> Ah, you already addressed this: a warning is not emitted because
> stdout is closed..

Yeah.

> But because the problem only occurs under extended-gdbserver, I'm
> inclined to think the issue is with the testsuite driver, in
> particular with the gdb_exit implementation in
> lib/gdbserver-support.exp.  One potential issue I notice in this proc
> is that when we send "monitor exit" to GDB, we don't necessarily wait
> for the command to finish (i.e. for the gdb prompt to get printed).
> As soon as the server is observed to get killed, we continue with
> exiting.  Dunno if that's substantial..

That's very plausible, at least.

Maybe that prompt got stuck in the expect buffer, and it confused
something else later on?

Another theory related to that could be that the new GDB started just
while the previous gdb is saving history and has just momentarily
renamed the history file to gdbinit-history.gdb_history-gdb-$PID~.
But AFAICS, that shouldn't happen because that gdb_exit calls
gdbserver_orig_gdb_exit at the end, which only returns after
the previous gdb exits...

Did anyone ever manage to reproduce this?

One thing I'd try is making dejagnu's local_exec (close_wait_program in master)
print the result of the "wait -i".  That will show whether gdb exited
due to a normal exit, or whether it was killed by SIGTERM or SIGKILL.
And then I'd try hacking gdb_safe_append_history to output debug logs
to a file instead of stdout (e.g., /tmp/gdb-log).

Another would be to add a "show history filename" to the test, to make sure
that the gdb that fails to load the previous history actually tried to
read the file we expect it to be reading.

Also, I think it's time to try to get all the buildslaves to use
dejagnu master, to pick up http://lists.gnu.org/archive/html/dejagnu/2015-07/msg00005.html.
Who knows, maybe that race/rogue kill could also explain this problem.
The x86_64 Fedora slaves have been running with that for a while, and
we no longer see attach-many-short-lived-threads.exp failures there, and
we keep seeing them on the other slaves (which don't have that fix).

Thanks,
Pedro Alves
  
Patrick Palka Aug. 17, 2015, 8:10 p.m. UTC | #15
On Mon, Aug 17, 2015 at 10:02 AM, Pedro Alves <palves@redhat.com> wrote:
> On 08/17/2015 02:28 PM, Patrick Palka wrote:
>
>> Ah, you already addressed this: a warning is not emitted because
>> stdout is closed..
>
> Yeah.
>
>> But because the problem only occurs under extended-gdbserver, I'm
>> inclined to think the issue is with the testsuite driver, in
>> particular with the gdb_exit implementation in
>> lib/gdbserver-support.exp.  One potential issue I notice in this proc
>> is that when we send "monitor exit" to GDB, we don't necessarily wait
>> for the command to finish (i.e. for the gdb prompt to get printed).
>> As soon as the server is observed to get killed, we continue with
>> exiting.  Dunno if that's substantial..
>
> That's very plausible, at least.
>
> Maybe that prompt got stuck in the expect buffer, and it confused
> something else later on?
>
> Another theory related to that could be that the new GDB started just
> while the previous gdb is saving history and has just momentarily
> renamed the history file to gdbinit-history.gdb_history-gdb-$PID~.
> But AFAICS, that shouldn't happen because that gdb_exit calls
> gdbserver_orig_gdb_exit at the end, which only returns after
> the previous gdb exits...
>
> Did anyone ever manage to reproduce this?

Yes, I can reproduce it on x86_64 Debian with dejagnu version 1.5.3.
But it's pretty rare.  I just do

  while make check RUNTESTFLAGS="gdbinit-history.exp
--target_board=native-extended-gdbserver --verbose --verbose"; do :;
done

and wait for a while (20 minutes or so).

But I just locally installed dejagnu from git, and have had the above
command running for about 30 minutes (after confirming that my local
copy, not system copy, of dejagnu is being used by the testsuite), and
fortunately, I could no longer reproduce these FAILs.  So it seems
that your changes to dejagnu fixed this spurious FAIL as well.  I'll
let it keep spinning though, just in case.


>
> One thing I'd try is making dejagnu's local_exec (close_wait_program in master)
> print the result of the "wait -i".  That will show whether gdb exited
> due to a normal exit, or whether it was killed by SIGTERM or SIGKILL.
> And then I'd try hacking gdb_safe_append_history to output debug logs
> to a file instead of stdout (e.g., /tmp/gdb-log).
>
> Another would be to add a "show history filename" to the test, to make sure
> that the gdb that fails to load the previous history actually tried to
> read the file we expect it to be reading.

Ok, I'll keep these approaches in mind but if I manage to trip over it
even with dejagnu from git.
  
Patrick Palka Aug. 17, 2015, 11:29 p.m. UTC | #16
On Mon, Aug 17, 2015 at 4:10 PM, Patrick Palka <patrick@parcs.ath.cx> wrote:
> On Mon, Aug 17, 2015 at 10:02 AM, Pedro Alves <palves@redhat.com> wrote:
>> On 08/17/2015 02:28 PM, Patrick Palka wrote:
>>
>>> Ah, you already addressed this: a warning is not emitted because
>>> stdout is closed..
>>
>> Yeah.
>>
>>> But because the problem only occurs under extended-gdbserver, I'm
>>> inclined to think the issue is with the testsuite driver, in
>>> particular with the gdb_exit implementation in
>>> lib/gdbserver-support.exp.  One potential issue I notice in this proc
>>> is that when we send "monitor exit" to GDB, we don't necessarily wait
>>> for the command to finish (i.e. for the gdb prompt to get printed).
>>> As soon as the server is observed to get killed, we continue with
>>> exiting.  Dunno if that's substantial..
>>
>> That's very plausible, at least.
>>
>> Maybe that prompt got stuck in the expect buffer, and it confused
>> something else later on?
>>
>> Another theory related to that could be that the new GDB started just
>> while the previous gdb is saving history and has just momentarily
>> renamed the history file to gdbinit-history.gdb_history-gdb-$PID~.
>> But AFAICS, that shouldn't happen because that gdb_exit calls
>> gdbserver_orig_gdb_exit at the end, which only returns after
>> the previous gdb exits...
>>
>> Did anyone ever manage to reproduce this?
>
> Yes, I can reproduce it on x86_64 Debian with dejagnu version 1.5.3.
> But it's pretty rare.  I just do
>
>   while make check RUNTESTFLAGS="gdbinit-history.exp
> --target_board=native-extended-gdbserver --verbose --verbose"; do :;
> done
>
> and wait for a while (20 minutes or so).
>
> But I just locally installed dejagnu from git, and have had the above
> command running for about 30 minutes (after confirming that my local
> copy, not system copy, of dejagnu is being used by the testsuite), and
> fortunately, I could no longer reproduce these FAILs.  So it seems
> that your changes to dejagnu fixed this spurious FAIL as well.  I'll
> let it keep spinning though, just in case.
>
>
>>
>> One thing I'd try is making dejagnu's local_exec (close_wait_program in master)
>> print the result of the "wait -i".  That will show whether gdb exited
>> due to a normal exit, or whether it was killed by SIGTERM or SIGKILL.
>> And then I'd try hacking gdb_safe_append_history to output debug logs
>> to a file instead of stdout (e.g., /tmp/gdb-log).
>>
>> Another would be to add a "show history filename" to the test, to make sure
>> that the gdb that fails to load the previous history actually tried to
>> read the file we expect it to be reading.
>
> Ok, I'll keep these approaches in mind but if I manage to trip over it
> even with dejagnu from git.

Sorry, botched that last sentence there...

Anyway, with dejagnu from git I could no longer reproduce these FAILs
after 3 straight hours of repeatedly running the test.
  
Pedro Alves Sept. 10, 2015, 1:44 p.m. UTC | #17
On 08/18/2015 12:29 AM, Patrick Palka wrote:
> On Mon, Aug 17, 2015 at 4:10 PM, Patrick Palka <patrick@parcs.ath.cx> wro

>> But I just locally installed dejagnu from git, and have had the above
>> command running for about 30 minutes (after confirming that my local
>> copy, not system copy, of dejagnu is being used by the testsuite), and
>> fortunately, I could no longer reproduce these FAILs.  So it seems
>> that your changes to dejagnu fixed this spurious FAIL as well.  I'll
>> let it keep spinning though, just in case.

> Anyway, with dejagnu from git I could no longer reproduce these FAILs
> after 3 straight hours of repeatedly running the test.

(almost a month passed somehow, but ...)   Hurray!

The DejaGnu patch has been backported to Fedora 21-23 meanwhile, so hopefully
these issues will start to fade away.  Someone might want to contact the
debian / ubuntu dejagnu maintainers to do the same.

Thanks,
Pedro Alves
  
Sergio Durigan Junior Sept. 10, 2015, 3:30 p.m. UTC | #18
On Thursday, September 10 2015, Pedro Alves wrote:

> On 08/18/2015 12:29 AM, Patrick Palka wrote:
>> On Mon, Aug 17, 2015 at 4:10 PM, Patrick Palka <patrick@parcs.ath.cx> wro
>
>>> But I just locally installed dejagnu from git, and have had the above
>>> command running for about 30 minutes (after confirming that my local
>>> copy, not system copy, of dejagnu is being used by the testsuite), and
>>> fortunately, I could no longer reproduce these FAILs.  So it seems
>>> that your changes to dejagnu fixed this spurious FAIL as well.  I'll
>>> let it keep spinning though, just in case.
>
>> Anyway, with dejagnu from git I could no longer reproduce these FAILs
>> after 3 straight hours of repeatedly running the test.
>
> (almost a month passed somehow, but ...)   Hurray!

Cool.  The Fedora buildslaves will update automatically as soon as this
version leaves updates-testing.

> The DejaGnu patch has been backported to Fedora 21-23 meanwhile, so hopefully
> these issues will start to fade away.  Someone might want to contact the
> debian / ubuntu dejagnu maintainers to do the same.

I can do that.

Thanks,
  

Patch

diff --git a/gdb/testsuite/gdb.base/gdbinit-history.exp b/gdb/testsuite/gdb.base/gdbinit-history.exp
index 8adfd68..83abd35 100644
--- a/gdb/testsuite/gdb.base/gdbinit-history.exp
+++ b/gdb/testsuite/gdb.base/gdbinit-history.exp
@@ -57,5 +57,59 @@  proc test_gdbinit_history_setting { home size } {
     array set env [array get old_env]
 }
 
+# Check that the history file does not get truncated to zero when a gdbinit
+# file sets the history size to unlimited.
+
+proc test_no_truncation_of_unlimited_history_file { } {
+    global env
+    global INTERNAL_GDBFLAGS
+
+    array set old_env [array get env]
+
+    # The HISTSIZE environment variable takes precedence over whatever
+    # history size is set in .gdbinit.  Make sure the former is not
+    # set.
+    unset -nocomplain env(HISTSIZE)
+
+    set saved_internal_gdbflags $INTERNAL_GDBFLAGS
+
+    set temp_gdbinit [standard_output_file "gdbinit-history.gdbinit"]
+    set temp_histfile [standard_output_file "gdbinit-history.gdb_history"]
+    file delete $temp_gdbinit
+    file delete $temp_histfile
+
+    set fd [open $temp_gdbinit "w"]
+    puts $fd "set history size unlimited\n"
+    puts $fd "set history filename $temp_histfile\n"
+    puts $fd "set history save\n"
+    close $fd
+
+    append INTERNAL_GDBFLAGS " -x $temp_gdbinit"
+
+    # We have to start then exit GDB twice: the first time to test the creation
+    # of the initial history file, and the second time to test appending to it.
+    # In either case the initial "print 1" command should persist through the
+    # history file.
+    with_test_prefix "truncation" {
+        gdb_exit
+        gdb_start
+        gdb_test "print 1"
+
+        gdb_exit
+        gdb_start
+        gdb_test "server show commands" "    .  print 1.*"
+
+        gdb_exit
+        gdb_start
+        gdb_test "server show commands" "    .  print 1.*"
+    }
+
+    set INTERNAL_GDBFLAGS $saved_internal_gdbflags
+
+    array set env [array get old_env]
+}
+
 test_gdbinit_history_setting "gdbinit-history/unlimited" "unlimited"
 test_gdbinit_history_setting "gdbinit-history/zero" "0"
+
+test_no_truncation_of_unlimited_history_file
diff --git a/gdb/top.c b/gdb/top.c
index 837bf16..f5a0819 100644
--- a/gdb/top.c
+++ b/gdb/top.c
@@ -981,7 +981,8 @@  gdb_safe_append_history (void)
       else
 	{
 	  append_history (command_count, local_history_filename);
-	  history_truncate_file (local_history_filename, history_max_entries);
+	  if (history_is_stifled ())
+	    history_truncate_file (local_history_filename, history_max_entries);
 	}
 
       ret = rename (local_history_filename, history_filename);