Message ID | 1434466413-28892-1-git-send-email-patrick@parcs.ath.cx |
---|---|
State | New, archived |
Headers |
Received: (qmail 85099 invoked by alias); 16 Jun 2015 14:53:45 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 85089 invoked by uid 89); 16 Jun 2015 14:53:44 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL, BAYES_20, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 X-HELO: mail-qc0-f176.google.com Received: from mail-qc0-f176.google.com (HELO mail-qc0-f176.google.com) (209.85.216.176) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 16 Jun 2015 14:53:43 +0000 Received: by qcej3 with SMTP id j3so4907711qce.3 for <gdb-patches@sourceware.org>; Tue, 16 Jun 2015 07:53:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=0GM4IgKDtm3fTOaAG5KUm+P2SXKSHFoRZ8s2cE6deNY=; b=bfNqzCar205VBA34RMbhOn33H1Y+11PYDDq2IL7JFRVBVlvYQvlsWtdIAjdhZsHIrA xALBI+7GVn2l9Wy45sFr3l+KiTYPNpeJ+79KG7sC4UZP2wIeDz3CNqjRsYsOColEE8NE vF/nxUaKQNp8AxkQG6Tm3jfUY00qJVOT3Ofe2kTTT8U5xvukCqWQ3hKb0wr//zyXFesg LDZr9lvSJb0XH5U25ZpYvjqG9ex2ubYAysAcs9Yk+WD5Qz+MjG3Y8Y4HbuONryEM2JNl RXjx3quJSK91arRGKBOWCpsxd5ZBY3Nr6HU3uT5XhTTYG/o1Bj1U6zclnMZkwdV5FDGK ez2w== X-Gm-Message-State: ALoCoQm7j/2r0BUpO4c0U8ILqyksFoPZACTE99byNEiedAxf29yKBhOyYJVLpMn6IRdeiOrgLSRk X-Received: by 10.55.27.162 with SMTP id m34mr1940214qkh.94.1434466421259; Tue, 16 Jun 2015 07:53:41 -0700 (PDT) Received: from localhost.localdomain (ool-4353acd8.dyn.optonline.net. [67.83.172.216]) by mx.google.com with ESMTPSA id r75sm438609qkh.6.2015.06.16.07.53.40 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 16 Jun 2015 07:53:40 -0700 (PDT) From: Patrick Palka <patrick@parcs.ath.cx> To: gdb-patches@sourceware.org Cc: Patrick Palka <patrick@parcs.ath.cx> Subject: [PATCH] Don't truncate the history file when history size is unlimited Date: Tue, 16 Jun 2015 10:53:33 -0400 Message-Id: <1434466413-28892-1-git-send-email-patrick@parcs.ath.cx> In-Reply-To: <1433878062-23560-1-git-send-email-patrick@parcs.ath.cx> References: <1433878062-23560-1-git-send-email-patrick@parcs.ath.cx> |
Commit Message
Patrick Palka
June 16, 2015, 2:53 p.m. UTC
We still do not handle "set history size unlimited" correctly. In particular, after writing to the history file, we truncate the history even if it is unlimited. This patch makes sure that we do not call history_truncate_file() if the history is not stifled (i.e. if it's unlimited). This bug causes the history file to be truncated to zero on exit when one has "set history size unlimited" in their gdbinit file. Although this code exists in GDB 7.8, the bug is masked by a pre-existing bug that's been only fixed in GDB 7.9 (PR gdb/17820). gdb/ChangeLog: * top.c (gdb_safe_append_history): Do not call history_truncate_file if the history is not stifled. gdb/testsuite/ChangeLog: * gdb.base/gdbinit-history.exp: Add test case to check that an unlimited history file does not get truncated on exit. --- gdb/testsuite/gdb.base/gdbinit-history.exp | 54 ++++++++++++++++++++++++++++++ gdb/top.c | 3 +- 2 files changed, 56 insertions(+), 1 deletion(-)
Comments
On Tue, Jun 16, 2015 at 10:53 AM, Patrick Palka <patrick@parcs.ath.cx> wrote: > We still do not handle "set history size unlimited" correctly. In > particular, after writing to the history file, we truncate the history > even if it is unlimited. > > This patch makes sure that we do not call history_truncate_file() if the > history is not stifled (i.e. if it's unlimited). This bug causes the > history file to be truncated to zero on exit when one has "set history > size unlimited" in their gdbinit file. Although this code exists in GDB > 7.8, the bug is masked by a pre-existing bug that's been only fixed in > GDB 7.9 (PR gdb/17820). > > gdb/ChangeLog: > > * top.c (gdb_safe_append_history): Do not call > history_truncate_file if the history is not stifled. > > gdb/testsuite/ChangeLog: > > * gdb.base/gdbinit-history.exp: Add test case to check that > an unlimited history file does not get truncated on exit. I had written some extra stuff in the commit message within the patch file itself and then I overwritten it by running "git format-patch" again... This is basically what I wrote: The reason I had so much trouble with writing the test case because I failed to realize that there are two code paths for writing to the history file: one for writing to a non-existing history file (i.e. creating one) and one for appending to an existing history file. The bug only manifests itself when appending to the history file, not when writing to it. Thus the test has to stop/start GDB twice in order to run both code paths and to trigger the bug. By the way, what do you think about unsetting HOME/HISTSIZE/GDBHISTFILE somewhere higher up in the testsuite infrastructure? It would be nice if individual tests did not have to worry about such environment variables being leaked from userspace.
On 06/16/2015 03:53 PM, Patrick Palka wrote: > + # We have to start then exit GDB twice: the first time to test the creation > + # of the initial history file, and the second time to test appending to it. > + # In either case the initial "print 1" command should persist through the > + # history file. > + with_test_prefix "truncation" { > + gdb_exit > + gdb_start > + gdb_test "print 1" > + > + gdb_exit > + gdb_start > + gdb_test "server show commands" " . print 1.*" > + > + gdb_exit > + gdb_start > + gdb_test "server show commands" " . print 1.*" These two "show commands" tests have the same output message. The patch is OK with that fixed. Thanks for following through with the test. Thanks, Pedro Alves
On 06/16/2015 04:00 PM, Patrick Palka wrote: > By the way, what do you think about unsetting > HOME/HISTSIZE/GDBHISTFILE somewhere higher up in the testsuite > infrastructure? It would be nice if individual tests did not have to > worry about such environment variables being leaked from userspace. On first blush, that sounds like a good idea. Though I wonder whether that could have unintended consequences. We won't know until we try, I guess. Does dejagnu need HOME set? Does Python/Scheme? Unsetting the GDB-specific GDBHISTSIZE/GDBHISTFILE vars sounds like a no brainer. For HISTSIZE, should we audit tests that might start an interactive shell (e.g., "(gdb) shell") and result in truncating the history file? Guess better would be to set HISTSIZE="" and HISTFILE=/dev/null or something like that so shells we start don't have a chance of mucking with the user's history files? Urgh, yes, we already have that issue today: $ make check RUNTESTFLAGS="gdb.base/batch-preserve-term-settings.exp" $ tail -n 5 ~/.bash_history stty || echo "not found" /home/pedro/gdb/mygit/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/pedro/gdb/mygit/build/gdb/testsuite/../data-directory -batch -ex "set height unlimited" -ex "start" --args "/home/pedro/gdb/mygit/build/gdb/testsuite/gdb.base/batch-preserve-term-settings" echo test_echo stty || echo "not found" exit Whoops... Thanks, Pedro Alves
On Tuesday, June 16 2015, Patrick Palka wrote: > We still do not handle "set history size unlimited" correctly. In > particular, after writing to the history file, we truncate the history > even if it is unlimited. > > This patch makes sure that we do not call history_truncate_file() if the > history is not stifled (i.e. if it's unlimited). This bug causes the > history file to be truncated to zero on exit when one has "set history > size unlimited" in their gdbinit file. Although this code exists in GDB > 7.8, the bug is masked by a pre-existing bug that's been only fixed in > GDB 7.9 (PR gdb/17820). Hey Patrick, Looking at the BuildBot logs today, I found that this new test is failing occasionally on native-extended-gdbserver testing. Take a look at the following build: <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429> You can see that gdb.base/gdbinit-history.exp failed: PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands The gdb.log is here: <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3> I haven't really investigated to determine what's going on here, but let me know if you need any help with this. Thanks,
On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior <sergiodj@redhat.com> wrote: > On Tuesday, June 16 2015, Patrick Palka wrote: > >> We still do not handle "set history size unlimited" correctly. In >> particular, after writing to the history file, we truncate the history >> even if it is unlimited. >> >> This patch makes sure that we do not call history_truncate_file() if the >> history is not stifled (i.e. if it's unlimited). This bug causes the >> history file to be truncated to zero on exit when one has "set history >> size unlimited" in their gdbinit file. Although this code exists in GDB >> 7.8, the bug is masked by a pre-existing bug that's been only fixed in >> GDB 7.9 (PR gdb/17820). > > Hey Patrick, > > Looking at the BuildBot logs today, I found that this new test is > failing occasionally on native-extended-gdbserver testing. Take a look > at the following build: > > <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429> > > You can see that gdb.base/gdbinit-history.exp failed: > > PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands > PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands > > The gdb.log is here: > > <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3> > > I haven't really investigated to determine what's going on here, but let > me know if you need any help with this. Thanks for the heads up. When doing gdb_exit followed by gdb_start, in the output log sometimes we have (this is printed shortly before the first FAIL) (gdb) ... Remote debugging from host 127.0.0.1 monitor exit spawn ... GNU gdb (GDB) 7.10.50.20150723-cvs ... Other times we have (this is printed shortly before the second FAIL) (gdb) ... Remote debugging from host 127.0.0.1 monitor exit (gdb) spawn ... GNU gdb (GDB) 7.10.50.20150723-cvs ... The literal difference being the "(gdb) " prompt printed before the "spawn" message. In the first case (where the "(gdb) " prefix is not there) the history file does not seem to be written/appended to. In the second case (when the "(gdb) " prefix is there) the history file is properly written/appended to (but it still FAILs because we're missing the command history from before the first case). So the race, if there is one, may have something to do with whether or not the "(gdb) " prompt gets printed after doing "monitor exit". Or maybe not. I'll do more analysis later.
On Thu, Jul 23, 2015 at 3:33 PM, Patrick Palka <patrick@parcs.ath.cx> wrote: > On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior > <sergiodj@redhat.com> wrote: >> On Tuesday, June 16 2015, Patrick Palka wrote: >> >>> We still do not handle "set history size unlimited" correctly. In >>> particular, after writing to the history file, we truncate the history >>> even if it is unlimited. >>> >>> This patch makes sure that we do not call history_truncate_file() if the >>> history is not stifled (i.e. if it's unlimited). This bug causes the >>> history file to be truncated to zero on exit when one has "set history >>> size unlimited" in their gdbinit file. Although this code exists in GDB >>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in >>> GDB 7.9 (PR gdb/17820). >> >> Hey Patrick, >> >> Looking at the BuildBot logs today, I found that this new test is >> failing occasionally on native-extended-gdbserver testing. Take a look >> at the following build: >> >> <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429> >> >> You can see that gdb.base/gdbinit-history.exp failed: >> >> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands >> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands >> >> The gdb.log is here: >> >> <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3> >> >> I haven't really investigated to determine what's going on here, but let >> me know if you need any help with this. > > Thanks for the heads up. > > When doing gdb_exit followed by gdb_start, in the output log sometimes > we have (this is printed shortly before the first FAIL) > > (gdb) ... > Remote debugging from host 127.0.0.1 > monitor exit > spawn ... > GNU gdb (GDB) 7.10.50.20150723-cvs > ... > > Other times we have (this is printed shortly before the second FAIL) > > (gdb) ... > Remote debugging from host 127.0.0.1 > monitor exit > (gdb) spawn ... > GNU gdb (GDB) 7.10.50.20150723-cvs > ... > > The literal difference being the "(gdb) " prompt printed before the > "spawn" message. In the first case (where the "(gdb) " prefix is not > there) the history file does not seem to be written/appended to. In > the second case (when the "(gdb) " prefix is there) the history file > is properly written/appended to (but it still FAILs because we're > missing the command history from before the first case). So the race, > if there is one, may have something to do with whether or not the > "(gdb) " prompt gets printed after doing "monitor exit". Or maybe > not. I'll do more analysis later. After further analysis I don't think there is any correlation between whether "(gdb) spawn ..." or else "spawn ..." gets printed, and if a race between "gdb_exit" and "gdb_start" occurs. In fact, I don't think there even is a race between gdb_exit and gdb_start. If I add "after 100" (i.e. TCL's way of sleeping 100ms) in the middle of such sequences (i.e. gdb_exit; after 100; gdb_start) in gdbinit-history.exp, I can still reproduce the intermittent FAILs. So it may be the case that sometimes gdb_exit does not kill the GDB process properly, and by doing so the process doesn't get a chance to save to the history file. And it only happens with extended-gdbserver, not with gdbserver or non-gdbserver. A unique code path taken only by the extended-gdbserver target is the code in gdbserver-support.exp:gdb_exit guarded by "if {[info exists gdb_spawn_id] && [info exists server_spawn_id]}", and then the close_gdbserver proc that follows. However if I just outright delete that code (which should make the exit logic nearly identical with the gdbserver target) I can still trigger the intermittent FAILs... Maybe it's an issue in dejagnu? Or could it be an obscure bug in GDB??
On Fri, Jul 24, 2015 at 10:03 AM, Patrick Palka <patrick@parcs.ath.cx> wrote: > On Thu, Jul 23, 2015 at 3:33 PM, Patrick Palka <patrick@parcs.ath.cx> wrote: >> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior >> <sergiodj@redhat.com> wrote: >>> On Tuesday, June 16 2015, Patrick Palka wrote: >>> >>>> We still do not handle "set history size unlimited" correctly. In >>>> particular, after writing to the history file, we truncate the history >>>> even if it is unlimited. >>>> >>>> This patch makes sure that we do not call history_truncate_file() if the >>>> history is not stifled (i.e. if it's unlimited). This bug causes the >>>> history file to be truncated to zero on exit when one has "set history >>>> size unlimited" in their gdbinit file. Although this code exists in GDB >>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in >>>> GDB 7.9 (PR gdb/17820). >>> >>> Hey Patrick, >>> >>> Looking at the BuildBot logs today, I found that this new test is >>> failing occasionally on native-extended-gdbserver testing. Take a look >>> at the following build: >>> >>> <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429> >>> >>> You can see that gdb.base/gdbinit-history.exp failed: >>> >>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands >>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands >>> >>> The gdb.log is here: >>> >>> <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3> >>> >>> I haven't really investigated to determine what's going on here, but let >>> me know if you need any help with this. >> >> Thanks for the heads up. >> >> When doing gdb_exit followed by gdb_start, in the output log sometimes >> we have (this is printed shortly before the first FAIL) >> >> (gdb) ... >> Remote debugging from host 127.0.0.1 >> monitor exit >> spawn ... >> GNU gdb (GDB) 7.10.50.20150723-cvs >> ... >> >> Other times we have (this is printed shortly before the second FAIL) >> >> (gdb) ... >> Remote debugging from host 127.0.0.1 >> monitor exit >> (gdb) spawn ... >> GNU gdb (GDB) 7.10.50.20150723-cvs >> ... >> >> The literal difference being the "(gdb) " prompt printed before the >> "spawn" message. In the first case (where the "(gdb) " prefix is not >> there) the history file does not seem to be written/appended to. In >> the second case (when the "(gdb) " prefix is there) the history file >> is properly written/appended to (but it still FAILs because we're >> missing the command history from before the first case). So the race, >> if there is one, may have something to do with whether or not the >> "(gdb) " prompt gets printed after doing "monitor exit". Or maybe >> not. I'll do more analysis later. > > After further analysis I don't think there is any correlation between > whether "(gdb) spawn ..." or else "spawn ..." gets printed, and if a > race between "gdb_exit" and "gdb_start" occurs. In fact, I don't > think there even is a race between gdb_exit and gdb_start. If I add > "after 100" (i.e. TCL's way of sleeping 100ms) in the middle of such > sequences (i.e. gdb_exit; after 100; gdb_start) in > gdbinit-history.exp, I can still reproduce the intermittent FAILs. > > So it may be the case that sometimes gdb_exit does not kill the GDB > process properly, and by doing so the process doesn't get a chance to > save to the history file. And it only happens with > extended-gdbserver, not with gdbserver or non-gdbserver. A unique > code path taken only by the extended-gdbserver target is the code in > gdbserver-support.exp:gdb_exit guarded by "if {[info exists > gdb_spawn_id] && [info exists server_spawn_id]}", and then the > close_gdbserver proc that follows. However if I just outright delete > that code (which should make the exit logic nearly identical with the > gdbserver target) I can still trigger the intermittent FAILs... Maybe > it's an issue in dejagnu? Or could it be an obscure bug in GDB?? One thing I noticed while repeatedly running the test with --target_board=native-extended-gdbserver --verbose --verbose is that it sometimes takes a few seconds to kill the GDB process. The output log pauses at Quitting /scratchpad/binutils-gdb-build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /scratchpad/binutils-gdb-build/gdb/testsuite/../data-directory -ex "set auto-connect-native-target off" Closing the remote shell exp8 doing kill, pid is 16925 pid is 16925 <PAUSE> for 5 or more seconds. I only see this happen with native-extended-gdbserver. I suppose it's waiting for the command (which gets spawned by dejagnu in its standard_close proc) sh -c exec > /dev/null 2>&1 && (kill -2 -16925 || kill -2 16925) && sleep 5 && (kill -16925 || kill 16925) && sleep 5 && (kill -9 -16925 || kill -9 16925) & to finish sleeping.
On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior <sergiodj@redhat.com> wrote: > On Tuesday, June 16 2015, Patrick Palka wrote: > >> We still do not handle "set history size unlimited" correctly. In >> particular, after writing to the history file, we truncate the history >> even if it is unlimited. >> >> This patch makes sure that we do not call history_truncate_file() if the >> history is not stifled (i.e. if it's unlimited). This bug causes the >> history file to be truncated to zero on exit when one has "set history >> size unlimited" in their gdbinit file. Although this code exists in GDB >> 7.8, the bug is masked by a pre-existing bug that's been only fixed in >> GDB 7.9 (PR gdb/17820). > > Hey Patrick, > > Looking at the BuildBot logs today, I found that this new test is > failing occasionally on native-extended-gdbserver testing. Take a look > at the following build: > > <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429> > > You can see that gdb.base/gdbinit-history.exp failed: > > PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands > PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands > > The gdb.log is here: > > <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3> > > I haven't really investigated to determine what's going on here, but let > me know if you need any help with this. Hi Sergio, Have you seen these spurious FAILs pop up recently? I wonder if the fixes to SIGTERM handling I made a few weeks ago may have fixed this as well.
On Thursday, August 13 2015, Patrick Palka wrote: > On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior > <sergiodj@redhat.com> wrote: >> On Tuesday, June 16 2015, Patrick Palka wrote: >> >>> We still do not handle "set history size unlimited" correctly. In >>> particular, after writing to the history file, we truncate the history >>> even if it is unlimited. >>> >>> This patch makes sure that we do not call history_truncate_file() if the >>> history is not stifled (i.e. if it's unlimited). This bug causes the >>> history file to be truncated to zero on exit when one has "set history >>> size unlimited" in their gdbinit file. Although this code exists in GDB >>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in >>> GDB 7.9 (PR gdb/17820). >> >> Hey Patrick, >> >> Looking at the BuildBot logs today, I found that this new test is >> failing occasionally on native-extended-gdbserver testing. Take a look >> at the following build: >> >> <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429> >> >> You can see that gdb.base/gdbinit-history.exp failed: >> >> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands >> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands >> >> The gdb.log is here: >> >> <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3> >> >> I haven't really investigated to determine what's going on here, but let >> me know if you need any help with this. > > Hi Sergio, > > Have you seen these spurious FAILs pop up recently? I wonder if the > fixes to SIGTERM handling I made a few weeks ago may have fixed this > as well. Hey Patrick, Yeah, I still see those FAIL's from time to time: <http://gdb-build.sergiodj.net/builders/Fedora-ppc64le-native-extended-gdbserver-m64/builds/1593> <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-extended-gdbserver-m64/builds/1674> I may be wrong, but I noticed that the FAIL's happen only on the PPC buildslaves. Thanks,
On 08/13/2015 11:27 PM, Sergio Durigan Junior wrote: > On Thursday, August 13 2015, Patrick Palka wrote: > >> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior >> <sergiodj@redhat.com> wrote: >>> On Tuesday, June 16 2015, Patrick Palka wrote: >>> >>>> We still do not handle "set history size unlimited" correctly. In >>>> particular, after writing to the history file, we truncate the history >>>> even if it is unlimited. >>>> >>>> This patch makes sure that we do not call history_truncate_file() if the >>>> history is not stifled (i.e. if it's unlimited). This bug causes the >>>> history file to be truncated to zero on exit when one has "set history >>>> size unlimited" in their gdbinit file. Although this code exists in GDB >>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in >>>> GDB 7.9 (PR gdb/17820). >>> >>> Hey Patrick, >>> >>> Looking at the BuildBot logs today, I found that this new test is >>> failing occasionally on native-extended-gdbserver testing. Take a look >>> at the following build: >>> >>> <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429> >>> >>> You can see that gdb.base/gdbinit-history.exp failed: >>> >>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands >>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands >>> >>> The gdb.log is here: >>> >>> <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3> >>> >>> I haven't really investigated to determine what's going on here, but let >>> me know if you need any help with this. >> >> Hi Sergio, >> >> Have you seen these spurious FAILs pop up recently? I wonder if the >> fixes to SIGTERM handling I made a few weeks ago may have fixed this >> as well. > > Hey Patrick, > > Yeah, I still see those FAIL's from time to time: > > <http://gdb-build.sergiodj.net/builders/Fedora-ppc64le-native-extended-gdbserver-m64/builds/1593> > <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-extended-gdbserver-m64/builds/1674> > > I may be wrong, but I noticed that the FAIL's happen only on the PPC > buildslaves. > Here's a theory: (gdb) server show commands 1 set height 0 2 set width 0 3 target extended-remote localhost:2348 4 print 1 5 monitor exit 6 set height 0 7 set width 0 8 target extended-remote localhost:2349 (gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands Remote debugging from host 127.0.0.1 monitor exit (gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora -ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off GNU gdb (GDB) 7.10.50.20150812-cvs ... (gdb) set height 0 (gdb) set width 0 (gdb) spawn ../gdbserver/gdbserver --once --multi :2350 Listening on port 2350 target extended-remote localhost:2350 Remote debugging using localhost:2350 (gdb) server show commands 1 set height 0 2 set width 0 3 target extended-remote localhost:2350 (gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands Observations: #1 - In the FAILing case, the history only shows commands invoked in that gdb session. That means that either that GDB invocation failed to load the history file, or the previous invocation failed to write it. #2 - dejagnu's standard_close, what is ultimately used to exit gdb, does this: 2.1 - send gdb a SIGINT. This might help by interrupting any command that gdb might be stuck in, getting it back to the event loop processing input. 2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout. 2.3 - if gdb doesn't exit after 5 seconds, kills it with SIGTERM. If it still doesn't exit after another 5 seconds, dejagnu kill it with SIGKILL. Unless you're running dejagnu git master, 2.1 and 2.2 happen in parallel. I'm assuming that 2.3 is not happening. It's 2.2 that normally causes gdb to exit. See event-top.c:stdin_event_handler. - Because, due to 2.2, stdin/stdout are already closed when gdb is running the teardown code (quit_force, what saves history), any warning/output that gdb might output goes nowhere, it won't appear in gdb.log. #3 - If you'll recall, not long ago we made the code that appends the commands to the history file roughly: - mv old-history-file old-history-file.tmp - append to old-history-file.tmp - mv old-history-file.tmp old-history-file That is top.c:gdb_safe_append_history. AFAICS, SIGINT is not set to SA_RESTART. Now imagine what happens if the SIGINT lands exactly just when gdb is about to call rename here, in gdb_safe_append_history: ret = rename (local_history_filename, history_filename); saved_errno = errno; if (ret < 0 && saved_errno != EEXIST) warning (_("Could not rename %s to %s: %s"), local_history_filename, history_filename, safe_strerror (saved_errno)); } and rename fails with EINTR. Then the following gdb invocation would not find any previous history file to load, which would perfectly explain that FAIL. If we still have the build/test directory of that test run, if the theory is right, I think we'd find a leftover gdbinit-history.gdb_history-gdb-$PID~ file. What doesn't look right in the theory is that the rename man does not document rename as failing with EINTR... But maybe it does? Thanks, Pedro Alves
On Thursday, August 13 2015, Pedro Alves wrote: > Here's a theory: > > (gdb) server show commands > 1 set height 0 > 2 set width 0 > 3 target extended-remote localhost:2348 > 4 print 1 > 5 monitor exit > 6 set height 0 > 7 set width 0 > 8 target extended-remote localhost:2349 > (gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands > Remote debugging from host 127.0.0.1 > monitor exit > (gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora > -ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu > ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off > GNU gdb (GDB) 7.10.50.20150812-cvs > ... > (gdb) set height 0 > (gdb) set width 0 > (gdb) spawn ../gdbserver/gdbserver --once --multi :2350 > Listening on port 2350 > target extended-remote localhost:2350 > Remote debugging using localhost:2350 > (gdb) server show commands > 1 set height 0 > 2 set width 0 > 3 target extended-remote localhost:2350 > (gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands > > Observations: > > #1 - In the FAILing case, the history only shows commands invoked in that > gdb session. That means that either that GDB invocation failed to > load the history file, or the previous invocation failed to write it. Yeah, it seems to me it's the second option: GDB failed to write the history file. > #2 - dejagnu's standard_close, what is ultimately used to exit gdb, > does this: > > 2.1 - send gdb a SIGINT. This might help by interrupting any command > that gdb might be stuck in, getting it back to > the event loop processing input. > > 2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout. > > 2.3 - if gdb doesn't exit after 5 seconds, kills it > with SIGTERM. If it still doesn't exit after another 5 > seconds, dejagnu kill it with SIGKILL. > > Unless you're running dejagnu git master, 2.1 and 2.2 happen > in parallel. > > I'm assuming that 2.3 is not happening. It's 2.2 that normally > causes gdb to exit. See event-top.c:stdin_event_handler. > > - Because, due to 2.2, stdin/stdout are already closed when gdb is > running the teardown code (quit_force, what saves history), any > warning/output that gdb might output goes nowhere, it won't > appear in gdb.log. > > #3 - If you'll recall, not long ago we made the code that appends > the commands to the history file roughly: > > - mv old-history-file old-history-file.tmp > - append to old-history-file.tmp > - mv old-history-file.tmp old-history-file > > That is top.c:gdb_safe_append_history. > > AFAICS, SIGINT is not set to SA_RESTART. Now imagine what happens > if the SIGINT lands exactly just when gdb is about to call > rename here, in gdb_safe_append_history: > > ret = rename (local_history_filename, history_filename); > saved_errno = errno; > if (ret < 0 && saved_errno != EEXIST) > warning (_("Could not rename %s to %s: %s"), > local_history_filename, history_filename, > safe_strerror (saved_errno)); > } > > and rename fails with EINTR. Then the following gdb invocation > would not find any previous history file to load, which would > perfectly explain that FAIL. > > If we still have the build/test directory of that test run, > if the theory is right, I think we'd find a > leftover gdbinit-history.gdb_history-gdb-$PID~ file. Unfortunately we don't have the build directory available anymore, because BuildBot deletes it in the next build. Even if we did, it would require permission/help from the buildslave admin to access it... > What doesn't look right in the theory is that the rename man > does not document rename as failing with EINTR... But > maybe it does? Yeah, well... I've made a few tests here, and I could not trigger EINTR with rename. I have also looked at the Linux kernel source code and I don't see (at least directly) a way for rename to error with EINTR: <http://lxr.free-electrons.com/source/fs/namei.c#L4398> It really is an atomic operation. Of course, there might be some quirks on PPC GNU/Linux that I'm not aware of...
On Thu, Aug 13, 2015 at 7:26 PM, Pedro Alves <palves@redhat.com> wrote: > On 08/13/2015 11:27 PM, Sergio Durigan Junior wrote: >> On Thursday, August 13 2015, Patrick Palka wrote: >> >>> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior >>> <sergiodj@redhat.com> wrote: >>>> On Tuesday, June 16 2015, Patrick Palka wrote: >>>> >>>>> We still do not handle "set history size unlimited" correctly. In >>>>> particular, after writing to the history file, we truncate the history >>>>> even if it is unlimited. >>>>> >>>>> This patch makes sure that we do not call history_truncate_file() if the >>>>> history is not stifled (i.e. if it's unlimited). This bug causes the >>>>> history file to be truncated to zero on exit when one has "set history >>>>> size unlimited" in their gdbinit file. Although this code exists in GDB >>>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in >>>>> GDB 7.9 (PR gdb/17820). >>>> >>>> Hey Patrick, >>>> >>>> Looking at the BuildBot logs today, I found that this new test is >>>> failing occasionally on native-extended-gdbserver testing. Take a look >>>> at the following build: >>>> >>>> <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429> >>>> >>>> You can see that gdb.base/gdbinit-history.exp failed: >>>> >>>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands >>>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands >>>> >>>> The gdb.log is here: >>>> >>>> <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3> >>>> >>>> I haven't really investigated to determine what's going on here, but let >>>> me know if you need any help with this. >>> >>> Hi Sergio, >>> >>> Have you seen these spurious FAILs pop up recently? I wonder if the >>> fixes to SIGTERM handling I made a few weeks ago may have fixed this >>> as well. >> >> Hey Patrick, >> >> Yeah, I still see those FAIL's from time to time: >> >> <http://gdb-build.sergiodj.net/builders/Fedora-ppc64le-native-extended-gdbserver-m64/builds/1593> >> <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-extended-gdbserver-m64/builds/1674> >> >> I may be wrong, but I noticed that the FAIL's happen only on the PPC >> buildslaves. >> > > Here's a theory: > > (gdb) server show commands > 1 set height 0 > 2 set width 0 > 3 target extended-remote localhost:2348 > 4 print 1 > 5 monitor exit > 6 set height 0 > 7 set width 0 > 8 target extended-remote localhost:2349 > (gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands > Remote debugging from host 127.0.0.1 > monitor exit > (gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora > -ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu > ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off > GNU gdb (GDB) 7.10.50.20150812-cvs > ... > (gdb) set height 0 > (gdb) set width 0 > (gdb) spawn ../gdbserver/gdbserver --once --multi :2350 > Listening on port 2350 > target extended-remote localhost:2350 > Remote debugging using localhost:2350 > (gdb) server show commands > 1 set height 0 > 2 set width 0 > 3 target extended-remote localhost:2350 > (gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands > > Observations: > > #1 - In the FAILing case, the history only shows commands invoked in that > gdb session. That means that either that GDB invocation failed to > load the history file, or the previous invocation failed to write it. > > #2 - dejagnu's standard_close, what is ultimately used to exit gdb, > does this: > > 2.1 - send gdb a SIGINT. This might help by interrupting any command > that gdb might be stuck in, getting it back to > the event loop processing input. > > 2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout. > > 2.3 - if gdb doesn't exit after 5 seconds, kills it > with SIGTERM. If it still doesn't exit after another 5 > seconds, dejagnu kill it with SIGKILL. > > Unless you're running dejagnu git master, 2.1 and 2.2 happen > in parallel. > > I'm assuming that 2.3 is not happening. It's 2.2 that normally > causes gdb to exit. See event-top.c:stdin_event_handler. > > - Because, due to 2.2, stdin/stdout are already closed when gdb is > running the teardown code (quit_force, what saves history), any > warning/output that gdb might output goes nowhere, it won't > appear in gdb.log. > > #3 - If you'll recall, not long ago we made the code that appends > the commands to the history file roughly: > > - mv old-history-file old-history-file.tmp > - append to old-history-file.tmp > - mv old-history-file.tmp old-history-file > > That is top.c:gdb_safe_append_history. > > AFAICS, SIGINT is not set to SA_RESTART. Now imagine what happens > if the SIGINT lands exactly just when gdb is about to call > rename here, in gdb_safe_append_history: > > ret = rename (local_history_filename, history_filename); > saved_errno = errno; > if (ret < 0 && saved_errno != EEXIST) > warning (_("Could not rename %s to %s: %s"), > local_history_filename, history_filename, > safe_strerror (saved_errno)); > } > > and rename fails with EINTR. Then the following gdb invocation > would not find any previous history file to load, which would > perfectly explain that FAIL. > > If we still have the build/test directory of that test run, > if the theory is right, I think we'd find a > leftover gdbinit-history.gdb_history-gdb-$PID~ file. > > What doesn't look right in the theory is that the rename man > does not document rename as failing with EINTR... But > maybe it does? Interesting theory. I had assumed that the problem lies in the testsuite driver, not in the GDB source code, because the problem only manifests itself under native-extended-gdbserver. If I understand correctly, your explanation is also plausible under native or native-gdbserver, yet the problem was not observed to ever manifest itself under these targets. Also, if rename fails with EINTR, wouldn't a "Could not rename %s to %s: %s" warning be emitted in the gdb output log?
On Mon, Aug 17, 2015 at 9:15 AM, Patrick Palka <patrick@parcs.ath.cx> wrote: > On Thu, Aug 13, 2015 at 7:26 PM, Pedro Alves <palves@redhat.com> wrote: >> On 08/13/2015 11:27 PM, Sergio Durigan Junior wrote: >>> On Thursday, August 13 2015, Patrick Palka wrote: >>> >>>> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior >>>> <sergiodj@redhat.com> wrote: >>>>> On Tuesday, June 16 2015, Patrick Palka wrote: >>>>> >>>>>> We still do not handle "set history size unlimited" correctly. In >>>>>> particular, after writing to the history file, we truncate the history >>>>>> even if it is unlimited. >>>>>> >>>>>> This patch makes sure that we do not call history_truncate_file() if the >>>>>> history is not stifled (i.e. if it's unlimited). This bug causes the >>>>>> history file to be truncated to zero on exit when one has "set history >>>>>> size unlimited" in their gdbinit file. Although this code exists in GDB >>>>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in >>>>>> GDB 7.9 (PR gdb/17820). >>>>> >>>>> Hey Patrick, >>>>> >>>>> Looking at the BuildBot logs today, I found that this new test is >>>>> failing occasionally on native-extended-gdbserver testing. Take a look >>>>> at the following build: >>>>> >>>>> <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429> >>>>> >>>>> You can see that gdb.base/gdbinit-history.exp failed: >>>>> >>>>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands >>>>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands >>>>> >>>>> The gdb.log is here: >>>>> >>>>> <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3> >>>>> >>>>> I haven't really investigated to determine what's going on here, but let >>>>> me know if you need any help with this. >>>> >>>> Hi Sergio, >>>> >>>> Have you seen these spurious FAILs pop up recently? I wonder if the >>>> fixes to SIGTERM handling I made a few weeks ago may have fixed this >>>> as well. >>> >>> Hey Patrick, >>> >>> Yeah, I still see those FAIL's from time to time: >>> >>> <http://gdb-build.sergiodj.net/builders/Fedora-ppc64le-native-extended-gdbserver-m64/builds/1593> >>> <http://gdb-build.sergiodj.net/builders/Fedora-ppc64be-native-extended-gdbserver-m64/builds/1674> >>> >>> I may be wrong, but I noticed that the FAIL's happen only on the PPC >>> buildslaves. >>> >> >> Here's a theory: >> >> (gdb) server show commands >> 1 set height 0 >> 2 set width 0 >> 3 target extended-remote localhost:2348 >> 4 print 1 >> 5 monitor exit >> 6 set height 0 >> 7 set width 0 >> 8 target extended-remote localhost:2349 >> (gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands >> Remote debugging from host 127.0.0.1 >> monitor exit >> (gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora >> -ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu >> ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off >> GNU gdb (GDB) 7.10.50.20150812-cvs >> ... >> (gdb) set height 0 >> (gdb) set width 0 >> (gdb) spawn ../gdbserver/gdbserver --once --multi :2350 >> Listening on port 2350 >> target extended-remote localhost:2350 >> Remote debugging using localhost:2350 >> (gdb) server show commands >> 1 set height 0 >> 2 set width 0 >> 3 target extended-remote localhost:2350 >> (gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands >> >> Observations: >> >> #1 - In the FAILing case, the history only shows commands invoked in that >> gdb session. That means that either that GDB invocation failed to >> load the history file, or the previous invocation failed to write it. >> >> #2 - dejagnu's standard_close, what is ultimately used to exit gdb, >> does this: >> >> 2.1 - send gdb a SIGINT. This might help by interrupting any command >> that gdb might be stuck in, getting it back to >> the event loop processing input. >> >> 2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout. >> >> 2.3 - if gdb doesn't exit after 5 seconds, kills it >> with SIGTERM. If it still doesn't exit after another 5 >> seconds, dejagnu kill it with SIGKILL. >> >> Unless you're running dejagnu git master, 2.1 and 2.2 happen >> in parallel. >> >> I'm assuming that 2.3 is not happening. It's 2.2 that normally >> causes gdb to exit. See event-top.c:stdin_event_handler. >> >> - Because, due to 2.2, stdin/stdout are already closed when gdb is >> running the teardown code (quit_force, what saves history), any >> warning/output that gdb might output goes nowhere, it won't >> appear in gdb.log. >> >> #3 - If you'll recall, not long ago we made the code that appends >> the commands to the history file roughly: >> >> - mv old-history-file old-history-file.tmp >> - append to old-history-file.tmp >> - mv old-history-file.tmp old-history-file >> >> That is top.c:gdb_safe_append_history. >> >> AFAICS, SIGINT is not set to SA_RESTART. Now imagine what happens >> if the SIGINT lands exactly just when gdb is about to call >> rename here, in gdb_safe_append_history: >> >> ret = rename (local_history_filename, history_filename); >> saved_errno = errno; >> if (ret < 0 && saved_errno != EEXIST) >> warning (_("Could not rename %s to %s: %s"), >> local_history_filename, history_filename, >> safe_strerror (saved_errno)); >> } >> >> and rename fails with EINTR. Then the following gdb invocation >> would not find any previous history file to load, which would >> perfectly explain that FAIL. >> >> If we still have the build/test directory of that test run, >> if the theory is right, I think we'd find a >> leftover gdbinit-history.gdb_history-gdb-$PID~ file. >> >> What doesn't look right in the theory is that the rename man >> does not document rename as failing with EINTR... But >> maybe it does? > > Interesting theory. I had assumed that the problem lies in the > testsuite driver, not in the GDB source code, because the problem only > manifests itself under native-extended-gdbserver. If I understand > correctly, your explanation is also plausible under native or > native-gdbserver, yet the problem was not observed to ever manifest > itself under these targets. > > Also, if rename fails with EINTR, wouldn't a "Could not rename %s to > %s: %s" warning be emitted in the gdb output log? Ah, you already addressed this: a warning is not emitted because stdout is closed.. But because the problem only occurs under extended-gdbserver, I'm inclined to think the issue is with the testsuite driver, in particular with the gdb_exit implementation in lib/gdbserver-support.exp. One potential issue I notice in this proc is that when we send "monitor exit" to GDB, we don't necessarily wait for the command to finish (i.e. for the gdb prompt to get printed). As soon as the server is observed to get killed, we continue with exiting. Dunno if that's substantial..
On 08/17/2015 02:28 PM, Patrick Palka wrote: > Ah, you already addressed this: a warning is not emitted because > stdout is closed.. Yeah. > But because the problem only occurs under extended-gdbserver, I'm > inclined to think the issue is with the testsuite driver, in > particular with the gdb_exit implementation in > lib/gdbserver-support.exp. One potential issue I notice in this proc > is that when we send "monitor exit" to GDB, we don't necessarily wait > for the command to finish (i.e. for the gdb prompt to get printed). > As soon as the server is observed to get killed, we continue with > exiting. Dunno if that's substantial.. That's very plausible, at least. Maybe that prompt got stuck in the expect buffer, and it confused something else later on? Another theory related to that could be that the new GDB started just while the previous gdb is saving history and has just momentarily renamed the history file to gdbinit-history.gdb_history-gdb-$PID~. But AFAICS, that shouldn't happen because that gdb_exit calls gdbserver_orig_gdb_exit at the end, which only returns after the previous gdb exits... Did anyone ever manage to reproduce this? One thing I'd try is making dejagnu's local_exec (close_wait_program in master) print the result of the "wait -i". That will show whether gdb exited due to a normal exit, or whether it was killed by SIGTERM or SIGKILL. And then I'd try hacking gdb_safe_append_history to output debug logs to a file instead of stdout (e.g., /tmp/gdb-log). Another would be to add a "show history filename" to the test, to make sure that the gdb that fails to load the previous history actually tried to read the file we expect it to be reading. Also, I think it's time to try to get all the buildslaves to use dejagnu master, to pick up http://lists.gnu.org/archive/html/dejagnu/2015-07/msg00005.html. Who knows, maybe that race/rogue kill could also explain this problem. The x86_64 Fedora slaves have been running with that for a while, and we no longer see attach-many-short-lived-threads.exp failures there, and we keep seeing them on the other slaves (which don't have that fix). Thanks, Pedro Alves
On Mon, Aug 17, 2015 at 10:02 AM, Pedro Alves <palves@redhat.com> wrote: > On 08/17/2015 02:28 PM, Patrick Palka wrote: > >> Ah, you already addressed this: a warning is not emitted because >> stdout is closed.. > > Yeah. > >> But because the problem only occurs under extended-gdbserver, I'm >> inclined to think the issue is with the testsuite driver, in >> particular with the gdb_exit implementation in >> lib/gdbserver-support.exp. One potential issue I notice in this proc >> is that when we send "monitor exit" to GDB, we don't necessarily wait >> for the command to finish (i.e. for the gdb prompt to get printed). >> As soon as the server is observed to get killed, we continue with >> exiting. Dunno if that's substantial.. > > That's very plausible, at least. > > Maybe that prompt got stuck in the expect buffer, and it confused > something else later on? > > Another theory related to that could be that the new GDB started just > while the previous gdb is saving history and has just momentarily > renamed the history file to gdbinit-history.gdb_history-gdb-$PID~. > But AFAICS, that shouldn't happen because that gdb_exit calls > gdbserver_orig_gdb_exit at the end, which only returns after > the previous gdb exits... > > Did anyone ever manage to reproduce this? Yes, I can reproduce it on x86_64 Debian with dejagnu version 1.5.3. But it's pretty rare. I just do while make check RUNTESTFLAGS="gdbinit-history.exp --target_board=native-extended-gdbserver --verbose --verbose"; do :; done and wait for a while (20 minutes or so). But I just locally installed dejagnu from git, and have had the above command running for about 30 minutes (after confirming that my local copy, not system copy, of dejagnu is being used by the testsuite), and fortunately, I could no longer reproduce these FAILs. So it seems that your changes to dejagnu fixed this spurious FAIL as well. I'll let it keep spinning though, just in case. > > One thing I'd try is making dejagnu's local_exec (close_wait_program in master) > print the result of the "wait -i". That will show whether gdb exited > due to a normal exit, or whether it was killed by SIGTERM or SIGKILL. > And then I'd try hacking gdb_safe_append_history to output debug logs > to a file instead of stdout (e.g., /tmp/gdb-log). > > Another would be to add a "show history filename" to the test, to make sure > that the gdb that fails to load the previous history actually tried to > read the file we expect it to be reading. Ok, I'll keep these approaches in mind but if I manage to trip over it even with dejagnu from git.
On Mon, Aug 17, 2015 at 4:10 PM, Patrick Palka <patrick@parcs.ath.cx> wrote: > On Mon, Aug 17, 2015 at 10:02 AM, Pedro Alves <palves@redhat.com> wrote: >> On 08/17/2015 02:28 PM, Patrick Palka wrote: >> >>> Ah, you already addressed this: a warning is not emitted because >>> stdout is closed.. >> >> Yeah. >> >>> But because the problem only occurs under extended-gdbserver, I'm >>> inclined to think the issue is with the testsuite driver, in >>> particular with the gdb_exit implementation in >>> lib/gdbserver-support.exp. One potential issue I notice in this proc >>> is that when we send "monitor exit" to GDB, we don't necessarily wait >>> for the command to finish (i.e. for the gdb prompt to get printed). >>> As soon as the server is observed to get killed, we continue with >>> exiting. Dunno if that's substantial.. >> >> That's very plausible, at least. >> >> Maybe that prompt got stuck in the expect buffer, and it confused >> something else later on? >> >> Another theory related to that could be that the new GDB started just >> while the previous gdb is saving history and has just momentarily >> renamed the history file to gdbinit-history.gdb_history-gdb-$PID~. >> But AFAICS, that shouldn't happen because that gdb_exit calls >> gdbserver_orig_gdb_exit at the end, which only returns after >> the previous gdb exits... >> >> Did anyone ever manage to reproduce this? > > Yes, I can reproduce it on x86_64 Debian with dejagnu version 1.5.3. > But it's pretty rare. I just do > > while make check RUNTESTFLAGS="gdbinit-history.exp > --target_board=native-extended-gdbserver --verbose --verbose"; do :; > done > > and wait for a while (20 minutes or so). > > But I just locally installed dejagnu from git, and have had the above > command running for about 30 minutes (after confirming that my local > copy, not system copy, of dejagnu is being used by the testsuite), and > fortunately, I could no longer reproduce these FAILs. So it seems > that your changes to dejagnu fixed this spurious FAIL as well. I'll > let it keep spinning though, just in case. > > >> >> One thing I'd try is making dejagnu's local_exec (close_wait_program in master) >> print the result of the "wait -i". That will show whether gdb exited >> due to a normal exit, or whether it was killed by SIGTERM or SIGKILL. >> And then I'd try hacking gdb_safe_append_history to output debug logs >> to a file instead of stdout (e.g., /tmp/gdb-log). >> >> Another would be to add a "show history filename" to the test, to make sure >> that the gdb that fails to load the previous history actually tried to >> read the file we expect it to be reading. > > Ok, I'll keep these approaches in mind but if I manage to trip over it > even with dejagnu from git. Sorry, botched that last sentence there... Anyway, with dejagnu from git I could no longer reproduce these FAILs after 3 straight hours of repeatedly running the test.
On 08/18/2015 12:29 AM, Patrick Palka wrote: > On Mon, Aug 17, 2015 at 4:10 PM, Patrick Palka <patrick@parcs.ath.cx> wro >> But I just locally installed dejagnu from git, and have had the above >> command running for about 30 minutes (after confirming that my local >> copy, not system copy, of dejagnu is being used by the testsuite), and >> fortunately, I could no longer reproduce these FAILs. So it seems >> that your changes to dejagnu fixed this spurious FAIL as well. I'll >> let it keep spinning though, just in case. > Anyway, with dejagnu from git I could no longer reproduce these FAILs > after 3 straight hours of repeatedly running the test. (almost a month passed somehow, but ...) Hurray! The DejaGnu patch has been backported to Fedora 21-23 meanwhile, so hopefully these issues will start to fade away. Someone might want to contact the debian / ubuntu dejagnu maintainers to do the same. Thanks, Pedro Alves
On Thursday, September 10 2015, Pedro Alves wrote: > On 08/18/2015 12:29 AM, Patrick Palka wrote: >> On Mon, Aug 17, 2015 at 4:10 PM, Patrick Palka <patrick@parcs.ath.cx> wro > >>> But I just locally installed dejagnu from git, and have had the above >>> command running for about 30 minutes (after confirming that my local >>> copy, not system copy, of dejagnu is being used by the testsuite), and >>> fortunately, I could no longer reproduce these FAILs. So it seems >>> that your changes to dejagnu fixed this spurious FAIL as well. I'll >>> let it keep spinning though, just in case. > >> Anyway, with dejagnu from git I could no longer reproduce these FAILs >> after 3 straight hours of repeatedly running the test. > > (almost a month passed somehow, but ...) Hurray! Cool. The Fedora buildslaves will update automatically as soon as this version leaves updates-testing. > The DejaGnu patch has been backported to Fedora 21-23 meanwhile, so hopefully > these issues will start to fade away. Someone might want to contact the > debian / ubuntu dejagnu maintainers to do the same. I can do that. Thanks,
diff --git a/gdb/testsuite/gdb.base/gdbinit-history.exp b/gdb/testsuite/gdb.base/gdbinit-history.exp index 8adfd68..83abd35 100644 --- a/gdb/testsuite/gdb.base/gdbinit-history.exp +++ b/gdb/testsuite/gdb.base/gdbinit-history.exp @@ -57,5 +57,59 @@ proc test_gdbinit_history_setting { home size } { array set env [array get old_env] } +# Check that the history file does not get truncated to zero when a gdbinit +# file sets the history size to unlimited. + +proc test_no_truncation_of_unlimited_history_file { } { + global env + global INTERNAL_GDBFLAGS + + array set old_env [array get env] + + # The HISTSIZE environment variable takes precedence over whatever + # history size is set in .gdbinit. Make sure the former is not + # set. + unset -nocomplain env(HISTSIZE) + + set saved_internal_gdbflags $INTERNAL_GDBFLAGS + + set temp_gdbinit [standard_output_file "gdbinit-history.gdbinit"] + set temp_histfile [standard_output_file "gdbinit-history.gdb_history"] + file delete $temp_gdbinit + file delete $temp_histfile + + set fd [open $temp_gdbinit "w"] + puts $fd "set history size unlimited\n" + puts $fd "set history filename $temp_histfile\n" + puts $fd "set history save\n" + close $fd + + append INTERNAL_GDBFLAGS " -x $temp_gdbinit" + + # We have to start then exit GDB twice: the first time to test the creation + # of the initial history file, and the second time to test appending to it. + # In either case the initial "print 1" command should persist through the + # history file. + with_test_prefix "truncation" { + gdb_exit + gdb_start + gdb_test "print 1" + + gdb_exit + gdb_start + gdb_test "server show commands" " . print 1.*" + + gdb_exit + gdb_start + gdb_test "server show commands" " . print 1.*" + } + + set INTERNAL_GDBFLAGS $saved_internal_gdbflags + + array set env [array get old_env] +} + test_gdbinit_history_setting "gdbinit-history/unlimited" "unlimited" test_gdbinit_history_setting "gdbinit-history/zero" "0" + +test_no_truncation_of_unlimited_history_file diff --git a/gdb/top.c b/gdb/top.c index 837bf16..f5a0819 100644 --- a/gdb/top.c +++ b/gdb/top.c @@ -981,7 +981,8 @@ gdb_safe_append_history (void) else { append_history (command_count, local_history_filename); - history_truncate_file (local_history_filename, history_max_entries); + if (history_is_stifled ()) + history_truncate_file (local_history_filename, history_max_entries); } ret = rename (local_history_filename, history_filename);