From patchwork Thu Feb 11 00:28:04 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Don Breazeal X-Patchwork-Id: 10815 Received: (qmail 67230 invoked by alias); 11 Feb 2016 00:28:10 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 67221 invoked by uid 89); 11 Feb 2016 00:28:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=restarting, spawns, one's, stopping X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 11 Feb 2016 00:28:08 +0000 Received: from svr-orw-fem-04.mgc.mentorg.com ([147.34.97.41]) by relay1.mentorg.com with esmtp id 1aTf77-0001E5-Hf from Don_Breazeal@mentor.com ; Wed, 10 Feb 2016 16:28:05 -0800 Received: from build2-lucid-cs (147.34.91.1) by svr-orw-fem-04.mgc.mentorg.com (147.34.97.41) with Microsoft SMTP Server id 14.3.224.2; Wed, 10 Feb 2016 16:28:05 -0800 Received: by build2-lucid-cs (Postfix, from userid 1905) id DABAD3A01CA; Wed, 10 Feb 2016 16:28:04 -0800 (PST) From: Don Breazeal To: , Subject: [PATCH v2 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt Date: Wed, 10 Feb 2016 16:28:04 -0800 Message-ID: <1455150484-12600-1-git-send-email-donb@codesourcery.com> In-Reply-To: <56AFBB80.1010209@redhat.com> MIME-Version: 1.0 X-IsSubscribed: yes Hi Pedro, On 2/1/2016 11:38 AM, Pedro Alves wrote: > On 01/28/2016 12:48 AM, Don Breazeal wrote: >> This patch addresses "fork:Interrupted system call" (or wait:) failures >> in gdb.threads/forking-threads-plus-breakpoint.exp. >> >> The test program spawns ten threads, each of which do ten fork/waitpid >> sequences. The cause of the problem was that when one of the fork >> children exited before the corresponding fork parent could initiate its >> waitpid for that child, a SIGCHLD was delivered and interrupted a fork >> or waitpid in another thread. In fact, I think my diagnosis here was incorrect, or at least incorrect in some cases. I believe at least some of the interruptions are caused by SIGSTOP, sent by GDB when stopping all the threads. More below. >> >> The fix was to wrap the system calls in a loop to retry the call if >> it was interrupted, like: >> >> do >> { >> pid = fork (); >> } >> while (pid == -1 && errno == EINTR); >> >> Since this is a Linux-only test I figure it is OK to use errno and EINTR. >> >> Tested on Nios II Linux target with x86 Linux host. > > I'd prefer to avoid this if possible. These loops potentially hide > bugs like ERESTARTSYS escaping out of a syscall and mishandling of > signals. See bc9540e842eb5639ca59cb133adef211d252843c for example: > https://sourceware.org/ml/gdb-patches/2015-02/msg00654.html > > How about setting SIGCHLD to SIG_IGN, or making SIGCHLD be SA_RESTART? I spent a couple of days trying to find an alternate solution, but couldn't find one that was reliable. Here is a snapshot of what I tried: 1) SIG_IGN: results in an ECHILD from waitpid. The man page for waitpid says "This can happen for one's own child if the action for SIGCHLD is set to SIG_IGN." 2) SA_RESTART: While waitpid is listed as a system call that can be restarted by SA_RESTART, fork is not. Even if I leave the "EINTR loop" in place for fork, using SA_RESTART I still see an interrupted system call for waitpid. Possibly because the problem is SIGSTOP and not SIGCHLD. 3) pthread_sigblock: With this set for SIGCHLD in all the threads, I still saw an interrupted system call. You can't block SIGSTOP. 4) pthread_sigblock with sigwait: using pthread_sigblock on all the blockable signals with a signal thread that called sigwait for all the signals in a loop, the signal thread would see a bunch of SIGCHLDs, but there would eventually be an interrupted system call. 5) bsd_signal: this function is supposed to automatically restart blocking system calls. fork is not a blocking system call, but it doesn't help for waitpid either. I found this in the ptrace(2) man page: "Note that a suppressed signal still causes system calls to return prematurely. In this case, system calls will be restarted: the tracer will observe the tracee to reexecute the interrupted system call (or restart_syscall(2) system call for a few system calls which use a different mechanism for restarting) if the tracer uses PTRACE_SYSCALL. Even system calls (such as poll(2)) which are not restartable after signal are restarted after signal is suppressed; however, kernel bugs exist which cause some system calls to fail with EINTR even though no observable signal is injected to the tracee." The GDB manual mentions something similar about interrupted system calls. So, the bottom line is that I haven't changed the fix for the interrupted system calls, because I can't find anything that works as well as the original fix. Perhaps this test puts enough stress on the kernel that the kernel bugs mentioned above are exposed. One change I did make from the previous version was to increase the timeout to 90 seconds, which was necessary to get more reliable results on the Nios II target. Let me know what you think. Thanks! --Don --- This patch addresses "fork:Interrupted system call" (or wait:) failures in gdb.threads/forking-threads-plus-breakpoint.exp. The test program spawns ten threads, each of which do ten fork/waitpid sequences. The cause of the problem was that when one of the fork children exited before the corresponding fork parent could initiate its waitpid for that child, a SIGCHLD was delivered and interrupted a fork or waitpid in another thread. The fix was to wrap the system calls in a loop to retry the call if it was interrupted, like: do { pid = fork (); } while (pid == -1 && errno == EINTR); Since this is a Linux-only test I figure it is OK to use errno and EINTR. I tried a number of alternative fixes using SIG_IGN, SA_RESTART, pthread_sigblock, and bsd_signal, but none of these worked as well. Tested on Nios II Linux target with x86 Linux host. gdb/testsuite/ChangeLog: 2016-02-10 Don Breazeal * gdb.threads/forking-threads-plus-breakpoint.c (thread_forks): Retry fork and waitpid on interrupted system call errors. * gdb.threads/forking-threads-plus-breakpoint.exp: (do_test): Increase timeout to 90. --- .../gdb.threads/forking-threads-plus-breakpoint.c | 14 ++++++++++++-- .../gdb.threads/forking-threads-plus-breakpoint.exp | 3 +++ 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c index fc64d93..c169e18 100644 --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c @@ -22,6 +22,7 @@ #include #include #include +#include /* Number of threads. Each thread continuously spawns a fork and wait for it. If we have another thread continuously start a step over, @@ -49,14 +50,23 @@ thread_forks (void *arg) { pid_t pid; - pid = fork (); + do + { + pid = fork (); + } + while (pid == -1 && errno == EINTR); if (pid > 0) { int status; /* Parent. */ - pid = waitpid (pid, &status, 0); + do + { + pid = waitpid (pid, &status, 0); + } + while (pid == -1 && errno == EINTR); + if (pid == -1) { perror ("wait"); diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp index ff3ca9a..6889c2b 100644 --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp @@ -73,6 +73,9 @@ proc do_test { cond_bp_target detach_on_fork displaced } { global linenum global is_remote_target + global timeout + set timeout 90 + set saved_gdbflags $GDBFLAGS set GDBFLAGS [concat $GDBFLAGS " -ex \"set non-stop on\""] clean_restart $binfile