Message ID | 6cb39b064b3e1c9ed57964b29fd980f3a6d30a25.1685956034.git.aburgess@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B372E3854E58 for <patchwork@sourceware.org>; Mon, 5 Jun 2023 09:12:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B372E3854E58 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1685956360; bh=+V7K0c+oiorWaplcB6ZpLCL816S2FPdh6WUbieIwUoA=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=L9GcXpk+M5+HujnH0k35m4948w1fMKoTrDU+zBszmuQ4RhH2gPv/xAyb3rSgFXz6M nxRpVs2iiIMXodlTzyVQCmNjiQqxYI/vu7/ex/XcUHzKgqDHq7A5gD1/z+CTZAhiyM wKNIqxICruW9GU0gU+RVrNFBeN+FmTioCA/dri+o= X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id C355C38558A7 for <gdb-patches@sourceware.org>; Mon, 5 Jun 2023 09:11:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C355C38558A7 Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-228-hqYiB7m9OnWu87UMIYJ9Eg-1; Mon, 05 Jun 2023 05:11:17 -0400 X-MC-Unique: hqYiB7m9OnWu87UMIYJ9Eg-1 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-3f6045853c1so22262875e9.3 for <gdb-patches@sourceware.org>; Mon, 05 Jun 2023 02:11:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685956276; x=1688548276; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+V7K0c+oiorWaplcB6ZpLCL816S2FPdh6WUbieIwUoA=; b=IcjuGTAE3gdw1htYNdLgBU9ZhhDISeHbHmB1BPrx4gWB7tRAYjK4h7aUj4ZT/PVxk5 LRbj6nBpQ4snW/8iE7phauxlNbAIw/03o8EnJZqDpVqIc2ikL3rkRAIOidrSuf/B+xBI 4U2wI/Odg6ksSLp13ZFj6u9RWTOc43RogFqXXN/Qz9yRKn8jIH6C3rcbCxv3RlGKlu7I 1zGrTDQ2HAVLaLN1wdxMgoYjqTvn5DtZii+zz/fSopBGTIVHbj3yIcGvsxsiTBUGgwTB 0oNZabSfjW7tHhQTHrgVrgdF8E7ZO2my5goknb+O/C3wSTCP2jQ0IeZ0dHmSDusmuywH +osg== X-Gm-Message-State: AC+VfDyoDbSEpnsPj15ygc0VUXpY87T6mKWSpCaN1+tPUzawxzgi1jo6 2/38EncNbPJmRh12Z1NszCqPmoCjEfnwtat1dkHIQIxbiVYx27ZVcsQJL5JbYALS5XUxVQYqhoM l6fpqSD2HeSQcqa0Gr784AetCj02xO9tUANde92cT3gdmm7EqDY4lUslJ8RkfpfpMEUq1tN+vNJ 5VMyH2nw== X-Received: by 2002:a05:600c:c9:b0:3f6:3497:aaaf with SMTP id u9-20020a05600c00c900b003f63497aaafmr6234848wmm.9.1685956275825; Mon, 05 Jun 2023 02:11:15 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4pDhpQ7c4D0q85vW56Fd3ZDG6SjrWexP2fg+bGsf+UHMUr+2mVifQ/3Rne4a6//qxh655fEA== X-Received: by 2002:a05:600c:c9:b0:3f6:3497:aaaf with SMTP id u9-20020a05600c00c900b003f63497aaafmr6234829wmm.9.1685956275405; Mon, 05 Jun 2023 02:11:15 -0700 (PDT) Received: from localhost (11.72.115.87.dyn.plus.net. [87.115.72.11]) by smtp.gmail.com with ESMTPSA id u4-20020a7bc044000000b003f70a7b4537sm13664342wmc.36.2023.06.05.02.11.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Jun 2023 02:11:14 -0700 (PDT) To: gdb-patches@sourceware.org Cc: Andrew Burgess <aburgess@redhat.com> Subject: [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid Date: Mon, 5 Jun 2023 10:11:08 +0100 Message-Id: <6cb39b064b3e1c9ed57964b29fd980f3a6d30a25.1685956034.git.aburgess@redhat.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <cover.1685956034.git.aburgess@redhat.com> References: <cover.1685956034.git.aburgess@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=subscribe> From: Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> Reply-To: Andrew Burgess <aburgess@redhat.com> Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
Improve vmcore loading
|
|
Commit Message
Andrew Burgess
June 5, 2023, 9:11 a.m. UTC
This patch contains a test for this commit: commit c820c52a914cc9d7c63cb41ad396f4ddffff2196 Date: Fri Aug 6 19:45:58 2010 +0000 * thread.c (add_thread_silent): Use null_ptid instead of minus_one_ptid while getting rid of stale inferior_ptid. This is another test that has been carried in the Fedora GDB tree for some time, and I thought that it would be worth merging to master. I don't believe there is any test like this currently in the testsuite. The original issue was reported in this thread: https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/ The problem was that when GDB was used to open a vmcore (core file) image generated by the Linux kernel GDB would (sometimes) crash with an assertion failure: thread.c:884: internal-error: switch_to_thread: Assertion `inf != NULL' failed. To understand what's going on we need some background; a vmcore file represents each processor core in the same way that a standard application core file represents threads. Thus, we might say, a vmcore file represents cores as threads. When writing a vmcore file, the kernel will store the pid of the process currently running on that core as the thread's lwpid. However, if a core is idle, with no process currently running on it, then the lwpid for that thread is stored as 0 in the vmcore file. If multiple cores are idle then multiple threads will have a lwpid of 0. Back in 2010, the original issue reported tried to change the kernel's behaviour in this thread: https://lkml.org/lkml/2010/8/3/75 This change was rejected by the kernel team, the current behaviour (lwpid of 0) was considered correct. I've checked the source of a recent kernel. The code mentioned in the lkml.org posting has moved, it's now in the function crash_save_cpu in the file kernel/kexec_core.c, but the general behaviour is unchanged, an idle core will have an lwpid of 0, so I think GDB still needs to be able to handle this case. When GDB loads a vmcore file (which is handled just like any other core file) the sections are processed in core_open to generate the threads for the core file. The processing is done by calling add_to_thread_list, a function which looks for sections named .reg/NN where NN is the lwpid of the thread, GDB then builds a ptid_t for the new thread and calls add_thread. Remember, in our case the lwpid is 0. Now for the first thread this is fine, if a little weird, 0 isn't usually a valid lwpid, but that's OK, GDB creates a thread with lwpid of 0 and carries on. When we find the next thread (core) with lwpid of 0, we attempt to create another thread with an lwpid of 0. This of course clashes with the previously created thread, they have the same ptid_t, so GDB tries to delete the first thread. And it was within this thread delete code that we triggered a bug which would then cause GDB to assert -- when deleting we tried to switch to a thread with minus_one_ptid, this resulted in a call to find_inferior_pid (passing in minus_one_ptid's pid, which is -1), the find_inferior_pid call fails and returns NULL, which then triggered an assert in switch_to_thread. The actual details of the why the assert triggered are really not important. What's important (I think) is that a vmcore file might have this interesting lwpid of 0 characteristic, which isn't something we see in "normal" application core files, and it is this that I think we should be testing. Now, you might be thinking: isn't deleting the first thread the wrong thing to do? If the vmcore file has two threads that represent two cores, and both have an lwpid of 0 (indicating both cores are idle), then surely GDB should still represent this as two threads? You're not wrong. This was mentioned by Pedro in the original GDB mailing list thread here: https://inbox.sourceware.org/gdb-patches/201008061057.03037.pedro@codesourcery.com/ This is indeed a problem, and this problem is still present in GDB today. I plan to try and address this in a later commit, however, this first commit is about getting a test in place to confirm that GDB at a minimum doesn't crash when loading such a vmcore file. And so, finally, what's in this commit? This commit contains a new test. The test doesn't actually contain a vmcore file. Instead I've created a standard application core file that contains two threads, and then manually edited the core file to set the lwpid of each thread to 0. To further reduce the size of the core file (as it will be stored in git), I've zeroed all of the LOAD-able segments in the core file. This test really doesn't care about that part of the core file, we only really care about loading the register's, this is enough to confirm that the GDB doesn't crash. Obviously as the core file is pre-generated, this test is architecture specific. There are already a few tests in gdb.arch/ that include pre-generate core files. Just as those existing tests do, I've compressed the core file with bzip2, which reduces it to just 750 bytes. I have structured the test so that if/when this patch is merged I can add some additional core files for other architectures, however, these are not included in this commit. The test simply expands the core file, and then loads it into GDB. One interesting thing to note is that GDB reports the core file loading like this: (gdb) core-file ./gdb/testsuite/outputs/gdb.arch/core-file-pid0/core-file-pid0.x86-64.core [New process 1] [New process 1] Failed to read a valid object file image from memory. Core was generated by `./segv-mt'. Program terminated with signal SIGSEGV, Segmentation fault. The current thread has terminated (gdb) There's two interesting things here: first, the repeated "New process 1" message. This is caused because linux_core_pid_to_str reports anything with an lwpid of 0 as a process, rather than an LWP. And second, the "The current thread has terminated" message. This is because the first thread in the core file is the current thread, but when GDB loads the second thread (which also has lwpid 0) this causes the first thread to be deleted, as a result GDB thinks that the current (first) thread has terminated. As I said previously, both of these problems are a result of the lwpid 0 aliasing, which is not being fixed in this commit -- this commit is just confirming that GDB doesn't crash when loading this core file. --- gdb/testsuite/gdb.arch/core-file-pid0.exp | 63 ++++++++++++++++++ .../gdb.arch/core-file-pid0.x86-64.core.bz2 | Bin 0 -> 750 bytes 2 files changed, 63 insertions(+) create mode 100644 gdb/testsuite/gdb.arch/core-file-pid0.exp create mode 100644 gdb/testsuite/gdb.arch/core-file-pid0.x86-64.core.bz2
Comments
On Mon, 5 Jun 2023 10:11:08 +0100 Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> wrote: > This patch contains a test for this commit: > > commit c820c52a914cc9d7c63cb41ad396f4ddffff2196 > Date: Fri Aug 6 19:45:58 2010 +0000 > > * thread.c (add_thread_silent): Use null_ptid instead of > minus_one_ptid while getting rid of stale inferior_ptid. > > This is another test that has been carried in the Fedora GDB tree for > some time, and I thought that it would be worth merging to master. I > don't believe there is any test like this currently in the testsuite. > > The original issue was reported in this thread: > > https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/ > > The problem was that when GDB was used to open a vmcore (core file) > image generated by the Linux kernel GDB would (sometimes) crash with > an assertion failure: > > thread.c:884: internal-error: switch_to_thread: Assertion `inf != NULL' failed. > > To understand what's going on we need some background; a vmcore file > represents each processor core in the same way that a standard > application core file represents threads. Thus, we might say, a > vmcore file represents cores as threads. > > When writing a vmcore file, the kernel will store the pid of the > process currently running on that core as the thread's lwpid. > > However, if a core is idle, with no process currently running on it, > then the lwpid for that thread is stored as 0 in the vmcore file. If > multiple cores are idle then multiple threads will have a lwpid of 0. > > Back in 2010, the original issue reported tried to change the kernel's > behaviour in this thread: > > https://lkml.org/lkml/2010/8/3/75 > > This change was rejected by the kernel team, the current > behaviour (lwpid of 0) was considered correct. I've checked the > source of a recent kernel. The code mentioned in the lkml.org posting > has moved, it's now in the function crash_save_cpu in the file > kernel/kexec_core.c, but the general behaviour is unchanged, an idle > core will have an lwpid of 0, so I think GDB still needs to be able to > handle this case. > > When GDB loads a vmcore file (which is handled just like any other > core file) the sections are processed in core_open to generate the > threads for the core file. The processing is done by calling > add_to_thread_list, a function which looks for sections named .reg/NN > where NN is the lwpid of the thread, GDB then builds a ptid_t for the > new thread and calls add_thread. > > Remember, in our case the lwpid is 0. Now for the first thread this > is fine, if a little weird, 0 isn't usually a valid lwpid, but that's > OK, GDB creates a thread with lwpid of 0 and carries on. > > When we find the next thread (core) with lwpid of 0, we attempt to > create another thread with an lwpid of 0. This of course clashes with > the previously created thread, they have the same ptid_t, so GDB tries > to delete the first thread. > > And it was within this thread delete code that we triggered a bug > which would then cause GDB to assert -- when deleting we tried to > switch to a thread with minus_one_ptid, this resulted in a call to > find_inferior_pid (passing in minus_one_ptid's pid, which is -1), the > find_inferior_pid call fails and returns NULL, which then triggered an > assert in switch_to_thread. > > The actual details of the why the assert triggered are really not > important. What's important (I think) is that a vmcore file might > have this interesting lwpid of 0 characteristic, which isn't something > we see in "normal" application core files, and it is this that I think > we should be testing. > > Now, you might be thinking: isn't deleting the first thread the wrong > thing to do? If the vmcore file has two threads that represent two > cores, and both have an lwpid of 0 (indicating both cores are idle), > then surely GDB should still represent this as two threads? You're > not wrong. This was mentioned by Pedro in the original GDB mailing > list thread here: > > https://inbox.sourceware.org/gdb-patches/201008061057.03037.pedro@codesourcery.com/ > > This is indeed a problem, and this problem is still present in GDB > today. I plan to try and address this in a later commit, however, > this first commit is about getting a test in place to confirm that GDB > at a minimum doesn't crash when loading such a vmcore file. > > And so, finally, what's in this commit? > > This commit contains a new test. The test doesn't actually contain a > vmcore file. Instead I've created a standard application core file > that contains two threads, and then manually edited the core file to > set the lwpid of each thread to 0. > > To further reduce the size of the core file (as it will be stored in > git), I've zeroed all of the LOAD-able segments in the core file. > This test really doesn't care about that part of the core file, we > only really care about loading the register's, this is enough to > confirm that the GDB doesn't crash. > > Obviously as the core file is pre-generated, this test is architecture > specific. There are already a few tests in gdb.arch/ that include > pre-generate core files. Just as those existing tests do, I've > compressed the core file with bzip2, which reduces it to just 750 > bytes. I have structured the test so that if/when this patch is > merged I can add some additional core files for other architectures, > however, these are not included in this commit. > > The test simply expands the core file, and then loads it into GDB. > One interesting thing to note is that GDB reports the core file > loading like this: > > (gdb) core-file ./gdb/testsuite/outputs/gdb.arch/core-file-pid0/core-file-pid0.x86-64.core > [New process 1] > [New process 1] > Failed to read a valid object file image from memory. > Core was generated by `./segv-mt'. > Program terminated with signal SIGSEGV, Segmentation fault. > The current thread has terminated > (gdb) > > There's two interesting things here: first, the repeated "New process > 1" message. This is caused because linux_core_pid_to_str reports > anything with an lwpid of 0 as a process, rather than an LWP. And > second, the "The current thread has terminated" message. This is > because the first thread in the core file is the current thread, but > when GDB loads the second thread (which also has lwpid 0) this causes > the first thread to be deleted, as a result GDB thinks that the > current (first) thread has terminated. > > As I said previously, both of these problems are a result of the lwpid > 0 aliasing, which is not being fixed in this commit -- this commit is > just confirming that GDB doesn't crash when loading this core file. Great explanation! :) Approved-by: Kevin Buettner <kevinb@redhat.com>
Hi Andrew, I'm skimming the thread to catch up, and noticed this: On 2023-06-05 10:11, Andrew Burgess via Gdb-patches wrote: > +++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp > @@ -0,0 +1,63 @@ > +# This testcase is part of GDB, the GNU debugger. > +# > +# Copyright 2023 Free Software Foundation, Inc. > +# > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 2 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write to the Free Software > +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. > + Wrong license (GPLv2), and wrong header -- we haven't been using the snail mail FSF header in years.
Pedro Alves <pedro@palves.net> writes: > Hi Andrew, > > I'm skimming the thread to catch up, and noticed this: > > On 2023-06-05 10:11, Andrew Burgess via Gdb-patches wrote: >> +++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp >> @@ -0,0 +1,63 @@ >> +# This testcase is part of GDB, the GNU debugger. >> +# >> +# Copyright 2023 Free Software Foundation, Inc. >> +# >> +# This program is free software; you can redistribute it and/or modify >> +# it under the terms of the GNU General Public License as published by >> +# the Free Software Foundation; either version 2 of the License, or >> +# (at your option) any later version. >> +# >> +# This program is distributed in the hope that it will be useful, >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +# GNU General Public License for more details. >> +# >> +# You should have received a copy of the GNU General Public License >> +# along with this program; if not, write to the Free Software >> +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. >> + > > Wrong license (GPLv2), and wrong header -- we haven't been using the snail > mail FSF header in years. Ooops. Thanks for spotting this. I pushed the patch below to correct this mistake. Thanks, Andrew --- commit 7c632c2a696fb68e5575db1e2c934788a831e578 Author: Andrew Burgess <aburgess@redhat.com> Date: Fri Jul 7 10:51:53 2023 +0100 gdb/testsuite: fix license on recently added file The license header on a file I recently contributed was incorrect. The file was added in commit: commit 087969169836f802a09b1cd0502d2f22d7a8f7dc Date: Tue May 23 11:25:21 2023 +0100 gdb: handle core files with .reg/0 section names The problems were: - GPLv2 instead of GPLv3, - Use the FSF postal address rather than their URL. Nobody else has touched the file since I merged it, so I don't believe there are any problems with me changing the license, this commit does just that. diff --git a/gdb/testsuite/gdb.arch/core-file-pid0.exp b/gdb/testsuite/gdb.arch/core-file-pid0.exp index 6e91111b44b..56746cca567 100644 --- a/gdb/testsuite/gdb.arch/core-file-pid0.exp +++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp @@ -4,7 +4,7 @@ # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by -# the Free Software Foundation; either version 2 of the License, or +# the Free Software Foundation; either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, @@ -13,8 +13,7 @@ # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License -# along with this program; if not, write to the Free Software -# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +# along with this program. If not, see <http://www.gnu.org/licenses/>. # Some kernel core files have PID 0 (for the idle task), check that # GDB can handle such a core file.
Andrew Burgess <aburgess@redhat.com> writes: > This patch contains a test for this commit: > > commit c820c52a914cc9d7c63cb41ad396f4ddffff2196 > Date: Fri Aug 6 19:45:58 2010 +0000 > > * thread.c (add_thread_silent): Use null_ptid instead of > minus_one_ptid while getting rid of stale inferior_ptid. > > This is another test that has been carried in the Fedora GDB tree for > some time, and I thought that it would be worth merging to master. I > don't believe there is any test like this currently in the testsuite. > > The original issue was reported in this thread: > > https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/ > > The problem was that when GDB was used to open a vmcore (core file) > image generated by the Linux kernel GDB would (sometimes) crash with > an assertion failure: > > thread.c:884: internal-error: switch_to_thread: Assertion `inf != NULL' failed. > > To understand what's going on we need some background; a vmcore file > represents each processor core in the same way that a standard > application core file represents threads. Thus, we might say, a > vmcore file represents cores as threads. > > When writing a vmcore file, the kernel will store the pid of the > process currently running on that core as the thread's lwpid. > > However, if a core is idle, with no process currently running on it, > then the lwpid for that thread is stored as 0 in the vmcore file. If > multiple cores are idle then multiple threads will have a lwpid of 0. > > Back in 2010, the original issue reported tried to change the kernel's > behaviour in this thread: > > https://lkml.org/lkml/2010/8/3/75 > > This change was rejected by the kernel team, the current > behaviour (lwpid of 0) was considered correct. I've checked the > source of a recent kernel. The code mentioned in the lkml.org posting > has moved, it's now in the function crash_save_cpu in the file > kernel/kexec_core.c, but the general behaviour is unchanged, an idle > core will have an lwpid of 0, so I think GDB still needs to be able to > handle this case. > > When GDB loads a vmcore file (which is handled just like any other > core file) the sections are processed in core_open to generate the > threads for the core file. The processing is done by calling > add_to_thread_list, a function which looks for sections named .reg/NN > where NN is the lwpid of the thread, GDB then builds a ptid_t for the > new thread and calls add_thread. > > Remember, in our case the lwpid is 0. Now for the first thread this > is fine, if a little weird, 0 isn't usually a valid lwpid, but that's > OK, GDB creates a thread with lwpid of 0 and carries on. > > When we find the next thread (core) with lwpid of 0, we attempt to > create another thread with an lwpid of 0. This of course clashes with > the previously created thread, they have the same ptid_t, so GDB tries > to delete the first thread. > > And it was within this thread delete code that we triggered a bug > which would then cause GDB to assert -- when deleting we tried to > switch to a thread with minus_one_ptid, this resulted in a call to > find_inferior_pid (passing in minus_one_ptid's pid, which is -1), the > find_inferior_pid call fails and returns NULL, which then triggered an > assert in switch_to_thread. > > The actual details of the why the assert triggered are really not > important. What's important (I think) is that a vmcore file might > have this interesting lwpid of 0 characteristic, which isn't something > we see in "normal" application core files, and it is this that I think > we should be testing. > > Now, you might be thinking: isn't deleting the first thread the wrong > thing to do? If the vmcore file has two threads that represent two > cores, and both have an lwpid of 0 (indicating both cores are idle), > then surely GDB should still represent this as two threads? You're > not wrong. This was mentioned by Pedro in the original GDB mailing > list thread here: > > https://inbox.sourceware.org/gdb-patches/201008061057.03037.pedro@codesourcery.com/ > > This is indeed a problem, and this problem is still present in GDB > today. I plan to try and address this in a later commit, however, > this first commit is about getting a test in place to confirm that GDB > at a minimum doesn't crash when loading such a vmcore file. > > And so, finally, what's in this commit? > > This commit contains a new test. The test doesn't actually contain a > vmcore file. Instead I've created a standard application core file > that contains two threads, and then manually edited the core file to > set the lwpid of each thread to 0. > > To further reduce the size of the core file (as it will be stored in > git), I've zeroed all of the LOAD-able segments in the core file. > This test really doesn't care about that part of the core file, we > only really care about loading the register's, this is enough to > confirm that the GDB doesn't crash. > > Obviously as the core file is pre-generated, this test is architecture > specific. There are already a few tests in gdb.arch/ that include > pre-generate core files. Just as those existing tests do, I've > compressed the core file with bzip2, which reduces it to just 750 > bytes. I have structured the test so that if/when this patch is > merged I can add some additional core files for other architectures, > however, these are not included in this commit. > > The test simply expands the core file, and then loads it into GDB. > One interesting thing to note is that GDB reports the core file > loading like this: > > (gdb) core-file ./gdb/testsuite/outputs/gdb.arch/core-file-pid0/core-file-pid0.x86-64.core > [New process 1] > [New process 1] > Failed to read a valid object file image from memory. > Core was generated by `./segv-mt'. > Program terminated with signal SIGSEGV, Segmentation fault. > The current thread has terminated > (gdb) > > There's two interesting things here: first, the repeated "New process > 1" message. This is caused because linux_core_pid_to_str reports > anything with an lwpid of 0 as a process, rather than an LWP. And > second, the "The current thread has terminated" message. This is > because the first thread in the core file is the current thread, but > when GDB loads the second thread (which also has lwpid 0) this causes > the first thread to be deleted, as a result GDB thinks that the > current (first) thread has terminated. > > As I said previously, both of these problems are a result of the lwpid > 0 aliasing, which is not being fixed in this commit -- this commit is > just confirming that GDB doesn't crash when loading this core file. > --- > gdb/testsuite/gdb.arch/core-file-pid0.exp | 63 ++++++++++++++++++ > .../gdb.arch/core-file-pid0.x86-64.core.bz2 | Bin 0 -> 750 bytes > 2 files changed, 63 insertions(+) > create mode 100644 gdb/testsuite/gdb.arch/core-file-pid0.exp > create mode 100644 gdb/testsuite/gdb.arch/core-file-pid0.x86-64.core.bz2 > > diff --git a/gdb/testsuite/gdb.arch/core-file-pid0.exp b/gdb/testsuite/gdb.arch/core-file-pid0.exp > new file mode 100644 > index 00000000000..b960dfe095b > --- /dev/null > +++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp > @@ -0,0 +1,63 @@ > +# This testcase is part of GDB, the GNU debugger. > +# > +# Copyright 2023 Free Software Foundation, Inc. > +# > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 2 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write to the Free Software > +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. > + > +# Some kernel core files have PID 0 (for the idle task), check that > +# GDB can handle such a core file. > + > +standard_testfile > + > +# Set CF_NAME, the name of the compressed core file within the source > +# tree, and CF_SIZE, the size (in bytes) of the uncompressed core > +# file. > +if {[istarget "x86_64-*-linux*"]} { > + set cf_name ${testfile}.x86-64.core.bz2 > + set cf_size 8757248 > +} else { > + unsupported "no pre-generated core file for this target" > +} It was pointed out to me that after reporting 'unsupported', there should be a return. Without the return we end up seeing TCL errors because the cf_name variable is not defined. Fixed with the patch below, which I have gone ahead and pushed. Thanks, Andrew --- commit 44c8334f4af5b9895d196077f23e20e15eff4c03 Author: Andrew Burgess <aburgess@redhat.com> Date: Mon Jul 10 12:05:21 2023 +0100 gdb/testsuite: return after reporting a test unsupported In this commit: commit 8bcead69665af3a9f9867cd34c3a1daf22120027 Date: Tue May 23 11:25:01 2023 +0100 gdb/testsuite: add test for core file with a 0 pid a new test gdb.arch/core-file-pid0.exp was added. This test includes a pre-generated core file for x86-64 and for other architectures the test reports 'unsupported'. However, after reporting 'unsupported' the test failed to perform an early return, so the test would then carry on and try to actually perform the test, which resulted in some TCL errors. Fix this by returning after reporting the test unsupported. diff --git a/gdb/testsuite/gdb.arch/core-file-pid0.exp b/gdb/testsuite/gdb.arch/core-file-pid0.exp index 56746cca567..46b8c6db5ed 100644 --- a/gdb/testsuite/gdb.arch/core-file-pid0.exp +++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp @@ -28,6 +28,7 @@ if {[istarget "x86_64-*-linux*"]} { set cf_size 8757248 } else { unsupported "no pre-generated core file for this target" + return -1 } # Decompress the core file.
diff --git a/gdb/testsuite/gdb.arch/core-file-pid0.exp b/gdb/testsuite/gdb.arch/core-file-pid0.exp new file mode 100644 index 00000000000..b960dfe095b --- /dev/null +++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp @@ -0,0 +1,63 @@ +# This testcase is part of GDB, the GNU debugger. +# +# Copyright 2023 Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +# Some kernel core files have PID 0 (for the idle task), check that +# GDB can handle such a core file. + +standard_testfile + +# Set CF_NAME, the name of the compressed core file within the source +# tree, and CF_SIZE, the size (in bytes) of the uncompressed core +# file. +if {[istarget "x86_64-*-linux*"]} { + set cf_name ${testfile}.x86-64.core.bz2 + set cf_size 8757248 +} else { + unsupported "no pre-generated core file for this target" +} + +# Decompress the core file. +set corebz2file ${srcdir}/${subdir}/${cf_name} +set corefile [decompress_bz2 $corebz2file] +if { $corefile eq "" } { + untested "failed to bunzip2 the core file" + return -1 +} + +# Check the size of the decompressed core file. Just for sanity. +file stat ${corefile} corestat +if { $corestat(size) != ${cf_size} } { + untested "uncompressed core file is the wrong size" + return -1 +} + +# Copy over the corefile if we are remote testing. +set corefile [gdb_remote_download host $corefile] + +clean_restart + +# Load the core file. At one point GDB would assert, complaining that +# the inferior was nullptr. For now we see a message about the +# current thread having terminated, this is because GDB gets confused +# and incorrectly deletes what should be the current thread. +gdb_test "core-file ${corefile}" \ + [multi_line \ + "Core was generated by \[^\r\n\]+\\." \ + "Program terminated with signal (?:11|SIGSEGV), Segmentation fault\\." \ + "The current thread has terminated"] \ + "check core file termination reason" diff --git a/gdb/testsuite/gdb.arch/core-file-pid0.x86-64.core.bz2 b/gdb/testsuite/gdb.arch/core-file-pid0.x86-64.core.bz2 new file mode 100644 index 0000000000000000000000000000000000000000..081a35250f1fbb9743aecc70723abc90ad2704e8 GIT binary patch literal 750 zcmV<K0ulW}T4*^jL0KkKS=gyx?Ew#FfB*mg`1I*p$U}Rr+C%1oU4hy`OaKNj#IVo+ z1U5!VNl(B6v_J@?B@ju7Q_1B=sro3|H9bu|L()AofM|MyAR0WOpn8o1Pf?&AX+V)A z3TZ!6)YC?$ntFg}8fl=@)B`{nGynmi>HzgLXaH~k4FC-QXaE4whyWS@0000Q0B`^e z01W_W007a502%-Q000^RIm5u!H4!vOi4u;~XXvv+_(4wTM27K9uvA#A$z8Re3k?Vx zg=D9N<JN>R?O2NX^{dvY9IV+DHS4>5kX7fxS%zD4RsCzh73;S*BC&GECiK?zCXv=Y z65o1Zd|ax6qtvPl(c_9KX5k85w|e}6l%`SES|O^nN<jri%s3*DRy|MfGVsPsWH1B( zkOPuQFDpPQ6A;q|h9C^<#7K$C^|I+HTB9IhY(hW@08}tVBSPDZLKK*gTVrAy2GbZK zVUW{c+i4h$Bp?KUF~g$(fG;!x;xQ!&h1Ob#1Z1?55D5TU=5uS4F3-*}vnUZUsA9n; zAkW);9BX<W{Gcf`;UTn=#(M#9c45p8ofEMjrHN6K7R%6rcRNqdUZ-MiEedv*7;`qn z_%Jq-&2_Ut^}toC5Qt-7)j-C#Zu{%xQ9TI6o+z~#X%QG_&9Z)!YnH<brn+QjJ0yF8 zz<{fd2=)r*P%f}w4*d$ovZsi}CXi%YU4O+bi8Z3F1Dw^e_|l&DF1G0B^Kr0F=1d?& z>NxR=)iL;!(m48t3>)nWv>L}odw_(LwYmfFnX^tXRS%I%0_gs2X0CMw)LAuUYge!* z*ho4d^b_t7!mr3Ox;W?}{mO+P*A-{+6zTs%^$`P2v5rXCxq<qEZyFlSwrFD9mhBBS zLbTCha|<kE)-W>ZV4i%UXhQ~m4Hr~y%;=R2;=vk&8NAlooiprS>1{0SArS-sL;&Pz gX|25!i~Ev+81)1P_+g=)e((HU$rRy2Lt>?Ww9hY4asU7T literal 0 HcmV?d00001 -- 2.25.4