Message ID | 20200215165444.32653-2-tom@tromey.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 85265 invoked by alias); 15 Feb 2020 16:54:51 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 85178 invoked by uid 89); 15 Feb 2020 16:54:50 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-20.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=ages, kicking X-HELO: gateway22.websitewelcome.com Received: from gateway22.websitewelcome.com (HELO gateway22.websitewelcome.com) (192.185.47.168) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 15 Feb 2020 16:54:48 +0000 Received: from cm13.websitewelcome.com (cm13.websitewelcome.com [100.42.49.6]) by gateway22.websitewelcome.com (Postfix) with ESMTP id 1A4A347C9 for <gdb-patches@sourceware.org>; Sat, 15 Feb 2020 10:54:47 -0600 (CST) Received: from box5379.bluehost.com ([162.241.216.53]) by cmsmtp with SMTP id 30iNj7UQRRP4z30iNjYXzq; Sat, 15 Feb 2020 10:54:47 -0600 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com; s=default; h=References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Sender:Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=zje8y+f4KhesQK2DjvQ3RHx/yyxXTIkHgYv8w62bGpY=; b=PGA2PEnFoXrptTo9E0v1KxBlNq oxoydjhTczdyztMIfw9AZf9kab1JE1eS0eGlVsZA04xXvcBTnbA4VuE7ac61N60KcT9gJQB5Bv/I3 cO6noOQv8YnN5+b0O+Hk5Yp0r; Received: from 75-166-123-50.hlrn.qwest.net ([75.166.123.50]:45240 helo=bapiya.Home) by box5379.bluehost.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92) (envelope-from <tom@tromey.com>) id 1j30iM-000xcC-Qn; Sat, 15 Feb 2020 09:54:46 -0700 From: Tom Tromey <tom@tromey.com> To: gdb-patches@sourceware.org Cc: Tom Tromey <tom@tromey.com> Subject: [PATCH 01/14] Fix latent bug in dwarf2_find_containing_comp_unit Date: Sat, 15 Feb 2020 09:54:31 -0700 Message-Id: <20200215165444.32653-2-tom@tromey.com> In-Reply-To: <20200215165444.32653-1-tom@tromey.com> References: <20200215165444.32653-1-tom@tromey.com> |
Commit Message
Tom Tromey
Feb. 15, 2020, 4:54 p.m. UTC
dwarf2_find_containing_comp_unit has this in its binary search: if (mid_cu->is_dwz > offset_in_dwz || (mid_cu->is_dwz == offset_in_dwz && mid_cu->sect_off + mid_cu->length >= sect_off)) high = mid; The intent here is to determine whether SECT_OFF appears in or before MID_CU. I believe this has an off-by-one error, and that the check should use ">" rather than ">=". If the two side are equal, then SECT_OFF actually appears at the start of the next CU. I've had this patch kicking around for ages but I forget how I found the problem. gdb/ChangeLog 2020-02-15 Tom Tromey <tom@tromey.com> * dwarf2/read.c (dwarf2_find_containing_comp_unit): Use ">", not ">=", in binary search. --- gdb/ChangeLog | 5 +++++ gdb/dwarf2/read.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-)
Comments
On 2020-02-15 11:54 a.m., Tom Tromey wrote: > dwarf2_find_containing_comp_unit has this in its binary search: > > if (mid_cu->is_dwz > offset_in_dwz > || (mid_cu->is_dwz == offset_in_dwz > && mid_cu->sect_off + mid_cu->length >= sect_off)) > high = mid; > > The intent here is to determine whether SECT_OFF appears in or before > MID_CU. > > I believe this has an off-by-one error, and that the check should use > ">" rather than ">=". If the two side are equal, then SECT_OFF > actually appears at the start of the next CU. > > I've had this patch kicking around for ages but I forget how I found > the problem. > > gdb/ChangeLog > 2020-02-15 Tom Tromey <tom@tromey.com> > > * dwarf2/read.c (dwarf2_find_containing_comp_unit): Use ">", not > ">=", in binary search. > --- > gdb/ChangeLog | 5 +++++ > gdb/dwarf2/read.c | 2 +- > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c > index e74383e01dc..e373cc0af2c 100644 > --- a/gdb/dwarf2/read.c > +++ b/gdb/dwarf2/read.c > @@ -24171,7 +24171,7 @@ dwarf2_find_containing_comp_unit (sect_offset sect_off, > mid_cu = dwarf2_per_objfile->all_comp_units[mid]; > if (mid_cu->is_dwz > offset_in_dwz > || (mid_cu->is_dwz == offset_in_dwz > - && mid_cu->sect_off + mid_cu->length >= sect_off)) > + && mid_cu->sect_off + mid_cu->length > sect_off)) > high = mid; > else > low = mid + 1; I can only convince myself of these things by plugging in real numbers. So let's say mid_cu's range is [100,150[, so 150 bytes long, and we are looking for sect_off == 150. 150 is outside (just after) mid_cu's range, so the right answer will be the cu just after this one. Without your change, we would have gone in the "if", and therefore excluded the right answer. With your change, we would have gone in the "else", and therefore chosen the range with the right answer. So that looks correct to me. I'm going to ask if you thought about a way to test this. I don't think writing a typical test case for this is feasible. By refactoring the code a bit, we could maybe factor out the meat of this function into one that operates on an std::vector<dwarf2_per_cu_data> instead of on a dwarf2_per_objfile. It should then be feasible to create an std::vector with dwarf2_per_cu_data elements out of thin air to unit-test the function, including edge cases like this. Also, do you understand what's the logic with this dwz stuff? Is it that all the CUs coming from a dwz file are at the end of the list, or something like that? Simon
>>>>> "Simon" == Simon Marchi <simark@simark.ca> writes:
Simon> I'm going to ask if you thought about a way to test this. I
Simon> don't think writing a typical test case for this is feasible.
Yeah. And like I said, I don't remember how I encountered this.
Maybe only with some other dubious patch in place.
Simon> By refactoring the code a bit, we could maybe factor out the
Simon> meat of this function into one that operates on an std::vector<dwarf2_per_cu_data> instead
Simon> of on a dwarf2_per_objfile. It should then be feasible to create an std::vector with
Simon> dwarf2_per_cu_data elements out of thin air to unit-test the function, including edge
Simon> cases like this.
Another idea that occurred to me is that we could just use
std::binary_search here. Then instead of implementing a binary search,
we'd only need to implement a comparison function -- which seems
simpler.
Anyway I will try to write a unit test, that's a good idea.
Simon> Also, do you understand what's the logic with this dwz stuff? Is it that all the CUs
Simon> coming from a dwz file are at the end of the list, or something like that?
Yes, exactly. Maybe this can be cleaned up a bit after this series,
since now we have the index directly in the dwarf2_per_cu_data. I am
not sure offhand.
Tom
Simon> By refactoring the code a bit, we could maybe factor out the meat Simon> of this function into one that operates on an Simon> std::vector<dwarf2_per_cu_data> instead of on a Simon> dwarf2_per_objfile. It should then be feasible to create an Simon> std::vector with dwarf2_per_cu_data elements out of thin air to Simon> unit-test the function, including edge cases like this. Here is a new version of this patch. I propose landing it separately, because it is just a straightforward latent bug fix. Simon> Also, do you understand what's the logic with this dwz stuff? Is it that all the CUs Simon> coming from a dwz file are at the end of the list, or something like that? Tom> Yes, exactly. To expand on that, in multi-file mode, dwz may create a combined, shared file of partial symtabs. When reading the DWARF for the main file, gdb may find it needs the dwz file; and in this case the additional CUs are put into the same vector, at the end. It's done this way so that we can easily find the correct CU (since the section offsets between the main and shared dwz file will overlap). It would probably be better to try to share these CUs across objfiles. That may be somewhat involved given the current code, though. Tom
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c index e74383e01dc..e373cc0af2c 100644 --- a/gdb/dwarf2/read.c +++ b/gdb/dwarf2/read.c @@ -24171,7 +24171,7 @@ dwarf2_find_containing_comp_unit (sect_offset sect_off, mid_cu = dwarf2_per_objfile->all_comp_units[mid]; if (mid_cu->is_dwz > offset_in_dwz || (mid_cu->is_dwz == offset_in_dwz - && mid_cu->sect_off + mid_cu->length >= sect_off)) + && mid_cu->sect_off + mid_cu->length > sect_off)) high = mid; else low = mid + 1;