gdb/build-id: protect against weirdly short build-ids

Message ID ba559099b3deffb790d1c4c500d17c543eb0b930.1732010413.git.aburgess@redhat.com
State New
Headers
Series gdb/build-id: protect against weirdly short build-ids |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Test passed

Commit Message

Andrew Burgess Nov. 19, 2024, 10 a.m. UTC
  While looking at build_id_to_bfd_suffix (in gdb/build-id.c) I realised
that GDB would likely not do what we wanted if a build-id was ever a
single byte.

Right now, build-ids generated by the GNU linker are 32 bytes, but
there's nothing that forces this to be the case, it's pretty easy to
create a fake, single byte, build-id.  Given that the build-id is an
external input (read from the objfile), GDB should protect itself
against these edge cases.

The problem with build_id_to_bfd_suffix is that this function creates
the path used to lookup either the debug information, or an
executable, based on its build-id.  So a 3-byte build-id 0xaabbcc will
look in the path: `$DEBUG_FILE_DIRECTORY/.build-id/aa/bbcc.debug`.
However, a single byte build-id 0xaa, will look in the file:
`$DEBUG_FILE_DIRECTORY/.build-id/aa/.debug` which doesn't seem right.

Worse, when looking for an objfile given a build-id GDB will look for
a file called `$DEBUG_FILE_DIRECTORY/.build-id/aa/` with a trailing
'/' character.

I propose that, in build_id_to_bfd_suffix we just return early if the
build-id is 1 byte (or less) with a return value that indicates no
separate file was found.

For testing I made use of the DWARF assembler.  I needed to update the
build-id creation proc, the existing code assumes that the build-id is
a multiple of 4 bytes, so I added some additional padding to ensure
that the generated note was a multiple of 4 bytes, even if the
build-id was not.

I added a test with a 1 byte build-id, and also for the case where the
build-id has zero length.  The zero length case already does what
you'd expect (no debug is loaded) as the bfd library rejects the
build-id when loading it from the objfile, but adding this additional
test is pretty cheap.
---
 gdb/build-id.c                              |  19 +++-
 gdb/testsuite/gdb.dwarf2/short-build-id.exp | 119 ++++++++++++++++++++
 gdb/testsuite/lib/dwarf.exp                 |  13 ++-
 3 files changed, 142 insertions(+), 9 deletions(-)
 create mode 100644 gdb/testsuite/gdb.dwarf2/short-build-id.exp


base-commit: 82eff6743b77908a502b4cf9030acc93caf69e74
  

Comments

Kevin Buettner Nov. 20, 2024, 9:22 p.m. UTC | #1
On Tue, 19 Nov 2024 10:00:28 +0000
Andrew Burgess <aburgess@redhat.com> wrote:

> I propose that, in build_id_to_bfd_suffix we just return early if the
> build-id is 1 byte (or less) with a return value that indicates no
> separate file was found.

I wonder if also printing some sort of warning might be appropriate in
such cases?  If a general warning isn't suitable, I think it makes
sense to at least use separate_debug_file_debug_printf() to output
a suitable message when separate-debug-file debugging is enabled.

Kevin
  
Andrew Burgess Nov. 21, 2024, 2:30 p.m. UTC | #2
Kevin Buettner <kevinb@redhat.com> writes:

> On Tue, 19 Nov 2024 10:00:28 +0000
> Andrew Burgess <aburgess@redhat.com> wrote:
>
>> I propose that, in build_id_to_bfd_suffix we just return early if the
>> build-id is 1 byte (or less) with a return value that indicates no
>> separate file was found.
>
> I wonder if also printing some sort of warning might be appropriate in
> such cases?  If a general warning isn't suitable, I think it makes
> sense to at least use separate_debug_file_debug_printf() to output
> a suitable message when separate-debug-file debugging is enabled.

I posted a v2 which includes a debug log message that the lookup was
abandoned.  I was tempted with the idea of a warning, but wasn't sure if
that was justified or not... I'm still not really sure.  If you have a
strong preference for a warning then I can switch to that approach.

Anyway, let me know what you think of v2.

Thanks,
Andrew
  

Patch

diff --git a/gdb/build-id.c b/gdb/build-id.c
index 9d4b005489d..8eaf3472e80 100644
--- a/gdb/build-id.c
+++ b/gdb/build-id.c
@@ -223,7 +223,13 @@  build_id_to_debug_bfd_1 (const std::string &original_link,
 
 /* Common code for finding BFDs of a given build-id.  This function
    works with both debuginfo files (SUFFIX == ".debug") and executable
-   files (SUFFIX == "").  */
+   files (SUFFIX == "").
+
+   The build-id will be split into a single byte sub-directory, followed by
+   the remaining build-id bytes as the filename, i.e. we use the lookup
+   format: `.build-id/xx/yy....zz`.  As a consequence, if BUILD_ID_LEN is
+   less than 2 (bytes), no results will be found as there are not enough
+   bytes to form the `yy....zz` part of the lookup filename.  */
 
 static gdb_bfd_ref_ptr
 build_id_to_bfd_suffix (size_t build_id_len, const bfd_byte *build_id,
@@ -231,6 +237,9 @@  build_id_to_bfd_suffix (size_t build_id_len, const bfd_byte *build_id,
 {
   SEPARATE_DEBUG_FILE_SCOPED_DEBUG_ENTER_EXIT;
 
+  if (build_id_len < 2)
+    return {};
+
   /* Keep backward compatibility so that DEBUG_FILE_DIRECTORY being "" will
      cause "/.build-id/..." lookups.  */
 
@@ -249,11 +258,9 @@  build_id_to_bfd_suffix (size_t build_id_len, const bfd_byte *build_id,
       std::string link = debugdir.get ();
       link += "/.build-id/";
 
-      if (size > 0)
-	{
-	  size--;
-	  string_appendf (link, "%02x/", (unsigned) *data++);
-	}
+      gdb_assert (size > 1);
+      size--;
+      string_appendf (link, "%02x/", (unsigned) *data++);
 
       while (size-- > 0)
 	string_appendf (link, "%02x", (unsigned) *data++);
diff --git a/gdb/testsuite/gdb.dwarf2/short-build-id.exp b/gdb/testsuite/gdb.dwarf2/short-build-id.exp
new file mode 100644
index 00000000000..f40d5e4b005
--- /dev/null
+++ b/gdb/testsuite/gdb.dwarf2/short-build-id.exp
@@ -0,0 +1,119 @@ 
+# Copyright 2024 Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Create a file with an artificially short (1-byte) build-id, and
+# check that GDB doesn't try to load debug information.  If we do try
+# then we end up loading from: `debug-directory/.build-id/xx/.debug`
+# which isn't right.
+
+load_lib dwarf.exp
+
+# This test can only be run on targets which support DWARF-2 and use gas.
+require dwarf2_support
+
+# No remote host testing either.
+require {!is_remote host}
+
+standard_testfile main.c
+
+# Create an assembler file which encodes BUILDID as the build-id.  Compile
+# this along with the global SRCFILE to create a test executable.
+#
+# Split the debug information out from the newly created executable and place
+# it into the debug file directory.
+#
+# Load the executable into GDB and check to see if the debug information was
+# loaded or not.  For this test we are expecting that the debug information
+# was not loaded.  The reason is that, with short values for BUILDID, GDB ends
+# up looking for the debug information in weird locations.
+proc run_test { buildid } {
+    set len [string length $buildid]
+
+    set asm_file [standard_output_file "$::testfile.$len.S"]
+    Dwarf::assemble $asm_file {
+	declare_labels int_label int_label2
+
+	upvar buildid buildid
+
+	build_id $buildid
+
+	cu { label cu_start } {
+	    compile_unit {{language @DW_LANG_C}} {
+		int_label2: base_type {
+		    {name int}
+		    {byte_size 4 sdata}
+		    {encoding @DW_ATE_signed}
+		}
+
+		constant {
+		    {name the_int}
+		    {type :$int_label2}
+		    {const_value 99 data1}
+		}
+	    }
+	}
+
+	aranges {} cu_start {
+	    arange {} 0 0
+	}
+    }
+
+    set execfile [standard_output_file $::testfile.$len]
+
+    if { [build_executable_from_specs "failed to build" \
+	      $execfile {debug no-build-id} \
+	      $::srcfile debug \
+	      $asm_file {}] } {
+	return
+    }
+
+    # Create the debug directory.
+    set debugdir [standard_output_file "debugdir.$len"]
+    set build_id_dir $debugdir/.build-id/$buildid
+    remote_exec host "mkdir -p $build_id_dir"
+
+    # Split out the debug information.
+    if {[gdb_gnu_strip_debug $execfile no-debuglink]} {
+	unresolved "failed to split out debug information"
+	return
+    }
+
+    # Move the debug information into the debug directory.  We place the debug
+    # information into a file called just '.debug'.  GDB should not check this
+    # file, but at one point GDB would check this file, even though this
+    # doesn't make much sense.
+    set execfile_debug ${execfile}.debug
+    remote_exec host "mv $execfile_debug $build_id_dir/.debug"
+
+    # Start GDB, set the debug-file-directory, and try loading the file.
+    clean_restart
+
+    gdb_test_no_output "set debug-file-directory $debugdir" \
+	"set debug-file-directory"
+
+    gdb_file_cmd $execfile
+
+    gdb_assert { $::gdb_file_cmd_debug_info eq "nodebug" } \
+	"no debug should be loaded"
+
+    # For sanity, read something that was encoded in the debug
+    # information, this should fail.
+    gdb_test "print the_int" \
+	"(?:No symbol table is loaded|No symbol \"the_int\" in current context).*"
+}
+
+foreach_with_prefix buildid { a4 "" } {
+    run_test $buildid
+}
diff --git a/gdb/testsuite/lib/dwarf.exp b/gdb/testsuite/lib/dwarf.exp
index 1002c75dd09..f6348972968 100644
--- a/gdb/testsuite/lib/dwarf.exp
+++ b/gdb/testsuite/lib/dwarf.exp
@@ -3008,25 +3008,32 @@  namespace eval Dwarf {
 
     proc _note {type name hexdata} {
 	set namelen [expr [string length $name] + 1]
+	set datalen [expr [string length $hexdata] / 2]
 
 	# Name size.
 	_op .4byte $namelen
 	# Data size.
-	_op .4byte [expr [string length $hexdata] / 2]
+	_op .4byte $datalen
 	# Type.
 	_op .4byte $type
 	# The name.
 	_op .ascii [_quote $name]
-	# Alignment.
+	# Alignment (to 4-byte boundary).
 	set align 2
 	set total [expr {($namelen + (1 << $align) - 1) & -(1 << $align)}]
 	for {set i $namelen} {$i < $total} {incr i} {
-	    _op .byte 0
+	    _op .byte 0 padding
 	}
 	# The data.
 	foreach {a b} [split $hexdata {}] {
 	    _op .byte 0x$a$b
 	}
+	# Alignment (to 4-byte boundary).
+	set align 2
+	set total [expr {($datalen + (1 << $align) - 1) & -(1 << $align)}]
+	for {set i $datalen} {$i < $total} {incr i} {
+	    _op .byte 0 padding
+	}
     }
 
     # Emit a note section holding the given build-id.