From patchwork Mon Nov 27 17:56:00 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Burgess <aburgess@redhat.com>
X-Patchwork-Id: 80836
Return-Path: <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id DE3A73861894
	for <patchwork@sourceware.org>; Mon, 27 Nov 2023 17:57:01 +0000 (GMT)
X-Original-To: gdb-patches@sourceware.org
Delivered-To: gdb-patches@sourceware.org
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by sourceware.org (Postfix) with ESMTPS id C62203857BBD
 for <gdb-patches@sourceware.org>; Mon, 27 Nov 2023 17:56:14 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C62203857BBD
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C62203857BBD
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=170.10.129.124
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701107776; cv=none;
 b=x4D552msSHkFpArynTx3pzOhXBSE3dG2EAWlVjksE3fX8YAbRHEhLu6VMpmx1H7ddcXxyBHQ8QZV4nTEoIvhftGGPyacZS6dTNh3X7/lPWzG6T4WyPi7qQEwXaF+lU6R1kPgWhsPvDgX5IoxB1SIdBFWjlhsKP3P2BXqc9UdCGg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1701107776; c=relaxed/simple;
 bh=N9VSrfn8rmdqm2H9xYWMNjLXB41BgcKhIuWhq5zL8yI=;
 h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version;
 b=fYqlP62eUhdCUK+RfOvVijRYSDyNU9KV/YtTvtkMYeYDx5DzEt28rt4kuIyujm/ffEzAbrJ4p21Dc+SBMbuWawEFCHn9beDRtcv7KHs8eVq3hNCX28Pnb9c/gDdDthBbZNLm4uoXyX0TGpgnpwikTBiiBVU1maTZvdxXWYPAh6s=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1701107774;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
 content-transfer-encoding:content-transfer-encoding:
 in-reply-to:in-reply-to:references:references;
 bh=a803Y+7TrO9OmdYOMRffjnb9tGwEID5OP6M8yiz4ce8=;
 b=KxLQKPHTJQ5zhby+fsUrYw+Ov/3x5mxVfqX61JONpfG3gdZdBb/AjYovrBmcTgEI4xhmhM
 OeikV7xlZlRvquVZs/XkKmcfge3ggpn0e7Q2YupMXp/jaj/QHVRn5ut2PFIcqFxSpoEg1m
 jkpvQkO0HdH51LKC8gnDLNmqYbr9j4E=
Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com
 [209.85.218.72]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-391-7COxaWEKNf6kSLDonEuYEA-1; Mon, 27 Nov 2023 12:56:13 -0500
X-MC-Unique: 7COxaWEKNf6kSLDonEuYEA-1
Received: by mail-ej1-f72.google.com with SMTP id
 a640c23a62f3a-a01efffd9e4so335713566b.0
 for <gdb-patches@sourceware.org>; Mon, 27 Nov 2023 09:56:12 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1701107772; x=1701712572;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=a803Y+7TrO9OmdYOMRffjnb9tGwEID5OP6M8yiz4ce8=;
 b=V2vxQ+jRivauAed41MfVfsld1o5mnBwSCWNbnxSnS35r2RoIy3RM0wIgNjL8uD9rYI
 HfLDOicAfTeoy/hmyQz3r/Y3OijGqNYw6iETdweDyHIUjSgOGJ+mKgzwDEiLgcTKVmvw
 lQx2Jv8wBG/JpHpVu38p35C0chsm4reDevT5Q13MCh0EUTW+znljCYPiGv+4wFQ+2Uqp
 VUOWQLE3sATRqqRHph0m/kUncAQp/B9+DO24Lpn+3N6hhkDUgAbF/1Hv8iTR6RGsOWRx
 hNu2GwCvaUxPQbFeQlMqUdctg3Oy/tA8ondTploAuXGfI4lZ3BBrMK/+gQj91GmZvJDm
 g5dg==
X-Gm-Message-State: AOJu0YwLoQKEHOl+zToFa+oAYyv5UheLuK0YhWjLEQ3y+MhWui72btJh
 S1KRFhuIUDWwX61LQEL0oS2hhGjgSy0FQ9ZAZ6t2cp52bO2cuEQw01dgRS6cXOwpiXJWXHaIDUg
 8klN4ETZgXUn01ycYbe6PDsgIZpRHFeMBA5/35xqMeIFk+tGxvaQ6T6z2FQMvomcVFM4hpo3nhB
 o7BQ2JJw==
X-Received: by 2002:a17:907:c18:b0:9fd:cc84:78c0 with SMTP id
 ga24-20020a1709070c1800b009fdcc8478c0mr11161484ejc.72.1701107771716;
 Mon, 27 Nov 2023 09:56:11 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IFDeggBsN9UgwR2hsN1v5nyEOVrDY5EXOe6XKHn+KuxWE/SfJRHpW7xu/LQiNZasvEk+kMX4Q==
X-Received: by 2002:a17:907:c18:b0:9fd:cc84:78c0 with SMTP id
 ga24-20020a1709070c1800b009fdcc8478c0mr11161462ejc.72.1701107771312;
 Mon, 27 Nov 2023 09:56:11 -0800 (PST)
Received: from localhost (105.226.159.143.dyn.plus.net. [143.159.226.105])
 by smtp.gmail.com with ESMTPSA id
 mb22-20020a170906eb1600b009fc0c42098csm5961587ejb.173.2023.11.27.09.56.10
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Mon, 27 Nov 2023 09:56:10 -0800 (PST)
From: Andrew Burgess <aburgess@redhat.com>
To: gdb-patches@sourceware.org
Cc: Andrew Burgess <aburgess@redhat.com>
Subject: [PATCH 6/7] gdb: generate gdb-index identically regardless of work
 thread count
Date: Mon, 27 Nov 2023 17:56:00 +0000
Message-Id: 
 <07b6a4d980be1f86db37101e371731be3dce3ecf.1701107594.git.aburgess@redhat.com>
X-Mailer: git-send-email 2.25.4
In-Reply-To: <cover.1701107594.git.aburgess@redhat.com>
References: <cover.1701107594.git.aburgess@redhat.com>
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org

It was observed that changing the number of worker threads that GDB
uses (maintenance set worker-threads NUM) would have an impact on the
layout of the generated gdb-index.

The cause seems to be how the CU are distributed between threads, and
then symbols that appear in multiple CU can be encountered earlier or
later depending on whether a particular CU moves between threads.

I certainly found this behaviour was reproducible when generating an
index for GDB itself, like:

  gdb -q -nx -nh -batch \
      -eiex 'maint set worker-threads NUM' \
      -ex 'save gdb-index /tmp/'

And then setting different values for NUM will change the generated
index.

Now, the question is: does this matter?

I would like to suggest that yes, this does matter.  At Red Hat we
generate a gdb-index as part of the build process, and we would
ideally like to have reproducible builds: for the same source,
compiled with the same tool-chain, we should get the exact same output
binary.  And we do .... except for the index.

Now we could simply force GDB to only use a single worker thread when
we build the index, but, I don't think the idea of reproducible builds
is that strange, so I think we should ensure that our generated
indexes are always reproducible.

To achieve this, I propose that we add an extra step when building the
gdb-index file.  After constructing the initial symbol hash table
contents, we will pull all the symbols out of the hash, sort them,
then re-insert them in sorted order.  This will ensure that the
structure of the generated hash will remain consistent (given the same
set of symbols).

I've extended the existing index-file test to check that the generated
index doesn't change if we adjust the number of worker threads used.
Given that this test is already rather slow, I've only made one change
to the worker-thread count.  Maybe this test should be changed to use
a smaller binary, which is quicker to load, and for which we could
then try many different worker thread counts.
---
 gdb/dwarf2/index-write.c             | 69 ++++++++++++++++++++++++++++
 gdb/testsuite/gdb.gdb/index-file.exp | 41 +++++++++++++++++
 gdb/testsuite/lib/gdb.exp            | 15 ++++++
 3 files changed, 125 insertions(+)

diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c
index e1c343e8790..bc07bcb8f4c 100644
--- a/gdb/dwarf2/index-write.c
+++ b/gdb/dwarf2/index-write.c
@@ -212,6 +212,13 @@ struct mapped_symtab
   void add_index_entry (const char *name, int is_static,
 			gdb_index_symbol_kind kind, offset_type cu_index);
 
+  /* When entries are originally added into the data hash the order will
+     vary based on the number of worker threads GDB is configured to use.
+     This function will rebuild the hash such that the final layout will be
+     deterministic regardless of the number of worker threads used.  */
+
+  void sort ();
+
   /* Access the obstack.  */
   auto_obstack *obstack ()
   { return &m_string_obstack; }
@@ -298,6 +305,65 @@ mapped_symtab::hash_expand ()
       }
 }
 
+/* See mapped_symtab class declaration.  */
+
+void mapped_symtab::sort ()
+{
+  /* Move contents out of this->data vector.  */
+  std::vector<symtab_index_entry> original_data = std::move (m_data);
+
+  /* Restore the size of m_data, this will avoid having to expand the hash
+     table (and rehash all elements) when we reinsert after sorting.
+     However, we do reset the element count, this allows for some sanity
+     checking asserts during the reinsert phase.  */
+  gdb_assert (m_data.size () == 0);
+  m_data.resize (original_data.size ());
+  m_element_count = 0;
+
+  /* Remove empty entries from ORIGINAL_DATA, this makes sorting quicker.  */
+  auto it = std::remove_if (original_data.begin (), original_data.end (),
+			    [] (const symtab_index_entry &entry) -> bool
+			    {
+			      return entry.name == nullptr;
+			    });
+  original_data.erase (it, original_data.end ());
+
+  /* Sort the existing contents.  */
+  std::sort (original_data.begin (), original_data.end (),
+	     [] (const symtab_index_entry &a,
+		 const symtab_index_entry &b) -> bool
+	     {
+	       /* Return true if A is before B.  */
+	       gdb_assert (a.name != nullptr);
+	       gdb_assert (b.name != nullptr);
+
+	       return strcmp (a.name, b.name) < 0;
+	     });
+
+  /* Re-insert each item from the sorted list.  */
+  for (auto &entry : original_data)
+    {
+      /* We know that ORIGINAL_DATA contains no duplicates, this data was
+	 taken from a hash table that de-duplicated entries for us, so
+	 count this as a new item.
+
+	 As we retained the original size of m_data (see above) then we
+	 should never need to grow m_data_ during this re-insertion phase,
+	 assert that now.  */
+      ++m_element_count;
+      gdb_assert (!this->hash_needs_expanding ());
+
+      /* Lookup a slot.  */
+      symtab_index_entry &slot = this->find_slot (entry.name);
+
+      /* As discussed above, we should not find duplicates.  */
+      gdb_assert (slot.name == nullptr);
+
+      /* Move this item into the slot we found.  */
+      slot = std::move (entry);
+    }
+}
+
 /* See class definition.  */
 
 void
@@ -1346,6 +1412,9 @@ write_gdbindex (dwarf2_per_bfd *per_bfd, cooked_index *table,
   for (auto map : table->get_addrmaps ())
     write_address_map (map, addr_vec, cu_index_htab);
 
+  /* Ensure symbol hash is built domestically.  */
+  symtab.sort ();
+
   /* Now that we've processed all symbols we can shrink their cu_indices
      lists.  */
   symtab.minimize ();
diff --git a/gdb/testsuite/gdb.gdb/index-file.exp b/gdb/testsuite/gdb.gdb/index-file.exp
index c6edd286fb9..08415920061 100644
--- a/gdb/testsuite/gdb.gdb/index-file.exp
+++ b/gdb/testsuite/gdb.gdb/index-file.exp
@@ -35,6 +35,9 @@ with_timeout_factor $timeout_factor {
     clean_restart $filename
 }
 
+# Record how many worker threads GDB is using.
+set worker_threads [gdb_get_worker_threads]
+
 # Generate an index file.
 set dir1 [standard_output_file "index_1"]
 remote_exec host "mkdir -p ${dir1}"
@@ -113,3 +116,41 @@ proc check_symbol_table_usage { filename } {
 
 set index_filename_base [file tail $filename]
 check_symbol_table_usage "$dir1/${index_filename_base}.gdb-index"
+
+# If GDB is using more than 1 worker thread then reduce the number of
+# worker threads, regenerate the index, and check that we get the same
+# index file back.  At one point the layout of the index would vary
+# based on the number of worker threads used.
+if { $worker_threads > 1 } {
+    # Start GDB, but don't load a file yet.
+    clean_restart
+
+    # Adjust the number of threads to use.
+    set reduced_threads [expr $worker_threads / 2]
+    gdb_test_no_output "maint set worker-threads $reduced_threads"
+
+    with_timeout_factor $timeout_factor {
+	# Now load the test binary.
+	gdb_file_cmd $filename
+    }
+
+    # Generate an index file.
+    set dir2 [standard_output_file "index_2"]
+    remote_exec host "mkdir -p ${dir2}"
+    with_timeout_factor $timeout_factor {
+	gdb_test_no_output "save gdb-index $dir2" \
+	    "create second gdb-index file"
+    }
+
+    # Close GDB.
+    gdb_exit
+
+    # Now check that the index files are identical.
+    foreach suffix { gdb-index  } {
+	set result \
+	    [remote_exec host \
+		 "cmp -s \"$dir1/${index_filename_base}.${suffix}\" \"$dir2/${index_filename_base}.${suffix}\""]
+	gdb_assert { [lindex $result 0] == 0 } \
+	    "$suffix files are identical"
+    }
+}
diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index 63885860795..b534a61d066 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -10026,6 +10026,21 @@ proc is_target_non_stop { {testname ""} } {
     return $is_non_stop
 }
 
+# Return the number of worker threads that GDB is currently using.
+
+proc gdb_get_worker_threads { {testname ""} } {
+    set worker_threads "UNKNOWN"
+    gdb_test_multiple "maintenance show worker-threads" $testname {
+	-wrap -re "^The number of worker threads GDB can use is unlimited \\(currently ($::decimal)\\)\\." {
+	    set worker_threads $expect_out(1,string)
+	}
+	-wrap -re "The number of worker threads GDB can use is ($::decimal)\\." {
+	    set worker_threads $expect_out(1,string)
+	}
+    }
+    return $worker_threads
+}
+
 # Check if the compiler emits epilogue information associated
 # with the closing brace or with the last statement line.
 #