| Message ID | cover.1721377314.git.osandov@fb.com |
|---|---|
| Headers |
Return-Path: <elfutils-devel-bounces~patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DF27B385DDED for <patchwork@sourceware.org>; Fri, 19 Jul 2024 08:32:32 +0000 (GMT) X-Original-To: elfutils-devel@sourceware.org Delivered-To: elfutils-devel@sourceware.org Received: from mail-oo1-xc36.google.com (mail-oo1-xc36.google.com [IPv6:2607:f8b0:4864:20::c36]) by sourceware.org (Postfix) with ESMTPS id 383A53857349 for <elfutils-devel@sourceware.org>; Fri, 19 Jul 2024 08:32:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 383A53857349 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=osandov.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=osandov.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 383A53857349 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::c36 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721377940; cv=none; b=VTaN6KMpwV2yiglaE5L4vuyPIBv1SdAj0/GEDOqy0UtMUzXOPFgkckeVpXmNacsAAkxxSCHwKHnlWQ9RKE/L8qPVrLpyARedFfdomTNxH9lFO2XM8svftr9u7F33mjbiLHkXZxmWOqkvnB47NK5gLYNxjIRVwLCZWsGT86dfAII= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721377940; c=relaxed/simple; bh=V/x4syxdp8ZcGWuz3KrDhKmSFvW2TeX169o7Of5ULys=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=rMdshkMvzWaLb3ZnexBzKFgY976Ykcca8o5mlWlbByJO6BVIxLA4/efQL1G9Mk/DmMEmD6wKMxTcNnlhGgA1s4aTRLNqBh/ibwVFuo9JfljZsbgbpvgCBcgibPnoL8PlCviXgkFZ7wo69of7PZbrn3XfYf7/wEjAjipMvVZmbFg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oo1-xc36.google.com with SMTP id 006d021491bc7-5d4fb707895so958931eaf.0 for <elfutils-devel@sourceware.org>; Fri, 19 Jul 2024 01:32:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20230601.gappssmtp.com; s=20230601; t=1721377934; x=1721982734; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=qj47/S75JiOHdj8zrrJyGuV0KYdPeJuJgYB4xKwyhXo=; b=ThMAeKydVCfbqBpru4DZjChBv2LZVaAkMLQj5U284MUkuRYip7IFvKxmWfIsc4i5yG 5wgsLHTwPEQ0Stkg35LndDduoFfuxJfCESH5xeTD5t0QPjLgSfgEyTLoGH6+BL/LEON8 Veo78RoNiZQAErsn+i95v0x+1bPjWx3qiVhNbN7FGM+bhXq9rTr7vn45k+VZ3ncobcuU pHFseT+nscbo9QkG7jLT8jxbF8pzWxORCD5CifNx8Q7kRwPSr8a1hf3zrPZ/zGcSVanV YaReUxg+Dvo7oFT4d92+CqngGVPiX5MHzVgfDJugz1ndGN5YuBtMghyNoXZZ7lgbk9Cj 2cHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721377934; x=1721982734; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qj47/S75JiOHdj8zrrJyGuV0KYdPeJuJgYB4xKwyhXo=; b=AmHXVUi2k0aWyRWpVPj3gAcTxn5EdL/lotZTDqWBToahNR6ObH4j6exvyFfj7tAnvu j++BAvGVaLbkmTzq7w6MlT02NTq3cknzE2WaWAd+v4dDhex9oscuIJcJhmQIjHAAEyfZ dAM/wFjX0UWYePeI91Xbx9k2pbLCDdovyohAl0TjI2kpjCl/+FheHrvkOxX3fI3LQEAU co1RgY/uoNNN3gUUDfgsMzsYHpuczfdVlssOWLQ1a/D9wNc2kC9csZRXGVB4Gs/f6LYJ nncsc6RBefX/GWfh0oLoMnuKkVBMb4RwqKlADFOCNvgAAnGhfmaebKEgPeOqrQQVYgP1 SFqA== X-Gm-Message-State: AOJu0YzcOhmZF1IVnV5O5+bwTqXjNAIX3xrxRidM6rWYgSKrBBH8OLq6 QXgu1INVd7OubIHq9t+McsMs3d2iyCamGWN4U3DRMHASc0UgkP1i/dsCeCaO9/tyvLpS0GT+di8 q X-Google-Smtp-Source: AGHT+IER2vgACA68RUEq4FYPQdVdtXgZ4ivzYwwRgL00SHcbU4mX3eQgqgLlpkeGAsIzeLESXBb/kQ== X-Received: by 2002:a05:6358:599b:b0:1aa:b645:329a with SMTP id e5c5f4694b2df-1aca9f344e0mr458845255d.21.1721377933749; Fri, 19 Jul 2024 01:32:13 -0700 (PDT) Received: from telecaster.hsd1.wa.comcast.net ([2601:602:8980:9170::7a8e]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fd64b49467sm8832375ad.6.2024.07.19.01.32.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jul 2024 01:32:13 -0700 (PDT) From: Omar Sandoval <osandov@osandov.com> To: elfutils-devel@sourceware.org Cc: "Frank Ch . Eigler" <fche@redhat.com>, linux-debuggers@vger.kernel.org Subject: [PATCH v3 0/7] debuginfod: speed up extraction from kernel debuginfo packages by 200x Date: Fri, 19 Jul 2024 01:31:56 -0700 Message-ID: <cover.1721377314.git.osandov@fb.com> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Elfutils-devel mailing list <elfutils-devel.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/elfutils-devel>, <mailto:elfutils-devel-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/elfutils-devel/> List-Post: <mailto:elfutils-devel@sourceware.org> List-Help: <mailto:elfutils-devel-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/elfutils-devel>, <mailto:elfutils-devel-request@sourceware.org?subject=subscribe> Errors-To: elfutils-devel-bounces~patchwork=sourceware.org@sourceware.org |
| Series |
debuginfod: speed up extraction from kernel debuginfo packages by 200x
|
|
Message
Omar Sandoval
July 19, 2024, 8:31 a.m. UTC
From: Omar Sandoval <osandov@fb.com>
This is v3 of my patch series optimizing debuginfod for kernel
debuginfo. v1 is here [7], v2 is here [8]. This version fixes a couple
of minor bugs and adds test cases.
Changes from v2 to v3:
- Added a test case with seekable rpm and deb files.
- Added a couple of independent fixes uncovered while adding tests.
- Added a few more prometheus metrics.
- Fixed passive mode.
Patches 1 and 2 fix existing bugs that the were uncovered by adding new
test package files. Patch 3 is a preparatory refactor. Patch 4 makes
the schema changes. Patch 5 implements the seekable xz extraction.
Patch 6 populates the table of seekable entries at scan time and adds a
test. Patch 7 does it for pre-existing files at request time.
Here is the background copied and pasted from v1:
drgn [1] currently uses debuginfod with great success for debugging
userspace processes. However, for debugging the Linux kernel (drgn's
main use case), we have had some performance issues with debuginfod, so
we intentionally avoid using it. Specifically, it sometimes takes over
a minute for debuginfod to respond to queries for vmlinux and kernel
modules (not including the actual download time).
The reason for the slowness is that Linux kernel debuginfo packages are
very large and contain lots of files. To respond to a query for a Linux
kernel debuginfo file, debuginfod has to decompress and iterate through
the whole package until it finds that file. If the file is towards the
end of the package, this can take a very long time. This was previously
reported for vdso files [2][3], which debuginfod was able to mitigate
with improved caching and prefetching. However, kernel modules are far
greater in number, vary drastically by hardware and workload, and can be
spread all over the package, so in practice I've still been seeing long
delays. This was also discussed on the drgn issue tracker [4].
The fundamental limitation is that Linux packages, which are essentially
compressed archives with extra metadata headers, don't support random
access to specific files. However, the multi-threaded xz compression
format does actually support random access. And, luckily, the kernel
debuginfo packages on Fedora, Debian, and Ubuntu all happen to use
multi-threaded xz compression!
debuginfod can take advantage of this: when it scans a package, if it is
a seekable xz archive, it can save the uncompressed offset and size of
each file. Then, when it needs a file, it can seek to that offset and
extract it from there. This requires some understanding of the xz
format and low-level liblzma code, but the speedup is massive: where the
worst case was previously about 50 seconds just to find a file in a
kernel debuginfo package, with this change the worst case is 0.25
seconds, a ~200x improvement! This works for both .rpm and .deb files.
I tested this by requesting and verifying the digest of every file from
a few kernel debuginfo rpms and debs [5].
P.S. The biggest downside of this change is that it depends on a very
specific compression format that is only used by kernel packages
incidentally. I think this is something we should formalize with Linux
distributions: large debuginfo packages should use a seekable format.
Currently, xz in multi-threaded mode is the only option, but Zstandard
also has an experimental seekable format that is worth looking into [6].
Thanks,
Omar
1: https://github.com/osandov/drgn
2: https://sourceware.org/bugzilla/show_bug.cgi?id=29478
3: https://bugzilla.redhat.com/show_bug.cgi?id=1970578
4: https://github.com/osandov/drgn/pull/380
5: https://gist.github.com/osandov/89d521fdc6c9a07aa8bb0ebf91974346
6: https://github.com/facebook/zstd/tree/dev/contrib/seekable_format
7: https://sourceware.org/pipermail/elfutils-devel/2024q3/007191.html
8: https://sourceware.org/pipermail/elfutils-devel/2024q3/007208.html
Omar Sandoval (7):
debuginfod: fix skipping <built-in> source file
tests/run-debuginfod-fd-prefetch-caches.sh: disable fdcache limit
check
debuginfod: factor out common code for responding from an archive
debugifod: add new table and views for seekable archives
debuginfod: optimize extraction from seekable xz archives
debuginfod: populate _r_seekable on scan
debuginfod: populate _r_seekable on request
configure.ac | 5 +
debuginfod/Makefile.am | 2 +-
debuginfod/debuginfod.cxx | 928 +++++++++++++++---
tests/Makefile.am | 4 +-
...pressme-seekable-xz-dbgsym_1.0-1_amd64.deb | Bin 0 -> 6288 bytes
...compressme-seekable-xz_1.0-1.debian.tar.xz | Bin 0 -> 1440 bytes
.../compressme-seekable-xz_1.0-1.dsc | 19 +
.../compressme-seekable-xz_1.0-1_amd64.deb | Bin 0 -> 6208 bytes
.../compressme-seekable-xz_1.0.orig.tar.xz | Bin 0 -> 7160 bytes
.../compressme-seekable-xz-1.0-1.src.rpm | Bin 0 -> 15880 bytes
.../compressme-seekable-xz-1.0-1.x86_64.rpm | Bin 0 -> 31873 bytes
...sme-seekable-xz-debuginfo-1.0-1.x86_64.rpm | Bin 0 -> 21917 bytes
...e-seekable-xz-debugsource-1.0-1.x86_64.rpm | Bin 0 -> 7961 bytes
tests/run-debuginfod-archive-groom.sh | 2 +-
tests/run-debuginfod-extraction.sh | 2 +-
tests/run-debuginfod-fd-prefetch-caches.sh | 4 +
tests/run-debuginfod-seekable.sh | 186 ++++
17 files changed, 1007 insertions(+), 145 deletions(-)
create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz-dbgsym_1.0-1_amd64.deb
create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0-1.debian.tar.xz
create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0-1.dsc
create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0-1_amd64.deb
create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0.orig.tar.xz
create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-1.0-1.src.rpm
create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-1.0-1.x86_64.rpm
create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-debuginfo-1.0-1.x86_64.rpm
create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-debugsource-1.0-1.x86_64.rpm
create mode 100755 tests/run-debuginfod-seekable.sh
Comments
Hi - > This is v3 of my patch series optimizing debuginfod for kernel > debuginfo. v1 is here [7], v2 is here [8]. This version fixes a couple > of minor bugs and adds test cases. [...] Thanks, LGTM, running through try-buildbots to make sure. - FChE
On Fri, Jul 19, 2024 at 01:34:48PM -0400, Frank Ch. Eigler wrote: > Hi - > > > This is v3 of my patch series optimizing debuginfod for kernel > > debuginfo. v1 is here [7], v2 is here [8]. This version fixes a couple > > of minor bugs and adds test cases. [...] > > Thanks, LGTM, running through try-buildbots to make sure. Sorry about the distcheck failures, looks like I forgot to add the new test files to EXTRA_DIST. I'll be sure to run distcheck next time. I'll send v4 shortly. Thanks, Omar