From patchwork Fri Jul 19 08:32:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 94193 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 741B2385C6C3 for ; Fri, 19 Jul 2024 08:33:06 +0000 (GMT) X-Original-To: elfutils-devel@sourceware.org Delivered-To: elfutils-devel@sourceware.org Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by sourceware.org (Postfix) with ESMTPS id 8270F385E82F for ; Fri, 19 Jul 2024 08:32:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8270F385E82F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=osandov.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=osandov.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8270F385E82F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::62c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721377944; cv=none; b=EJ+tFQYh6Ep2P3kFpgfxniRXyOHQX6kXDPrZrhUPZz5rgDWhvtNX/uhNroZxAz8AQDc70Obf19ZG4AugQ1ASjN/RaCZRQ2K+1wN4EQ57t2jmRkIddQkDLQLNnNNdPJ7NgJCwAMD7P5hVwtdJHL75kT9H/QMaD2+FGK0YOcrxRh4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721377944; c=relaxed/simple; bh=agDSi8xWCBdvDIXT0FELTzOpR+F7mJtkbuWXMvK/saY=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=VzU+WbhMadsgCAgkmQA2Aun7nMF9yITLmyo+ixyYKOIsk6K7legJ/a2QPVv07/pMlAJ96N36yOBexHNohVfTrlFcgHNkELLGIsat3hOuBLX6FjaW9CNHyRQcblXzhKyL/KgkgnLvG21lpMYmfOyt/CKHRB9j8eubS9/3SD11hhA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-1fc5296e214so14076345ad.0 for ; Fri, 19 Jul 2024 01:32:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20230601.gappssmtp.com; s=20230601; t=1721377941; x=1721982741; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mLK46KT51dAP0a3tOktqrPFE0gq0ljUrBtqHmp2qrN4=; b=gjxnDKeRuWrd8OCzpHivzMpGURMNm8YcCWOZda4AYYbK3fyEkjH1Z3rKcPw54vJ6/T 8dLrMg6n5AKbBi10SgZ0LAuJay5X5yMkiqEq/UfKGt7NVlLIV4BLCbvDEhO7vJA0RWbu UK/8iusOv9qq9WKAo9hY7o3rVEA4DRYRg/ISyEHI2ntaoxj7pe/aakD/zpjhijq3q+X0 ELwe+IpRKxb0tiCzE8o6J5QA3/MVFn6FQZnfc14UolYPofTNXqVgAA4/31v/1UUNORJ6 5w3Xy4b6zmZLaza66YLNvmqZufq5ow/gd/2JxpQ4rq4LUB8KOjTl/BRmqxbYew/8i85m Di8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721377941; x=1721982741; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mLK46KT51dAP0a3tOktqrPFE0gq0ljUrBtqHmp2qrN4=; b=g6xiWJr75sxxTAzfmesKzsM9eSF6YtV60KM0YhStzMPnrx/lPxcaTy0Cgg3z9gxzyv uNlbQLtzMn1Rw5zbDTywTXCw5dLA+AWQ5KBIsTaHbqJ6+YR/LX+haEWvwcW1WWwr1obd WuDtnQ31+VK0U52U/S7fPfY6DVj5ysY7tqyddQeIu0zJ2aMgvUQCcXRzrbJVCn43pahz 8A6aQ9Dfjfze5z7V5aWWmj15qxdaFQr22LkosrJSnalNYHcx9q7ZaR7+Qp0A8HCxisRg LugYqdN731U/14WlBxeUb1jmpZGbGZ0RXjT5uOYLf4BEn8SnsHUSI4Cd3n7t8wpF2x41 EqBw== X-Gm-Message-State: AOJu0Yy7XqzK0vHsypmx+K6p27SIgeUV8SRD9l6DIiruhVbxAUghkZPP 4Bw6AMeFwc8bV6ia47D58eccVXP6Cr+/h4Al0dAx3yik8lP1cT1uPHzj1rvYs57vDqttwn97fol D X-Google-Smtp-Source: AGHT+IG72SUBZAvwMALXqzTD5+dohWTHp5gVIDoExuDoyD6/TSz7HDqAFXAzhLZxkmWRbAXwmEcVBA== X-Received: by 2002:a17:902:ea04:b0:1fc:611a:bb3 with SMTP id d9443c01a7336-1fc611a0f69mr36407365ad.16.1721377941311; Fri, 19 Jul 2024 01:32:21 -0700 (PDT) Received: from telecaster.hsd1.wa.comcast.net ([2601:602:8980:9170::7a8e]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fd64b49467sm8832375ad.6.2024.07.19.01.32.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jul 2024 01:32:20 -0700 (PDT) From: Omar Sandoval To: elfutils-devel@sourceware.org Cc: "Frank Ch . Eigler" , linux-debuggers@vger.kernel.org Subject: [PATCH v3 7/7] debuginfod: populate _r_seekable on request Date: Fri, 19 Jul 2024 01:32:03 -0700 Message-ID: <99e9dcbd8c29a1be4ac46c74b9e59499fc0fce07.1721377314.git.osandov@fb.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: elfutils-devel-bounces~patchwork=sourceware.org@sourceware.org From: Omar Sandoval Since the schema change adding _r_seekable was done in a backward compatible way, seekable archives that were previously scanned will not be in _r_seekable. Whenever an archive is going to be extracted to satisfy a request, check if it is seekable. If so, populate _r_seekable while extracting it so that future requests use the optimized path. The next time that BUILDIDS is bumped, all archives will be checked at scan time. At that point, checking again will be unnecessary and this commit (including the test case modification) can be reverted. Signed-off-by: Omar Sandoval --- debuginfod/debuginfod.cxx | 76 +++++++++++++++++++++++++++++--- tests/run-debuginfod-seekable.sh | 45 +++++++++++++++++++ 2 files changed, 115 insertions(+), 6 deletions(-) diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index 677eca30..d8a02fb5 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -2740,6 +2740,7 @@ handle_buildid_r_match (bool internal_req_p, } // no match ... look for a seekable entry + bool populate_seekable = ! passive_p; unique_ptr pp (new sqlite_ps (internal_req_p ? db : dbq, "rpm-seekable-query", "select type, size, offset, mtime from " BUILDIDS "_r_seekable " @@ -2749,6 +2750,9 @@ handle_buildid_r_match (bool internal_req_p, { if (rc != SQLITE_ROW) throw sqlite_exception(rc, "step"); + // if we found a match in _r_seekable but we fail to extract it, don't + // bother populating it again + populate_seekable = false; const char* seekable_type = (const char*) sqlite3_column_text (*pp, 0); if (seekable_type != NULL && strcmp (seekable_type, "xz") == 0) { @@ -2840,16 +2844,39 @@ handle_buildid_r_match (bool internal_req_p, throw archive_exception(a, "cannot open archive from pipe"); } - // archive traversal is in three stages, no, four stages: - // 1) skip entries whose names do not match the requested one - // 2) extract the matching entry name (set r = result) - // 3) extract some number of prefetched entries (just into fdcache) - // 4) abort any further processing + // If the archive was scanned in a version without _r_seekable, then we may + // need to populate _r_seekable now. This can be removed the next time + // BUILDIDS is updated. + if (populate_seekable) + { + populate_seekable = is_seekable_archive (b_source0, a); + if (populate_seekable) + { + // NB: the names are already interned + pp.reset(new sqlite_ps (db, "rpm-seekable-insert2", + "insert or ignore into " BUILDIDS "_r_seekable (file, content, type, size, offset, mtime) " + "values (?, " + "(select id from " BUILDIDS "_files " + "where dirname = (select id from " BUILDIDS "_fileparts where name = ?) " + "and basename = (select id from " BUILDIDS "_fileparts where name = ?) " + "), 'xz', ?, ?, ?)")); + } + } + + // archive traversal is in five stages: + // 1) before we find a matching entry, insert it into _r_seekable if needed or + // skip it otherwise + // 2) extract the matching entry (set r = result). Also insert it into + // _r_seekable if needed + // 3) extract some number of prefetched entries (just into fdcache). Also + // insert them into _r_seekable if needed + // 4) if needed, insert all of the remaining entries into _r_seekable + // 5) abort any further processing struct MHD_Response* r = 0; // will set in stage 2 unsigned prefetch_count = internal_req_p ? 0 : fdcache_prefetch; // will decrement in stage 3 - while(r == 0 || prefetch_count > 0) // stage 1, 2, or 3 + while(r == 0 || prefetch_count > 0 || populate_seekable) // stage 1-4 { if (interrupted) break; @@ -2863,6 +2890,43 @@ handle_buildid_r_match (bool internal_req_p, continue; string fn = canonicalized_archive_entry_pathname (e); + + if (populate_seekable) + { + string dn, bn; + size_t slash = fn.rfind('/'); + if (slash == std::string::npos) { + dn = ""; + bn = fn; + } else { + dn = fn.substr(0, slash); + bn = fn.substr(slash + 1); + } + + int64_t seekable_size = archive_entry_size (e); + int64_t seekable_offset = archive_filter_bytes (a, 0); + time_t seekable_mtime = archive_entry_mtime (e); + + pp->reset(); + pp->bind(1, b_id0); + pp->bind(2, dn); + pp->bind(3, bn); + pp->bind(4, seekable_size); + pp->bind(5, seekable_offset); + pp->bind(6, seekable_mtime); + rc = pp->step(); + if (rc != SQLITE_DONE) + obatched(clog) << "recording seekable file=" << fn + << " sqlite3 error: " << (sqlite3_errstr(rc) ?: "?") << endl; + else if (verbose > 2) + obatched(clog) << "recorded seekable file=" << fn + << " size=" << seekable_size + << " offset=" << seekable_offset + << " mtime=" << seekable_mtime << endl; + if (r != 0 && prefetch_count == 0) // stage 4 + continue; + } + if ((r == 0) && (fn != b_source1)) // stage 1 continue; diff --git a/tests/run-debuginfod-seekable.sh b/tests/run-debuginfod-seekable.sh index d546fa3d..c787428f 100755 --- a/tests/run-debuginfod-seekable.sh +++ b/tests/run-debuginfod-seekable.sh @@ -138,4 +138,49 @@ kill $PID1 wait $PID1 PID1=0 +if type sqlite3 2>/dev/null; then + # Emulate the case of upgrading from an old server without the seekable + # optimization by dropping the _r_seekable table. + sqlite3 "$DB" 'DROP TABLE buildids10_r_seekable' + + env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -d $DB -p $PORT2 -t0 -g0 --fdcache-prefetch=0 -v -R -U R D > vlog$PORT2 2>&1 & + PID2=$! + tempfiles vlog$PORT2 + errfiles vlog$PORT2 + + wait_ready $PORT2 'ready' 1 + + check_all $PORT2 + + # The first request per archive has to do a full extraction. Check + # that the rest used the seekable optimization. + curl -s http://localhost:$PORT2/metrics | awk ' +/^http_responses_total\{result="seekable xz archive"\}/ { + print + seekable = $NF +} + +/^http_responses_total\{result="(rpm|deb) archive"\}/ { + print + full = $NF +} + +END { + if (seekable == 0) { + print "error: no seekable extractions" > "/dev/stderr" + exit 1 + } + if (full > 4) { + print "error: too many (" full ") full extractions" > "/dev/stderr" + exit 1 + } +}' + + tempfiles $DB* + + kill $PID2 + wait $PID2 + PID2=0 +fi + exit 0