From patchwork Fri Jul 19 18:24:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 94243 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BB2B3384A49A for ; Fri, 19 Jul 2024 18:25:39 +0000 (GMT) X-Original-To: elfutils-devel@sourceware.org Delivered-To: elfutils-devel@sourceware.org Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by sourceware.org (Postfix) with ESMTPS id D27BB384A45F for ; Fri, 19 Jul 2024 18:24:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D27BB384A45F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=osandov.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=osandov.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D27BB384A45F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::52e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721413498; cv=none; b=kUDRsZ2c1CGdNlOm9lBfSjzunClav4a3V3EP7BjUkVtgDg+TjiZ8qnwTdxy3EWAc+EZiXniiXY8LFsuA0Qac+cTTLzZyx85hS27SUpeNIpNihwK5CEKG8zn9BJmPhbJ+GCbKOu/CSJiQwBhM3ie4IJ7f84V4YFa0B1h9vpTp7kY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721413498; c=relaxed/simple; bh=dwgvgrIp09u3tk89mAn+7NZnPmf+6349+R7yz9bHE14=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=vKUv/gjSZEV9itej/4PpasYs9QIFtl76LyheM8LwEFHVQdBfQ8v++P0LNFJPa/DRoh+Kw1yJCeHgoMCDnSEx2Z58OXg1BOyrNFPAtGWO7qeS2SKI8HztS3HDe+16PogqbewURls3lcwIaOUd2wJYr9/bAzd0s1Vi1Nqxn/D2w1g= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pg1-x52e.google.com with SMTP id 41be03b00d2f7-79db9fc335cso630515a12.2 for ; Fri, 19 Jul 2024 11:24:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20230601.gappssmtp.com; s=20230601; t=1721413494; x=1722018294; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jyXA9l38FmGfagMMaEXadf4EZ7z0P5kdHpg/k92mBmI=; b=P1HG4HIKvH9EASp1Yn8Nwr9/1gjrEooGkJSp0uVe+jWOiA5CRWki42g7yVfmVra4g7 K1rHEVdVtASusMXEL+jGTNs1aVpwrxJdU5XhGlV9uWHhgK5+WXEnY4ARZ336NgwTTrGu a0qXXK90/4JDaM6i43G+hwrFbGOUg6qjPX9uZa4lDQce9QhI0pU8D+tWFgG5kPZCncNK 9BO7FVjBj1yqYhfUW4bMFj7S1wgRkA2HQKR1Bp3EjRkcsgSiQJ/cEKm/CnsvU/J24cRK zzW56evf9eQQWyFqQHZbYOI8T8gB6G0BV4zg68YQZauxmdo3rcPWmX8HX7z5s5f45Lyp NBHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721413494; x=1722018294; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jyXA9l38FmGfagMMaEXadf4EZ7z0P5kdHpg/k92mBmI=; b=NUmkIe1qYm6dFplMjQu8sVev4/UsMTwENP1+9Jaw9lQ0lKm0m3GYGhdRxTcfdMAKhf hOWWBGub2m1Z9rtB/2B7GeBaM3PURA8APyy1tiP4wqujRWgBtgf1A4nTM9vz9TEsZumu 1hnCzHyqTdCEjMrnBwBR/xyiqBBG9QcqFldlFL/1uHb/8eRIT70jHz3Inkqu39v2wp1f /FPg1n1TLtDS+pUstIeEnRGfdIZan1S5fe6XKjE7zZFG9lgw7fdwxOoZZjvDLRsBoGZB ljVq4gmNJOWxy1dgsYWwuJgwciG/ntqpiYARcvnICIfdmpE/KtI6iCMHzOUSR1GjGs8m Rqww== X-Gm-Message-State: AOJu0Yzgc8lOr8i/hQoFdz8cMZNIbpGqePL7oIWWUCXFAhfZD1s+gYjx le4ERGGOiCY3ZXBJnYYZhSsEjRxFy+IYV/TJEkM2WXiaSxUnOtUx83/p9821v5vHsG/X6nx45f3 / X-Google-Smtp-Source: AGHT+IGciLEi44jQa6NMmFS49VBDLw8cc7gQf5i1LwwWAK+SSaSByorIN/KPQVI5I0en+/SKfFc9JA== X-Received: by 2002:a05:6a21:78a2:b0:1c0:f0be:1536 with SMTP id adf61e73a8af0-1c3fddb3cadmr10683051637.40.1721413494449; Fri, 19 Jul 2024 11:24:54 -0700 (PDT) Received: from telecaster.thefacebook.com ([2620:10d:c090:500::7:5d79]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70cff4b2f40sm1476344b3a.63.2024.07.19.11.24.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jul 2024 11:24:53 -0700 (PDT) From: Omar Sandoval To: elfutils-devel@sourceware.org Cc: "Frank Ch . Eigler" , linux-debuggers@vger.kernel.org Subject: [PATCH v4 7/7] debuginfod: populate _r_seekable on request Date: Fri, 19 Jul 2024 11:24:38 -0700 Message-ID: <7805d255fdf503cb1154fe168c62347c3ed8a559.1721413308.git.osandov@fb.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: elfutils-devel-bounces~patchwork=sourceware.org@sourceware.org From: Omar Sandoval Since the schema change adding _r_seekable was done in a backward compatible way, seekable archives that were previously scanned will not be in _r_seekable. Whenever an archive is going to be extracted to satisfy a request, check if it is seekable. If so, populate _r_seekable while extracting it so that future requests use the optimized path. The next time that BUILDIDS is bumped, all archives will be checked at scan time. At that point, checking again will be unnecessary and this commit (including the test case modification) can be reverted. Signed-off-by: Omar Sandoval --- debuginfod/debuginfod.cxx | 76 +++++++++++++++++++++++++++++--- tests/run-debuginfod-seekable.sh | 45 +++++++++++++++++++ 2 files changed, 115 insertions(+), 6 deletions(-) diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index 5fe2db0c..fb7873ae 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -2740,6 +2740,7 @@ handle_buildid_r_match (bool internal_req_p, } // no match ... look for a seekable entry + bool populate_seekable = ! passive_p; unique_ptr pp (new sqlite_ps (internal_req_p ? db : dbq, "rpm-seekable-query", "select type, size, offset, mtime from " BUILDIDS "_r_seekable " @@ -2749,6 +2750,9 @@ handle_buildid_r_match (bool internal_req_p, { if (rc != SQLITE_ROW) throw sqlite_exception(rc, "step"); + // if we found a match in _r_seekable but we fail to extract it, don't + // bother populating it again + populate_seekable = false; const char* seekable_type = (const char*) sqlite3_column_text (*pp, 0); if (seekable_type != NULL && strcmp (seekable_type, "xz") == 0) { @@ -2840,16 +2844,39 @@ handle_buildid_r_match (bool internal_req_p, throw archive_exception(a, "cannot open archive from pipe"); } - // archive traversal is in three stages, no, four stages: - // 1) skip entries whose names do not match the requested one - // 2) extract the matching entry name (set r = result) - // 3) extract some number of prefetched entries (just into fdcache) - // 4) abort any further processing + // If the archive was scanned in a version without _r_seekable, then we may + // need to populate _r_seekable now. This can be removed the next time + // BUILDIDS is updated. + if (populate_seekable) + { + populate_seekable = is_seekable_archive (b_source0, a); + if (populate_seekable) + { + // NB: the names are already interned + pp.reset(new sqlite_ps (db, "rpm-seekable-insert2", + "insert or ignore into " BUILDIDS "_r_seekable (file, content, type, size, offset, mtime) " + "values (?, " + "(select id from " BUILDIDS "_files " + "where dirname = (select id from " BUILDIDS "_fileparts where name = ?) " + "and basename = (select id from " BUILDIDS "_fileparts where name = ?) " + "), 'xz', ?, ?, ?)")); + } + } + + // archive traversal is in five stages: + // 1) before we find a matching entry, insert it into _r_seekable if needed or + // skip it otherwise + // 2) extract the matching entry (set r = result). Also insert it into + // _r_seekable if needed + // 3) extract some number of prefetched entries (just into fdcache). Also + // insert them into _r_seekable if needed + // 4) if needed, insert all of the remaining entries into _r_seekable + // 5) abort any further processing struct MHD_Response* r = 0; // will set in stage 2 unsigned prefetch_count = internal_req_p ? 0 : fdcache_prefetch; // will decrement in stage 3 - while(r == 0 || prefetch_count > 0) // stage 1, 2, or 3 + while(r == 0 || prefetch_count > 0 || populate_seekable) // stage 1-4 { if (interrupted) break; @@ -2863,6 +2890,43 @@ handle_buildid_r_match (bool internal_req_p, continue; string fn = canonicalized_archive_entry_pathname (e); + + if (populate_seekable) + { + string dn, bn; + size_t slash = fn.rfind('/'); + if (slash == std::string::npos) { + dn = ""; + bn = fn; + } else { + dn = fn.substr(0, slash); + bn = fn.substr(slash + 1); + } + + int64_t seekable_size = archive_entry_size (e); + int64_t seekable_offset = archive_filter_bytes (a, 0); + time_t seekable_mtime = archive_entry_mtime (e); + + pp->reset(); + pp->bind(1, b_id0); + pp->bind(2, dn); + pp->bind(3, bn); + pp->bind(4, seekable_size); + pp->bind(5, seekable_offset); + pp->bind(6, seekable_mtime); + rc = pp->step(); + if (rc != SQLITE_DONE) + obatched(clog) << "recording seekable file=" << fn + << " sqlite3 error: " << (sqlite3_errstr(rc) ?: "?") << endl; + else if (verbose > 2) + obatched(clog) << "recorded seekable file=" << fn + << " size=" << seekable_size + << " offset=" << seekable_offset + << " mtime=" << seekable_mtime << endl; + if (r != 0 && prefetch_count == 0) // stage 4 + continue; + } + if ((r == 0) && (fn != b_source1)) // stage 1 continue; diff --git a/tests/run-debuginfod-seekable.sh b/tests/run-debuginfod-seekable.sh index d546fa3d..c787428f 100755 --- a/tests/run-debuginfod-seekable.sh +++ b/tests/run-debuginfod-seekable.sh @@ -138,4 +138,49 @@ kill $PID1 wait $PID1 PID1=0 +if type sqlite3 2>/dev/null; then + # Emulate the case of upgrading from an old server without the seekable + # optimization by dropping the _r_seekable table. + sqlite3 "$DB" 'DROP TABLE buildids10_r_seekable' + + env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -d $DB -p $PORT2 -t0 -g0 --fdcache-prefetch=0 -v -R -U R D > vlog$PORT2 2>&1 & + PID2=$! + tempfiles vlog$PORT2 + errfiles vlog$PORT2 + + wait_ready $PORT2 'ready' 1 + + check_all $PORT2 + + # The first request per archive has to do a full extraction. Check + # that the rest used the seekable optimization. + curl -s http://localhost:$PORT2/metrics | awk ' +/^http_responses_total\{result="seekable xz archive"\}/ { + print + seekable = $NF +} + +/^http_responses_total\{result="(rpm|deb) archive"\}/ { + print + full = $NF +} + +END { + if (seekable == 0) { + print "error: no seekable extractions" > "/dev/stderr" + exit 1 + } + if (full > 4) { + print "error: too many (" full ") full extractions" > "/dev/stderr" + exit 1 + } +}' + + tempfiles $DB* + + kill $PID2 + wait $PID2 + PID2=0 +fi + exit 0