From patchwork Tue Nov 1 14:23:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Frank Ch. Eigler" X-Patchwork-Id: 59723 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DF56C38582BA for ; Tue, 1 Nov 2022 14:23:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DF56C38582BA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1667312606; bh=avn1NSlSz7QMDnxF5B28do/+vIyqgb1cl9F8lMxdF2A=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Help:List-Subscribe:From:Reply-To:From; b=JYWN4KPB/T/dhBDIvYCRz9YnMMrtXFZXk4aToNabEFoQxO4l4ogeWgsAfEEhziUqI 3H3IPktxcxhhkVVCOgXudeeWMAV5LnvJ/szVFetS+RozkTpLwkYHTAgWj50IQfUCVS ZdWJTKTEaPqIU28osCisGA5T1G2h+uij9IqrjywI= X-Original-To: elfutils-devel@sourceware.org Delivered-To: elfutils-devel@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 00A123858410 for ; Tue, 1 Nov 2022 14:23:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 00A123858410 Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-460-r6ID27j8NGeuNx4skOsPVA-1; Tue, 01 Nov 2022 10:23:08 -0400 X-MC-Unique: r6ID27j8NGeuNx4skOsPVA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1E6EB1C08975 for ; Tue, 1 Nov 2022 14:23:08 +0000 (UTC) Received: from redhat.com (unknown [10.2.16.25]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D6A552166B2D for ; Tue, 1 Nov 2022 14:23:07 +0000 (UTC) Received: from fche by redhat.com with local (Exim 4.94.2) (envelope-from ) id 1opsAU-0004is-Ck for elfutils-devel@sourceware.org; Tue, 01 Nov 2022 10:23:06 -0400 Date: Tue, 1 Nov 2022 10:23:06 -0400 To: elfutils-devel@sourceware.org Subject: PATCH: Bug debuginfod/29472 followup Message-ID: <20221101142306.GL16441@redhat.com> References: MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.12.0 (2019-05-25) X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-13.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_BADIPHTTP, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Frank Ch. Eigler via Elfutils-devel" From: "Frank Ch. Eigler" Reply-To: "Frank Ch. Eigler" Errors-To: elfutils-devel-bounces+patchwork=sourceware.org@sourceware.org Sender: "Elfutils-devel" Hi - On the users/fche/try-pr29472 branch, I pushed a followup to Ryan's PR29472 draft from a bunch of weeks ago. It's missing some ChangeLog's but appears otherwise complete. It's structured as Ryan's original patch plus my followup that changes things around, so as to preserve both contributions in the history. I paste the overall diff here. There will be some minor merge conflicts between this and amerey's section-extraction extensions that are also aiming for this release. I'll be glad to deconflict whichever way. diff --git a/ChangeLog b/ChangeLog index 7bbb2c0fe97e..efce07161abe 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,8 @@ +2022-10-06 Ryan Goldberg + + * configure.ac (HAVE_JSON_C): Defined iff libjson-c + is found, and debuginfod metadata querying is thus enabled. + 2022-10-20 Mark Wielaard * Makefile.am (rpm): Remove --sign. diff --git a/configure.ac b/configure.ac index 1084b4695e2c..6077d52a7daf 100644 --- a/configure.ac +++ b/configure.ac @@ -600,6 +600,11 @@ case "$ac_cv_search__obstack_free" in esac AC_SUBST([obstack_LIBS]) +AC_CHECK_LIB(json-c, json_object_array_add, [ + AC_DEFINE([HAVE_JSON_C], [1], [Define if json-c is on the machine]) + AC_SUBST(jsonc_LIBS, '-ljson-c') +]) + dnl The directories with content. dnl Documentation. diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog index 1df903fe4ac2..79f827d95217 100644 --- a/debuginfod/ChangeLog +++ b/debuginfod/ChangeLog @@ -1,3 +1,27 @@ +2022-10-06 Ryan Goldberg + + * Makefile.am (debuginfod_LDADD): Add jsonc_LIBS. + (libdebuginfod_so_LDLIBS): Likewise. + * debuginfod-find.c (main): Add command line interface for + metadata query by path. + * debuginfod.h.in: Added debuginfod_find_metadata. + * debuginfod.cxx (add_client_federation_headers): New function + created from existing code to remove code duplication. + (handle_buildid_match): Calls new add_client_federation_headers + function. + (handle_metadata): New function which queries local DB and + upstream for metadata. + (handler_cb): New accepted url type, /metadata. + * debuginfod-client.c (struct handle_data): New fields: metadata, + metadata_size, to store incoming metadata. + (metadata_callback): New function called by curl upon reciving + metedata + (init_server_urls, init_handle, perform_queries) : New functions created from + existing code within debuginfod_query_server to reduce code duplication. + (debuginfod_query_server_by_buildid): debuginfod_query_server renamed, and above + functions used in place of identical previously inline code. + (debuginfod_find_metadata): New function. + 2022-10-18 Daniel Thornburgh * debuginfod-client.c (debuginfod_query_server): Add DEBUGINFOD_HEADERS_FILE diff --git a/debuginfod/Makefile.am b/debuginfod/Makefile.am index 435cb8a6839c..3d6bc26ecc4a 100644 --- a/debuginfod/Makefile.am +++ b/debuginfod/Makefile.am @@ -70,7 +70,7 @@ bin_PROGRAMS += debuginfod-find endif debuginfod_SOURCES = debuginfod.cxx -debuginfod_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS) -lpthread -ldl +debuginfod_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS) $(jsonc_LIBS) $(libcurl_LIBS) -lpthread -ldl debuginfod_find_SOURCES = debuginfod-find.c debuginfod_find_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) @@ -97,7 +97,7 @@ libdebuginfod_so_LIBS = libdebuginfod_pic.a if DUMMY_LIBDEBUGINFOD libdebuginfod_so_LDLIBS = else -libdebuginfod_so_LDLIBS = -lpthread $(libcurl_LIBS) $(fts_LIBS) +libdebuginfod_so_LDLIBS = -lpthread $(libcurl_LIBS) $(fts_LIBS) $(jsonc_LIBS) endif $(LIBDEBUGINFOD_SONAME): $(srcdir)/libdebuginfod.map $(libdebuginfod_so_LIBS) $(AM_V_CCLD)$(LINK) $(dso_LDFLAGS) -o $@ \ diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c index 716cb7695617..4a75eef2303a 100644 --- a/debuginfod/debuginfod-client.c +++ b/debuginfod/debuginfod-client.c @@ -56,6 +56,8 @@ int debuginfod_find_executable (debuginfod_client *c, const unsigned char *b, int s, char **p) { return -ENOSYS; } int debuginfod_find_source (debuginfod_client *c, const unsigned char *b, int s, const char *f, char **p) { return -ENOSYS; } +int debuginfod_find_metadata (debuginfod_client *c, + const char *k, const char* v, char** m) { return -ENOSYS; } void debuginfod_set_progressfn(debuginfod_client *c, debuginfod_progressfn_t fn) { } void debuginfod_set_verbose_fd(debuginfod_client *c, int fd) { } @@ -103,6 +105,10 @@ void debuginfod_end (debuginfod_client *c) { } #include +#ifdef HAVE_JSON_C + #include +#endif + static pthread_once_t init_control = PTHREAD_ONCE_INIT; static void @@ -201,6 +207,9 @@ struct handle_data /* Response http headers for this client handle, sent from the server */ char *response_data; size_t response_data_size; + /* Response metadata values for this client handle, sent from the server */ + char *metadata; + size_t metadata_size; }; static size_t @@ -555,18 +564,9 @@ header_callback (char * buffer, size_t size, size_t numitems, void * userdata) } /* Temporary buffer for realloc */ char *temp = NULL; - if (data->response_data == NULL) - { - temp = malloc(numitems); - if (temp == NULL) - return 0; - } - else - { - temp = realloc(data->response_data, data->response_data_size + numitems); - if (temp == NULL) - return 0; - } + temp = realloc(data->response_data, data->response_data_size + numitems); + if (temp == NULL) + return 0; memcpy(temp + data->response_data_size, buffer, numitems-1); data->response_data = temp; @@ -576,13 +576,345 @@ header_callback (char * buffer, size_t size, size_t numitems, void * userdata) return numitems; } +#ifdef HAVE_JSON_C +static size_t +metadata_callback (char * buffer, size_t size, size_t numitems, void * userdata) +{ + if (size != 1) + return 0; + /* Temporary buffer for realloc */ + char *temp = NULL; + struct handle_data *data = (struct handle_data *) userdata; + temp = realloc(data->metadata, data->metadata_size + numitems + 1); + if (temp == NULL) + return 0; + + memcpy(temp + data->metadata_size, buffer, numitems); + data->metadata = temp; + data->metadata_size += numitems; + data->metadata[data->metadata_size] = '\0'; + return numitems; +} +#endif + + +/* This function takes a copy of DEBUGINFOD_URLS, server_urls, and seperates it into an + * array of urls to query. The url_subdir is either 'buildid' or 'metadata', corresponding + * to the query type. Returns 0 on success and -Posix error on faliure. + */ +int +init_server_urls(char* url_subdir, char *server_urls, char ***server_url_list, int *num_urls, int vfd) +{ + /* Initialize the memory to zero */ + char *strtok_saveptr; + char *server_url = strtok_r(server_urls, url_delim, &strtok_saveptr); + /* Count number of URLs. */ + int n = 0; + assert(0 == strcmp(url_subdir, "buildid") || 0 == strcmp(url_subdir, "metadata")); + + /* PR 27983: If the url is already set to be used use, skip it */ + while (server_url != NULL) + { + int r; + char *tmp_url; + if (strlen(server_url) > 1 && server_url[strlen(server_url)-1] == '/') + r = asprintf(&tmp_url, "%s%s", server_url, url_subdir); + else + r = asprintf(&tmp_url, "%s/%s", server_url, url_subdir); + + if (r == -1) + { + return -ENOMEM; + } + int url_index; + for (url_index = 0; url_index < n; ++url_index) + { + if(strcmp(tmp_url, (*server_url_list)[url_index]) == 0) + { + url_index = -1; + break; + } + } + if (url_index == -1) + { + if (vfd >= 0) + dprintf(vfd, "duplicate url: %s, skipping\n", tmp_url); + free(tmp_url); + } + else + { + n++; + char ** realloc_ptr; + realloc_ptr = reallocarray(*server_url_list, n, + sizeof(char*)); + if (realloc_ptr == NULL) + { + free (tmp_url); + return -ENOMEM; + } + *server_url_list = realloc_ptr; + (*server_url_list)[n-1] = tmp_url; + } + server_url = strtok_r(NULL, url_delim, &strtok_saveptr); + } + *num_urls = n; + return 0; +} + +/* Some boilerplate for checking curl_easy_setopt. */ +#define curl_easy_setopt_ck(H,O,P) do { \ + CURLcode curl_res = curl_easy_setopt (H,O,P); \ + if (curl_res != CURLE_OK) \ + { \ + if (vfd >= 0) \ + dprintf (vfd, \ + "Bad curl_easy_setopt: %s\n", \ + curl_easy_strerror(curl_res)); \ + return -EINVAL; \ + } \ + } while (0) + + +/* + * This function initializes a CURL handle. It takes optional callbacks for the write + * function and the header function, which if defined will use userdata of type struct handle_data*. + * Specifically the data[i] within an array of struct handle_data's. + * Returns 0 on success and -Posix error on faliure. + */ +int +init_handle(debuginfod_client *client, + size_t (*w_callback)(char *buffer,size_t size,size_t nitems,void *userdata), + size_t (*h_callback)(char *buffer,size_t size,size_t nitems,void *userdata), + struct handle_data *data, int i, long timeout, + int vfd) +{ + data->handle = curl_easy_init(); + if (data->handle == NULL) + { + return -ENETUNREACH; + } + + if (vfd >= 0) + dprintf (vfd, "url %d %s\n", i, data->url); + + /* Only allow http:// + https:// + file:// so we aren't being + redirected to some unsupported protocol. */ + curl_easy_setopt_ck(data->handle, CURLOPT_PROTOCOLS, + (CURLPROTO_HTTP | CURLPROTO_HTTPS | CURLPROTO_FILE)); + curl_easy_setopt_ck(data->handle, CURLOPT_URL, data->url); + if (vfd >= 0) + curl_easy_setopt_ck(data->handle, CURLOPT_ERRORBUFFER, + data->errbuf); + if(w_callback) { + curl_easy_setopt_ck(data->handle, + CURLOPT_WRITEFUNCTION, w_callback); + curl_easy_setopt_ck(data->handle, CURLOPT_WRITEDATA, data); + } + if (timeout > 0) + { + /* Make sure there is at least some progress, + try to get at least 100K per timeout seconds. */ + curl_easy_setopt_ck (data->handle, CURLOPT_LOW_SPEED_TIME, + timeout); + curl_easy_setopt_ck (data->handle, CURLOPT_LOW_SPEED_LIMIT, + 100 * 1024L); + } + data->response_data = NULL; + data->response_data_size = 0; + curl_easy_setopt_ck(data->handle, CURLOPT_FILETIME, (long) 1); + curl_easy_setopt_ck(data->handle, CURLOPT_FOLLOWLOCATION, (long) 1); + curl_easy_setopt_ck(data->handle, CURLOPT_FAILONERROR, (long) 1); + curl_easy_setopt_ck(data->handle, CURLOPT_NOSIGNAL, (long) 1); + if(h_callback){ + curl_easy_setopt_ck(data->handle, + CURLOPT_HEADERFUNCTION, h_callback); + curl_easy_setopt_ck(data->handle, CURLOPT_HEADERDATA, data); + } + #if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */ + curl_easy_setopt_ck(data->handle, CURLOPT_PATH_AS_IS, (long) 1); + #else + /* On old curl; no big deal, canonicalization here is almost the + same, except perhaps for ? # type decorations at the tail. */ + #endif + curl_easy_setopt_ck(data->handle, CURLOPT_AUTOREFERER, (long) 1); + curl_easy_setopt_ck(data->handle, CURLOPT_ACCEPT_ENCODING, ""); + curl_easy_setopt_ck(data->handle, CURLOPT_HTTPHEADER, client->headers); + + return 0; +} + + +/* + * This function busy-waits on one or more curl queries to complete. This can + * be controled via only_one, which, if true, will find the first winner and exit + * once found. If positive maxtime and maxsize dictate the maximum allowed wait times + * and download sizes respectivly. Returns 0 on success and -Posix error on faliure. + */ +int +perform_queries(CURLM *curlm, CURL **target_handle, struct handle_data *data, debuginfod_client *c, + int num_urls, long maxtime, long maxsize, bool only_one, int vfd) +{ + int still_running = -1; + long loops = 0; + int committed_to = -1; + bool verbose_reported = false; + struct timespec start_time, cur_time; + if (c->winning_headers != NULL) + { + free (c->winning_headers); + c->winning_headers = NULL; + } + if ( maxtime > 0 && clock_gettime(CLOCK_MONOTONIC_RAW, &start_time) == -1) + { + return errno; + } + long delta = 0; + do + { + /* Check to see how long querying is taking. */ + if (maxtime > 0) + { + if (clock_gettime(CLOCK_MONOTONIC_RAW, &cur_time) == -1) + { + return errno; + } + delta = cur_time.tv_sec - start_time.tv_sec; + if ( delta > maxtime) + { + dprintf(vfd, "Timeout with max time=%lds and transfer time=%lds\n", maxtime, delta ); + return -ETIME; + } + } + /* Wait 1 second, the minimum DEBUGINFOD_TIMEOUT. */ + curl_multi_wait(curlm, NULL, 0, 1000, NULL); + CURLMcode curlm_res = curl_multi_perform(curlm, &still_running); + + if(only_one){ + /* If the target file has been found, abort the other queries. */ + if (target_handle && *target_handle != NULL) + { + for (int i = 0; i < num_urls; i++) + if (data[i].handle != *target_handle) + curl_multi_remove_handle(curlm, data[i].handle); + else + { + committed_to = i; + if (c->winning_headers == NULL) + { + c->winning_headers = data[committed_to].response_data; + if (vfd >= 0 && c->winning_headers != NULL) + dprintf(vfd, "\n%s", c->winning_headers); + data[committed_to].response_data = NULL; + data[committed_to].response_data_size = 0; + } + } + } + + if (vfd >= 0 && !verbose_reported && committed_to >= 0) + { + bool pnl = (c->default_progressfn_printed_p && vfd == STDERR_FILENO); + dprintf (vfd, "%scommitted to url %d\n", pnl ? "\n" : "", + committed_to); + if (pnl) + c->default_progressfn_printed_p = 0; + verbose_reported = true; + } + } + + if (curlm_res != CURLM_OK) + { + switch (curlm_res) + { + case CURLM_CALL_MULTI_PERFORM: continue; + case CURLM_OUT_OF_MEMORY: return -ENOMEM; + default: return -ENETUNREACH; + } + } + + long dl_size = 0; + if(only_one && target_handle){ // Only bother with progress functions if we're retrieving exactly 1 file + if (*target_handle && (c->progressfn || maxsize > 0)) + { + /* Get size of file being downloaded. NB: If going through + deflate-compressing proxies, this number is likely to be + unavailable, so -1 may show. */ + CURLcode curl_res; +#ifdef CURLINFO_CONTENT_LENGTH_DOWNLOAD_T + curl_off_t cl; + curl_res = curl_easy_getinfo(*target_handle, + CURLINFO_CONTENT_LENGTH_DOWNLOAD_T, + &cl); + if (curl_res == CURLE_OK && cl >= 0) + dl_size = (cl > LONG_MAX ? LONG_MAX : (long)cl); +#else + double cl; + curl_res = curl_easy_getinfo(*target_handle, + CURLINFO_CONTENT_LENGTH_DOWNLOAD, + &cl); + if (curl_res == CURLE_OK) + dl_size = (cl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)cl); +#endif + /* If Content-Length is -1, try to get the size from + X-Debuginfod-Size */ + if (dl_size == -1 && c->winning_headers != NULL) + { + long xdl; + char *hdr = strcasestr(c->winning_headers, "x-debuginfod-size"); + + if (hdr != NULL + && sscanf(hdr, "x-debuginfod-size: %ld", &xdl) == 1) + dl_size = xdl; + } + } + + if (c->progressfn) /* inform/check progress callback */ + { + loops ++; + long pa = loops; /* default param for progress callback */ + if (*target_handle) /* we've committed to a server; report its download progress */ + { + CURLcode curl_res; +#ifdef CURLINFO_SIZE_DOWNLOAD_T + curl_off_t dl; + curl_res = curl_easy_getinfo(*target_handle, + CURLINFO_SIZE_DOWNLOAD_T, + &dl); + if (curl_res == 0 && dl >= 0) + pa = (dl > LONG_MAX ? LONG_MAX : (long)dl); +#else + double dl; + curl_res = curl_easy_getinfo(*target_handle, + CURLINFO_SIZE_DOWNLOAD, + &dl); + if (curl_res == 0) + pa = (dl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)dl); +#endif + + } + + if ((*c->progressfn) (c, pa, dl_size)) + break; + } + } + /* Check to see if we are downloading something which exceeds maxsize, if set.*/ + if (target_handle && *target_handle && dl_size > maxsize && maxsize > 0) + { + if (vfd >=0) + dprintf(vfd, "Content-Length too large.\n"); + return -EFBIG; + } + } while (still_running); + return 0; +} + + /* Query each of the server URLs found in $DEBUGINFOD_URLS for the file with the specified build-id, type (debuginfo, executable or source) and filename. filename may be NULL. If found, return a file descriptor for the target, otherwise return an error code. */ static int -debuginfod_query_server (debuginfod_client *c, +debuginfod_query_server_by_buildid (debuginfod_client *c, const unsigned char *build_id, int build_id_len, const char *type, @@ -601,7 +933,7 @@ debuginfod_query_server (debuginfod_client *c, char suffix[PATH_MAX + 1]; /* +1 for zero terminator. */ char build_id_bytes[MAX_BUILD_ID_BYTES * 2 + 1]; int vfd = c->verbose_fd; - int rc; + int rc, r; if (vfd >= 0) { @@ -915,60 +1247,14 @@ debuginfod_query_server (debuginfod_client *c, goto out0; } - /* Initialize the memory to zero */ - char *strtok_saveptr; char **server_url_list = NULL; - char *server_url = strtok_r(server_urls, url_delim, &strtok_saveptr); - /* Count number of URLs. */ - int num_urls = 0; - - while (server_url != NULL) - { - /* PR 27983: If the url is already set to be used use, skip it */ - char *slashbuildid; - if (strlen(server_url) > 1 && server_url[strlen(server_url)-1] == '/') - slashbuildid = "buildid"; - else - slashbuildid = "/buildid"; - - char *tmp_url; - if (asprintf(&tmp_url, "%s%s", server_url, slashbuildid) == -1) - { - rc = -ENOMEM; - goto out1; - } - int url_index; - for (url_index = 0; url_index < num_urls; ++url_index) - { - if(strcmp(tmp_url, server_url_list[url_index]) == 0) - { - url_index = -1; - break; - } - } - if (url_index == -1) - { - if (vfd >= 0) - dprintf(vfd, "duplicate url: %s, skipping\n", tmp_url); - free(tmp_url); - } - else - { - num_urls++; - char ** realloc_ptr; - realloc_ptr = reallocarray(server_url_list, num_urls, - sizeof(char*)); - if (realloc_ptr == NULL) - { - free (tmp_url); - rc = -ENOMEM; - goto out1; - } - server_url_list = realloc_ptr; - server_url_list[num_urls-1] = tmp_url; - } - server_url = strtok_r(NULL, url_delim, &strtok_saveptr); - } + char *server_url; + int num_urls; + r = init_server_urls("buildid", server_urls, &server_url_list, &num_urls, vfd); + if(0 != r){ + rc = r; + goto out1; + } int retry_limit = default_retry_limit; const char* retry_limit_envvar = getenv(DEBUGINFOD_RETRY_LIMIT_ENV_VAR); @@ -1038,13 +1324,6 @@ debuginfod_query_server (debuginfod_client *c, data[i].fd = fd; data[i].target_handle = &target_handle; - data[i].handle = curl_easy_init(); - if (data[i].handle == NULL) - { - if (filename) curl_free (escaped_string); - rc = -ENETUNREACH; - goto out2; - } data[i].client = c; if (filename) /* must start with / */ @@ -1055,220 +1334,29 @@ debuginfod_query_server (debuginfod_client *c, } else snprintf(data[i].url, PATH_MAX, "%s/%s/%s", server_url, build_id_bytes, type); - if (vfd >= 0) - dprintf (vfd, "url %d %s\n", i, data[i].url); - - /* Some boilerplate for checking curl_easy_setopt. */ -#define curl_easy_setopt_ck(H,O,P) do { \ - CURLcode curl_res = curl_easy_setopt (H,O,P); \ - if (curl_res != CURLE_OK) \ - { \ - if (vfd >= 0) \ - dprintf (vfd, \ - "Bad curl_easy_setopt: %s\n", \ - curl_easy_strerror(curl_res)); \ - rc = -EINVAL; \ - goto out2; \ - } \ - } while (0) - /* Only allow http:// + https:// + file:// so we aren't being - redirected to some unsupported protocol. */ - curl_easy_setopt_ck(data[i].handle, CURLOPT_PROTOCOLS, - (CURLPROTO_HTTP | CURLPROTO_HTTPS | CURLPROTO_FILE)); - curl_easy_setopt_ck(data[i].handle, CURLOPT_URL, data[i].url); - if (vfd >= 0) - curl_easy_setopt_ck(data[i].handle, CURLOPT_ERRORBUFFER, - data[i].errbuf); - curl_easy_setopt_ck(data[i].handle, - CURLOPT_WRITEFUNCTION, - debuginfod_write_callback); - curl_easy_setopt_ck(data[i].handle, CURLOPT_WRITEDATA, (void*)&data[i]); - if (timeout > 0) - { - /* Make sure there is at least some progress, - try to get at least 100K per timeout seconds. */ - curl_easy_setopt_ck (data[i].handle, CURLOPT_LOW_SPEED_TIME, - timeout); - curl_easy_setopt_ck (data[i].handle, CURLOPT_LOW_SPEED_LIMIT, - 100 * 1024L); - } - data[i].response_data = NULL; - data[i].response_data_size = 0; - curl_easy_setopt_ck(data[i].handle, CURLOPT_FILETIME, (long) 1); - curl_easy_setopt_ck(data[i].handle, CURLOPT_FOLLOWLOCATION, (long) 1); - curl_easy_setopt_ck(data[i].handle, CURLOPT_FAILONERROR, (long) 1); - curl_easy_setopt_ck(data[i].handle, CURLOPT_NOSIGNAL, (long) 1); - curl_easy_setopt_ck(data[i].handle, CURLOPT_HEADERFUNCTION, - header_callback); - curl_easy_setopt_ck(data[i].handle, CURLOPT_HEADERDATA, - (void *) &(data[i])); -#if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */ - curl_easy_setopt_ck(data[i].handle, CURLOPT_PATH_AS_IS, (long) 1); -#else - /* On old curl; no big deal, canonicalization here is almost the - same, except perhaps for ? # type decorations at the tail. */ -#endif - curl_easy_setopt_ck(data[i].handle, CURLOPT_AUTOREFERER, (long) 1); - curl_easy_setopt_ck(data[i].handle, CURLOPT_ACCEPT_ENCODING, ""); - curl_easy_setopt_ck(data[i].handle, CURLOPT_HTTPHEADER, c->headers); + r = init_handle(c, debuginfod_write_callback, header_callback, &data[i], i, timeout, vfd); + if(0 != r){ + rc = r; + if(filename) curl_free (escaped_string); + goto out2; + } curl_multi_add_handle(curlm, data[i].handle); } if (filename) curl_free(escaped_string); + /* Query servers in parallel. */ if (vfd >= 0) dprintf (vfd, "query %d urls in parallel\n", num_urls); - int still_running; - long loops = 0; - int committed_to = -1; - bool verbose_reported = false; - struct timespec start_time, cur_time; - free (c->winning_headers); - c->winning_headers = NULL; - if ( maxtime > 0 && clock_gettime(CLOCK_MONOTONIC_RAW, &start_time) == -1) + r = perform_queries(curlm, &target_handle,data,c, num_urls, maxtime, maxsize, true, vfd); + if (0 != r) { - rc = -errno; + rc = r; goto out2; } - long delta = 0; - do - { - /* Check to see how long querying is taking. */ - if (maxtime > 0) - { - if (clock_gettime(CLOCK_MONOTONIC_RAW, &cur_time) == -1) - { - rc = -errno; - goto out2; - } - delta = cur_time.tv_sec - start_time.tv_sec; - if ( delta > maxtime) - { - dprintf(vfd, "Timeout with max time=%lds and transfer time=%lds\n", maxtime, delta ); - rc = -ETIME; - goto out2; - } - } - /* Wait 1 second, the minimum DEBUGINFOD_TIMEOUT. */ - curl_multi_wait(curlm, NULL, 0, 1000, NULL); - CURLMcode curlm_res = curl_multi_perform(curlm, &still_running); - - /* If the target file has been found, abort the other queries. */ - if (target_handle != NULL) - { - for (int i = 0; i < num_urls; i++) - if (data[i].handle != target_handle) - curl_multi_remove_handle(curlm, data[i].handle); - else - { - committed_to = i; - if (c->winning_headers == NULL) - { - c->winning_headers = data[committed_to].response_data; - data[committed_to].response_data = NULL; - data[committed_to].response_data_size = 0; - } - - } - } - - if (vfd >= 0 && !verbose_reported && committed_to >= 0) - { - bool pnl = (c->default_progressfn_printed_p && vfd == STDERR_FILENO); - dprintf (vfd, "%scommitted to url %d\n", pnl ? "\n" : "", - committed_to); - if (pnl) - c->default_progressfn_printed_p = 0; - verbose_reported = true; - } - - if (curlm_res != CURLM_OK) - { - switch (curlm_res) - { - case CURLM_CALL_MULTI_PERFORM: continue; - case CURLM_OUT_OF_MEMORY: rc = -ENOMEM; break; - default: rc = -ENETUNREACH; break; - } - goto out2; - } - - long dl_size = 0; - if (target_handle && (c->progressfn || maxsize > 0)) - { - /* Get size of file being downloaded. NB: If going through - deflate-compressing proxies, this number is likely to be - unavailable, so -1 may show. */ - CURLcode curl_res; -#ifdef CURLINFO_CONTENT_LENGTH_DOWNLOAD_T - curl_off_t cl; - curl_res = curl_easy_getinfo(target_handle, - CURLINFO_CONTENT_LENGTH_DOWNLOAD_T, - &cl); - if (curl_res == CURLE_OK && cl >= 0) - dl_size = (cl > LONG_MAX ? LONG_MAX : (long)cl); -#else - double cl; - curl_res = curl_easy_getinfo(target_handle, - CURLINFO_CONTENT_LENGTH_DOWNLOAD, - &cl); - if (curl_res == CURLE_OK) - dl_size = (cl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)cl); -#endif - /* If Content-Length is -1, try to get the size from - X-Debuginfod-Size */ - if (dl_size == -1 && c->winning_headers != NULL) - { - long xdl; - char *hdr = strcasestr(c->winning_headers, "x-debuginfod-size"); - - if (hdr != NULL - && sscanf(hdr, "x-debuginfod-size: %ld", &xdl) == 1) - dl_size = xdl; - } - } - - if (c->progressfn) /* inform/check progress callback */ - { - loops ++; - long pa = loops; /* default param for progress callback */ - if (target_handle) /* we've committed to a server; report its download progress */ - { - CURLcode curl_res; -#ifdef CURLINFO_SIZE_DOWNLOAD_T - curl_off_t dl; - curl_res = curl_easy_getinfo(target_handle, - CURLINFO_SIZE_DOWNLOAD_T, - &dl); - if (curl_res == 0 && dl >= 0) - pa = (dl > LONG_MAX ? LONG_MAX : (long)dl); -#else - double dl; - curl_res = curl_easy_getinfo(target_handle, - CURLINFO_SIZE_DOWNLOAD, - &dl); - if (curl_res == 0) - pa = (dl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)dl); -#endif - - } - - if ((*c->progressfn) (c, pa, dl_size)) - break; - } - - /* Check to see if we are downloading something which exceeds maxsize, if set.*/ - if (target_handle && dl_size > maxsize && maxsize > 0) - { - if (vfd >=0) - dprintf(vfd, "Content-Length too large.\n"); - rc = -EFBIG; - goto out2; - } - } while (still_running); /* Check whether a query was successful. If so, assign its handle to verified_handle. */ @@ -1625,7 +1713,7 @@ debuginfod_find_debuginfo (debuginfod_client *client, const unsigned char *build_id, int build_id_len, char **path) { - return debuginfod_query_server(client, build_id, build_id_len, + return debuginfod_query_server_by_buildid(client, build_id, build_id_len, "debuginfo", NULL, path); } @@ -1636,7 +1724,7 @@ debuginfod_find_executable(debuginfod_client *client, const unsigned char *build_id, int build_id_len, char **path) { - return debuginfod_query_server(client, build_id, build_id_len, + return debuginfod_query_server_by_buildid(client, build_id, build_id_len, "executable", NULL, path); } @@ -1645,11 +1733,222 @@ int debuginfod_find_source(debuginfod_client *client, const unsigned char *build_id, int build_id_len, const char *filename, char **path) { - return debuginfod_query_server(client, build_id, build_id_len, + return debuginfod_query_server_by_buildid(client, build_id, build_id_len, "source", filename, path); } +int debuginfod_find_metadata (debuginfod_client *client, + const char* key, const char* value, char** metadata) +{ + (void) client; + (void) key; + (void) value; + + if (NULL == metadata) return -EINVAL; +#ifdef HAVE_JSON_C + char *server_urls; + char *urls_envvar; + json_object *json_metadata = json_object_new_array(); + int rc = 0, r; + int vfd = client->verbose_fd; + + if(NULL == json_metadata){ + rc = -ENOMEM; + goto out; + } + + if(NULL == value || NULL == key){ + rc = -EINVAL; + goto out; + } + + if (vfd >= 0) + dprintf (vfd, "debuginfod_find_metadata %s %s\n", key, value); + + /* Without query-able URL, we can stop here*/ + urls_envvar = getenv(DEBUGINFOD_URLS_ENV_VAR); + if (vfd >= 0) + dprintf (vfd, "server urls \"%s\"\n", + urls_envvar != NULL ? urls_envvar : ""); + if (urls_envvar == NULL || urls_envvar[0] == '\0') + { + rc = -ENOSYS; + goto out; + } + + /* Clear the client of previous urls*/ + free (client->url); + client->url = NULL; + + long maxtime = 0; + const char *maxtime_envvar; + maxtime_envvar = getenv(DEBUGINFOD_MAXTIME_ENV_VAR); + if (maxtime_envvar != NULL) + maxtime = atol (maxtime_envvar); + if (maxtime && vfd >= 0) + dprintf(vfd, "using max time %lds\n", maxtime); + + long timeout = default_timeout; + const char* timeout_envvar = getenv(DEBUGINFOD_TIMEOUT_ENV_VAR); + if (timeout_envvar != NULL) + timeout = atoi (timeout_envvar); + if (vfd >= 0) + dprintf (vfd, "using timeout %ld\n", timeout); + + add_default_headers(client); + + /* make a copy of the envvar so it can be safely modified. */ + server_urls = strdup(urls_envvar); + if (server_urls == NULL) + { + rc = -ENOMEM; + goto out; + } + /* thereafter, goto out1 on error*/ + + char **server_url_list = NULL; + char *server_url; + int num_urls; + r = init_server_urls("metadata", server_urls, &server_url_list, &num_urls, vfd); + if(0 != r){ + rc = r; + goto out1; + } + + CURLM *curlm = client->server_mhandle; + assert (curlm != NULL); + + CURL *target_handle = NULL; + struct handle_data *data = malloc(sizeof(struct handle_data) * num_urls); + if (data == NULL) + { + rc = -ENOMEM; + goto out1; + } + + /* thereafter, goto out2 on error. */ + + + /* Initialize handle_data */ + for (int i = 0; i < num_urls; i++) + { + if ((server_url = server_url_list[i]) == NULL) + break; + if (vfd >= 0) + dprintf (vfd, "init server %d %s\n", i, server_url); + + data[i].errbuf[0] = '\0'; + data[i].target_handle = &target_handle; + data[i].client = client; + data[i].metadata = NULL; + data[i].metadata_size = 0; + + // libcurl > 7.62ish has curl_url_set()/etc. to construct these things more properly. + // curl_easy_escape() is older + CURL *c = curl_easy_init(); + if (c) { + char *key_escaped = curl_easy_escape(c, key, 0); + char *value_escaped = curl_easy_escape(c, value, 0); + snprintf(data[i].url, PATH_MAX, "%s?key=%s&value=%s", server_url, + // fallback to unescaped values in unlikely case of error + key_escaped ?: key, value_escaped ?: value); + curl_free(value_escaped); + curl_free(key_escaped); + curl_easy_cleanup(c); + } + + r = init_handle(client, metadata_callback, header_callback, &data[i], i, timeout, vfd); + if(0 != r){ + rc = r; + goto out2; + } + curl_multi_add_handle(curlm, data[i].handle); + } + + /* Query servers */ + if (vfd >= 0) + dprintf (vfd, "Starting %d queries\n",num_urls); + r = perform_queries(curlm, NULL, data, client, num_urls, maxtime, 0, false, vfd); + if(0 != r){ + rc = r; + goto out2; + } + + /* NOTE: We don't check the return codes of the curl messages since + a metadata query failing silently is just fine. We want to know what's + available from servers which can be connected with no issues. + If running with additional verbosity, the failure will be noted in stderr */ + + /* Building the new json array from all the upstream data + and cleanup while at it + */ + for (int i = 0; i < num_urls; i++) + { + curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */ + if(NULL == data[i].metadata) + { + if (vfd >= 0) + dprintf (vfd, "Query to %s failed with error message:\n\t\"%s\"\n", + data[i].url, data[i].errbuf); + continue; + } + json_object *upstream_metadata = json_tokener_parse(data[i].metadata); + if(NULL == upstream_metadata) continue; + // Combine the upstream metadata into the json array + for (int j = 0, n = json_object_array_length(upstream_metadata); j < n; j++) { + json_object *entry = json_object_array_get_idx(upstream_metadata, j); + json_object_get(entry); // increment reference count + json_object_array_add(json_metadata, entry); + } + json_object_put(upstream_metadata); + + curl_easy_cleanup (data[i].handle); + free (data[i].response_data); + free (data[i].metadata); + } + + *metadata = strdup(json_object_to_json_string_ext(json_metadata, JSON_C_TO_STRING_PRETTY)); + + free (data); + goto out1; + +/* error exits */ +out2: + /* remove all handles from multi */ + for (int i = 0; i < num_urls; i++) + { + if (data[i].handle != NULL) + { + curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */ + curl_easy_cleanup (data[i].handle); + free (data[i].response_data); + free (data[i].metadata); + } + } + free(data); + +out1: + for (int i = 0; i < num_urls; ++i) + free(server_url_list[i]); + free(server_url_list); + free (server_urls); + +/* general purpose exit */ +out: + json_object_put(json_metadata); + /* Reset sent headers */ + curl_slist_free_all (client->headers); + client->headers = NULL; + client->user_agent_set_p = 0; + + return rc; + +#else /* ! HAVE_JSON_C */ + return -ENOSYS; +#endif +} + /* Add an outgoing HTTP header. */ int debuginfod_add_http_header (debuginfod_client *client, const char* header) { diff --git a/debuginfod/debuginfod-find.c b/debuginfod/debuginfod-find.c index 778fb09b0890..4136c5679c23 100644 --- a/debuginfod/debuginfod-find.c +++ b/debuginfod/debuginfod-find.c @@ -31,6 +31,9 @@ #include #include +#ifdef HAVE_JSON_C + #include +#endif /* Name and version of program. */ ARGP_PROGRAM_VERSION_HOOK_DEF = print_version; @@ -48,7 +51,11 @@ static const char args_doc[] = N_("debuginfo BUILDID\n" "executable BUILDID\n" "executable PATH\n" "source BUILDID /FILENAME\n" - "source PATH /FILENAME\n"); + "source PATH /FILENAME\n" +#ifdef HAVE_JSON_C + "metadata KEY VALUE" +#endif + ); /* Definitions of arguments for argp functions. */ @@ -140,6 +147,30 @@ main(int argc, char** argv) return 1; } +#ifdef HAVE_JSON_C + if(strcmp(argv[remaining], "metadata") == 0){ + if (remaining+2 == argc) + { + fprintf(stderr, "Require KEY and VALUE for \"metadata\"\n"); + return 1; + } + + char* metadata; + int rc = debuginfod_find_metadata (client, argv[remaining+1], argv[remaining+2], + &metadata); + + if (rc < 0) + { + fprintf(stderr, "Server query failed: %s\n", strerror(-rc)); + return 1; + } + // Output the metadata to stdout + printf("%s\n", metadata); + free(metadata); + return 0; + } +#endif + /* If we were passed an ELF file name in the BUILDID slot, look in there. */ unsigned char* build_id = (unsigned char*) argv[remaining+1]; int build_id_len = 0; /* assume text */ diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index 9dc4836bbe12..105c39087cfc 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -115,6 +115,9 @@ using namespace std; #define tid() pthread_self() #endif +#ifdef HAVE_JSON_C + #include +#endif inline bool string_endswith(const string& haystack, const string& needle) @@ -173,7 +176,7 @@ static const char DEBUGINFOD_SQLITE_DDL[] = " foreign key (buildid) references " BUILDIDS "_buildids(id) on update cascade on delete cascade,\n" " primary key (buildid, file, mtime)\n" " ) " WITHOUT_ROWID ";\n" - // Index for faster delete by file identifier + // Index for faster delete by file identifier and metadata searches "create index if not exists " BUILDIDS "_f_de_idx on " BUILDIDS "_f_de (file, mtime);\n" "create table if not exists " BUILDIDS "_f_s (\n" " buildid integer not null,\n" @@ -199,6 +202,8 @@ static const char DEBUGINFOD_SQLITE_DDL[] = " ) " WITHOUT_ROWID ";\n" // Index for faster delete by archive file identifier "create index if not exists " BUILDIDS "_r_de_idx on " BUILDIDS "_r_de (file, mtime);\n" + // Index for metadata searches + "create index if not exists " BUILDIDS "_r_de_idx2 on " BUILDIDS "_r_de (content);\n" "create table if not exists " BUILDIDS "_r_sref (\n" // outgoing dwarf sourcefile references from rpm " buildid integer not null,\n" " artifactsrc integer not null,\n" @@ -386,6 +391,9 @@ static const struct argp_option options[] = { "passive", ARGP_KEY_PASSIVE, NULL, 0, "Do not scan or groom, read-only database.", 0 }, #define ARGP_KEY_DISABLE_SOURCE_SCAN 0x1009 { "disable-source-scan", ARGP_KEY_DISABLE_SOURCE_SCAN, NULL, 0, "Do not scan dwarf source info.", 0 }, +#define ARGP_KEY_METADATA_MAXTIME 0x100A + { "metadata-maxtime", ARGP_KEY_METADATA_MAXTIME, "SECONDS", 0, + "Number of seconds to limit metadata query run time, 0=unlimited.", 0 }, { NULL, 0, NULL, 0, NULL, 0 }, }; @@ -438,6 +446,8 @@ static unsigned forwarded_ttl_limit = 8; static bool scan_source_info = true; static string tmpdir; static bool passive_p = false; +static unsigned metadata_maxtime_s = 5; + static void set_metric(const string& key, double value); // static void inc_metric(const string& key); @@ -639,6 +649,9 @@ parse_opt (int key, char *arg, case ARGP_KEY_DISABLE_SOURCE_SCAN: scan_source_info = false; break; + case ARGP_KEY_METADATA_MAXTIME: + metadata_maxtime_s = (unsigned) atoi(arg); + break; // case 'h': argp_state_help (state, stderr, ARGP_HELP_LONG|ARGP_HELP_EXIT_OK); default: return ARGP_ERR_UNKNOWN; } @@ -1824,6 +1837,58 @@ handle_buildid_r_match (bool internal_req_p, return r; } +void +add_client_federation_headers(debuginfod_client *client, MHD_Connection* conn){ + // Transcribe incoming User-Agent: + string ua = MHD_lookup_connection_value (conn, MHD_HEADER_KIND, "User-Agent") ?: ""; + string ua_complete = string("User-Agent: ") + ua; + debuginfod_add_http_header (client, ua_complete.c_str()); + + // Compute larger XFF:, for avoiding info loss during + // federation, and for future cyclicity detection. + string xff = MHD_lookup_connection_value (conn, MHD_HEADER_KIND, "X-Forwarded-For") ?: ""; + if (xff != "") + xff += string(", "); // comma separated list + + unsigned int xff_count = 0; + for (auto&& i : xff){ + if (i == ',') xff_count++; + } + + // if X-Forwarded-For: exceeds N hops, + // do not delegate a local lookup miss to upstream debuginfods. + if (xff_count >= forwarded_ttl_limit) + throw reportable_exception(MHD_HTTP_NOT_FOUND, "not found, --forwared-ttl-limit reached \ +and will not query the upstream servers"); + + // Compute the client's numeric IP address only - so can't merge with conninfo() + const union MHD_ConnectionInfo *u = MHD_get_connection_info (conn, + MHD_CONNECTION_INFO_CLIENT_ADDRESS); + struct sockaddr *so = u ? u->client_addr : 0; + char hostname[256] = ""; // RFC1035 + if (so && so->sa_family == AF_INET) { + (void) getnameinfo (so, sizeof (struct sockaddr_in), hostname, sizeof (hostname), NULL, 0, + NI_NUMERICHOST); + } else if (so && so->sa_family == AF_INET6) { + struct sockaddr_in6* addr6 = (struct sockaddr_in6*) so; + if (IN6_IS_ADDR_V4MAPPED(&addr6->sin6_addr)) { + struct sockaddr_in addr4; + memset (&addr4, 0, sizeof(addr4)); + addr4.sin_family = AF_INET; + addr4.sin_port = addr6->sin6_port; + memcpy (&addr4.sin_addr.s_addr, addr6->sin6_addr.s6_addr+12, sizeof(addr4.sin_addr.s_addr)); + (void) getnameinfo ((struct sockaddr*) &addr4, sizeof (addr4), + hostname, sizeof (hostname), NULL, 0, + NI_NUMERICHOST); + } else { + (void) getnameinfo (so, sizeof (struct sockaddr_in6), hostname, sizeof (hostname), NULL, 0, + NI_NUMERICHOST); + } + } + + string xff_complete = string("X-Forwarded-For: ")+xff+string(hostname); + debuginfod_add_http_header (client, xff_complete.c_str()); +} static struct MHD_Response* handle_buildid_match (bool internal_req_p, @@ -2010,57 +2075,7 @@ handle_buildid (MHD_Connection* conn, debuginfod_set_progressfn (client, & debuginfod_find_progress); if (conn) - { - // Transcribe incoming User-Agent: - string ua = MHD_lookup_connection_value (conn, MHD_HEADER_KIND, "User-Agent") ?: ""; - string ua_complete = string("User-Agent: ") + ua; - debuginfod_add_http_header (client, ua_complete.c_str()); - - // Compute larger XFF:, for avoiding info loss during - // federation, and for future cyclicity detection. - string xff = MHD_lookup_connection_value (conn, MHD_HEADER_KIND, "X-Forwarded-For") ?: ""; - if (xff != "") - xff += string(", "); // comma separated list - - unsigned int xff_count = 0; - for (auto&& i : xff){ - if (i == ',') xff_count++; - } - - // if X-Forwarded-For: exceeds N hops, - // do not delegate a local lookup miss to upstream debuginfods. - if (xff_count >= forwarded_ttl_limit) - throw reportable_exception(MHD_HTTP_NOT_FOUND, "not found, --forwared-ttl-limit reached \ -and will not query the upstream servers"); - - // Compute the client's numeric IP address only - so can't merge with conninfo() - const union MHD_ConnectionInfo *u = MHD_get_connection_info (conn, - MHD_CONNECTION_INFO_CLIENT_ADDRESS); - struct sockaddr *so = u ? u->client_addr : 0; - char hostname[256] = ""; // RFC1035 - if (so && so->sa_family == AF_INET) { - (void) getnameinfo (so, sizeof (struct sockaddr_in), hostname, sizeof (hostname), NULL, 0, - NI_NUMERICHOST); - } else if (so && so->sa_family == AF_INET6) { - struct sockaddr_in6* addr6 = (struct sockaddr_in6*) so; - if (IN6_IS_ADDR_V4MAPPED(&addr6->sin6_addr)) { - struct sockaddr_in addr4; - memset (&addr4, 0, sizeof(addr4)); - addr4.sin_family = AF_INET; - addr4.sin_port = addr6->sin6_port; - memcpy (&addr4.sin_addr.s_addr, addr6->sin6_addr.s6_addr+12, sizeof(addr4.sin_addr.s_addr)); - (void) getnameinfo ((struct sockaddr*) &addr4, sizeof (addr4), - hostname, sizeof (hostname), NULL, 0, - NI_NUMERICHOST); - } else { - (void) getnameinfo (so, sizeof (struct sockaddr_in6), hostname, sizeof (hostname), NULL, 0, - NI_NUMERICHOST); - } - } - - string xff_complete = string("X-Forwarded-For: ")+xff+string(hostname); - debuginfod_add_http_header (client, xff_complete.c_str()); - } + add_client_federation_headers(client, conn); if (artifacttype == "debuginfo") fd = debuginfod_find_debuginfo (client, @@ -2272,6 +2287,140 @@ handle_metrics (off_t* size) return r; } + +#ifdef HAVE_JSON_C +static struct MHD_Response* +handle_metadata (MHD_Connection* conn, + string key, string value, off_t* size) +{ + MHD_Response* r; + sqlite3 *thisdb = dbq; + + // Query locally for matching e, d and s files + + string op; + if (key == "glob") + op = "glob"; + else if (key == "file") + op = "="; + else + throw reportable_exception("/metadata webapi error, unsupported key"); + + string sql = string( + // explicit query r_de and f_de once here, rather than the query_d and query_e + // separately, because they scan the same tables, so we'd double the work + "select d1.executable_p, d1.debuginfo_p, 0 as source_p, b1.hex, f1.name as file " + "from " BUILDIDS "_r_de d1, " BUILDIDS "_files f1, " BUILDIDS "_buildids b1 " + "where f1.id = d1.content and d1.buildid = b1.id and f1.name " + op + " ? " + "union all \n" + "select d2.executable_p, d2.debuginfo_p, 0, b2.hex, f2.name " + "from " BUILDIDS "_f_de d2, " BUILDIDS "_files f2, " BUILDIDS "_buildids b2 " + "where f2.id = d2.file and d2.buildid = b2.id and f2.name " + op + " ? " + "union all \n" + // delegate to query_s for this one + "select 0, 0, 1, q.buildid, q.artifactsrc " + "from " BUILDIDS "_query_s as q " + "where q.artifactsrc " + op + " ? "); + + sqlite_ps *pp = new sqlite_ps (thisdb, "mhd-query-meta-glob", sql); + pp->reset(); + pp->bind(1, value); + pp->bind(2, value); + pp->bind(3, value); + unique_ptr ps_closer(pp); // release pp if exception or return + + json_object *metadata = json_object_new_array(); + if (!metadata) + throw libc_exception(ENOMEM, "json allocation"); + + // consume all the rows + struct timespec ts_start; + clock_gettime (CLOCK_MONOTONIC, &ts_start); + + int rc; + while (SQLITE_DONE != (rc = pp->step())) + { + // break out of loop if we have searched too long + struct timespec ts_end; + clock_gettime (CLOCK_MONOTONIC, &ts_end); + double deltas = (ts_end.tv_sec - ts_start.tv_sec) + (ts_end.tv_nsec - ts_start.tv_nsec)/1.e9; + if (metadata_maxtime_s > 0 && deltas > metadata_maxtime_s) + break; // NB: no particular signal is given to the client about incompleteness + + if (rc != SQLITE_ROW) throw sqlite_exception(rc, "step"); + + int m_executable_p = sqlite3_column_int (*pp, 0); + int m_debuginfo_p = sqlite3_column_int (*pp, 1); + int m_source_p = sqlite3_column_int (*pp, 2); + string m_buildid = (const char*) sqlite3_column_text (*pp, 3) ?: ""; // should always be non-null + string m_file = (const char*) sqlite3_column_text (*pp, 4) ?: ""; + + auto add_metadata = [metadata, m_buildid, m_file](const string& type) { + json_object* entry = json_object_new_object(); + if (NULL == entry) throw libc_exception (ENOMEM, "cannot allocate json"); + defer_dtor entry_d(entry, json_object_put); + + auto add_entry_metadata = [entry](const char* k, string v) { + json_object* s; + if(v != "") { + s = json_object_new_string(v.c_str()); + if (NULL == s) throw libc_exception (ENOMEM, "cannot allocate json"); + json_object_object_add(entry, k, s); + } + }; + + add_entry_metadata("type", type.c_str()); + add_entry_metadata("buildid", m_buildid); + add_entry_metadata("file", m_file); + json_object_array_add(metadata, json_object_get(entry)); // Increase ref count to switch its ownership + }; + + if (m_executable_p) add_metadata("executable"); + if (m_debuginfo_p) add_metadata("debuginfo"); + if (m_source_p) add_metadata("source"); + + } + pp->reset(); + + // Query upstream as well + debuginfod_client *client = debuginfod_pool_begin(); + if (metadata && client != NULL) + { + add_client_federation_headers(client, conn); + + char * upstream_metadata; + if (0 == debuginfod_find_metadata(client, key.c_str(), value.c_str(), &upstream_metadata)) { + json_object *upstream_metadata_json = json_tokener_parse(upstream_metadata); + if(NULL != upstream_metadata_json) + { + for (int i = 0, n = json_object_array_length(upstream_metadata_json); i < n; i++) { + json_object *entry = json_object_array_get_idx(upstream_metadata_json, i); + json_object_get(entry); // increment reference count + json_object_array_add(metadata, entry); + } + json_object_put(upstream_metadata_json); + } + free(upstream_metadata); + } + debuginfod_pool_end (client); + } + + const char* metadata_str = (metadata != NULL) ? + json_object_to_json_string(metadata) : "[ ]" ; + if (! metadata_str) + throw libc_exception (ENOMEM, "cannot allocate json"); + r = MHD_create_response_from_buffer (strlen(metadata_str), + (void*) metadata_str, + MHD_RESPMEM_MUST_COPY); + *size = strlen(metadata_str); + json_object_put(metadata); + if (r) + add_mhd_response_header(r, "Content-Type", "application/json"); + return r; +} +#endif + + static struct MHD_Response* handle_root (off_t* size) { @@ -2406,6 +2555,20 @@ handler_cb (void * /*cls*/, inc_metric("http_requests_total", "type", artifacttype); r = handle_metrics(& http_size); } +#ifdef HAVE_JSON_C + else if (url1 == "/metadata") + { + tmp_inc_metric m ("thread_busy", "role", "http-metadata"); + const char* key = MHD_lookup_connection_value(connection, MHD_GET_ARGUMENT_KIND, "key"); + const char* value = MHD_lookup_connection_value(connection, MHD_GET_ARGUMENT_KIND, "value"); + if (NULL == value || NULL == key) + throw reportable_exception("/metadata webapi error, need key and value"); + + artifacttype = "metadata"; + inc_metric("http_requests_total", "type", artifacttype); + r = handle_metadata(connection, key, value, &http_size); + } +#endif else if (url1 == "/") { artifacttype = "/"; @@ -3693,12 +3856,13 @@ void groom() if (interrupted) return; // NB: "vacuum" is too heavy for even daily runs: it rewrites the entire db, so is done as maxigroom -G - sqlite_ps g1 (db, "incremental vacuum", "pragma incremental_vacuum"); - g1.reset().step_ok_done(); - sqlite_ps g2 (db, "optimize", "pragma optimize"); - g2.reset().step_ok_done(); - sqlite_ps g3 (db, "wal checkpoint", "pragma wal_checkpoint=truncate"); - g3.reset().step_ok_done(); + { sqlite_ps g (db, "incremental vacuum", "pragma incremental_vacuum"); g.reset().step_ok_done(); } + // https://www.sqlite.org/lang_analyze.html#approx + { sqlite_ps g (db, "analyze setup", "pragma analysis_limit = 1000;\n"); g.reset().step_ok_done(); } + { sqlite_ps g (db, "analyze", "analyze"); g.reset().step_ok_done(); } + { sqlite_ps g (db, "analyze reload", "analyze sqlite_schema"); g.reset().step_ok_done(); } + { sqlite_ps g (db, "optimize", "pragma optimize"); g.reset().step_ok_done(); } + { sqlite_ps g (db, "wal checkpoint", "pragma wal_checkpoint=truncate"); g.reset().step_ok_done(); } database_stats_report(); diff --git a/debuginfod/debuginfod.h.in b/debuginfod/debuginfod.h.in index 7d8e4972b185..4aa38abb5731 100644 --- a/debuginfod/debuginfod.h.in +++ b/debuginfod/debuginfod.h.in @@ -79,6 +79,16 @@ int debuginfod_find_source (debuginfod_client *client, const char *filename, char **path); +/* Query the urls contained in $DEBUGINFOD_URLS for metadata + with given query key/value. + + If successful, return 0, otherwise return a posix error code. + If successful, set *metadata to a malloc'd json array + with each entry being a json object of metadata for 1 file. + Caller must free() it later. metadata MUST be non-NULL. */ +int debuginfod_find_metadata (debuginfod_client *client, + const char *key, const char* value, char** metadata); + typedef int (*debuginfod_progressfn_t)(debuginfod_client *c, long a, long b); void debuginfod_set_progressfn(debuginfod_client *c, debuginfod_progressfn_t fn); diff --git a/debuginfod/libdebuginfod.map b/debuginfod/libdebuginfod.map index 93964167836f..6e4fe4b5bcba 100644 --- a/debuginfod/libdebuginfod.map +++ b/debuginfod/libdebuginfod.map @@ -20,4 +20,5 @@ ELFUTILS_0.183 { } ELFUTILS_0.179; ELFUTILS_0.188 { debuginfod_get_headers; + debuginfod_find_metadata; } ELFUTILS_0.183; diff --git a/doc/ChangeLog b/doc/ChangeLog index 269ed06e567e..7f852824cdc9 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,10 @@ +2022-10-06 Ryan Goldberg + + * debuginfod-find.1: Document metadata query commandline API. + * debuginfod_find_debuginfo.1: Document metadata queryC API. + * debuginfod_find_metadata.3: New file. + * Makefile.am (notrans_dist_*_man3): Add it. + 2022-10-28 Arsen Arsenović * readelf.1: Document the --syms alias. diff --git a/doc/Makefile.am b/doc/Makefile.am index db2506fd3d49..64ffdaa2c3b5 100644 --- a/doc/Makefile.am +++ b/doc/Makefile.am @@ -38,6 +38,7 @@ notrans_dist_man3_MANS += debuginfod_end.3 notrans_dist_man3_MANS += debuginfod_find_debuginfo.3 notrans_dist_man3_MANS += debuginfod_find_executable.3 notrans_dist_man3_MANS += debuginfod_find_source.3 +notrans_dist_man3_MANS += debuginfod_find_metadata.3 notrans_dist_man3_MANS += debuginfod_get_user_data.3 notrans_dist_man3_MANS += debuginfod_get_url.3 notrans_dist_man3_MANS += debuginfod_set_progressfn.3 diff --git a/doc/debuginfod-find.1 b/doc/debuginfod-find.1 index 957ec7e716f9..c44fbc763650 100644 --- a/doc/debuginfod-find.1 +++ b/doc/debuginfod-find.1 @@ -29,6 +29,8 @@ debuginfod-find \- request debuginfo-related data .B debuginfod-find [\fIOPTION\fP]... source \fIBUILDID\fP \fI/FILENAME\fP .br .B debuginfod-find [\fIOPTION\fP]... source \fIPATH\fP \fI/FILENAME\fP +.br +.B debuginfod-find [\fIOPTION\fP]... metadata \fIKEY\fP \fIVALUE\fP .SH DESCRIPTION \fBdebuginfod-find\fP queries one or more \fBdebuginfod\fP servers for @@ -106,6 +108,35 @@ l l. \../bar/foo.c AT_comp_dir=/zoo/ source BUILDID /zoo//../bar/foo.c .TE +.SS metadata \fIKEY\fP \fIVALUE\fP + +All designated debuginfod servers are queried for metadata about files +in their index. Different search keys may be supported by different +servers. + +.TS +l l l . +KEY VALUE DESCRIPTION + +\fBfile\fP path match exact \fIpath\fP, including in archives +\fBglob\fP pattern sqlite glob match \fIpattern\fP, including in archives +.TE + +The results of the search are output to \fBstdout\fP as a JSON array +of objects, supplying metadata about each match. This metadata report +may or may not be cached. It may be incomplete and may contain +duplicates. For each match, the result is a JSON object with these +fields. Additional fields may be present. + +.TS +l l l . +NAME TYPE DESCRIPTION + +\fBbuildid\fP string hexadecimal buildid associated with the file +\fBtype\fP string one of \fBdebuginfo\fP or \fBexecutable\fP or \fBsource\fP +\fBfile\fP string matched file name, outside or inside the archive +.TE + .SH "OPTIONS" .TP diff --git a/doc/debuginfod.8 b/doc/debuginfod.8 index 7c1dc3dd6a68..9f529d0a4042 100644 --- a/doc/debuginfod.8 +++ b/doc/debuginfod.8 @@ -133,6 +133,14 @@ scanner/groomer server and multiple passive ones, thereby sharing service load. Archive pattern options must still be given, so debuginfod can recognize file name extensions for unpacking. +.TP +.B "\-\-metadata\-maxtime=SECONDS" +Impose a limit on the runtime of metadata webapi queries. These +queries, especially broad "glob" wildcards, can take a large amount of +time and produce large results. Public-facing servers may need to +throttle them. The default limit is 5 seconds. Set 0 to disable this +limit. + .TP .B "\-D SQL" "\-\-ddl=SQL" Execute given sqlite statement after the database is opened and @@ -371,6 +379,16 @@ The exact set of metrics and their meanings may change in future versions. Caution: configuration information (path names, versions) may be disclosed. +.SS /metadata?key=\fIKEY\fP&value=\fIVALUE\fP + +This endpoint triggers a search of the files in the index plus any +upstream federated servers, based on given key and value. If +successful, the result is a application/json textual array, listing +metadata for the matched files. See \fIdebuginfod-find(1)\fP for +documentation of the common key/value search parameters, and the +resulting data schema. + + .SH DATA MANAGEMENT debuginfod stores its index in an sqlite database in a densely packed diff --git a/doc/debuginfod_find_debuginfo.3 b/doc/debuginfod_find_debuginfo.3 index 3dd832400ec6..f131813ecefc 100644 --- a/doc/debuginfod_find_debuginfo.3 +++ b/doc/debuginfod_find_debuginfo.3 @@ -43,6 +43,10 @@ LOOKUP FUNCTIONS .BI " int " build_id_len "," .BI " const char *" filename "," .BI " char ** " path ");" +.BI "int debuginfod_find_metadata(debuginfod_client *" client "," +.BI " const char *" key "," +.BI " const char *" value "," +.BI " char** " metadata ");" OPTIONAL FUNCTIONS @@ -109,6 +113,14 @@ A \fBclient\fP handle should be used from only one thread at a time. A handle may be reused for a series of lookups, which can improve performance due to retention of connections and caches. +.BR debuginfod_find_metadata (), +likewise queries all debuginfod server URLs contained in +.BR $DEBUGINFOD_URLS +but instead retrieves metadata. The query search mode is specified +in the \fIkey\fP parameter, and its parameter \fIvalue\fP. See +\fIdebuginfod-find(1)\fP for more information on the available +options for query key/value. + .SH "RETURN VALUE" \fBdebuginfod_begin\fP returns the \fBdebuginfod_client\fP handle to @@ -120,6 +132,13 @@ to the client cache and a file descriptor to that file is returned. The caller needs to \fBclose\fP() this descriptor. Otherwise, a negative error code is returned. +The one exception is \fBdebuginfod_find_metadata\fP, which likewise +returns negative error codes, but on success returns 0 and sets +\fI*metadata\fP to a string-form JSON array of the found matching +metadata. This should be freed by the caller. See +\fIdebuginfod-find(1)\fP for more information on the metadata being +returned. + .SH "OPTIONAL FUNCTIONS" A small number of optional functions are available to tune or query diff --git a/doc/debuginfod_find_metadata.3 b/doc/debuginfod_find_metadata.3 new file mode 100644 index 000000000000..16279936e2ea --- /dev/null +++ b/doc/debuginfod_find_metadata.3 @@ -0,0 +1 @@ +.so man3/debuginfod_find_debuginfo.3 diff --git a/tests/ChangeLog b/tests/ChangeLog index a240a70506b1..79aae9208319 100644 --- a/tests/ChangeLog +++ b/tests/ChangeLog @@ -1,3 +1,9 @@ +2022-10-06 Ryan Goldberg + + * run-debuginfod-find-metadata.sh: New test. + * Makefile.am (TESTS): Add run-debuginfod-find-metadata.sh. + (EXTRA_DIST): Likewise. + 2022-09-20 Yonggang Luo * Makefile.am (EXTRA_DIST): Remove debuginfod-rpms/hello2.spec. diff --git a/tests/Makefile.am b/tests/Makefile.am index ced4a8266236..aaa5d35a769c 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -247,7 +247,8 @@ TESTS += run-debuginfod-dlopen.sh \ run-debuginfod-x-forwarded-for.sh \ run-debuginfod-response-headers.sh \ run-debuginfod-extraction-passive.sh \ - run-debuginfod-webapi-concurrency.sh + run-debuginfod-webapi-concurrency.sh \ + run-debuginfod-find-metadata.sh endif if !OLD_LIBMICROHTTPD # Will crash on too old libmicrohttpd @@ -554,6 +555,7 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \ run-debuginfod-response-headers.sh \ run-debuginfod-extraction-passive.sh \ run-debuginfod-webapi-concurrency.sh \ + run-debuginfod-find-metadata.sh \ debuginfod-rpms/fedora30/hello2-1.0-2.src.rpm \ debuginfod-rpms/fedora30/hello2-1.0-2.x86_64.rpm \ debuginfod-rpms/fedora30/hello2-debuginfo-1.0-2.x86_64.rpm \ diff --git a/tests/run-debuginfod-find-metadata.sh b/tests/run-debuginfod-find-metadata.sh new file mode 100755 index 000000000000..2e1999f56d91 --- /dev/null +++ b/tests/run-debuginfod-find-metadata.sh @@ -0,0 +1,89 @@ +#!/usr/bin/env bash +# +# Copyright (C) 2022 Red Hat, Inc. +# This file is part of elfutils. +# +# This file is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# elfutils is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +. $srcdir/debuginfod-subr.sh + +# for test case debugging, uncomment: +set -x +unset VALGRIND_CMD + +type curl 2>/dev/null || { echo "need curl"; exit 77; } +type jq 2>/dev/null || { echo "need jq"; exit 77; } + +pkg-config json-c libcurl || { echo "one or more libraries are missing (libjson-c, libcurl)"; exit 77; } + +DB=${PWD}/.debuginfod_tmp.sqlite +export DEBUGINFOD_CACHE_PATH=${PWD}/.client_cache +tempfiles $DB ${DB}_2 + +# This variable is essential and ensures no time-race for claiming ports occurs +# set base to a unique multiple of 100 not used in any other 'run-debuginfod-*' test +base=13100 +get_ports +mkdir R D +cp -rvp ${abs_srcdir}/debuginfod-rpms/rhel7 R +cp -rvp ${abs_srcdir}/debuginfod-debs/*deb D + +env LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS= ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -R \ + -d $DB -p $PORT1 -t0 -g0 R > vlog$PORT1 2>&1 & +PID1=$! +tempfiles vlog$PORT1 +errfiles vlog$PORT1 + +wait_ready $PORT1 'ready' 1 +wait_ready $PORT1 'thread_work_total{role="traverse"}' 1 +wait_ready $PORT1 'thread_work_pending{role="scan"}' 0 +wait_ready $PORT1 'thread_busy{role="scan"}' 0 + +env LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS="http://127.0.0.1:$PORT1 https://bad/url.web" ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -U \ + -d ${DB}_2 -p $PORT2 -t0 -g0 D > vlog$PORT2 2>&1 & +PID2=$! +tempfiles vlog$PORT2 +errfiles vlog$PORT2 + +wait_ready $PORT2 'ready' 1 +wait_ready $PORT2 'thread_work_total{role="traverse"}' 1 +wait_ready $PORT2 'thread_work_pending{role="scan"}' 0 +wait_ready $PORT2 'thread_busy{role="scan"}' 0 + +# have clients contact the new server +export DEBUGINFOD_URLS=http://127.0.0.1:$PORT2 + +tempfiles json.txt +# Check that we find 11 files(which means that the local and upstream correctly reply to the query) +N_FOUND=`env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod-find metadata glob "/?sr*" | jq '. | length'` +test $N_FOUND -eq 11 + +# Query via the webapi as well +EXPECTED='[ { "type": "executable", "buildid": "f17a29b5a25bd4960531d82aa6b07c8abe84fa66", "file": "/usr/bin/hithere"} ]' +curl http://127.0.0.1:$PORT2'/metadata?key=glob&value=/usr/bin/*hi*' +test `curl http://127.0.0.1:$PORT2'/metadata?key=glob&value=/usr/bin/*hi*' | jq ". == $EXPECTED" ` = 'true' + +# An empty array is returned on server error or if the file DNE +test `env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod-find metadata file "/this/isnt/there" | jq ". == [ ]" ` = 'true' + +kill $PID1 +kill $PID2 +wait $PID1 +wait $PID2 +PID1=0 +PID2=0 + +test `env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod-find metadata file "/usr/bin/hithere" | jq ". == [ ]" ` = 'true' + +exit 0