PR27531: retry within default retry_limit will be supported. In debuginfod-client.c (debuginfod_query_server), insert a goto statement for jumping back to the beginning of curl handles set up if query fails and a non ENOENT error is returned.
Commit Message
Also introduced DEBUGINFOD_RETRY_LIMIT_ENV_VAR and default
DEBUGINFOD_RETRY_LIMIT(which is 2).
Correponding test has been added to tests/run-debuginfod-find.sh
Signed-off-by: Alice Zhang <alizhang@redhat.com>
---
config/ChangeLog | 4 ++++
config/debuginfod.sysconfig | 1 +
debuginfod/ChangeLog | 7 +++++++
debuginfod/debuginfod-client.c | 33 +++++++++++++++++++++++++++++++--
debuginfod/debuginfod.h.in | 1 +
tests/ChangeLog | 5 +++++
tests/run-debuginfod-find.sh | 11 +++++++++++
7 files changed, 60 insertions(+), 2 deletions(-)
Comments
Hi Alice,
On Tue, 2021-07-06 at 16:15 -0400, Alice Zhang via Elfutils-devel
wrote:
> Also introduced DEBUGINFOD_RETRY_LIMIT_ENV_VAR and default
> DEBUGINFOD_RETRY_LIMIT(which is 2).
>
> Correponding test has been added to tests/run-debuginfod-find.sh
>
> Signed-off-by: Alice Zhang <alizhang@redhat.com>
Nice. But try to split up the first paragraph. e.g.
---
PR27531: retry within default retry_limit will be supported.
In debuginfod-client.c (debuginfod_query_server), insert a
goto statement for jumping back to the beginning of curl
handles set up if query fails and a non ENOENT error is
returned.
Also introduced DEBUGINFOD_RETRY_LIMIT_ENV_VAR and default
DEBUGINFOD_RETRY_LIMIT (which is 2).
Correponding test has been added to tests/run-debuginfod-find.sh
Signed-off-by: Alice Zhang <alizhang@redhat.com>
---
That way the first sentence (note the extra blank line) becomes the
short commit message/subject.
+ /* If the verified_handle is NULL and rc != -ENOENT, the query
> fails with
> + * an error code other than 404, then do several retry within the retry_limit.
> + * Clean up all old handles and jump back to the beginning of query_in_parallel,
> + * reinitialize handles and query again.*/
> if (verified_handle == NULL)
> - goto out1;
> + {
> + if (rc != -ENOENT && retry_limit-- > 0)
> + {
> + if (vfd >= 0)
> + dprintf (vfd, "Retry failed query, %d attempt(s) remaining\n", retry_limit);
> + /* remove all handles from multi */
> + for (int i = 0; i < num_urls; i++)
> + {
> + curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
> + curl_easy_cleanup (data[i].handle);
> + }
> + goto query_in_parallel;
> + }
> + else
> + goto out1;
> + }
You also need to call free (data) before the goto query_in_parallel
since that will also be reallocated. Or you can rearrange the code a
little around query_in_parallel to keep the data around, but only
reinitialize it, but I think it is fine to free and then malloc again.
Try to run under valgrind --full-leak-check and you'll see:
==24684== 43,840 bytes in 10 blocks are definitely lost in loss record 61 of 61
==24684== at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==24684== by 0x52EB2A7: debuginfod_query_server (debuginfod-client.c:789)
==24684== by 0x401131: main (debuginfod-find.c:192)
(P.S. Don't worry about the still reachable issues.)
diff --git a/debuginfod/debuginfod.h.in b/debuginfod/debuginfod.h.in
> index 559ea947..282e523d 100644
> --- a/debuginfod/debuginfod.h.in
> +++ b/debuginfod/debuginfod.h.in
> @@ -35,6 +35,7 @@
> #define DEBUGINFOD_TIMEOUT_ENV_VAR "DEBUGINFOD_TIMEOUT"
> #define DEBUGINFOD_PROGRESS_ENV_VAR "DEBUGINFOD_PROGRESS"
> #define DEBUGINFOD_VERBOSE_ENV_VAR "DEBUGINFOD_VERBOSE"
> +#define DEBUGINFOD_RETRY_LIMIT_ENV_VAR "DEBUGINFOD_RETRY_LIMIT"
>
> /* The libdebuginfod soname. */
> #define DEBUGINFOD_SONAME "@LIBDEBUGINFOD_SONAME@"
Could you also add an entry for DEBUGINFOD_RETRY_LIMIT to
doc/debuginfod_find_debuginfo.3 under ENVIRONMENT VARIABLES.
> +####################################################################
> ####
> +# set up tests for retrying failed queries.
> +retry_attempts=`(testrun env DEBUGINFOD_URLS=
> http://255.255.255.255/JUNKJUNK DEBUGINFOD_RETRY_LIMIT=10
> DEBUGINFOD_VERBOSE=1 \
> + ${abs_top_builddir}/debuginfod/debuginfod-find debuginfo
> /bin/ls || true) 2>&1 >/dev/null \
> + | grep -c 'Retry failed query'`
> +if [ $retry_attempts -ne 10 ]; then
> + echo "retry mechanism failed."
> + exit 1;
> + fi
> +
> exit 0
Cute testcase. This works because 255.255.255.255 will always fail.
Thanks,
Mark
Hi Alice (adding elfutils-devel back to CC),
On Thu, 2021-07-08 at 16:34 -0400, Alice Zhang wrote:
> Hi, Mark, I made modifications as you suggested. I relocated the goto
> statement and now the resources will be reused.
That rearrangement is better than what I suggested with the free and
(re)malloc. And thanks for adding the documentation. The commit message
looks good too.
> Also, I run valgrind --leak-check=full and here's the result:
> ==464842==
> ==464842== HEAP SUMMARY:
> ==464842== in use at exit: 215,989 bytes in 2,167 blocks
> ==464842== total heap usage: 3,866 allocs, 1,699 frees, 509,558 bytes
> allocated
> ==464842==
> ==464842== LEAK SUMMARY:
> ==464842== definitely lost: 0 bytes in 0 blocks
> ==464842== indirectly lost: 0 bytes in 0 blocks
> ==464842== possibly lost: 0 bytes in 0 blocks
> ==464842== still reachable: 215,989 bytes in 2,167 blocks
> ==464842== suppressed: 0 bytes in 0 blocks
> ==464842== Reachable blocks (those to which a pointer was found) are not
> shown.
> ==464842== To see them, rerun with: --leak-check=full --show-leak-kinds=all
> ==464842==
> ==464842== For lists of detected and suppressed errors, rerun with: -s
> ==464842== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Looks great. The still reachable blocks are from the libcurl global
initialization and not an issue.
Pushed your rearranged version to master.
Thanks,
Mark
@@ -1,3 +1,7 @@
+2021-07-06 Alice Zhang <alizhang@redhat.com>
+
+ * debuginfod.sysconfig: Introduce default retry limit.
+
2021-05-10 Mark Wielaard <mark@klomp.org>
* elfutils.spec.in: Update for 0.185.
@@ -11,4 +11,5 @@ DEBUGINFOD_PATHS="-t43200 -F -R /usr/lib/debug /usr/bin /usr/libexec /usr/sbin /
# upstream debuginfods
#DEBUGINFOD_URLS="http://secondhost:8002 http://thirdhost:8002"
#DEBUGINFOD_TIMEOUT="5"
+#DEBUGINFOD_RETRY_LIMIT="2"
#DEBUGINFOD_CACHE_DIR=""
@@ -1,3 +1,10 @@
+2021-07-06 Alice Zhang <alizhang@redhat.com>
+
+ PR27531
+ * debuginfod-client.c (debuginfod_query_server): Retry failed queries
+ if error code is not ENOENT.
+ * debuginfod.h.in: Introduce DEBUGINFOD_RETRY_LIMIT_ENV_VAR.
+
2021-05-14 Frank Ch. Eigler <fche@redhat.com>
PR27859
@@ -157,6 +157,8 @@ static const char url_delim_char = ' ';
/* Timeout for debuginfods, in seconds (to get at least 100K). */
static const long default_timeout = 90;
+/* Default retry count for download error. */
+static const long default_retry_limit = 2;
/* Data associated with a particular CURL easy handle. Passed to
the write callback. */
@@ -770,9 +772,17 @@ debuginfod_query_server (debuginfod_client *c,
&& (i == 0 || server_urls[i - 1] == url_delim_char))
num_urls++;
+ int retry_limit = default_retry_limit;
+ const char* retry_limit_envvar = getenv(DEBUGINFOD_RETRY_LIMIT_ENV_VAR);
+ if (retry_limit_envvar != NULL)
+ retry_limit = atoi (retry_limit_envvar);
+
+ /*The beginning of goto block query_in_parallel.*/
+ query_in_parallel:
+ rc = -ENOENT; /* Reset rc to default.*/
CURLM *curlm = c->server_mhandle;
assert (curlm != NULL);
-
+
/* Tracks which handle should write to fd. Set to the first
handle that is ready to write the target file to the cache. */
CURL *target_handle = NULL;
@@ -1074,8 +1084,27 @@ debuginfod_query_server (debuginfod_client *c,
close(efd);
}
+ /* If the verified_handle is NULL and rc != -ENOENT, the query fails with
+ * an error code other than 404, then do several retry within the retry_limit.
+ * Clean up all old handles and jump back to the beginning of query_in_parallel,
+ * reinitialize handles and query again.*/
if (verified_handle == NULL)
- goto out1;
+ {
+ if (rc != -ENOENT && retry_limit-- > 0)
+ {
+ if (vfd >= 0)
+ dprintf (vfd, "Retry failed query, %d attempt(s) remaining\n", retry_limit);
+ /* remove all handles from multi */
+ for (int i = 0; i < num_urls; i++)
+ {
+ curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
+ curl_easy_cleanup (data[i].handle);
+ }
+ goto query_in_parallel;
+ }
+ else
+ goto out1;
+ }
if (vfd >= 0)
{
@@ -35,6 +35,7 @@
#define DEBUGINFOD_TIMEOUT_ENV_VAR "DEBUGINFOD_TIMEOUT"
#define DEBUGINFOD_PROGRESS_ENV_VAR "DEBUGINFOD_PROGRESS"
#define DEBUGINFOD_VERBOSE_ENV_VAR "DEBUGINFOD_VERBOSE"
+#define DEBUGINFOD_RETRY_LIMIT_ENV_VAR "DEBUGINFOD_RETRY_LIMIT"
/* The libdebuginfod soname. */
#define DEBUGINFOD_SONAME "@LIBDEBUGINFOD_SONAME@"
@@ -1,3 +1,8 @@
+2021-07-06 Alice Zhang <alizhang@redhat.com>
+
+ PR27531
+ * run-debuginfod-find.sh: Add test case for retry mechanism.
+
2021-05-14 Frank Ch. Eigler <fche@redhat.com>
PR27859
@@ -582,6 +582,7 @@ curl -s http://127.0.0.1:$PORT2/metrics | grep 'dc_pool_op.*reuse'
########################################################################
# Corrupt the sqlite database and get debuginfod to trip across its errors
+
curl -s http://127.0.0.1:$PORT1/metrics | grep 'sqlite3.*reset'
ls -al $DB
dd if=/dev/zero of=$DB bs=1 count=1
@@ -674,4 +675,14 @@ DEBUGINFOD_URLS="file://${PWD}/mocktree/"
filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find source aaaaaaaaaabbbbbbbbbbccccccccccdddddddddd /my/path/main.c`
cmp $filename ${local_dir}/main.c
+########################################################################
+# set up tests for retrying failed queries.
+retry_attempts=`(testrun env DEBUGINFOD_URLS=http://255.255.255.255/JUNKJUNK DEBUGINFOD_RETRY_LIMIT=10 DEBUGINFOD_VERBOSE=1 \
+ ${abs_top_builddir}/debuginfod/debuginfod-find debuginfo /bin/ls || true) 2>&1 >/dev/null \
+ | grep -c 'Retry failed query'`
+if [ $retry_attempts -ne 10 ]; then
+ echo "retry mechanism failed."
+ exit 1;
+ fi
+
exit 0