diff mbox series

PR27531: retry within default retry_limit will be supported. In debuginfod-client.c (debuginfod_query_server), insert a goto statement for jumping back to the beginning of curl handles set up if query fails and a non ENOENT error is returned.

Message ID 20210706201502.336113-1-alizhang@redhat.com
State Committed
Headers show
Series PR27531: retry within default retry_limit will be supported. In debuginfod-client.c (debuginfod_query_server), insert a goto statement for jumping back to the beginning of curl handles set up if query fails and a non ENOENT error is returned. | expand

Commit Message

Alice Zhang July 6, 2021, 8:15 p.m. UTC
Also introduced DEBUGINFOD_RETRY_LIMIT_ENV_VAR and default
DEBUGINFOD_RETRY_LIMIT(which is 2).

Correponding test has been added to tests/run-debuginfod-find.sh

Signed-off-by: Alice Zhang <alizhang@redhat.com>
---
 config/ChangeLog               |  4 ++++
 config/debuginfod.sysconfig    |  1 +
 debuginfod/ChangeLog           |  7 +++++++
 debuginfod/debuginfod-client.c | 33 +++++++++++++++++++++++++++++++--
 debuginfod/debuginfod.h.in     |  1 +
 tests/ChangeLog                |  5 +++++
 tests/run-debuginfod-find.sh   | 11 +++++++++++
 7 files changed, 60 insertions(+), 2 deletions(-)

Comments

Mark Wielaard July 8, 2021, 2:43 p.m. UTC | #1
Hi Alice,

On Tue, 2021-07-06 at 16:15 -0400, Alice Zhang via Elfutils-devel
wrote:
> Also introduced DEBUGINFOD_RETRY_LIMIT_ENV_VAR and default
> DEBUGINFOD_RETRY_LIMIT(which is 2).
> 
> Correponding test has been added to tests/run-debuginfod-find.sh
> 
> Signed-off-by: Alice Zhang <alizhang@redhat.com>

Nice. But try to split up the first paragraph. e.g.

---
PR27531: retry within default retry_limit will be supported.

In debuginfod-client.c (debuginfod_query_server), insert a
goto statement for jumping back to the beginning of curl
handles set up if query fails and a non ENOENT error is
returned.

Also introduced DEBUGINFOD_RETRY_LIMIT_ENV_VAR and default
DEBUGINFOD_RETRY_LIMIT (which is 2).

Correponding test has been added to tests/run-debuginfod-find.sh
 
Signed-off-by: Alice Zhang <alizhang@redhat.com>
---

That way the first sentence (note the extra blank line) becomes the
short commit message/subject.


+  /* If the verified_handle is NULL and rc != -ENOENT, the query
> fails with
> +   * an error code other than 404, then do several retry within the retry_limit.
> +   * Clean up all old handles and jump back to the beginning of query_in_parallel,
> +   * reinitialize handles and query again.*/
>    if (verified_handle == NULL)
> -    goto out1;
> +    {
> +      if (rc != -ENOENT && retry_limit-- > 0)
> +        {
> +	  if (vfd >= 0)
> +            dprintf (vfd, "Retry failed query, %d attempt(s) remaining\n", retry_limit);
> +	  /* remove all handles from multi */
> +          for (int i = 0; i < num_urls; i++)
> +            {
> +              curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
> +              curl_easy_cleanup (data[i].handle);
> +            }
> +	    goto query_in_parallel;
> +	}
> +      else
> +	goto out1;
> +    }

You also need to call free (data) before the goto query_in_parallel
since that will also be reallocated. Or you can rearrange the code a
little around query_in_parallel to keep the data around, but only
reinitialize it, but I think it is fine to free and then malloc again.

Try to run under valgrind --full-leak-check and you'll see:

==24684== 43,840 bytes in 10 blocks are definitely lost in loss record 61 of 61
==24684==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==24684==    by 0x52EB2A7: debuginfod_query_server (debuginfod-client.c:789)
==24684==    by 0x401131: main (debuginfod-find.c:192)

(P.S. Don't worry about the still reachable issues.)

diff --git a/debuginfod/debuginfod.h.in b/debuginfod/debuginfod.h.in
> index 559ea947..282e523d 100644
> --- a/debuginfod/debuginfod.h.in
> +++ b/debuginfod/debuginfod.h.in
> @@ -35,6 +35,7 @@
>  #define DEBUGINFOD_TIMEOUT_ENV_VAR "DEBUGINFOD_TIMEOUT"
>  #define DEBUGINFOD_PROGRESS_ENV_VAR "DEBUGINFOD_PROGRESS"
>  #define DEBUGINFOD_VERBOSE_ENV_VAR "DEBUGINFOD_VERBOSE"
> +#define DEBUGINFOD_RETRY_LIMIT_ENV_VAR "DEBUGINFOD_RETRY_LIMIT"
>  
>  /* The libdebuginfod soname.  */
>  #define DEBUGINFOD_SONAME "@LIBDEBUGINFOD_SONAME@"

Could you also add an entry for DEBUGINFOD_RETRY_LIMIT to
doc/debuginfod_find_debuginfo.3 under ENVIRONMENT VARIABLES.

> +####################################################################
> ####
> +# set up tests for retrying failed queries.
> +retry_attempts=`(testrun env DEBUGINFOD_URLS=
> http://255.255.255.255/JUNKJUNK DEBUGINFOD_RETRY_LIMIT=10
> DEBUGINFOD_VERBOSE=1 \
> +	${abs_top_builddir}/debuginfod/debuginfod-find debuginfo
> /bin/ls || true) 2>&1 >/dev/null \
> +	| grep -c 'Retry failed query'`
> +if [ $retry_attempts -ne 10 ]; then
> +    echo "retry mechanism failed."
> +    exit 1;
> +  fi
> +
>  exit 0

Cute testcase. This works because 255.255.255.255 will always fail.

Thanks,

Mark
Mark Wielaard July 9, 2021, 10:27 a.m. UTC | #2
Hi Alice (adding elfutils-devel back to CC),

On Thu, 2021-07-08 at 16:34 -0400, Alice Zhang wrote:
> Hi, Mark, I made modifications as you suggested. I relocated the goto
> statement and now the resources will be reused.

That rearrangement is better than what I suggested with the free and
(re)malloc. And thanks for adding the documentation. The commit message
looks good too.

> Also, I run valgrind --leak-check=full and here's the result:
> ==464842==
> ==464842== HEAP SUMMARY:
> ==464842==     in use at exit: 215,989 bytes in 2,167 blocks
> ==464842==   total heap usage: 3,866 allocs, 1,699 frees, 509,558 bytes
> allocated
> ==464842==
> ==464842== LEAK SUMMARY:
> ==464842==    definitely lost: 0 bytes in 0 blocks
> ==464842==    indirectly lost: 0 bytes in 0 blocks
> ==464842==      possibly lost: 0 bytes in 0 blocks
> ==464842==    still reachable: 215,989 bytes in 2,167 blocks
> ==464842==         suppressed: 0 bytes in 0 blocks
> ==464842== Reachable blocks (those to which a pointer was found) are not
> shown.
> ==464842== To see them, rerun with: --leak-check=full --show-leak-kinds=all
> ==464842==
> ==464842== For lists of detected and suppressed errors, rerun with: -s
> ==464842== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Looks great. The still reachable blocks are from the libcurl global
initialization and not an issue.

Pushed your rearranged version to master.

Thanks,

Mark
diff mbox series

Patch

diff --git a/config/ChangeLog b/config/ChangeLog
index 2cdcfa72..70a1e923 100644
--- a/config/ChangeLog
+++ b/config/ChangeLog
@@ -1,3 +1,7 @@ 
+2021-07-06  Alice Zhang  <alizhang@redhat.com>
+
+        * debuginfod.sysconfig: Introduce default retry limit.
+
 2021-05-10  Mark Wielaard  <mark@klomp.org>
 
 	* elfutils.spec.in: Update for 0.185.
diff --git a/config/debuginfod.sysconfig b/config/debuginfod.sysconfig
index 44603874..890a1a25 100644
--- a/config/debuginfod.sysconfig
+++ b/config/debuginfod.sysconfig
@@ -11,4 +11,5 @@  DEBUGINFOD_PATHS="-t43200 -F -R /usr/lib/debug /usr/bin /usr/libexec /usr/sbin /
 # upstream debuginfods
 #DEBUGINFOD_URLS="http://secondhost:8002 http://thirdhost:8002"
 #DEBUGINFOD_TIMEOUT="5"
+#DEBUGINFOD_RETRY_LIMIT="2"
 #DEBUGINFOD_CACHE_DIR=""
diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index 21407dc2..f0673410 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -1,3 +1,10 @@ 
+2021-07-06  Alice Zhang  <alizhang@redhat.com>
+
+        PR27531
+	* debuginfod-client.c (debuginfod_query_server): Retry failed queries
+	if error code is not ENOENT.
+	* debuginfod.h.in: Introduce DEBUGINFOD_RETRY_LIMIT_ENV_VAR.
+
 2021-05-14  Frank Ch. Eigler <fche@redhat.com>
 
 	PR27859
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index ee7eda24..f8ad8ac4 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -157,6 +157,8 @@  static const char url_delim_char = ' ';
 /* Timeout for debuginfods, in seconds (to get at least 100K). */
 static const long default_timeout = 90;
 
+/* Default retry count for download error. */
+static const long default_retry_limit = 2;
 
 /* Data associated with a particular CURL easy handle. Passed to
    the write callback.  */
@@ -770,9 +772,17 @@  debuginfod_query_server (debuginfod_client *c,
         && (i == 0 || server_urls[i - 1] == url_delim_char))
       num_urls++;
   
+  int retry_limit = default_retry_limit;
+  const char* retry_limit_envvar = getenv(DEBUGINFOD_RETRY_LIMIT_ENV_VAR);
+  if (retry_limit_envvar != NULL)
+    retry_limit = atoi (retry_limit_envvar);
+
+ /*The beginning of goto block query_in_parallel.*/
+ query_in_parallel:
+  rc = -ENOENT; /* Reset rc to default.*/
   CURLM *curlm = c->server_mhandle;
   assert (curlm != NULL);
-  
+
   /* Tracks which handle should write to fd. Set to the first
      handle that is ready to write the target file to the cache.  */
   CURL *target_handle = NULL;
@@ -1074,8 +1084,27 @@  debuginfod_query_server (debuginfod_client *c,
         close(efd);
     }
 
+  /* If the verified_handle is NULL and rc != -ENOENT, the query fails with
+   * an error code other than 404, then do several retry within the retry_limit.
+   * Clean up all old handles and jump back to the beginning of query_in_parallel,
+   * reinitialize handles and query again.*/
   if (verified_handle == NULL)
-    goto out1;
+    {
+      if (rc != -ENOENT && retry_limit-- > 0)
+        {
+	  if (vfd >= 0)
+            dprintf (vfd, "Retry failed query, %d attempt(s) remaining\n", retry_limit);
+	  /* remove all handles from multi */
+          for (int i = 0; i < num_urls; i++)
+            {
+              curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
+              curl_easy_cleanup (data[i].handle);
+            }
+	    goto query_in_parallel;
+	}
+      else
+	goto out1;
+    }
 
   if (vfd >= 0)
     {
diff --git a/debuginfod/debuginfod.h.in b/debuginfod/debuginfod.h.in
index 559ea947..282e523d 100644
--- a/debuginfod/debuginfod.h.in
+++ b/debuginfod/debuginfod.h.in
@@ -35,6 +35,7 @@ 
 #define DEBUGINFOD_TIMEOUT_ENV_VAR "DEBUGINFOD_TIMEOUT"
 #define DEBUGINFOD_PROGRESS_ENV_VAR "DEBUGINFOD_PROGRESS"
 #define DEBUGINFOD_VERBOSE_ENV_VAR "DEBUGINFOD_VERBOSE"
+#define DEBUGINFOD_RETRY_LIMIT_ENV_VAR "DEBUGINFOD_RETRY_LIMIT"
 
 /* The libdebuginfod soname.  */
 #define DEBUGINFOD_SONAME "@LIBDEBUGINFOD_SONAME@"
diff --git a/tests/ChangeLog b/tests/ChangeLog
index 38e92659..bf45e09f 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,8 @@ 
+2021-07-06  Alice Zhang  <alizhang@redhat.com>
+
+        PR27531
+	* run-debuginfod-find.sh: Add test case for retry mechanism.
+
 2021-05-14  Frank Ch. Eigler <fche@redhat.com>
 
 	PR27859
diff --git a/tests/run-debuginfod-find.sh b/tests/run-debuginfod-find.sh
index 9183cccb..feb3310d 100755
--- a/tests/run-debuginfod-find.sh
+++ b/tests/run-debuginfod-find.sh
@@ -582,6 +582,7 @@  curl -s http://127.0.0.1:$PORT2/metrics | grep 'dc_pool_op.*reuse'
 
 ########################################################################
 # Corrupt the sqlite database and get debuginfod to trip across its errors
+
 curl -s http://127.0.0.1:$PORT1/metrics | grep 'sqlite3.*reset'
 ls -al $DB
 dd if=/dev/zero of=$DB bs=1 count=1
@@ -674,4 +675,14 @@  DEBUGINFOD_URLS="file://${PWD}/mocktree/"
 filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find source aaaaaaaaaabbbbbbbbbbccccccccccdddddddddd /my/path/main.c`
 cmp $filename ${local_dir}/main.c
 
+########################################################################
+# set up tests for retrying failed queries.
+retry_attempts=`(testrun env DEBUGINFOD_URLS=http://255.255.255.255/JUNKJUNK DEBUGINFOD_RETRY_LIMIT=10 DEBUGINFOD_VERBOSE=1 \
+	${abs_top_builddir}/debuginfod/debuginfod-find debuginfo /bin/ls || true) 2>&1 >/dev/null \
+	| grep -c 'Retry failed query'`
+if [ $retry_attempts -ne 10 ]; then
+    echo "retry mechanism failed."
+    exit 1;
+  fi
+
 exit 0