PR27531: retry within default retry_limit will be supported. In debuginfod-client.c (debuginfod_query_server), insert a goto statement for jumping back to the beginning of curl handles set up if query fails and a non ENOENT error is returned.

Message ID 20210706201502.336113-1-alizhang@redhat.com
State Committed
Headers
Series PR27531: retry within default retry_limit will be supported. In debuginfod-client.c (debuginfod_query_server), insert a goto statement for jumping back to the beginning of curl handles set up if query fails and a non ENOENT error is returned. |

Commit Message

Alice Zhang July 6, 2021, 8:15 p.m. UTC
  Also introduced DEBUGINFOD_RETRY_LIMIT_ENV_VAR and default
DEBUGINFOD_RETRY_LIMIT(which is 2).

Correponding test has been added to tests/run-debuginfod-find.sh

Signed-off-by: Alice Zhang <alizhang@redhat.com>
---
 config/ChangeLog               |  4 ++++
 config/debuginfod.sysconfig    |  1 +
 debuginfod/ChangeLog           |  7 +++++++
 debuginfod/debuginfod-client.c | 33 +++++++++++++++++++++++++++++++--
 debuginfod/debuginfod.h.in     |  1 +
 tests/ChangeLog                |  5 +++++
 tests/run-debuginfod-find.sh   | 11 +++++++++++
 7 files changed, 60 insertions(+), 2 deletions(-)
  

Comments

Mark Wielaard July 8, 2021, 2:43 p.m. UTC | #1
Hi Alice,

On Tue, 2021-07-06 at 16:15 -0400, Alice Zhang via Elfutils-devel
wrote:
> Also introduced DEBUGINFOD_RETRY_LIMIT_ENV_VAR and default
> DEBUGINFOD_RETRY_LIMIT(which is 2).
> 
> Correponding test has been added to tests/run-debuginfod-find.sh
> 
> Signed-off-by: Alice Zhang <alizhang@redhat.com>

Nice. But try to split up the first paragraph. e.g.

---
PR27531: retry within default retry_limit will be supported.

In debuginfod-client.c (debuginfod_query_server), insert a
goto statement for jumping back to the beginning of curl
handles set up if query fails and a non ENOENT error is
returned.

Also introduced DEBUGINFOD_RETRY_LIMIT_ENV_VAR and default
DEBUGINFOD_RETRY_LIMIT (which is 2).

Correponding test has been added to tests/run-debuginfod-find.sh
 
Signed-off-by: Alice Zhang <alizhang@redhat.com>
---

That way the first sentence (note the extra blank line) becomes the
short commit message/subject.


+  /* If the verified_handle is NULL and rc != -ENOENT, the query
> fails with
> +   * an error code other than 404, then do several retry within the retry_limit.
> +   * Clean up all old handles and jump back to the beginning of query_in_parallel,
> +   * reinitialize handles and query again.*/
>    if (verified_handle == NULL)
> -    goto out1;
> +    {
> +      if (rc != -ENOENT && retry_limit-- > 0)
> +        {
> +	  if (vfd >= 0)
> +            dprintf (vfd, "Retry failed query, %d attempt(s) remaining\n", retry_limit);
> +	  /* remove all handles from multi */
> +          for (int i = 0; i < num_urls; i++)
> +            {
> +              curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
> +              curl_easy_cleanup (data[i].handle);
> +            }
> +	    goto query_in_parallel;
> +	}
> +      else
> +	goto out1;
> +    }

You also need to call free (data) before the goto query_in_parallel
since that will also be reallocated. Or you can rearrange the code a
little around query_in_parallel to keep the data around, but only
reinitialize it, but I think it is fine to free and then malloc again.

Try to run under valgrind --full-leak-check and you'll see:

==24684== 43,840 bytes in 10 blocks are definitely lost in loss record 61 of 61
==24684==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==24684==    by 0x52EB2A7: debuginfod_query_server (debuginfod-client.c:789)
==24684==    by 0x401131: main (debuginfod-find.c:192)

(P.S. Don't worry about the still reachable issues.)

diff --git a/debuginfod/debuginfod.h.in b/debuginfod/debuginfod.h.in
> index 559ea947..282e523d 100644
> --- a/debuginfod/debuginfod.h.in
> +++ b/debuginfod/debuginfod.h.in
> @@ -35,6 +35,7 @@
>  #define DEBUGINFOD_TIMEOUT_ENV_VAR "DEBUGINFOD_TIMEOUT"
>  #define DEBUGINFOD_PROGRESS_ENV_VAR "DEBUGINFOD_PROGRESS"
>  #define DEBUGINFOD_VERBOSE_ENV_VAR "DEBUGINFOD_VERBOSE"
> +#define DEBUGINFOD_RETRY_LIMIT_ENV_VAR "DEBUGINFOD_RETRY_LIMIT"
>  
>  /* The libdebuginfod soname.  */
>  #define DEBUGINFOD_SONAME "@LIBDEBUGINFOD_SONAME@"

Could you also add an entry for DEBUGINFOD_RETRY_LIMIT to
doc/debuginfod_find_debuginfo.3 under ENVIRONMENT VARIABLES.

> +####################################################################
> ####
> +# set up tests for retrying failed queries.
> +retry_attempts=`(testrun env DEBUGINFOD_URLS=
> http://255.255.255.255/JUNKJUNK DEBUGINFOD_RETRY_LIMIT=10
> DEBUGINFOD_VERBOSE=1 \
> +	${abs_top_builddir}/debuginfod/debuginfod-find debuginfo
> /bin/ls || true) 2>&1 >/dev/null \
> +	| grep -c 'Retry failed query'`
> +if [ $retry_attempts -ne 10 ]; then
> +    echo "retry mechanism failed."
> +    exit 1;
> +  fi
> +
>  exit 0

Cute testcase. This works because 255.255.255.255 will always fail.

Thanks,

Mark
  
Mark Wielaard July 9, 2021, 10:27 a.m. UTC | #2
Hi Alice (adding elfutils-devel back to CC),

On Thu, 2021-07-08 at 16:34 -0400, Alice Zhang wrote:
> Hi, Mark, I made modifications as you suggested. I relocated the goto
> statement and now the resources will be reused.

That rearrangement is better than what I suggested with the free and
(re)malloc. And thanks for adding the documentation. The commit message
looks good too.

> Also, I run valgrind --leak-check=full and here's the result:
> ==464842==
> ==464842== HEAP SUMMARY:
> ==464842==     in use at exit: 215,989 bytes in 2,167 blocks
> ==464842==   total heap usage: 3,866 allocs, 1,699 frees, 509,558 bytes
> allocated
> ==464842==
> ==464842== LEAK SUMMARY:
> ==464842==    definitely lost: 0 bytes in 0 blocks
> ==464842==    indirectly lost: 0 bytes in 0 blocks
> ==464842==      possibly lost: 0 bytes in 0 blocks
> ==464842==    still reachable: 215,989 bytes in 2,167 blocks
> ==464842==         suppressed: 0 bytes in 0 blocks
> ==464842== Reachable blocks (those to which a pointer was found) are not
> shown.
> ==464842== To see them, rerun with: --leak-check=full --show-leak-kinds=all
> ==464842==
> ==464842== For lists of detected and suppressed errors, rerun with: -s
> ==464842== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Looks great. The still reachable blocks are from the libcurl global
initialization and not an issue.

Pushed your rearranged version to master.

Thanks,

Mark
  

Patch

diff --git a/config/ChangeLog b/config/ChangeLog
index 2cdcfa72..70a1e923 100644
--- a/config/ChangeLog
+++ b/config/ChangeLog
@@ -1,3 +1,7 @@ 
+2021-07-06  Alice Zhang  <alizhang@redhat.com>
+
+        * debuginfod.sysconfig: Introduce default retry limit.
+
 2021-05-10  Mark Wielaard  <mark@klomp.org>
 
 	* elfutils.spec.in: Update for 0.185.
diff --git a/config/debuginfod.sysconfig b/config/debuginfod.sysconfig
index 44603874..890a1a25 100644
--- a/config/debuginfod.sysconfig
+++ b/config/debuginfod.sysconfig
@@ -11,4 +11,5 @@  DEBUGINFOD_PATHS="-t43200 -F -R /usr/lib/debug /usr/bin /usr/libexec /usr/sbin /
 # upstream debuginfods
 #DEBUGINFOD_URLS="http://secondhost:8002 http://thirdhost:8002"
 #DEBUGINFOD_TIMEOUT="5"
+#DEBUGINFOD_RETRY_LIMIT="2"
 #DEBUGINFOD_CACHE_DIR=""
diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index 21407dc2..f0673410 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -1,3 +1,10 @@ 
+2021-07-06  Alice Zhang  <alizhang@redhat.com>
+
+        PR27531
+	* debuginfod-client.c (debuginfod_query_server): Retry failed queries
+	if error code is not ENOENT.
+	* debuginfod.h.in: Introduce DEBUGINFOD_RETRY_LIMIT_ENV_VAR.
+
 2021-05-14  Frank Ch. Eigler <fche@redhat.com>
 
 	PR27859
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index ee7eda24..f8ad8ac4 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -157,6 +157,8 @@  static const char url_delim_char = ' ';
 /* Timeout for debuginfods, in seconds (to get at least 100K). */
 static const long default_timeout = 90;
 
+/* Default retry count for download error. */
+static const long default_retry_limit = 2;
 
 /* Data associated with a particular CURL easy handle. Passed to
    the write callback.  */
@@ -770,9 +772,17 @@  debuginfod_query_server (debuginfod_client *c,
         && (i == 0 || server_urls[i - 1] == url_delim_char))
       num_urls++;
   
+  int retry_limit = default_retry_limit;
+  const char* retry_limit_envvar = getenv(DEBUGINFOD_RETRY_LIMIT_ENV_VAR);
+  if (retry_limit_envvar != NULL)
+    retry_limit = atoi (retry_limit_envvar);
+
+ /*The beginning of goto block query_in_parallel.*/
+ query_in_parallel:
+  rc = -ENOENT; /* Reset rc to default.*/
   CURLM *curlm = c->server_mhandle;
   assert (curlm != NULL);
-  
+
   /* Tracks which handle should write to fd. Set to the first
      handle that is ready to write the target file to the cache.  */
   CURL *target_handle = NULL;
@@ -1074,8 +1084,27 @@  debuginfod_query_server (debuginfod_client *c,
         close(efd);
     }
 
+  /* If the verified_handle is NULL and rc != -ENOENT, the query fails with
+   * an error code other than 404, then do several retry within the retry_limit.
+   * Clean up all old handles and jump back to the beginning of query_in_parallel,
+   * reinitialize handles and query again.*/
   if (verified_handle == NULL)
-    goto out1;
+    {
+      if (rc != -ENOENT && retry_limit-- > 0)
+        {
+	  if (vfd >= 0)
+            dprintf (vfd, "Retry failed query, %d attempt(s) remaining\n", retry_limit);
+	  /* remove all handles from multi */
+          for (int i = 0; i < num_urls; i++)
+            {
+              curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
+              curl_easy_cleanup (data[i].handle);
+            }
+	    goto query_in_parallel;
+	}
+      else
+	goto out1;
+    }
 
   if (vfd >= 0)
     {
diff --git a/debuginfod/debuginfod.h.in b/debuginfod/debuginfod.h.in
index 559ea947..282e523d 100644
--- a/debuginfod/debuginfod.h.in
+++ b/debuginfod/debuginfod.h.in
@@ -35,6 +35,7 @@ 
 #define DEBUGINFOD_TIMEOUT_ENV_VAR "DEBUGINFOD_TIMEOUT"
 #define DEBUGINFOD_PROGRESS_ENV_VAR "DEBUGINFOD_PROGRESS"
 #define DEBUGINFOD_VERBOSE_ENV_VAR "DEBUGINFOD_VERBOSE"
+#define DEBUGINFOD_RETRY_LIMIT_ENV_VAR "DEBUGINFOD_RETRY_LIMIT"
 
 /* The libdebuginfod soname.  */
 #define DEBUGINFOD_SONAME "@LIBDEBUGINFOD_SONAME@"
diff --git a/tests/ChangeLog b/tests/ChangeLog
index 38e92659..bf45e09f 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,8 @@ 
+2021-07-06  Alice Zhang  <alizhang@redhat.com>
+
+        PR27531
+	* run-debuginfod-find.sh: Add test case for retry mechanism.
+
 2021-05-14  Frank Ch. Eigler <fche@redhat.com>
 
 	PR27859
diff --git a/tests/run-debuginfod-find.sh b/tests/run-debuginfod-find.sh
index 9183cccb..feb3310d 100755
--- a/tests/run-debuginfod-find.sh
+++ b/tests/run-debuginfod-find.sh
@@ -582,6 +582,7 @@  curl -s http://127.0.0.1:$PORT2/metrics | grep 'dc_pool_op.*reuse'
 
 ########################################################################
 # Corrupt the sqlite database and get debuginfod to trip across its errors
+
 curl -s http://127.0.0.1:$PORT1/metrics | grep 'sqlite3.*reset'
 ls -al $DB
 dd if=/dev/zero of=$DB bs=1 count=1
@@ -674,4 +675,14 @@  DEBUGINFOD_URLS="file://${PWD}/mocktree/"
 filename=`testrun ${abs_top_builddir}/debuginfod/debuginfod-find source aaaaaaaaaabbbbbbbbbbccccccccccdddddddddd /my/path/main.c`
 cmp $filename ${local_dir}/main.c
 
+########################################################################
+# set up tests for retrying failed queries.
+retry_attempts=`(testrun env DEBUGINFOD_URLS=http://255.255.255.255/JUNKJUNK DEBUGINFOD_RETRY_LIMIT=10 DEBUGINFOD_VERBOSE=1 \
+	${abs_top_builddir}/debuginfod/debuginfod-find debuginfo /bin/ls || true) 2>&1 >/dev/null \
+	| grep -c 'Retry failed query'`
+if [ $retry_attempts -ne 10 ]; then
+    echo "retry mechanism failed."
+    exit 1;
+  fi
+
 exit 0