[v2,2/2] debuginfod-client.c: Fix x-debuginfod-size counts differently than CURLINFO_SIZE_DOWNLOAD_T

Message ID 20230329150237.896092-2-lilydjwg@gmail.com
State New
Headers
Series [v2,1/2] debuginfod-client.c: Fix download size not correctly fallbacks to x-debuginfod-size header |

Commit Message

lilydjwg March 29, 2023, 3:02 p.m. UTC
  x-debuginfod-size is the actual file size, but CURLINFO_SIZE_DOWNLOAD_T
is transferred size, i.e. the gzipped one if gzip is on.

Let's count written data and use that if and only if x-debuginfod-size
is used.

Signed-off-by: lilydjwg <lilydjwg@gmail.com>
---
 ChangeLog                      |  2 ++
 debuginfod/debuginfod-client.c | 45 ++++++++++++++++++++++------------
 2 files changed, 32 insertions(+), 15 deletions(-)
  

Comments

Frank Ch. Eigler March 29, 2023, 7:14 p.m. UTC | #1
Hi -

> x-debuginfod-size is the actual file size, but CURLINFO_SIZE_DOWNLOAD_T
> is transferred size, i.e. the gzipped one if gzip is on.
> Let's count written data and use that if and only if x-debuginfod-size
> is used.

Hey, great idea actually tallying up writes in the callback function.
(We need to take care to clear that counter, in case of client object
reuse.)  Also, can you think of some reason not to just use that value
at all times, i.e., without any of that "if and only if ..." business?

- FChE
  
lilydjwg March 30, 2023, 3:41 a.m. UTC | #2
On Wed, Mar 29, 2023 at 03:14:43PM -0400, Frank Ch. Eigler wrote:
> Hi -
> 
> > x-debuginfod-size is the actual file size, but CURLINFO_SIZE_DOWNLOAD_T
> > is transferred size, i.e. the gzipped one if gzip is on.
> > Let's count written data and use that if and only if x-debuginfod-size
> > is used.
> 
> Hey, great idea actually tallying up writes in the callback function.
> (We need to take care to clear that counter, in case of client object
> reuse.)  Also, can you think of some reason not to just use that value
> at all times, i.e., without any of that "if and only if ..." business?

The counter is in handle_data, and it is already cleared at
"query_in_parallel:". I don't find other places that may reuse them.

The written_size is actual file size (uncompressed), but IIUC
Content-Length is the compressed size if Content-Encoding says the
content is compressed. I haven't seen any compressed responses with
Content-Length, but from the spec I don't read they are not allowed.
  
Frank Ch. Eigler March 30, 2023, 5:24 p.m. UTC | #3
Hi -

> > Hey, great idea actually tallying up writes in the callback function.
> > (We need to take care to clear that counter, in case of client object
> > reuse.)  Also, can you think of some reason not to just use that value
> > at all times, i.e., without any of that "if and only if ..." business?
> 
> The counter is in handle_data, and it is already cleared at
> "query_in_parallel:". I don't find other places that may reuse them.

OK.

> The written_size is actual file size (uncompressed), but IIUC
> Content-Length is the compressed size if Content-Encoding says the
> content is compressed. I haven't seen any compressed responses with
> Content-Length, but from the spec I don't read they are not allowed.

OK, so to spell out the hypothetical problem: what if a httpd server
does send back a Content-Length: response header for a compressed
file, and we use that as the denominator for progress reporting.  If
we use the decompressed actual file length as numerator, we'd go over
100%.

Then ISTM a simpler way to handle this would be to say that if the
x-debuginfod-size: response header is found (as denominator), then go
ahead and use the actual data[committed_to].written_size (as
numerator).  Don't even try the CURLINFO_SIZE* queries then.

- FChE
  
lilydjwg March 31, 2023, 4:50 a.m. UTC | #4
On Thu, Mar 30, 2023 at 01:24:13PM -0400, Frank Ch. Eigler wrote:
> > The written_size is actual file size (uncompressed), but IIUC
> > Content-Length is the compressed size if Content-Encoding says the
> > content is compressed. I haven't seen any compressed responses with
> > Content-Length, but from the spec I don't read they are not allowed.
> 
> OK, so to spell out the hypothetical problem: what if a httpd server
> does send back a Content-Length: response header for a compressed
> file, and we use that as the denominator for progress reporting.  If
> we use the decompressed actual file length as numerator, we'd go over
> 100%.
> 
> Then ISTM a simpler way to handle this would be to say that if the
> x-debuginfod-size: response header is found (as denominator), then go
> ahead and use the actual data[committed_to].written_size (as
> numerator).  Don't even try the CURLINFO_SIZE* queries then.

It's not tried in that case. size_compressed indicates where the total
size is from and those CURLINFO_SIZE* is skipped. Maybe I should rename
the variable to something else (it's not always compressed).

Or do you mean that you want to always use written_size even when the
progress may go beyond 100%?
  

Patch

diff --git a/ChangeLog b/ChangeLog
index 05697a02..903b3494 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -2,6 +2,8 @@ 
 
 	* debuginfod/debuginfod-client.c: Fix download size not correctly
 	fallbacks to x-debuginfod-size header
+	* debuginfod-client.c: Fix x-debuginfod-size counts differently than
+	CURLINFO_SIZE_DOWNLOAD_T
 
 2023-03-03  Mark Wielaard  <mark@klomp.org>
 
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index d6d3f0dd..ebc8e6b2 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -215,6 +215,9 @@  struct handle_data
   /* Response http headers for this client handle, sent from the server */
   char *response_data;
   size_t response_data_size;
+
+  /* The data size that has been written */
+  long written_size;
 };
 
 static size_t
@@ -243,6 +246,7 @@  debuginfod_write_callback (char *ptr, size_t size, size_t nmemb, void *data)
   if (*d->target_handle != d->handle)
     return -1;
 
+  d->written_size += count;
   return (size_t) write(d->fd, (void*)ptr, count);
 }
 
@@ -1265,6 +1269,7 @@  debuginfod_query_server (debuginfod_client *c,
       data[i].errbuf[0] = '\0';
       data[i].response_data = NULL;
       data[i].response_data_size = 0;
+      data[i].written_size = 0;
     }
 
   char *escaped_string = NULL;
@@ -1468,6 +1473,7 @@  debuginfod_query_server (debuginfod_client *c,
         }
 
       long dl_size = -1;
+      bool size_compressed = true;
       if (target_handle && (c->progressfn || maxsize > 0))
         {
           /* Get size of file being downloaded. NB: If going through
@@ -1498,7 +1504,10 @@  debuginfod_query_server (debuginfod_client *c,
 
               if (hdr != NULL
                   && sscanf(hdr, "x-debuginfod-size: %ld", &xdl) == 1)
-                dl_size = xdl;
+                {
+                  dl_size = xdl;
+                  size_compressed = false;
+                }
             }
         }
 
@@ -1508,23 +1517,29 @@  debuginfod_query_server (debuginfod_client *c,
           long pa = loops; /* default param for progress callback */
           if (target_handle) /* we've committed to a server; report its download progress */
             {
-              CURLcode curl_res;
+              if (size_compressed)
+                {
+                  CURLcode curl_res;
 #if CURL_AT_LEAST_VERSION(7, 55, 0)
-              curl_off_t dl;
-              curl_res = curl_easy_getinfo(target_handle,
-                                           CURLINFO_SIZE_DOWNLOAD_T,
-                                           &dl);
-              if (curl_res == 0 && dl >= 0)
-                pa = (dl > LONG_MAX ? LONG_MAX : (long)dl);
+                  curl_off_t dl;
+                  curl_res = curl_easy_getinfo(target_handle,
+                                              CURLINFO_SIZE_DOWNLOAD_T,
+                                              &dl);
+                  if (curl_res == 0 && dl >= 0)
+                    pa = (dl > LONG_MAX ? LONG_MAX : (long)dl);
 #else
-              double dl;
-              curl_res = curl_easy_getinfo(target_handle,
-                                           CURLINFO_SIZE_DOWNLOAD,
-                                           &dl);
-              if (curl_res == 0)
-                pa = (dl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)dl);
+                  double dl;
+                  curl_res = curl_easy_getinfo(target_handle,
+                                              CURLINFO_SIZE_DOWNLOAD,
+                                              &dl);
+                  if (curl_res == 0)
+                    pa = (dl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)dl);
 #endif
-
+                }
+              else
+                {
+                  pa = data[committed_to].written_size;
+                }
             }
 
           if ((*c->progressfn) (c, pa, dl_size == -1 ? 0 : dl_size))