debuginfod: PR27917 - protect against federation loops

Message ID CAN-Pu7RbLCLLc-GtyPXzq6VKfwxi-yLBzn1pmBVMxE-iKqpAnw@mail.gmail.com
State Committed
Headers
Series debuginfod: PR27917 - protect against federation loops |

Commit Message

Di Chen Aug. 12, 2021, 2:35 p.m. UTC
  >From a726d9868f4e02d390b9071180b0c3728da3750e Mon Sep 17 00:00:00 2001
From: Di Chen <dichen@redhat.com>
Date: Sun, 8 Aug 2021 16:57:12 +0800
Subject: [PATCH] debuginfod: PR27917 - protect against federation loops

If someone misconfigures a debuginfod federation to have loops, and a
nonexistent
buildid lookup is attempted, bad things will happen, as is documented.

This patch aims to reduce the risk by adding an option to debuginfod that
functions
kind of like an IP packet's TTL: a limit on the length of XFF: header that
debuginfod
is willing to process. If X-Forwarded-For: exceeds N hops, it will not
delegate a local
lookup miss to upstream debuginfods.

Commit ab38d167c40c99 causes federation loops for non-existent resources to
result
in multiple temporary livelocks, each lasting for $DEBUGINFOD_TIMEOUT
seconds.
Since concurrent requests for each unique resource are now serialized,
federation
loops can result in one server thread waiting to acquire a lock while the
server
thread holding the lock waits for the first thread to respond to an http
request.

This PR can help protect against the above multiple temporary livelocks
behaviour.
Ex. if --forwarded-ttl-limit=0 then the timeout behaviour of local loops
should
be avoided.

https://sourceware.org/bugzilla/show_bug.cgi?id=27917

Signed-off-by: Di Chen <dichen@redhat.com>
---
 debuginfod/debuginfod.cxx    | 30 ++++++++++++++++++++++++--
 doc/debuginfod.8             |  6 ++++++
 tests/run-debuginfod-find.sh | 42 +++++++++++++++++++++++++++++++++++-
 3 files changed, 75 insertions(+), 3 deletions(-)

 }
@@ -54,7 +58,7 @@ trap cleanup 0 1 2 3 5 9 15
 errfiles_list=
 err() {
     echo ERROR REPORTS
-    for ports in $PORT1 $PORT2 $PORT3
+    for ports in $PORT1 $PORT2 $PORT3 $PORT4 $PORT5
     do
         echo ERROR REPORT $port metrics
         curl -s http://127.0.0.1:$port/metrics
@@ -804,4 +808,40 @@ if [ $retry_attempts -ne 10 ]; then
     exit 1;
   fi

+########################################################################
+
+# Test when debuginfod hitting X-Forwarded-For hops limit.
+# This test will start two servers (as a loop) with two different hop
limits.
+
+while true; do
+    PORT4=`expr '(' $RANDOM % 1000 ')' + 9000`
+    PORT5=`expr '(' $RANDOM % 1000 ')' + 9000`
+    ss -atn | fgrep -e ":$PORT4" -e ":$PORT5"|| break
+done
+
+env LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS=http://127.0.0.1:$PORT5
${abs_builddir}/../debuginfod/debuginfod $VERBOSE --forwarded-ttl-limit 0
-p $PORT4 > vlog$PORT4 2>&1 &
+PID5=$!
+
+env LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS=http://127.0.0.1:$PORT4
${abs_builddir}/../debuginfod/debuginfod $VERBOSE --forwarded-ttl-limit 1
-p $PORT5 > vlog$PORT5 2>&1 &
+PID6=$!
+
+wait_ready $PORT4 'ready' 1
+wait_ready $PORT5 'ready' 1
+
+export DEBUGINFOD_URLS="http://127.0.0.1:$PORT4/"
+testrun ${abs_top_builddir}/debuginfod/debuginfod-find debuginfo 01234567
|| true
+
+# Use a different buildid to avoid using same cache.
+export DEBUGINFOD_URLS="http://127.0.0.1:$PORT5/"
+testrun ${abs_top_builddir}/debuginfod/debuginfod-find debuginfo 11234567
|| true
+
+grep "forwared-ttl-limit reached and will not query the upstream servers"
vlog$PORT4
+grep -v "forwared-ttl-limit reached and will not query the upstream
servers" vlog$PORT5 | grep "not found" vlog$PORT5
+
+kill $PID5 $PID6
+wait $PID5 $PID6
+
+PID5=0
+PID6=0
+
 exit 0
  

Comments

Frank Ch. Eigler Aug. 18, 2021, 10:55 p.m. UTC | #1
Hi -

> This patch aims to reduce the risk by adding an option to debuginfod
> that functions kind of like an IP packet's TTL: a limit on the
> length of XFF: header that debuginfod is willing to process. If
> X-Forwarded-For: exceeds N hops, it will not delegate a local lookup
> miss to upstream debuginfods. [...]

Thank you very much!


> Commit ab38d167c40c99 causes federation loops for non-existent
> resources to result in multiple temporary livelocks, each lasting
> for $DEBUGINFOD_TIMEOUT seconds. [...]

(FWIW, the term "livelock" is not quite right here, try just
"deadlock".)

The patch looks functional, and thank you also for including the
docs and test case.  Thorough enough!


> @@ -1862,6 +1869,12 @@ handle_buildid (MHD_Connection* conn,
>    // We couldn't find it in the database.  Last ditch effort
>    // is to defer to other debuginfo servers.
> 
> +  // if X-Forwarded-For: exceeds N hops,
> +  // do not delegate a local lookup miss to upstream debuginfods.
> +  if (disable_query_server)
> +    throw reportable_exception(MHD_HTTP_NOT_FOUND, "not found,
> --forwared-ttl-limit reached \
> +and will not query the upstream servers");

One part I don't understand is why you added the code to check for XFF
length into handler_cb(), and then passed the disable_query_server
result flag to this function.  Was there some reason not to perform
the XFF comma-counting right here?


- FChE
  
Di Chen Aug. 20, 2021, 12:44 p.m. UTC | #2
Hey Frank,

1) moved the XFF check to handle_buildid.
2) replace "livelock" with "deadlock" in the commit message.

- dichen


On Thu, Aug 19, 2021 at 6:55 AM Frank Ch. Eigler <fche@redhat.com> wrote:

> Hi -
>
> > This patch aims to reduce the risk by adding an option to debuginfod
> > that functions kind of like an IP packet's TTL: a limit on the
> > length of XFF: header that debuginfod is willing to process. If
> > X-Forwarded-For: exceeds N hops, it will not delegate a local lookup
> > miss to upstream debuginfods. [...]
>
> Thank you very much!
>
>
> > Commit ab38d167c40c99 causes federation loops for non-existent
> > resources to result in multiple temporary livelocks, each lasting
> > for $DEBUGINFOD_TIMEOUT seconds. [...]
>
> (FWIW, the term "livelock" is not quite right here, try just
> "deadlock".)
>
> The patch looks functional, and thank you also for including the
> docs and test case.  Thorough enough!
>
>
> > @@ -1862,6 +1869,12 @@ handle_buildid (MHD_Connection* conn,
> >    // We couldn't find it in the database.  Last ditch effort
> >    // is to defer to other debuginfo servers.
> >
> > +  // if X-Forwarded-For: exceeds N hops,
> > +  // do not delegate a local lookup miss to upstream debuginfods.
> > +  if (disable_query_server)
> > +    throw reportable_exception(MHD_HTTP_NOT_FOUND, "not found,
> > --forwared-ttl-limit reached \
> > +and will not query the upstream servers");
>
> One part I don't understand is why you added the code to check for XFF
> length into handler_cb(), and then passed the disable_query_server
> result flag to this function.  Was there some reason not to perform
> the XFF comma-counting right here?
>
>
> - FChE
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-debuginfod-PR27917-protect-against-federation-loops.patch
Type: text/x-patch
Size: 7151 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/elfutils-devel/attachments/20210820/c31e889f/attachment.bin>
  

Patch

diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index 4ddd9255..bd103d9f 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -375,6 +375,8 @@  static const struct argp_option options[] =
 #define ARGP_KEY_FDCACHE_PREFETCH_FDS 0x1006
    { "fdcache-prefetch-fds", ARGP_KEY_FDCACHE_PREFETCH_FDS, "NUM",
0,"Number of files allocated to the \
       prefetch cache.", 0},
+#define ARGP_KEY_FORWARDED_TTL_LIMIT 0x1007
+   {"forwarded-ttl-limit", ARGP_KEY_FORWARDED_TTL_LIMIT, "NUM", 0, "Limit
of X-Forwarded-For hops, default 8.", 0},
    { NULL, 0, NULL, 0, NULL, 0 },
   };

@@ -422,6 +424,7 @@  static long fdcache_prefetch;
 static long fdcache_mintmp;
 static long fdcache_prefetch_mbs;
 static long fdcache_prefetch_fds;
+static unsigned forwarded_ttl_limit = 8;
 static string tmpdir;

 static void set_metric(const string& key, double value);
@@ -554,6 +557,9 @@  parse_opt (int key, char *arg,
       if( fdcache_mintmp > 100 || fdcache_mintmp < 0 )
         argp_failure(state, 1, EINVAL, "fdcache mintmp percent");
       break;
+    case ARGP_KEY_FORWARDED_TTL_LIMIT:
+      forwarded_ttl_limit = (unsigned) atoi(arg);
+      break;
     case ARGP_KEY_ARG:
       source_paths.insert(string(arg));
       break;
@@ -1769,7 +1775,8 @@  handle_buildid (MHD_Connection* conn,
                 const string& buildid /* unsafe */,
                 const string& artifacttype /* unsafe */,
                 const string& suffix /* unsafe */,
-                int *result_fd)
+                int *result_fd,
+                bool disable_query_server = false)
 {
   // validate artifacttype
   string atype_code;
@@ -1862,6 +1869,12 @@  handle_buildid (MHD_Connection* conn,
   // We couldn't find it in the database.  Last ditch effort
   // is to defer to other debuginfo servers.

+  // if X-Forwarded-For: exceeds N hops,
+  // do not delegate a local lookup miss to upstream debuginfods.
+  if (disable_query_server)
+    throw reportable_exception(MHD_HTTP_NOT_FOUND, "not found,
--forwared-ttl-limit reached \
+and will not query the upstream servers");
+
   int fd = -1;
   debuginfod_client *client = debuginfod_pool_begin ();
   if (client != NULL)
@@ -2119,6 +2132,7 @@  handler_cb (void * /*cls*/,
   struct timespec ts_start, ts_end;
   clock_gettime (CLOCK_MONOTONIC, &ts_start);
   double afteryou = 0.0;
+  bool disable_query_server = false;

   try
     {
@@ -2131,6 +2145,17 @@  handler_cb (void * /*cls*/,

       if (slash1 != string::npos && url1 == "/buildid")
         {
+          // check if X-Forwarded-For exceeds the limit number of hops.
+          string xff = MHD_lookup_connection_value (connection,
MHD_HEADER_KIND, "X-Forwarded-For") ?: "";
+
+          unsigned int xff_count = 0;
+          for (auto&& i : xff){
+            if (i == ',') xff_count++;
+          }
+
+          if (xff_count >= forwarded_ttl_limit)
+            disable_query_server = true;
+
           // PR27863: block this thread awhile if another thread is
already busy
           // fetching the exact same thing.  This is better for Everyone.
           // The latecomer says "... after you!" and waits.
@@ -2171,7 +2196,7 @@  handler_cb (void * /*cls*/,

           // get the resulting fd so we can report its size
           int fd;
-          r = handle_buildid(connection, buildid, artifacttype, suffix,
&fd);
+          r = handle_buildid(connection, buildid, artifacttype, suffix,
&fd, disable_query_server);
           if (r)
             {
               struct stat fs;
@@ -3719,6 +3744,7 @@  main (int argc, char *argv[])
   obatched(clog) << "groom time " << groom_s << endl;
   obatched(clog) << "prefetch fds " << fdcache_prefetch_fds << endl;
   obatched(clog) << "prefetch mbs " << fdcache_prefetch_mbs << endl;
+  obatched(clog) << "forwarded ttl limit " << forwarded_ttl_limit << endl;

   if (scan_archives.size()>0)
     {
diff --git a/doc/debuginfod.8 b/doc/debuginfod.8
index f70af625..4f532cb8 100644
--- a/doc/debuginfod.8
+++ b/doc/debuginfod.8
@@ -236,6 +236,12 @@  are intended to give an operator notice about storage
scarcity - which
 can translate to RAM scarcity if the disk happens to be on a RAM
 virtual disk.  The default threshold is 25%.

+.TP
+.B "\-\-forwarded\-ttl\-limit=NUM"
+Configure limits of X-Forwarded-For hops. if X-Forwarded-For
+exceeds N hops, it will not delegate a local lookup miss to
+upstream debuginfods. The default limit is 8.
+
 .TP
 .B "\-v"
 Increase verbosity of logging to the standard error file descriptor.
diff --git a/tests/run-debuginfod-find.sh b/tests/run-debuginfod-find.sh
index 991d1dc5..dbf20975 100755
--- a/tests/run-debuginfod-find.sh
+++ b/tests/run-debuginfod-find.sh
@@ -37,6 +37,8 @@  PID1=0
 PID2=0
 PID3=0
 PID4=0
+PID5=0
+PID6=0

 cleanup()
 {
@@ -44,6 +46,8 @@  cleanup()
   if [ $PID2 -ne 0 ]; then kill $PID2; wait $PID2; fi
   if [ $PID3 -ne 0 ]; then kill $PID3; wait $PID3; fi
   if [ $PID4 -ne 0 ]; then kill $PID4; wait $PID4; fi
+  if [ $PID5 -ne 0 ]; then kill $PID5; wait $PID5; fi
+  if [ $PID6 -ne 0 ]; then kill $PID6; wait $PID6; fi
   rm -rf F R D L Z ${PWD}/foobar ${PWD}/mocktree ${PWD}/.client_cache*
${PWD}/tmp*
   exit_cleanup