PR 30991: srcfiles tarball feature

Message ID 20240119164716.261781-1-halamour@redhat.com
State Superseded
Delegated to: Aaron Merey
Headers
Series PR 30991: srcfiles tarball feature |

Commit Message

Housam Alamour Jan. 19, 2024, 4:47 p.m. UTC
  * srcfiles.cxx: Introduce new --zip option that places all the
    source files associated with a specified dwarf/elf file
    into a zip file and sends it to stdout. Files may be
    fetched from debuginfod (if applicable) or locally as
    a backup.
    Added -b option to disable the backup of checking
    for files locally in -z mode.

* run-srcfiles-self.sh: Added test-case for the new zip
    feature that archives the source files of the srcfiles
    tool and checks archive integrity. An additional test
    ensures that if debuginfod is enabled, the files are
    fetched and archived properly while maintaing integrity.

* debuginfod-subr.sh: On very slow/remote storage, it can
    take O(minute) to finish indexing the entire elfutils
    build tree, so a wait_ready4 shell function is one
    way to let a longer debuginfod wait operation work.

* srcfiles.1, NEWS: Added documentation for the new zip feature.

* configure.ac: Simplify check for libarchive for srcfiles.cxx
    by integrating it into the same check for debuginfod.

* Makefile.am: build with local copy of debuginfod-client.

Example:
% ./src/srcfiles -z -e /bin/ls > output.zip

https://sourceware.org/bugzilla/show_bug.cgi?id=30991

Signed-off-by: Housam Alamour <halamour@redhat.com>
---
 NEWS                       |   8 +
 configure.ac               |   5 +-
 debuginfod/debuginfod.cxx  |   2 -
 doc/srcfiles.1             |  26 +++-
 src/Makefile.am            |   9 +-
 src/srcfiles.cxx           | 307 ++++++++++++++++++++++++++++++++-----
 tests/debuginfod-subr.sh   |  14 +-
 tests/run-srcfiles-self.sh |  77 +++++++++-
 8 files changed, 394 insertions(+), 54 deletions(-)
  

Comments

Aaron Merey Jan. 23, 2024, 3:02 a.m. UTC | #1
Hi Housam,

This is a very cool feature.  Thanks again for working on this!

On Fri, Jan 19, 2024 at 11:47 AM Housam Alamour <halamour@redhat.com> wrote:
>
> * srcfiles.cxx: Introduce new --zip option that places all the
>     source files associated with a specified dwarf/elf file
>     into a zip file and sends it to stdout. Files may be
>     fetched from debuginfod (if applicable) or locally as
>     a backup.
>     Added -b option to disable the backup of checking
>     for files locally in -z mode.
>
> * run-srcfiles-self.sh: Added test-case for the new zip
>     feature that archives the source files of the srcfiles
>     tool and checks archive integrity. An additional test
>     ensures that if debuginfod is enabled, the files are
>     fetched and archived properly while maintaing integrity.

Should be 'maintaining'.

>
> * debuginfod-subr.sh: On very slow/remote storage, it can
>     take O(minute) to finish indexing the entire elfutils
>     build tree, so a wait_ready4 shell function is one
>     way to let a longer debuginfod wait operation work.
>
> * srcfiles.1, NEWS: Added documentation for the new zip feature.
>
> * configure.ac: Simplify check for libarchive for srcfiles.cxx
>     by integrating it into the same check for debuginfod.
>
> * Makefile.am: build with local copy of debuginfod-client.
>
> Example:
> % ./src/srcfiles -z -e /bin/ls > output.zip
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=30991
>
> Signed-off-by: Housam Alamour <halamour@redhat.com>
> ---
>  NEWS                       |   8 +
>  configure.ac               |   5 +-
>  debuginfod/debuginfod.cxx  |   2 -
>  doc/srcfiles.1             |  26 +++-
>  src/Makefile.am            |   9 +-
>  src/srcfiles.cxx           | 307 ++++++++++++++++++++++++++++++++-----
>  tests/debuginfod-subr.sh   |  14 +-
>  tests/run-srcfiles-self.sh |  77 +++++++++-
>  8 files changed, 394 insertions(+), 54 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 0420d3b8..3391d6a1 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,3 +1,8 @@
> +Version 0.191 (after 0.189)
> +
> +srcfiles: Can now fetch the source files of a DWARF/ELF file and
> +          place them into a zip.
> +
>  Version 0.190 "Woke!"
>
>  CONTRIBUTING: Switch from real name policy to known identity policy.
> @@ -9,6 +14,9 @@ libelf: Add RELR support.
>
>  libdw: Recognize .debug_[ct]u_index sections
>
> +srcfiles: added srcfiles tool that lists all the source files of a given
> +          DWARF/ELF file.
> +
>  readelf: Support readelf -Ds, --use-dynamic --symbol.
>           Support .gdb_index version 9
>
> diff --git a/configure.ac b/configure.ac
> index af5b6bf7..ddb79b83 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -841,7 +841,7 @@ AM_CONDITIONAL([LIBDEBUGINFOD],[test "x$enable_libdebuginfod" = "xyes" || test "
>  AM_CONDITIONAL([DUMMY_LIBDEBUGINFOD],[test "x$enable_libdebuginfod" = "xdummy"])
>  AC_CHECK_HEADERS([execinfo.h])
>
> -# Look for libmicrohttpd, libarchive, sqlite for debuginfo server
> +# Look for libmicrohttpd, libarchive, sqlite for debuginfo server and srcfiles tool
>  # minimum versions as per rhel7.
>  AC_ARG_ENABLE([debuginfod],AS_HELP_STRING([--enable-debuginfod], [Build debuginfod server]))
>  AS_IF([test "x$enable_debuginfod" != "xno"], [
> @@ -853,11 +853,12 @@ AS_IF([test "x$enable_debuginfod" != "xno"], [
>        AC_MSG_ERROR([need libdebuginfod (or dummy), use --disable-debuginfod to disable.])
>      fi
>      enable_debuginfod=yes # presume success
> +    AC_DEFINE([HAVE_LIBARCHIVE], [1], [Define to 1 if libarchive is available]) # presume success
>      PKG_PROG_PKG_CONFIG
>      PKG_CHECK_MODULES([libmicrohttpd],[libmicrohttpd >= 0.9.33],[],[enable_debuginfod=no])
>      PKG_CHECK_MODULES([oldlibmicrohttpd],[libmicrohttpd < 0.9.51],[old_libmicrohttpd=yes],[old_libmicrohttpd=no])
>      PKG_CHECK_MODULES([sqlite3],[sqlite3 >= 3.7.17],[],[enable_debuginfod=no])
> -    PKG_CHECK_MODULES([libarchive],[libarchive >= 3.1.2],[],[enable_debuginfod=no])
> +    PKG_CHECK_MODULES([libarchive],[libarchive >= 3.1.2],[],[enable_debuginfod=no], AC_DEFINE([HAVE_LIBARCHIVE], [0], [Define to 0 if libarchive is not available]))
>      if test "x$enable_debuginfod" = "xno"; then
>        AC_MSG_ERROR([dependencies not found, use --disable-debuginfod to disable.])
>      fi
> diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
> index 524be948..6b21f46f 100644
> --- a/debuginfod/debuginfod.cxx
> +++ b/debuginfod/debuginfod.cxx
> @@ -2996,8 +2996,6 @@ dwarf_extract_source_paths (Elf *elf, set<string>& debug_sourcefiles)
>
>        if (comp_dir[0] == '\0' && cuname[0] != '/')
>          {
> -          // This is a common symptom for dwz-compressed debug files,
> -          // where the altdebug file cannot be resolved.
>            if (verbose > 3)
>              obatched(clog) << "skipping cu=" << cuname << " due to empty comp_dir" << endl;
>            continue;
> diff --git a/doc/srcfiles.1 b/doc/srcfiles.1
> index 6149c21b..a7cde664 100644
> --- a/doc/srcfiles.1
> +++ b/doc/srcfiles.1
> @@ -21,15 +21,18 @@
>  eu-srcfiles \- Lists the source files of a DWARF/ELF file.
>
>  .SH "SYNOPSIS"
> -eu-srcfiles [\fB\-0\fR|\fB\-\-null\fR] [\fB\-c\fR|\fB\-\-cu\-only\fR] [\fB\-v\fR|\fB\-\-verbose\fR] INPUT
> +eu-srcfiles [\fB\-0\fR|\fB\-\-null\fR] [\fB\-c\fR|\fB\-\-cu\-only\fR] [\fB\-v\fR|\fB\-\-verbose\fR] [\fB\-z\fR|\fB\-\-zip\fR] INPUT
>
>  .SH "DESCRIPTION"
> -\fBeu-srcfiles\fR lists the source files of a given \s-DWARF/ELF\s0
> +\fBeu-srcfiles\fR lists all the source files of a given DWARF/ELF
>  file.  This list is based on a search of the DWARF debuginfo, which
>  may be automatically fetched by debuginfod if applicable.  The target
>  file may be an executable, a coredump, a process, or even the running
> -kernel.  The default is the file 'a.out'.  The source file names are
> -made unique and printed to standard output.
> +kernel.  The default input is the file 'a.out'.  The source file names are
> +made unique by prepending the full path name and then printed to standard output. The source files can be
> +placed in a zip file that is sent to stdout.
> +
> +Note that all occurrences of '/./' and '/../' in the path name are canonicalized.
>
>  .SH "INPUT OPTIONS"
>  The long and short forms of options, shown here as alternatives, are
> @@ -82,12 +85,20 @@ Separate items by a null instead of a newline.
>
>  .TP
>  \fB\-c, \-\-cu\-only\fR
> -Only list the CU names.
> +Only list the (compilation unit) CU names.

I think this should be either "compilation unit (CU)" or CU (compilation unit)".

>  .TP
>  \fB\-v, \-\-verbose\fR
>  Increase verbosity of logging messages.
>
> +.TP
> +\fB\-z, \-\-zip\fR
> +Zip all the source files and send to stdout.
> +Files may be automatically fetched by
> +debuginfod (if applicable) or locally as a
> +backup. Any source files that were not found
> +will not be archived.
> +
>
>  .SH EXAMPLES
>
> @@ -119,6 +130,11 @@ List the source files of a kernel image.
>  eu-srcfiles -e /boot/vmlinuz-`uname -r`
>  .ESAMPLE
>
> +Zip all the source files for a binary.
> +.SAMPLE
> +eu-srcfiles -z -e /bin/ls > ls.zip
> +.ESAMPLE
> +
>
>  .SH "AUTHOR"
>  Written by Housam Alamour.
> diff --git a/src/Makefile.am b/src/Makefile.am
> index d3d9d408..5fa8df3d 100644
> --- a/src/Makefile.am
> +++ b/src/Makefile.am
> @@ -21,7 +21,7 @@ DEFS += $(YYDEBUG) -DDEBUGPRED=@DEBUGPRED@ \
>         -DSRCDIR=\"$(shell cd $(srcdir);pwd)\" -DOBJDIR=\"$(shell pwd)\"
>  AM_CPPFLAGS += -I$(srcdir)/../libelf -I$(srcdir)/../libebl \
>             -I$(srcdir)/../libdw -I$(srcdir)/../libdwelf \
> -           -I$(srcdir)/../libdwfl -I$(srcdir)/../libasm
> +           -I$(srcdir)/../libdwfl -I$(srcdir)/../libasm -I../debuginfod
>
>  AM_LDFLAGS = -Wl,-rpath-link,../libelf:../libdw $(STACK_USAGE_NO_ERROR)
>
> @@ -50,6 +50,11 @@ libelf = ../libelf/libelf.so
>  endif
>  libebl = ../libebl/libebl.a ../backends/libebl_backends.a ../libcpu/libcpu.a
>  libeu = ../lib/libeu.a
> +if LIBDEBUGINFOD
> +libdebuginfod = ../debuginfod/libdebuginfod.so
> +else
> +libdebuginfod =
> +endif
>
>  if DEMANGLE
>  demanglelib = -lstdc++
> @@ -85,7 +90,7 @@ stack_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD) $(demanglelib)
>  elfcompress_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD)
>  elfclassify_LDADD = $(libelf) $(libdw) $(libeu) $(argp_LDADD)
>  srcfiles_SOURCES = srcfiles.cxx
> -srcfiles_LDADD = $(libdw) $(libelf) $(libeu)  $(argp_LDADD)
> +srcfiles_LDADD = $(libdw) $(libelf) $(libeu)  $(argp_LDADD) $(libarchive_LIBS) $(libdebuginfod)
>
>  installcheck-binPROGRAMS: $(bin_PROGRAMS)
>         bad=0; pid=$$$$; list="$(bin_PROGRAMS)"; for p in $$list; do \
> diff --git a/src/srcfiles.cxx b/src/srcfiles.cxx
> index 3c7afdc4..0e4f756f 100644
> --- a/src/srcfiles.cxx
> +++ b/src/srcfiles.cxx
> @@ -15,6 +15,21 @@
>
>     You should have received a copy of the GNU General Public License
>     along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +
> +/* In case we have a bad fts we include this before config.h because it
> +   can't handle _FILE_OFFSET_BITS.
> +   Everything we need here is fine if its declarations just come first.
> +   Also, include sys/types.h before fts. On some systems fts.h is not self
> +   contained. */
> +#ifdef BAD_FTS
> +  #include <sys/types.h>
> +  #include <fts.h>
> +#endif
> +
> +#ifdef HAVE_CONFIG_H
> +# include <config.h>
> +#endif
>
>  #include "printversion.h"
>  #include <dwarf.h>
> @@ -23,23 +38,50 @@
>  #include <set>
>  #include <string>
>  #include <cassert>
> -#include <config.h>
> +#include <gelf.h>
> +#include <memory>
> +
> +#ifdef ENABLE_LIBDEBUGINFOD
> +  #include "debuginfod.h"
> +#endif
>
>  #include <libdwfl.h>
>  #include <fcntl.h>
>  #include <iostream>
>  #include <libdw.h>
> +#include <sstream>
> +#include <vector>
> +
> +/* Libraries for use by the --zip option */
> +#ifdef HAVE_LIBARCHIVE
> +  #include <archive.h>
> +  #include <archive_entry.h>
> +#endif
> +
> +/* If fts.h is included before config.h, its indirect inclusions may not
> +   give us the right LFS aliases of these functions, so map them manually. */
> +#ifdef BAD_FTS
> +  #ifdef _FILE_OFFSET_BITS
> +    #define open open64
> +    #define fopen fopen64
> +  #endif
> +#else
> +  #include <sys/types.h>
> +  #include <fts.h>
> +#endif
>
>  using namespace std;
>
> -/* Name and version of program.  */
> +/* Name and version of program. */

FYI it's common to leave two spaces after a period.  But if you
prefer just one space that's fine too.

>  ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
>
> -/* Bug report address.  */
> +/* Bug report address. */
>  ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
>
> -/* Definitions of arguments for argp functions.  */
> -static const struct argp_option options[] =
> +constexpr size_t BUFFER_SIZE = 8192;
> +
> +/* Definitions of arguments for argp functions. */
> +static const struct argp_option options[] =
>  {
>    { NULL, 0, NULL, OPTION_DOC, N_("Output options:"), 1 },
>    { "null", '0', NULL, 0,
> @@ -47,21 +89,30 @@ static const struct argp_option options[] =
>    { "verbose", 'v', NULL, 0,
>      N_ ("Increase verbosity of logging messages."), 0 },
>    { "cu-only", 'c', NULL, 0, N_ ("Only list the CU names."), 0 },
> -  { NULL, 0, NULL, 0, NULL, 0 }
> +  #ifdef HAVE_LIBARCHIVE
> +  { "zip", 'z', NULL, 0, N_ ("Zip all the source files and send to stdout."
> +    "Cannot be used with the null option"), 0 },
> +    #ifdef ENABLE_LIBDEBUGINFOD
> +    { "no-backup", 'b', NULL, 0, N_ ("Disables local source file search when "
> +      "debuginfod fails to fetch files. This option is only applicable"

There should be a space after "send to stdout." and "is only
applicable".

I would emphasize in the -z help text and man page entry that
debuginfod queries, if enabled, take priority over local source
files by default.

I would also keep -z and -b options even when srcfiles isn't
built with libarchive/libdebuginfod.  If -z or -b is given
in these cases, we can just emit a warning that the option
isn't supported and will be ignored.  This helps prevent
immediate error exits due to unrecognized options, which
can be inconvenient when srcfiles is used in a script.

Also it looks like the man page is missing an entry for -b.

> +      "when fetching and zipping files."), 0 },
> +    #endif
> +  #endif
> +  { NULL, 0, NULL, 0, NULL, 0 }
>  };
>
> -/* Short description of program.  */
> +/* Short description of program. */
>  static const char doc[] = N_("Lists the source files of a DWARF/ELF file. The default input is the file 'a.out'.");
>
> -/* Strings for arguments in help texts.  */
> +/* Strings for arguments in help texts. */
>  static const char args_doc[] = N_("INPUT");
>
> -/* Prototype for option handler.  */
> +/* Prototype for option handler. */
>  static error_t parse_opt (int key, char *arg, struct argp_state *state);
>
> -static struct argp_child argp_children[2]; /* [0] is set in main.  */
> +static struct argp_child argp_children[2]; /* [0] is set in main. */
>
> -/* Data structure to communicate with argp functions.  */
> +/* Data structure to communicate with argp functions. */
>  static const struct argp argp =
>  {
>    options, parse_opt, args_doc, doc, argp_children, NULL, NULL
> @@ -73,8 +124,19 @@ static bool verbose;
>  static bool null_arg;
>  /* Only print compilation unit names. */
>  static bool CU_only;
> -
> -/* Handle program arguments. */
> +#ifdef HAVE_LIBARCHIVE
> +  /* Zip all the source files and send to stdout. */
> +  static bool zip;
> +
> +  #ifdef ENABLE_LIBDEBUGINFOD
> +    /* Disables local source file search when debuginfod fails to fetch them.
> +       This option is only applicable when fetching and zipping files.*/
> +    static bool no_backup;
> +  #endif
> +#endif
> +
> +/* Handle program arguments. Note null arg and zip
> +    cannot be combined due to warnings raised when unzipping. */
>  static error_t
>  parse_opt (int key, char *arg, struct argp_state *state)
>  {
> @@ -97,6 +159,18 @@ parse_opt (int key, char *arg, struct argp_state *state)
>      case 'c':
>        CU_only = true;
>        break;
> +
> +    #ifdef HAVE_LIBARCHIVE
> +      case 'z':
> +      zip = true;
> +      break;
> +
> +      #ifdef ENABLE_LIBDEBUGINFOD
> +        case 'b':
> +        no_backup = true;
> +        break;
> +      #endif
> +    #endif
>
>      default:
>        return ARGP_ERR_UNKNOWN;
> @@ -104,11 +178,40 @@ parse_opt (int key, char *arg, struct argp_state *state)
>    return 0;
>  }
>
> +/* Remove the "/./" , "../" and the preceding directory
> +    that some paths include which raise errors during unzip. */
> +string canonicalize_path(string path)
> +{
> +    stringstream ss(path);
> +    string token;
> +    vector<string> tokens;
> +    /* Extract each directory of the path and place into a vector. */
> +    while (getline(ss, token, '/')) {
> +      /* Ignore any empty //, or /./ dirs. */
> +        if (token == "" || token == ".")
> +            continue;
> +      /* When /.. is encountered, remove the most recent directory from the vector. */
> +        else if (token == "..") {
> +            if (!tokens.empty())
> +                tokens.pop_back();
> +        } else
> +            tokens.push_back(token);
> +    }
> +    stringstream result;
> +    if (tokens.empty())
> +        return "/";
> +    /* Reconstruct the path from the extracted directories. */
> +    for (const string &t : tokens) {
> +        result << '/' << t;
> +    }
> +    return result.str();
> +}
>
> -/* Global list of collected source files.  Normally, it'll contain
> -   the sources of just one named binary, but the '-K' option can cause
> -   multiple dwfl modules to be loaded, thus listed.   */
> -   set<string> debug_sourcefiles;
> +/* Global list of collected source files and their respective module.
> +   Normally, it'll contain the sources of just one named binary, but
> +   the '-K' option can cause multiple dwfl modules to be loaded, thus
> +   listed. */
> +set<pair<string, Dwfl_Module*>> debug_sourcefiles;
>
>  static int
>  collect_sourcefiles (Dwfl_Module *dwflmod,
> @@ -118,15 +221,14 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>                       void *arg __attribute__ ((unused)))
>  {
>    Dwarf *dbg;
> -  Dwarf_Addr bias; /* ignored - for addressing purposes only  */
> -
> +  Dwarf_Addr bias; /* ignored - for addressing purposes only. */
> +
>    dbg = dwfl_module_getdwarf (dwflmod, &bias);
>
>    Dwarf_Off offset = 0;
>    Dwarf_Off old_offset;
>    size_t hsize;
> -
> -  /* Traverse all CUs of this module.  */
> +  /* Traverse all CUs of this module. */
>    while (dwarf_nextcu (dbg, old_offset = offset, &offset, &hsize, NULL, NULL, NULL) == 0)
>      {
>        Dwarf_Die cudie_mem;
> @@ -141,7 +243,7 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>        if (dwarf_getsrcfiles (cudie, &files, &nfiles) != 0)
>          continue;
>
> -      /* extract DW_AT_comp_dir to resolve relative file names  */
> +      /* extract DW_AT_comp_dir to resolve relative file names. */
>        const char *comp_dir = "";
>        const char *const *dirs;
>        size_t ndirs;
> @@ -152,11 +254,19 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>          comp_dir = "";
>
>        if (verbose)
> -        std::clog << "searching for sources for cu=" << cuname
> +        clog << "searching for sources for cu=" << cuname
>                    << " comp_dir=" << comp_dir << " #files=" << nfiles
>                    << " #dirs=" << ndirs << endl;
> -
> -      for (size_t f = 1; f < nfiles; f++)
> +
> +      if (comp_dir[0] == '\0' && cuname[0] != '/')
> +        {
> +          /* This is a common symptom for dwz-compressed debug files,
> +             where the altdebug file cannot be resolved. */
> +          if (verbose)
> +            clog << "skipping cu=" << cuname << " due to empty comp_dir" << endl;
> +          continue;
> +        }
> +      for (size_t f = 1; f < nfiles; ++f)
>          {
>            const char *hat;
>            if (CU_only)
> @@ -172,7 +282,7 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>              continue;
>
>            if (string(hat).find("<built-in>")
> -              != std::string::npos) /* gcc intrinsics, don't bother record  */
> +              != string::npos) /* gcc intrinsics, don't bother recording */
>              continue;
>
>            string waldo;
> @@ -180,23 +290,137 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>              waldo = (string (hat));
>            else if (comp_dir[0] != '\0') /* comp_dir relative */
>              waldo = (string (comp_dir) + string ("/") + string (hat));
> -          debug_sourcefiles.insert (waldo);
> +          else
> +           {
> +             if (verbose)
> +              clog << "skipping file=" << hat << " due to empty comp_dir" << endl;
> +             continue;
> +           }
> +          waldo = canonicalize_path (waldo);
> +          debug_sourcefiles.insert (make_pair(waldo, dwflmod));
>          }
>      }
> -
>    return DWARF_CB_OK;
>  }
>
> +#ifdef HAVE_LIBARCHIVE
> +  void zip_files()

You don't have to indent within preprocessor conditionals. Especially
since the lines of code in this function are already quite long.

> +  {
> +    struct archive *a = archive_write_new();
> +    struct stat st;
> +    char buff[BUFFER_SIZE];
> +    int len;
> +    int fd;
> +    #ifdef ENABLE_LIBDEBUGINFOD
> +      /* Initialize a debuginfod client. */
> +      static unique_ptr <debuginfod_client, void (*)(debuginfod_client*)>
> +        client (debuginfod_begin(), &debuginfod_end);
> +    #endif
> +
> +    archive_write_set_format_zip(a);
> +    archive_write_open_fd(a, STDOUT_FILENO);
> +
> +    int missing_files = 0;
> +    for (const auto &pair : debug_sourcefiles)
> +    {
> +      fd = -1;
> +      const std::string &file_path = pair.first;
> +
> +      /* Attempt to query debuginfod client to fetch source files. */
> +      #ifdef ENABLE_LIBDEBUGINFOD
> +        Dwfl_Module* dwflmod = pair.second;
> +        /* Obtain source file's build ID. */
> +        const unsigned char *bits;
> +        GElf_Addr vaddr;
> +        int bits_length = dwfl_module_build_id(dwflmod, &bits, &vaddr);
> +        /* Ensure successful client and build ID acquisition. */
> +        if (client.get() != NULL && bits_length > 0)
> +        {
> +          fd = debuginfod_find_source(client.get(),
> +                                        bits, bits_length,
> +                                        file_path.c_str(), NULL);
> +        }
> +        else
> +        {
> +            if (client.get() == NULL)
> +                cerr << "Error: Failed to initialize debuginfod client." << endl;
> +            else
> +                cerr << "Error: Invalid build ID length (" << bits_length << ")." << endl;
> +        }
> +
> +      #endif
> +      if (!no_backup)
> +        /* Files could not be located using debuginfod, search locally */
> +        if (fd < 0)
> +          fd = open(file_path.c_str(), O_RDONLY);
> +      if (fd < 0)
> +      {
> +        missing_files++;
> +        continue;
> +      }
> +
> +      /* Create an entry for each file including file information to be placed in the zip. */
> +      if (fstat(fd, &st) == -1)
> +      {
> +        missing_files++;
> +        if (verbose)
> +          cerr << "Error: Failed to get file status for " << file_path << ": " << strerror(errno) << endl;
> +        continue;
> +      }
> +      struct archive_entry *entry = archive_entry_new();
> +      /* Removing first "/"" to make the path "relative" before zipping, otherwise warnings are raised when unzipping. */
> +      string entry_name = file_path.substr(file_path.find_first_of('/') + 1);
> +      archive_entry_set_pathname(entry, entry_name.c_str());
> +      archive_entry_copy_stat(entry, &st);
> +      if (archive_write_header(a, entry) != ARCHIVE_OK)
> +      {
> +        missing_files++;
> +        if (verbose)
> +          cerr << "Error: failed to write header for " << file_path << ": " << archive_error_string(a) << endl;
> +        continue;
> +      }
> +
> +      /* Write the file to the zip. */
> +      len = read(fd, buff, sizeof(buff));
> +      if (len == -1)
> +      {
> +        missing_files++;
> +        if (verbose)
> +          cerr << "Error: Failed to open file: " << file_path << ": " << strerror(errno) <<endl;
> +        continue;
> +      }
> +      if (verbose && len > 0)
> +        clog << "Writing to zip: " << file_path << endl;
> +      while (len > 0)
> +      {
> +        if (archive_write_data(a, buff, len) < ARCHIVE_OK)
> +        {
> +          if (verbose)
> +            cerr << "Error: Failed to read from the file: " << file_path << ": " << strerror(errno) << endl;
> +          break;
> +        }
> +        len = read(fd, buff, sizeof(buff));
> +      }
> +      close(fd);
> +      archive_entry_free(entry);
> +    }
> +    if (verbose && missing_files > 0 )
> +      cerr << "WARNING: " << missing_files << " files could not be found." << endl;

I think we should also print the names of the missing source files.
This information could help the user decide if the missing files are a
problem for them.

Also just a small detail, but I'd remove "WARNING" and just say:

    N files could not be found:
    file1
    file2
    ...

A file might not be found for a variety of mundane reasons but the
all-caps warning suggests to me that something is critically wrong.

> +
> +    archive_write_close(a);
> +    archive_write_free(a);
> +  }
> +#endif
>
>  int
>  main (int argc, char *argv[])
>  {
>    int remaining;
> -
> -  /* Parse and process arguments.  This includes opening the modules.  */
> +
> +  /* Parse and process arguments.  This includes opening the modules. */
>    argp_children[0].argp = dwfl_standard_argp ();
>    argp_children[0].group = 1;
> -
> +
>    Dwfl *dwfl = NULL;
>    (void) argp_parse (&argp, argc, argv, 0, &remaining, &dwfl);
>    assert (dwfl != NULL);
> @@ -204,14 +428,23 @@ main (int argc, char *argv[])
>    (void) dwfl_getmodules (dwfl, &collect_sourcefiles, NULL, 0);
>
>    if (!debug_sourcefiles.empty ())
> -    for (const string &element : debug_sourcefiles)
> +  {
> +    #ifdef HAVE_LIBARCHIVE
> +      if (zip)
> +        zip_files();
> +      else
> +    #endif
>        {
> -        std::cout << element;
> -        if (null_arg)
> -          std::cout << '\0';
> -        else
> -          std::cout << '\n';
> +        for (const auto &pair : debug_sourcefiles)
> +          {
> +            cout << pair.first;
> +            if (null_arg)
> +              cout << '\0';
> +            else
> +              cout << '\n';
> +          }
>        }
> +  }
>
>    dwfl_end (dwfl);
>    return 0;
> diff --git a/tests/debuginfod-subr.sh b/tests/debuginfod-subr.sh
> index 108dff74..c3b0603d 100755
> --- a/tests/debuginfod-subr.sh
> +++ b/tests/debuginfod-subr.sh
> @@ -79,12 +79,12 @@ errfiles() {
>  # So we gather the LD_LIBRARY_PATH with this cunning trick:
>  ldpath=`testrun sh -c 'echo $LD_LIBRARY_PATH'`
>
> -wait_ready()
> +wait_ready4()
>  {
>    port=$1;
>    what=$2;
>    value=$3;
> -  timeout=20;
> +  timeout=$4;
>
>    echo "Wait $timeout seconds on $port for metric $what to change to $value"
>    while [ $timeout -gt 0 ]; do
> @@ -105,6 +105,16 @@ wait_ready()
>    fi
>  }
>
> +wait_ready()
> +{
> +  port=$1;
> +  what=$2;
> +  value=$3;
> +  timeout=20;
> +  wait_ready4 "$port" "$what" "$value" "$timeout"
> +}
> +
> +
>  archive_test() {
>      __BUILDID=$1
>      __SOURCEPATH=$2
> diff --git a/tests/run-srcfiles-self.sh b/tests/run-srcfiles-self.sh
> index 0e64dd2b..1ee460f6 100755
> --- a/tests/run-srcfiles-self.sh
> +++ b/tests/run-srcfiles-self.sh
> @@ -1,4 +1,4 @@
> -#! /bin/sh
> +#! /usr/bin/env bash
>  # Copyright (C) 2023 Red Hat, Inc.
>  # This file is part of elfutils.
>  #
> @@ -15,7 +15,16 @@
>  # You should have received a copy of the GNU General Public License
>  # along with this program.  If not, see <http://www.gnu.org/licenses/>.
>
> -. $srcdir/test-subr.sh
> +. $srcdir/debuginfod-subr.sh
> +
> +# prereqs
> +type unzip 2>/dev/null || exit 77

We could check for unzip when it's actually needed. This way
we can at least run the tests that don't rely on unzip.

> +
> +# for test case debugging, uncomment:
> +# set -x
> +set -e
> +base=14000
> +get_ports
>
>  # Test different command line combinations on the srcfiles binary itself.
>  ET_EXEC="${abs_top_builddir}/src/srcfiles"
> @@ -26,11 +35,16 @@ SRC_NAME="srcfiles.cxx"
>  # Ensure the output contains the expected source file srcfiles.cxx
>  testrun $ET_EXEC -e $ET_EXEC | grep $SRC_NAME > /dev/null
>
> +# Check if zip option is available (only available if libarchive is available.
> +#   Debuginfod optional to fetch source files from debuginfod federation.)
> +$ET_EXEC --help | grep -q zip && zip=true || zip=false

If -z is changed to be always a valid option even when support
is missing (as suggested above), then this check for -z support
will have to change.

> +
>  for null_arg in --null ""; do
>    for verbose_arg in --verbose ""; do
> +    echo "Test with options $null_arg $verbose_arg"
>      testrun $ET_EXEC $null_arg $verbose_arg -p $ET_PID > /dev/null
>
> -    # Ensure that the output contains srclines.cxx
> +    # Ensure that the output contains srcfiles.cxx
>      cu_only=$(testrun $ET_EXEC $null_arg $verbose_arg -c -e $ET_EXEC)
>      default=$(testrun $ET_EXEC $null_arg $verbose_arg -e $ET_EXEC)
>      result1=$(echo "$cu_only" | grep "$SRC_NAME")
> @@ -40,9 +54,64 @@ for null_arg in --null ""; do
>        exit 1
>      fi
>
> -    # Ensure that the output with the cu-only option contains less source files
> +    # Ensure that the output with the cu-only option contains fewer source files
>      if [ $(echo "$cu_only" | wc -m) -gt $(echo "$default" | wc -m) ]; then
>        exit 1
>      fi
> +
> +    if $zip; then
> +      # Zip option tests
> +        testrun $ET_EXEC $verbose_arg -z -e $ET_EXEC > test.zip
> +        tempfiles test.zip
> +
> +        unzip -v test.zip
> +        unzip -t test.zip
> +
> +        # Ensure unzipped srcfiles.cxx and its contents are the same as the original source file
> +        unzip -j test.zip "*/$SRC_NAME"
> +        diff "$SRC_NAME" $abs_srcdir/../src/$SRC_NAME
> +        rm -f test.zip $SRC_NAME
> +    fi
>    done
>  done
> +
> +# Debuginfod source file downloading test.
> +# Start debuginfod server on the elfutils build directory.
> +if [ -x ${abs_builddir}/../debuginfod/debuginfod ] && $zip; then
> +  LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod -vvvv -d debuginfod.sqlite3 -F -p $PORT1 ${abs_top_builddir}/src > debuginfod.log 2>&1 &
> +  PID1=$!
> +  tempfiles debuginfod.sqlite3 debuginfod.log
> +  wait_ready $PORT1 'ready' 1
> +  wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
> +  wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
> +  wait_ready4 $PORT1 'thread_busy{role="scan"}' 0 300 # lots of source files may be slow to index with $db on NFS
> +
> +  export DEBUGINFOD_URLS="http://localhost:${PORT1}/"
> +  export DEBUGINFOD_VERBOSE=1
> +  testrun $ET_EXEC -z -b -e $ET_EXEC > test.zip
> +  tempfiles test.zip
> +
> +  unzip -v test.zip
> +  unzip -t test.zip
> +
> +  # Extract the zip.
> +  mkdir extracted
> +  unzip test.zip -d extracted
> +
> +  # Ensure that source files for this tool have been archived.
> +  source_files="srcfiles.cxx libdwfl.h gelf.h"
> +  extracted_files=$(find extracted -type f)
> +  for file in $source_files; do
> +      echo "$extracted_files" | grep -q "$file" > /dev/null
> +  done
> +
> +  # Compare between the extracted file and the actual source file srcfiles.cxx.
> +  extracted_file=$(find extracted -name $SRC_NAME)
> +  diff "$extracted_file" $abs_srcdir/../src/$SRC_NAME
> +
> +  rm -rf extracted
> +
> +  kill $PID1
> +  wait
> +  PID1=0
> +fi
> --
> 2.43.0
>

Nice, the tests pass for me even if libarchive and/or debuginfod is missing.

Aaron
  

Patch

diff --git a/NEWS b/NEWS
index 0420d3b8..3391d6a1 100644
--- a/NEWS
+++ b/NEWS
@@ -1,3 +1,8 @@ 
+Version 0.191 (after 0.189)
+
+srcfiles: Can now fetch the source files of a DWARF/ELF file and
+          place them into a zip.
+
 Version 0.190 "Woke!"
 
 CONTRIBUTING: Switch from real name policy to known identity policy.
@@ -9,6 +14,9 @@  libelf: Add RELR support.
 
 libdw: Recognize .debug_[ct]u_index sections
 
+srcfiles: added srcfiles tool that lists all the source files of a given 
+          DWARF/ELF file.
+
 readelf: Support readelf -Ds, --use-dynamic --symbol.
          Support .gdb_index version 9
 
diff --git a/configure.ac b/configure.ac
index af5b6bf7..ddb79b83 100644
--- a/configure.ac
+++ b/configure.ac
@@ -841,7 +841,7 @@  AM_CONDITIONAL([LIBDEBUGINFOD],[test "x$enable_libdebuginfod" = "xyes" || test "
 AM_CONDITIONAL([DUMMY_LIBDEBUGINFOD],[test "x$enable_libdebuginfod" = "xdummy"])
 AC_CHECK_HEADERS([execinfo.h])
 
-# Look for libmicrohttpd, libarchive, sqlite for debuginfo server
+# Look for libmicrohttpd, libarchive, sqlite for debuginfo server and srcfiles tool
 # minimum versions as per rhel7.
 AC_ARG_ENABLE([debuginfod],AS_HELP_STRING([--enable-debuginfod], [Build debuginfod server]))
 AS_IF([test "x$enable_debuginfod" != "xno"], [
@@ -853,11 +853,12 @@  AS_IF([test "x$enable_debuginfod" != "xno"], [
       AC_MSG_ERROR([need libdebuginfod (or dummy), use --disable-debuginfod to disable.])
     fi
     enable_debuginfod=yes # presume success
+    AC_DEFINE([HAVE_LIBARCHIVE], [1], [Define to 1 if libarchive is available]) # presume success
     PKG_PROG_PKG_CONFIG
     PKG_CHECK_MODULES([libmicrohttpd],[libmicrohttpd >= 0.9.33],[],[enable_debuginfod=no])
     PKG_CHECK_MODULES([oldlibmicrohttpd],[libmicrohttpd < 0.9.51],[old_libmicrohttpd=yes],[old_libmicrohttpd=no])
     PKG_CHECK_MODULES([sqlite3],[sqlite3 >= 3.7.17],[],[enable_debuginfod=no])
-    PKG_CHECK_MODULES([libarchive],[libarchive >= 3.1.2],[],[enable_debuginfod=no])
+    PKG_CHECK_MODULES([libarchive],[libarchive >= 3.1.2],[],[enable_debuginfod=no], AC_DEFINE([HAVE_LIBARCHIVE], [0], [Define to 0 if libarchive is not available]))
     if test "x$enable_debuginfod" = "xno"; then
       AC_MSG_ERROR([dependencies not found, use --disable-debuginfod to disable.])
     fi
diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index 524be948..6b21f46f 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -2996,8 +2996,6 @@  dwarf_extract_source_paths (Elf *elf, set<string>& debug_sourcefiles)
 
       if (comp_dir[0] == '\0' && cuname[0] != '/')
         {
-          // This is a common symptom for dwz-compressed debug files,
-          // where the altdebug file cannot be resolved.
           if (verbose > 3)
             obatched(clog) << "skipping cu=" << cuname << " due to empty comp_dir" << endl;
           continue;
diff --git a/doc/srcfiles.1 b/doc/srcfiles.1
index 6149c21b..a7cde664 100644
--- a/doc/srcfiles.1
+++ b/doc/srcfiles.1
@@ -21,15 +21,18 @@ 
 eu-srcfiles \- Lists the source files of a DWARF/ELF file.
 
 .SH "SYNOPSIS"
-eu-srcfiles [\fB\-0\fR|\fB\-\-null\fR] [\fB\-c\fR|\fB\-\-cu\-only\fR] [\fB\-v\fR|\fB\-\-verbose\fR] INPUT
+eu-srcfiles [\fB\-0\fR|\fB\-\-null\fR] [\fB\-c\fR|\fB\-\-cu\-only\fR] [\fB\-v\fR|\fB\-\-verbose\fR] [\fB\-z\fR|\fB\-\-zip\fR] INPUT
 
 .SH "DESCRIPTION"
-\fBeu-srcfiles\fR lists the source files of a given \s-DWARF/ELF\s0
+\fBeu-srcfiles\fR lists all the source files of a given DWARF/ELF
 file.  This list is based on a search of the DWARF debuginfo, which
 may be automatically fetched by debuginfod if applicable.  The target
 file may be an executable, a coredump, a process, or even the running
-kernel.  The default is the file 'a.out'.  The source file names are
-made unique and printed to standard output.
+kernel.  The default input is the file 'a.out'.  The source file names are
+made unique by prepending the full path name and then printed to standard output. The source files can be
+placed in a zip file that is sent to stdout.
+
+Note that all occurrences of '/./' and '/../' in the path name are canonicalized.
 
 .SH "INPUT OPTIONS"
 The long and short forms of options, shown here as alternatives, are
@@ -82,12 +85,20 @@  Separate items by a null instead of a newline.
 
 .TP
 \fB\-c, \-\-cu\-only\fR
-Only list the CU names.
+Only list the (compilation unit) CU names.
 
 .TP
 \fB\-v, \-\-verbose\fR
 Increase verbosity of logging messages.
 
+.TP
+\fB\-z, \-\-zip\fR
+Zip all the source files and send to stdout. 
+Files may be automatically fetched by 
+debuginfod (if applicable) or locally as a 
+backup. Any source files that were not found 
+will not be archived.
+
 
 .SH EXAMPLES
 
@@ -119,6 +130,11 @@  List the source files of a kernel image.
 eu-srcfiles -e /boot/vmlinuz-`uname -r`
 .ESAMPLE
 
+Zip all the source files for a binary.
+.SAMPLE
+eu-srcfiles -z -e /bin/ls > ls.zip
+.ESAMPLE
+
 
 .SH "AUTHOR"
 Written by Housam Alamour.
diff --git a/src/Makefile.am b/src/Makefile.am
index d3d9d408..5fa8df3d 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -21,7 +21,7 @@  DEFS += $(YYDEBUG) -DDEBUGPRED=@DEBUGPRED@ \
 	-DSRCDIR=\"$(shell cd $(srcdir);pwd)\" -DOBJDIR=\"$(shell pwd)\"
 AM_CPPFLAGS += -I$(srcdir)/../libelf -I$(srcdir)/../libebl \
 	    -I$(srcdir)/../libdw -I$(srcdir)/../libdwelf \
-	    -I$(srcdir)/../libdwfl -I$(srcdir)/../libasm
+	    -I$(srcdir)/../libdwfl -I$(srcdir)/../libasm -I../debuginfod
 
 AM_LDFLAGS = -Wl,-rpath-link,../libelf:../libdw $(STACK_USAGE_NO_ERROR)
 
@@ -50,6 +50,11 @@  libelf = ../libelf/libelf.so
 endif
 libebl = ../libebl/libebl.a ../backends/libebl_backends.a ../libcpu/libcpu.a
 libeu = ../lib/libeu.a
+if LIBDEBUGINFOD
+libdebuginfod = ../debuginfod/libdebuginfod.so
+else
+libdebuginfod = 
+endif
 
 if DEMANGLE
 demanglelib = -lstdc++
@@ -85,7 +90,7 @@  stack_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD) $(demanglelib)
 elfcompress_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD)
 elfclassify_LDADD = $(libelf) $(libdw) $(libeu) $(argp_LDADD)
 srcfiles_SOURCES = srcfiles.cxx
-srcfiles_LDADD = $(libdw) $(libelf) $(libeu)  $(argp_LDADD)
+srcfiles_LDADD = $(libdw) $(libelf) $(libeu)  $(argp_LDADD) $(libarchive_LIBS) $(libdebuginfod)
 
 installcheck-binPROGRAMS: $(bin_PROGRAMS)
 	bad=0; pid=$$$$; list="$(bin_PROGRAMS)"; for p in $$list; do \
diff --git a/src/srcfiles.cxx b/src/srcfiles.cxx
index 3c7afdc4..0e4f756f 100644
--- a/src/srcfiles.cxx
+++ b/src/srcfiles.cxx
@@ -15,6 +15,21 @@ 
 
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+  
+
+/* In case we have a bad fts we include this before config.h because it
+   can't handle _FILE_OFFSET_BITS.
+   Everything we need here is fine if its declarations just come first.
+   Also, include sys/types.h before fts. On some systems fts.h is not self
+   contained. */
+#ifdef BAD_FTS
+  #include <sys/types.h>
+  #include <fts.h>
+#endif
+
+#ifdef HAVE_CONFIG_H
+# include <config.h>
+#endif
 
 #include "printversion.h"
 #include <dwarf.h>
@@ -23,23 +38,50 @@ 
 #include <set>
 #include <string>
 #include <cassert>
-#include <config.h>
+#include <gelf.h>
+#include <memory>
+
+#ifdef ENABLE_LIBDEBUGINFOD
+  #include "debuginfod.h"
+#endif
 
 #include <libdwfl.h>
 #include <fcntl.h>
 #include <iostream>
 #include <libdw.h>
+#include <sstream>
+#include <vector>
+
+/* Libraries for use by the --zip option */
+#ifdef HAVE_LIBARCHIVE
+  #include <archive.h>
+  #include <archive_entry.h>
+#endif
+
+/* If fts.h is included before config.h, its indirect inclusions may not
+   give us the right LFS aliases of these functions, so map them manually. */
+#ifdef BAD_FTS
+  #ifdef _FILE_OFFSET_BITS
+    #define open open64
+    #define fopen fopen64
+  #endif
+#else
+  #include <sys/types.h>
+  #include <fts.h>
+#endif
 
 using namespace std;
 
-/* Name and version of program.  */
+/* Name and version of program. */
 ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
 
-/* Bug report address.  */
+/* Bug report address. */
 ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
 
-/* Definitions of arguments for argp functions.  */
-static const struct argp_option options[] =
+constexpr size_t BUFFER_SIZE = 8192;
+
+/* Definitions of arguments for argp functions. */
+static const struct argp_option options[] = 
 {
   { NULL, 0, NULL, OPTION_DOC, N_("Output options:"), 1 },
   { "null", '0', NULL, 0,
@@ -47,21 +89,30 @@  static const struct argp_option options[] =
   { "verbose", 'v', NULL, 0,
     N_ ("Increase verbosity of logging messages."), 0 },
   { "cu-only", 'c', NULL, 0, N_ ("Only list the CU names."), 0 },
-  { NULL, 0, NULL, 0, NULL, 0 }
+  #ifdef HAVE_LIBARCHIVE
+  { "zip", 'z', NULL, 0, N_ ("Zip all the source files and send to stdout."
+    "Cannot be used with the null option"), 0 },
+    #ifdef ENABLE_LIBDEBUGINFOD
+    { "no-backup", 'b', NULL, 0, N_ ("Disables local source file search when " 
+      "debuginfod fails to fetch files. This option is only applicable"
+      "when fetching and zipping files."), 0 },
+    #endif
+  #endif
+  { NULL, 0, NULL, 0, NULL, 0 } 
 };
 
-/* Short description of program.  */
+/* Short description of program. */
 static const char doc[] = N_("Lists the source files of a DWARF/ELF file. The default input is the file 'a.out'.");
 
-/* Strings for arguments in help texts.  */
+/* Strings for arguments in help texts. */
 static const char args_doc[] = N_("INPUT");
 
-/* Prototype for option handler.  */
+/* Prototype for option handler. */
 static error_t parse_opt (int key, char *arg, struct argp_state *state);
 
-static struct argp_child argp_children[2]; /* [0] is set in main.  */
+static struct argp_child argp_children[2]; /* [0] is set in main. */
 
-/* Data structure to communicate with argp functions.  */
+/* Data structure to communicate with argp functions. */
 static const struct argp argp =
 {
   options, parse_opt, args_doc, doc, argp_children, NULL, NULL
@@ -73,8 +124,19 @@  static bool verbose;
 static bool null_arg;
 /* Only print compilation unit names. */
 static bool CU_only;
-
-/* Handle program arguments. */
+#ifdef HAVE_LIBARCHIVE
+  /* Zip all the source files and send to stdout. */
+  static bool zip;
+
+  #ifdef ENABLE_LIBDEBUGINFOD
+    /* Disables local source file search when debuginfod fails to fetch them. 
+       This option is only applicable when fetching and zipping files.*/
+    static bool no_backup;
+  #endif
+#endif
+
+/* Handle program arguments. Note null arg and zip 
+    cannot be combined due to warnings raised when unzipping. */
 static error_t
 parse_opt (int key, char *arg, struct argp_state *state)
 {
@@ -97,6 +159,18 @@  parse_opt (int key, char *arg, struct argp_state *state)
     case 'c':
       CU_only = true;
       break;
+    
+    #ifdef HAVE_LIBARCHIVE
+      case 'z':
+      zip = true;
+      break;
+
+      #ifdef ENABLE_LIBDEBUGINFOD
+        case 'b':
+        no_backup = true;
+        break;
+      #endif
+    #endif
 
     default:
       return ARGP_ERR_UNKNOWN;
@@ -104,11 +178,40 @@  parse_opt (int key, char *arg, struct argp_state *state)
   return 0;
 }
 
+/* Remove the "/./" , "../" and the preceding directory 
+    that some paths include which raise errors during unzip. */
+string canonicalize_path(string path) 
+{
+    stringstream ss(path);
+    string token;
+    vector<string> tokens;
+    /* Extract each directory of the path and place into a vector. */
+    while (getline(ss, token, '/')) {
+      /* Ignore any empty //, or /./ dirs. */
+        if (token == "" || token == ".")
+            continue;
+      /* When /.. is encountered, remove the most recent directory from the vector. */
+        else if (token == "..") {
+            if (!tokens.empty())
+                tokens.pop_back();
+        } else
+            tokens.push_back(token);
+    }
+    stringstream result;
+    if (tokens.empty())
+        return "/";
+    /* Reconstruct the path from the extracted directories. */
+    for (const string &t : tokens) {
+        result << '/' << t;
+    }
+    return result.str();
+}
 
-/* Global list of collected source files.  Normally, it'll contain
-   the sources of just one named binary, but the '-K' option can cause
-   multiple dwfl modules to be loaded, thus listed.   */
-   set<string> debug_sourcefiles;
+/* Global list of collected source files and their respective module.  
+   Normally, it'll contain the sources of just one named binary, but 
+   the '-K' option can cause multiple dwfl modules to be loaded, thus 
+   listed. */
+set<pair<string, Dwfl_Module*>> debug_sourcefiles;
 
 static int
 collect_sourcefiles (Dwfl_Module *dwflmod,
@@ -118,15 +221,14 @@  collect_sourcefiles (Dwfl_Module *dwflmod,
                      void *arg __attribute__ ((unused)))
 {
   Dwarf *dbg;
-  Dwarf_Addr bias; /* ignored - for addressing purposes only  */
-
+  Dwarf_Addr bias; /* ignored - for addressing purposes only. */
+  
   dbg = dwfl_module_getdwarf (dwflmod, &bias);
 
   Dwarf_Off offset = 0;
   Dwarf_Off old_offset;
   size_t hsize;
-
-  /* Traverse all CUs of this module.  */
+  /* Traverse all CUs of this module. */
   while (dwarf_nextcu (dbg, old_offset = offset, &offset, &hsize, NULL, NULL, NULL) == 0)
     {
       Dwarf_Die cudie_mem;
@@ -141,7 +243,7 @@  collect_sourcefiles (Dwfl_Module *dwflmod,
       if (dwarf_getsrcfiles (cudie, &files, &nfiles) != 0)
         continue;
 
-      /* extract DW_AT_comp_dir to resolve relative file names  */
+      /* extract DW_AT_comp_dir to resolve relative file names. */
       const char *comp_dir = "";
       const char *const *dirs;
       size_t ndirs;
@@ -152,11 +254,19 @@  collect_sourcefiles (Dwfl_Module *dwflmod,
         comp_dir = "";
 
       if (verbose)
-        std::clog << "searching for sources for cu=" << cuname
+        clog << "searching for sources for cu=" << cuname
                   << " comp_dir=" << comp_dir << " #files=" << nfiles
                   << " #dirs=" << ndirs << endl;
-
-      for (size_t f = 1; f < nfiles; f++)
+      
+      if (comp_dir[0] == '\0' && cuname[0] != '/')
+        {
+          /* This is a common symptom for dwz-compressed debug files,
+             where the altdebug file cannot be resolved. */
+          if (verbose)
+            clog << "skipping cu=" << cuname << " due to empty comp_dir" << endl;
+          continue;
+        }
+      for (size_t f = 1; f < nfiles; ++f)
         {
           const char *hat;
           if (CU_only)
@@ -172,7 +282,7 @@  collect_sourcefiles (Dwfl_Module *dwflmod,
             continue;
 
           if (string(hat).find("<built-in>")
-              != std::string::npos) /* gcc intrinsics, don't bother record  */
+              != string::npos) /* gcc intrinsics, don't bother recording */
             continue;
 
           string waldo;
@@ -180,23 +290,137 @@  collect_sourcefiles (Dwfl_Module *dwflmod,
             waldo = (string (hat));
           else if (comp_dir[0] != '\0') /* comp_dir relative */
             waldo = (string (comp_dir) + string ("/") + string (hat));
-          debug_sourcefiles.insert (waldo);
+          else
+           {
+             if (verbose)
+              clog << "skipping file=" << hat << " due to empty comp_dir" << endl;
+             continue;
+           }
+          waldo = canonicalize_path (waldo);
+          debug_sourcefiles.insert (make_pair(waldo, dwflmod));
         }
     }
-
   return DWARF_CB_OK;
 }
 
+#ifdef HAVE_LIBARCHIVE
+  void zip_files()
+  {
+    struct archive *a = archive_write_new();
+    struct stat st;
+    char buff[BUFFER_SIZE];
+    int len;
+    int fd;
+    #ifdef ENABLE_LIBDEBUGINFOD
+      /* Initialize a debuginfod client. */
+      static unique_ptr <debuginfod_client, void (*)(debuginfod_client*)>
+        client (debuginfod_begin(), &debuginfod_end);
+    #endif
+
+    archive_write_set_format_zip(a);
+    archive_write_open_fd(a, STDOUT_FILENO);
+
+    int missing_files = 0;
+    for (const auto &pair : debug_sourcefiles)
+    {
+      fd = -1;
+      const std::string &file_path = pair.first;
+
+      /* Attempt to query debuginfod client to fetch source files. */
+      #ifdef ENABLE_LIBDEBUGINFOD
+        Dwfl_Module* dwflmod = pair.second;
+        /* Obtain source file's build ID. */
+        const unsigned char *bits;
+        GElf_Addr vaddr;
+        int bits_length = dwfl_module_build_id(dwflmod, &bits, &vaddr);
+        /* Ensure successful client and build ID acquisition. */
+        if (client.get() != NULL && bits_length > 0)
+        {
+          fd = debuginfod_find_source(client.get(),
+                                        bits, bits_length, 
+                                        file_path.c_str(), NULL);
+        }
+        else
+        {
+            if (client.get() == NULL)
+                cerr << "Error: Failed to initialize debuginfod client." << endl;
+            else
+                cerr << "Error: Invalid build ID length (" << bits_length << ")." << endl;
+        }
+      
+      #endif
+      if (!no_backup)
+        /* Files could not be located using debuginfod, search locally */
+        if (fd < 0)
+          fd = open(file_path.c_str(), O_RDONLY);
+      if (fd < 0)
+      {
+        missing_files++;
+        continue;
+      }
+
+      /* Create an entry for each file including file information to be placed in the zip. */
+      if (fstat(fd, &st) == -1)
+      {
+        missing_files++;
+        if (verbose)
+          cerr << "Error: Failed to get file status for " << file_path << ": " << strerror(errno) << endl;
+        continue;
+      }
+      struct archive_entry *entry = archive_entry_new();
+      /* Removing first "/"" to make the path "relative" before zipping, otherwise warnings are raised when unzipping. */
+      string entry_name = file_path.substr(file_path.find_first_of('/') + 1);
+      archive_entry_set_pathname(entry, entry_name.c_str());
+      archive_entry_copy_stat(entry, &st);
+      if (archive_write_header(a, entry) != ARCHIVE_OK)
+      {
+        missing_files++;
+        if (verbose)
+          cerr << "Error: failed to write header for " << file_path << ": " << archive_error_string(a) << endl;
+        continue;
+      }
+
+      /* Write the file to the zip. */
+      len = read(fd, buff, sizeof(buff));
+      if (len == -1)
+      {
+        missing_files++;
+        if (verbose)
+          cerr << "Error: Failed to open file: " << file_path << ": " << strerror(errno) <<endl;
+        continue;
+      }
+      if (verbose && len > 0)
+        clog << "Writing to zip: " << file_path << endl;
+      while (len > 0)
+      {
+        if (archive_write_data(a, buff, len) < ARCHIVE_OK)
+        {
+          if (verbose)
+            cerr << "Error: Failed to read from the file: " << file_path << ": " << strerror(errno) << endl;
+          break; 
+        }
+        len = read(fd, buff, sizeof(buff));
+      }
+      close(fd);
+      archive_entry_free(entry);
+    }
+    if (verbose && missing_files > 0 )
+      cerr << "WARNING: " << missing_files << " files could not be found." << endl;
+
+    archive_write_close(a);
+    archive_write_free(a);
+  }
+#endif
 
 int
 main (int argc, char *argv[])
 {
   int remaining;
-
-  /* Parse and process arguments.  This includes opening the modules.  */
+  
+  /* Parse and process arguments.  This includes opening the modules. */
   argp_children[0].argp = dwfl_standard_argp ();
   argp_children[0].group = 1;
-
+  
   Dwfl *dwfl = NULL;
   (void) argp_parse (&argp, argc, argv, 0, &remaining, &dwfl);
   assert (dwfl != NULL);
@@ -204,14 +428,23 @@  main (int argc, char *argv[])
   (void) dwfl_getmodules (dwfl, &collect_sourcefiles, NULL, 0);
 
   if (!debug_sourcefiles.empty ())
-    for (const string &element : debug_sourcefiles)
+  {
+    #ifdef HAVE_LIBARCHIVE
+      if (zip)
+        zip_files();
+      else
+    #endif
       {
-        std::cout << element;
-        if (null_arg)
-          std::cout << '\0';
-        else
-          std::cout << '\n';
+        for (const auto &pair : debug_sourcefiles)
+          {
+            cout << pair.first;
+            if (null_arg)
+              cout << '\0';
+            else
+              cout << '\n';
+          }
       }
+  }
 
   dwfl_end (dwfl);
   return 0;
diff --git a/tests/debuginfod-subr.sh b/tests/debuginfod-subr.sh
index 108dff74..c3b0603d 100755
--- a/tests/debuginfod-subr.sh
+++ b/tests/debuginfod-subr.sh
@@ -79,12 +79,12 @@  errfiles() {
 # So we gather the LD_LIBRARY_PATH with this cunning trick:
 ldpath=`testrun sh -c 'echo $LD_LIBRARY_PATH'`
 
-wait_ready()
+wait_ready4()
 {
   port=$1;
   what=$2;
   value=$3;
-  timeout=20;
+  timeout=$4;
 
   echo "Wait $timeout seconds on $port for metric $what to change to $value"
   while [ $timeout -gt 0 ]; do
@@ -105,6 +105,16 @@  wait_ready()
   fi
 }
 
+wait_ready()
+{
+  port=$1;
+  what=$2;
+  value=$3;
+  timeout=20;
+  wait_ready4 "$port" "$what" "$value" "$timeout"
+}
+
+
 archive_test() {
     __BUILDID=$1
     __SOURCEPATH=$2
diff --git a/tests/run-srcfiles-self.sh b/tests/run-srcfiles-self.sh
index 0e64dd2b..1ee460f6 100755
--- a/tests/run-srcfiles-self.sh
+++ b/tests/run-srcfiles-self.sh
@@ -1,4 +1,4 @@ 
-#! /bin/sh
+#! /usr/bin/env bash
 # Copyright (C) 2023 Red Hat, Inc.
 # This file is part of elfutils.
 #
@@ -15,7 +15,16 @@ 
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 
-. $srcdir/test-subr.sh
+. $srcdir/debuginfod-subr.sh
+
+# prereqs
+type unzip 2>/dev/null || exit 77
+
+# for test case debugging, uncomment:
+# set -x
+set -e
+base=14000
+get_ports
 
 # Test different command line combinations on the srcfiles binary itself.
 ET_EXEC="${abs_top_builddir}/src/srcfiles"
@@ -26,11 +35,16 @@  SRC_NAME="srcfiles.cxx"
 # Ensure the output contains the expected source file srcfiles.cxx
 testrun $ET_EXEC -e $ET_EXEC | grep $SRC_NAME > /dev/null
 
+# Check if zip option is available (only available if libarchive is available. 
+#   Debuginfod optional to fetch source files from debuginfod federation.)
+$ET_EXEC --help | grep -q zip && zip=true || zip=false
+
 for null_arg in --null ""; do
   for verbose_arg in --verbose ""; do
+    echo "Test with options $null_arg $verbose_arg"
     testrun $ET_EXEC $null_arg $verbose_arg -p $ET_PID > /dev/null
 
-    # Ensure that the output contains srclines.cxx
+    # Ensure that the output contains srcfiles.cxx
     cu_only=$(testrun $ET_EXEC $null_arg $verbose_arg -c -e $ET_EXEC)
     default=$(testrun $ET_EXEC $null_arg $verbose_arg -e $ET_EXEC)
     result1=$(echo "$cu_only" | grep "$SRC_NAME")
@@ -40,9 +54,64 @@  for null_arg in --null ""; do
       exit 1
     fi
 
-    # Ensure that the output with the cu-only option contains less source files
+    # Ensure that the output with the cu-only option contains fewer source files
     if [ $(echo "$cu_only" | wc -m) -gt $(echo "$default" | wc -m) ]; then
       exit 1
     fi
+
+    if $zip; then
+      # Zip option tests
+        testrun $ET_EXEC $verbose_arg -z -e $ET_EXEC > test.zip
+        tempfiles test.zip
+        
+        unzip -v test.zip
+        unzip -t test.zip
+        
+        # Ensure unzipped srcfiles.cxx and its contents are the same as the original source file
+        unzip -j test.zip "*/$SRC_NAME"
+        diff "$SRC_NAME" $abs_srcdir/../src/$SRC_NAME
+        rm -f test.zip $SRC_NAME
+    fi
   done
 done
+
+# Debuginfod source file downloading test.
+# Start debuginfod server on the elfutils build directory.
+if [ -x ${abs_builddir}/../debuginfod/debuginfod ] && $zip; then
+  LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod -vvvv -d debuginfod.sqlite3 -F -p $PORT1 ${abs_top_builddir}/src > debuginfod.log 2>&1 &
+  PID1=$!
+  tempfiles debuginfod.sqlite3 debuginfod.log
+  wait_ready $PORT1 'ready' 1
+  wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
+  wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+  wait_ready4 $PORT1 'thread_busy{role="scan"}' 0 300 # lots of source files may be slow to index with $db on NFS 
+  
+  export DEBUGINFOD_URLS="http://localhost:${PORT1}/"
+  export DEBUGINFOD_VERBOSE=1
+  testrun $ET_EXEC -z -b -e $ET_EXEC > test.zip
+  tempfiles test.zip
+  
+  unzip -v test.zip
+  unzip -t test.zip
+  
+  # Extract the zip.
+  mkdir extracted
+  unzip test.zip -d extracted
+  
+  # Ensure that source files for this tool have been archived.
+  source_files="srcfiles.cxx libdwfl.h gelf.h"
+  extracted_files=$(find extracted -type f)
+  for file in $source_files; do
+      echo "$extracted_files" | grep -q "$file" > /dev/null
+  done
+
+  # Compare between the extracted file and the actual source file srcfiles.cxx.
+  extracted_file=$(find extracted -name $SRC_NAME)
+  diff "$extracted_file" $abs_srcdir/../src/$SRC_NAME
+
+  rm -rf extracted
+  
+  kill $PID1
+  wait
+  PID1=0
+fi