PR 30991: srcfiles tarball feature

Message ID 20240202154953.587686-1-halamour@redhat.com
State Committed
Delegated to: Aaron Merey
Headers
Series PR 30991: srcfiles tarball feature |

Commit Message

Housam Alamour Feb. 2, 2024, 3:49 p.m. UTC
  * srcfiles.cxx: Introduce new --zip option that places all the
    source files associated with a specified dwarf/elf file
    into a zip file and sends it to stdout. Files may be
    fetched from debuginfod (if applicable) or locally as
    a backup.
    Added -b option to disable the backup of checking
    for files locally in -z mode.

* run-srcfiles-self.sh: Added test-case for the new zip
    feature that archives the source files of the srcfiles
    tool and checks archive integrity. An additional test
    ensures that if debuginfod is enabled, the files are
    fetched and archived properly while maintaing integrity.

* debuginfod-subr.sh: On very slow/remote storage, it can
    take O(minute) to finish indexing the entire elfutils
    build tree, so a wait_ready4 shell function is one
    way to let a longer debuginfod wait operation work.

* srcfiles.1, NEWS: Added documentation for the new zip feature.

* configure.ac: Simplify check for libarchive for srcfiles.cxx
    by integrating it into the same check for debuginfod.

* Makefile.am: build with local copy of debuginfod-client.

Example:
% ./src/srcfiles -z -e /bin/ls > output.zip

https://sourceware.org/bugzilla/show_bug.cgi?id=30991

Signed-off-by: Housam Alamour <halamour@redhat.com>
---
 NEWS                       |   8 +
 configure.ac               |   5 +-
 debuginfod/debuginfod.cxx  |   2 -
 doc/srcfiles.1             |  33 +++-
 src/Makefile.am            |   9 +-
 src/srcfiles.cxx           | 305 +++++++++++++++++++++++++++++++++----
 tests/debuginfod-subr.sh   |  14 +-
 tests/run-srcfiles-self.sh |  74 ++++++++-
 8 files changed, 400 insertions(+), 50 deletions(-)
  

Comments

Aaron Merey Feb. 6, 2024, 12:24 a.m. UTC | #1
Thanks Housam, pushed as commit 75fb8c0152.

Aaron

On Fri, Feb 2, 2024 at 10:51 AM Housam Alamour <halamour@redhat.com> wrote:
>
> * srcfiles.cxx: Introduce new --zip option that places all the
>     source files associated with a specified dwarf/elf file
>     into a zip file and sends it to stdout. Files may be
>     fetched from debuginfod (if applicable) or locally as
>     a backup.
>     Added -b option to disable the backup of checking
>     for files locally in -z mode.
>
> * run-srcfiles-self.sh: Added test-case for the new zip
>     feature that archives the source files of the srcfiles
>     tool and checks archive integrity. An additional test
>     ensures that if debuginfod is enabled, the files are
>     fetched and archived properly while maintaing integrity.
>
> * debuginfod-subr.sh: On very slow/remote storage, it can
>     take O(minute) to finish indexing the entire elfutils
>     build tree, so a wait_ready4 shell function is one
>     way to let a longer debuginfod wait operation work.
>
> * srcfiles.1, NEWS: Added documentation for the new zip feature.
>
> * configure.ac: Simplify check for libarchive for srcfiles.cxx
>     by integrating it into the same check for debuginfod.
>
> * Makefile.am: build with local copy of debuginfod-client.
>
> Example:
> % ./src/srcfiles -z -e /bin/ls > output.zip
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=30991
>
> Signed-off-by: Housam Alamour <halamour@redhat.com>
> ---
>  NEWS                       |   8 +
>  configure.ac               |   5 +-
>  debuginfod/debuginfod.cxx  |   2 -
>  doc/srcfiles.1             |  33 +++-
>  src/Makefile.am            |   9 +-
>  src/srcfiles.cxx           | 305 +++++++++++++++++++++++++++++++++----
>  tests/debuginfod-subr.sh   |  14 +-
>  tests/run-srcfiles-self.sh |  74 ++++++++-
>  8 files changed, 400 insertions(+), 50 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 0420d3b8..3391d6a1 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,3 +1,8 @@
> +Version 0.191 (after 0.189)
> +
> +srcfiles: Can now fetch the source files of a DWARF/ELF file and
> +          place them into a zip.
> +
>  Version 0.190 "Woke!"
>
>  CONTRIBUTING: Switch from real name policy to known identity policy.
> @@ -9,6 +14,9 @@ libelf: Add RELR support.
>
>  libdw: Recognize .debug_[ct]u_index sections
>
> +srcfiles: added srcfiles tool that lists all the source files of a given
> +          DWARF/ELF file.
> +
>  readelf: Support readelf -Ds, --use-dynamic --symbol.
>           Support .gdb_index version 9
>
> diff --git a/configure.ac b/configure.ac
> index af5b6bf7..ddb79b83 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -841,7 +841,7 @@ AM_CONDITIONAL([LIBDEBUGINFOD],[test "x$enable_libdebuginfod" = "xyes" || test "
>  AM_CONDITIONAL([DUMMY_LIBDEBUGINFOD],[test "x$enable_libdebuginfod" = "xdummy"])
>  AC_CHECK_HEADERS([execinfo.h])
>
> -# Look for libmicrohttpd, libarchive, sqlite for debuginfo server
> +# Look for libmicrohttpd, libarchive, sqlite for debuginfo server and srcfiles tool
>  # minimum versions as per rhel7.
>  AC_ARG_ENABLE([debuginfod],AS_HELP_STRING([--enable-debuginfod], [Build debuginfod server]))
>  AS_IF([test "x$enable_debuginfod" != "xno"], [
> @@ -853,11 +853,12 @@ AS_IF([test "x$enable_debuginfod" != "xno"], [
>        AC_MSG_ERROR([need libdebuginfod (or dummy), use --disable-debuginfod to disable.])
>      fi
>      enable_debuginfod=yes # presume success
> +    AC_DEFINE([HAVE_LIBARCHIVE], [1], [Define to 1 if libarchive is available]) # presume success
>      PKG_PROG_PKG_CONFIG
>      PKG_CHECK_MODULES([libmicrohttpd],[libmicrohttpd >= 0.9.33],[],[enable_debuginfod=no])
>      PKG_CHECK_MODULES([oldlibmicrohttpd],[libmicrohttpd < 0.9.51],[old_libmicrohttpd=yes],[old_libmicrohttpd=no])
>      PKG_CHECK_MODULES([sqlite3],[sqlite3 >= 3.7.17],[],[enable_debuginfod=no])
> -    PKG_CHECK_MODULES([libarchive],[libarchive >= 3.1.2],[],[enable_debuginfod=no])
> +    PKG_CHECK_MODULES([libarchive],[libarchive >= 3.1.2],[],[enable_debuginfod=no], AC_DEFINE([HAVE_LIBARCHIVE], [0], [Define to 0 if libarchive is not available]))
>      if test "x$enable_debuginfod" = "xno"; then
>        AC_MSG_ERROR([dependencies not found, use --disable-debuginfod to disable.])
>      fi
> diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
> index 524be948..6b21f46f 100644
> --- a/debuginfod/debuginfod.cxx
> +++ b/debuginfod/debuginfod.cxx
> @@ -2996,8 +2996,6 @@ dwarf_extract_source_paths (Elf *elf, set<string>& debug_sourcefiles)
>
>        if (comp_dir[0] == '\0' && cuname[0] != '/')
>          {
> -          // This is a common symptom for dwz-compressed debug files,
> -          // where the altdebug file cannot be resolved.
>            if (verbose > 3)
>              obatched(clog) << "skipping cu=" << cuname << " due to empty comp_dir" << endl;
>            continue;
> diff --git a/doc/srcfiles.1 b/doc/srcfiles.1
> index 6149c21b..c6338315 100644
> --- a/doc/srcfiles.1
> +++ b/doc/srcfiles.1
> @@ -21,15 +21,18 @@
>  eu-srcfiles \- Lists the source files of a DWARF/ELF file.
>
>  .SH "SYNOPSIS"
> -eu-srcfiles [\fB\-0\fR|\fB\-\-null\fR] [\fB\-c\fR|\fB\-\-cu\-only\fR] [\fB\-v\fR|\fB\-\-verbose\fR] INPUT
> +eu-srcfiles [\fB\-0\fR|\fB\-\-null\fR] [\fB\-c\fR|\fB\-\-cu\-only\fR] [\fB\-v\fR|\fB\-\-verbose\fR] [\fB\-z\fR|\fB\-\-zip\fR] INPUT
>
>  .SH "DESCRIPTION"
> -\fBeu-srcfiles\fR lists the source files of a given \s-DWARF/ELF\s0
> +\fBeu-srcfiles\fR lists all the source files of a given DWARF/ELF
>  file.  This list is based on a search of the DWARF debuginfo, which
>  may be automatically fetched by debuginfod if applicable.  The target
>  file may be an executable, a coredump, a process, or even the running
> -kernel.  The default is the file 'a.out'.  The source file names are
> -made unique and printed to standard output.
> +kernel.  The default input is the file 'a.out'.  The source file names are
> +made unique by prepending the full path name and then printed to standard output. The source files can be
> +placed in a zip file that is sent to stdout.
> +
> +Note that all occurrences of '/./' and '/../' in the path name are canonicalized.
>
>  .SH "INPUT OPTIONS"
>  The long and short forms of options, shown here as alternatives, are
> @@ -80,14 +83,29 @@ Print program version.
>  \fB\-0, \-\-null\fR
>  Separate items by a null instead of a newline.
>
> +.TP
> +\fB\-b, \-\-no-backup\fR
> +Disables local source file search when
> +debuginfod fails to fetch files.  This
> +option is only applicable when fetching and
> +zipping files.
> +
>  .TP
>  \fB\-c, \-\-cu\-only\fR
> -Only list the CU names.
> +Only list the CU (compilation unit) names.
>
>  .TP
>  \fB\-v, \-\-verbose\fR
>  Increase verbosity of logging messages.
>
> +.TP
> +\fB\-z, \-\-zip\fR
> +Zip all the source files and send to stdout.
> +By default, files will be automatically fetched by
> +debuginfod (if applicable) or locally as a
> +backup. Any source files that were not found
> +will not be archived.
> +
>
>  .SH EXAMPLES
>
> @@ -119,6 +137,11 @@ List the source files of a kernel image.
>  eu-srcfiles -e /boot/vmlinuz-`uname -r`
>  .ESAMPLE
>
> +Zip all the source files for a binary.
> +.SAMPLE
> +eu-srcfiles -z -e /bin/ls > ls.zip
> +.ESAMPLE
> +
>
>  .SH "AUTHOR"
>  Written by Housam Alamour.
> diff --git a/src/Makefile.am b/src/Makefile.am
> index d3d9d408..5fa8df3d 100644
> --- a/src/Makefile.am
> +++ b/src/Makefile.am
> @@ -21,7 +21,7 @@ DEFS += $(YYDEBUG) -DDEBUGPRED=@DEBUGPRED@ \
>         -DSRCDIR=\"$(shell cd $(srcdir);pwd)\" -DOBJDIR=\"$(shell pwd)\"
>  AM_CPPFLAGS += -I$(srcdir)/../libelf -I$(srcdir)/../libebl \
>             -I$(srcdir)/../libdw -I$(srcdir)/../libdwelf \
> -           -I$(srcdir)/../libdwfl -I$(srcdir)/../libasm
> +           -I$(srcdir)/../libdwfl -I$(srcdir)/../libasm -I../debuginfod
>
>  AM_LDFLAGS = -Wl,-rpath-link,../libelf:../libdw $(STACK_USAGE_NO_ERROR)
>
> @@ -50,6 +50,11 @@ libelf = ../libelf/libelf.so
>  endif
>  libebl = ../libebl/libebl.a ../backends/libebl_backends.a ../libcpu/libcpu.a
>  libeu = ../lib/libeu.a
> +if LIBDEBUGINFOD
> +libdebuginfod = ../debuginfod/libdebuginfod.so
> +else
> +libdebuginfod =
> +endif
>
>  if DEMANGLE
>  demanglelib = -lstdc++
> @@ -85,7 +90,7 @@ stack_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD) $(demanglelib)
>  elfcompress_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD)
>  elfclassify_LDADD = $(libelf) $(libdw) $(libeu) $(argp_LDADD)
>  srcfiles_SOURCES = srcfiles.cxx
> -srcfiles_LDADD = $(libdw) $(libelf) $(libeu)  $(argp_LDADD)
> +srcfiles_LDADD = $(libdw) $(libelf) $(libeu)  $(argp_LDADD) $(libarchive_LIBS) $(libdebuginfod)
>
>  installcheck-binPROGRAMS: $(bin_PROGRAMS)
>         bad=0; pid=$$$$; list="$(bin_PROGRAMS)"; for p in $$list; do \
> diff --git a/src/srcfiles.cxx b/src/srcfiles.cxx
> index 3c7afdc4..df602719 100644
> --- a/src/srcfiles.cxx
> +++ b/src/srcfiles.cxx
> @@ -15,6 +15,21 @@
>
>     You should have received a copy of the GNU General Public License
>     along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +
> +/* In case we have a bad fts we include this before config.h because it
> +   can't handle _FILE_OFFSET_BITS.
> +   Everything we need here is fine if its declarations just come first.
> +   Also, include sys/types.h before fts.  On some systems fts.h is not self
> +   contained.  */
> +#ifdef BAD_FTS
> +#include <sys/types.h>
> +#include <fts.h>
> +#endif
> +
> +#ifdef HAVE_CONFIG_H
> +# include <config.h>
> +#endif
>
>  #include "printversion.h"
>  #include <dwarf.h>
> @@ -23,12 +38,37 @@
>  #include <set>
>  #include <string>
>  #include <cassert>
> -#include <config.h>
> +#include <gelf.h>
> +#include <memory>
> +
> +#ifdef ENABLE_LIBDEBUGINFOD
> +#include "debuginfod.h"
> +#endif
>
>  #include <libdwfl.h>
>  #include <fcntl.h>
>  #include <iostream>
>  #include <libdw.h>
> +#include <sstream>
> +#include <vector>
> +
> +/* Libraries for use by the --zip option */
> +#ifdef HAVE_LIBARCHIVE
> +#include <archive.h>
> +#include <archive_entry.h>
> +#endif
> +
> +/* If fts.h is included before config.h, its indirect inclusions may not
> +   give us the right LFS aliases of these functions, so map them manually.  */
> +#ifdef BAD_FTS
> +#ifdef _FILE_OFFSET_BITS
> +#define open open64
> +#define fopen fopen64
> +#endif
> +#else
> +  #include <sys/types.h>
> +  #include <fts.h>
> +#endif
>
>  using namespace std;
>
> @@ -38,8 +78,10 @@ ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
>  /* Bug report address.  */
>  ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
>
> +constexpr size_t BUFFER_SIZE = 8192;
> +
>  /* Definitions of arguments for argp functions.  */
> -static const struct argp_option options[] =
> +static const struct argp_option options[] =
>  {
>    { NULL, 0, NULL, OPTION_DOC, N_("Output options:"), 1 },
>    { "null", '0', NULL, 0,
> @@ -47,11 +89,20 @@ static const struct argp_option options[] =
>    { "verbose", 'v', NULL, 0,
>      N_ ("Increase verbosity of logging messages."), 0 },
>    { "cu-only", 'c', NULL, 0, N_ ("Only list the CU names."), 0 },
> -  { NULL, 0, NULL, 0, NULL, 0 }
> +  #ifdef HAVE_LIBARCHIVE
> +  { "zip", 'z', NULL, 0, N_ ("Zip all the source files and send to stdout."
> +    "Cannot be used with the null option"), 0 },
> +    #ifdef ENABLE_LIBDEBUGINFOD
> +    { "no-backup", 'b', NULL, 0, N_ ("Disables local source file search when "
> +      "debuginfod fails to fetch files. This option is only applicable"
> +      "when fetching and zipping files."), 0 },
> +    #endif
> +  #endif
> +  { NULL, 0, NULL, 0, NULL, 0 }
>  };
>
>  /* Short description of program.  */
> -static const char doc[] = N_("Lists the source files of a DWARF/ELF file. The default input is the file 'a.out'.");
> +static const char doc[] = N_("Lists the source files of a DWARF/ELF file.  The default input is the file 'a.out'.");
>
>  /* Strings for arguments in help texts.  */
>  static const char args_doc[] = N_("INPUT");
> @@ -67,18 +118,29 @@ static const struct argp argp =
>    options, parse_opt, args_doc, doc, argp_children, NULL, NULL
>  };
>
> -/* Verbose message printing. */
> +/* Verbose message printing.  */
>  static bool verbose;
> -/* Delimit the output with nulls. */
> +/* Delimit the output with nulls.  */
>  static bool null_arg;
> -/* Only print compilation unit names. */
> +/* Only print compilation unit names.  */
>  static bool CU_only;
> -
> -/* Handle program arguments. */
> +#ifdef HAVE_LIBARCHIVE
> +  /* Zip all the source files and send to stdout. */
> +  static bool zip;
> +
> +  #ifdef ENABLE_LIBDEBUGINFOD
> +    /* Disables local source file search when debuginfod fails to fetch them.
> +       This option is only applicable when fetching and zipping files.*/
> +    static bool no_backup;
> +  #endif
> +#endif
> +
> +/* Handle program arguments.  Note null arg and zip
> +    cannot be combined due to warnings raised when unzipping.  */
>  static error_t
>  parse_opt (int key, char *arg, struct argp_state *state)
>  {
> -  /* Suppress "unused parameter" warning. */
> +  /* Suppress "unused parameter" warning.  */
>    (void)arg;
>    switch (key)
>      {
> @@ -97,6 +159,18 @@ parse_opt (int key, char *arg, struct argp_state *state)
>      case 'c':
>        CU_only = true;
>        break;
> +
> +    #ifdef HAVE_LIBARCHIVE
> +      case 'z':
> +      zip = true;
> +      break;
> +
> +      #ifdef ENABLE_LIBDEBUGINFOD
> +        case 'b':
> +        no_backup = true;
> +        break;
> +      #endif
> +    #endif
>
>      default:
>        return ARGP_ERR_UNKNOWN;
> @@ -104,11 +178,40 @@ parse_opt (int key, char *arg, struct argp_state *state)
>    return 0;
>  }
>
> +/* Remove the "/./" , "../" and the preceding directory
> +    that some paths include which raise errors during unzip.  */
> +string canonicalize_path(string path)
> +{
> +    stringstream ss(path);
> +    string token;
> +    vector<string> tokens;
> +    /* Extract each directory of the path and place into a vector.  */
> +    while (getline(ss, token, '/')) {
> +      /* Ignore any empty //, or /./ dirs.  */
> +        if (token == "" || token == ".")
> +            continue;
> +      /* When /..  is encountered, remove the most recent directory from the vector.  */
> +        else if (token == "..") {
> +            if (!tokens.empty())
> +                tokens.pop_back();
> +        } else
> +            tokens.push_back(token);
> +    }
> +    stringstream result;
> +    if (tokens.empty())
> +        return "/";
> +    /* Reconstruct the path from the extracted directories.  */
> +    for (const string &t : tokens) {
> +        result << '/' << t;
> +    }
> +    return result.str();
> +}
>
> -/* Global list of collected source files.  Normally, it'll contain
> -   the sources of just one named binary, but the '-K' option can cause
> -   multiple dwfl modules to be loaded, thus listed.   */
> -   set<string> debug_sourcefiles;
> +/* Global list of collected source files and their respective module.
> +   Normally, it'll contain the sources of just one named binary, but
> +   the '-K' option can cause multiple dwfl modules to be loaded, thus
> +   listed.  */
> +set<pair<string, Dwfl_Module*>> debug_sourcefiles;
>
>  static int
>  collect_sourcefiles (Dwfl_Module *dwflmod,
> @@ -118,14 +221,13 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>                       void *arg __attribute__ ((unused)))
>  {
>    Dwarf *dbg;
> -  Dwarf_Addr bias; /* ignored - for addressing purposes only  */
> -
> +  Dwarf_Addr bias; /* ignored - for addressing purposes only.  */
> +
>    dbg = dwfl_module_getdwarf (dwflmod, &bias);
>
>    Dwarf_Off offset = 0;
>    Dwarf_Off old_offset;
>    size_t hsize;
> -
>    /* Traverse all CUs of this module.  */
>    while (dwarf_nextcu (dbg, old_offset = offset, &offset, &hsize, NULL, NULL, NULL) == 0)
>      {
> @@ -141,7 +243,7 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>        if (dwarf_getsrcfiles (cudie, &files, &nfiles) != 0)
>          continue;
>
> -      /* extract DW_AT_comp_dir to resolve relative file names  */
> +      /* extract DW_AT_comp_dir to resolve relative file names.  */
>        const char *comp_dir = "";
>        const char *const *dirs;
>        size_t ndirs;
> @@ -152,11 +254,19 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>          comp_dir = "";
>
>        if (verbose)
> -        std::clog << "searching for sources for cu=" << cuname
> +        clog << "searching for sources for cu=" << cuname
>                    << " comp_dir=" << comp_dir << " #files=" << nfiles
>                    << " #dirs=" << ndirs << endl;
> -
> -      for (size_t f = 1; f < nfiles; f++)
> +
> +      if (comp_dir[0] == '\0' && cuname[0] != '/')
> +        {
> +          /* This is a common symptom for dwz-compressed debug files,
> +             where the altdebug file cannot be resolved.  */
> +          if (verbose)
> +            clog << "skipping cu=" << cuname << " due to empty comp_dir" << endl;
> +          continue;
> +        }
> +      for (size_t f = 1; f < nfiles; ++f)
>          {
>            const char *hat;
>            if (CU_only)
> @@ -172,7 +282,7 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>              continue;
>
>            if (string(hat).find("<built-in>")
> -              != std::string::npos) /* gcc intrinsics, don't bother record  */
> +              != string::npos) /* gcc intrinsics, don't bother recording */
>              continue;
>
>            string waldo;
> @@ -180,38 +290,167 @@ collect_sourcefiles (Dwfl_Module *dwflmod,
>              waldo = (string (hat));
>            else if (comp_dir[0] != '\0') /* comp_dir relative */
>              waldo = (string (comp_dir) + string ("/") + string (hat));
> -          debug_sourcefiles.insert (waldo);
> +          else
> +           {
> +             if (verbose)
> +              clog << "skipping file=" << hat << " due to empty comp_dir" << endl;
> +             continue;
> +           }
> +          waldo = canonicalize_path (waldo);
> +          debug_sourcefiles.insert (make_pair(waldo, dwflmod));
>          }
>      }
> -
>    return DWARF_CB_OK;
>  }
>
> +#ifdef HAVE_LIBARCHIVE
> +void zip_files()
> +{
> +  struct archive *a = archive_write_new();
> +  struct stat st;
> +  char buff[BUFFER_SIZE];
> +  int len;
> +  int fd;
> +  #ifdef ENABLE_LIBDEBUGINFOD
> +  /* Initialize a debuginfod client.  */
> +  static unique_ptr <debuginfod_client, void (*)(debuginfod_client*)>
> +    client (debuginfod_begin(), &debuginfod_end);
> +  #endif
> +
> +  archive_write_set_format_zip(a);
> +  archive_write_open_fd(a, STDOUT_FILENO);
> +
> +  int missing_files = 0;
> +  for (const auto &pair : debug_sourcefiles)
> +  {
> +    fd = -1;
> +    const std::string &file_path = pair.first;
> +
> +    /* Attempt to query debuginfod client to fetch source files.  */
> +    #ifdef ENABLE_LIBDEBUGINFOD
> +    Dwfl_Module* dwflmod = pair.second;
> +    /* Obtain source file's build ID.  */
> +    const unsigned char *bits;
> +    GElf_Addr vaddr;
> +    int bits_length = dwfl_module_build_id(dwflmod, &bits, &vaddr);
> +    /* Ensure successful client and build ID acquisition.  */
> +    if (client.get() != NULL && bits_length > 0)
> +    {
> +      fd = debuginfod_find_source(client.get(),
> +                                    bits, bits_length,
> +                                    file_path.c_str(), NULL);
> +    }
> +    else
> +    {
> +        if (client.get() == NULL)
> +            cerr << "Error: Failed to initialize debuginfod client." << endl;
> +        else
> +            cerr << "Error: Invalid build ID length (" << bits_length << ")." << endl;
> +    }
> +    #endif
> +
> +    if (!no_backup)
> +      /* Files could not be located using debuginfod, search locally */
> +      if (fd < 0)
> +        fd = open(file_path.c_str(), O_RDONLY);
> +    if (fd < 0)
> +    {
> +      if (verbose)
> +        cerr << file_path << endl;
> +      missing_files++;
> +      continue;
> +    }
> +
> +    /* Create an entry for each file including file information to be placed in the zip.  */
> +    if (fstat(fd, &st) == -1)
> +    {
> +      if (verbose)
> +        cerr << file_path << endl;
> +      missing_files++;
> +      if (verbose)
> +        cerr << "Error: Failed to get file status for " << file_path << ": " << strerror(errno) << endl;
> +      continue;
> +    }
> +    struct archive_entry *entry = archive_entry_new();
> +    /* Removing first "/"" to make the path "relative" before zipping, otherwise warnings are raised when unzipping.  */
> +    string entry_name = file_path.substr(file_path.find_first_of('/') + 1);
> +    archive_entry_set_pathname(entry, entry_name.c_str());
> +    archive_entry_copy_stat(entry, &st);
> +    if (archive_write_header(a, entry) != ARCHIVE_OK)
> +    {
> +      if (verbose)
> +        cerr << file_path << endl;
> +      missing_files++;
> +      if (verbose)
> +        cerr << "Error: failed to write header for " << file_path << ": " << archive_error_string(a) << endl;
> +      continue;
> +    }
> +
> +    /* Write the file to the zip.  */
> +    len = read(fd, buff, sizeof(buff));
> +    if (len == -1)
> +    {
> +      if (verbose)
> +        cerr << file_path << endl;
> +      missing_files++;
> +      if (verbose)
> +        cerr << "Error: Failed to open file: " << file_path << ": " << strerror(errno) <<endl;
> +      continue;
> +    }
> +    while (len > 0)
> +    {
> +      if (archive_write_data(a, buff, len) < ARCHIVE_OK)
> +      {
> +        if (verbose)
> +          cerr << "Error: Failed to read from the file: " << file_path << ": " << strerror(errno) << endl;
> +        break;
> +      }
> +      len = read(fd, buff, sizeof(buff));
> +    }
> +    close(fd);
> +    archive_entry_free(entry);
> +  }
> +  if (verbose && missing_files > 0 )
> +    cerr << missing_files << " file(s) listed above could not be found.  " << endl;
> +
> +  archive_write_close(a);
> +  archive_write_free(a);
> +}
> +#endif
>
>  int
>  main (int argc, char *argv[])
>  {
>    int remaining;
> -
> +
>    /* Parse and process arguments.  This includes opening the modules.  */
>    argp_children[0].argp = dwfl_standard_argp ();
>    argp_children[0].group = 1;
> -
> +
>    Dwfl *dwfl = NULL;
>    (void) argp_parse (&argp, argc, argv, 0, &remaining, &dwfl);
>    assert (dwfl != NULL);
> -  /* Process all loaded modules - probably just one, except if -K or -p is used. */
> +  /* Process all loaded modules - probably just one, except if -K or -p is used.  */
>    (void) dwfl_getmodules (dwfl, &collect_sourcefiles, NULL, 0);
>
>    if (!debug_sourcefiles.empty ())
> -    for (const string &element : debug_sourcefiles)
> +  {
> +    #ifdef HAVE_LIBARCHIVE
> +      if (zip)
> +        zip_files();
> +      else
> +    #endif
>        {
> -        std::cout << element;
> -        if (null_arg)
> -          std::cout << '\0';
> -        else
> -          std::cout << '\n';
> +        for (const auto &pair : debug_sourcefiles)
> +          {
> +            cout << pair.first;
> +            if (null_arg)
> +              cout << '\0';
> +            else
> +              cout << '\n';
> +          }
>        }
> +  }
>
>    dwfl_end (dwfl);
>    return 0;
> diff --git a/tests/debuginfod-subr.sh b/tests/debuginfod-subr.sh
> index 108dff74..c3b0603d 100755
> --- a/tests/debuginfod-subr.sh
> +++ b/tests/debuginfod-subr.sh
> @@ -79,12 +79,12 @@ errfiles() {
>  # So we gather the LD_LIBRARY_PATH with this cunning trick:
>  ldpath=`testrun sh -c 'echo $LD_LIBRARY_PATH'`
>
> -wait_ready()
> +wait_ready4()
>  {
>    port=$1;
>    what=$2;
>    value=$3;
> -  timeout=20;
> +  timeout=$4;
>
>    echo "Wait $timeout seconds on $port for metric $what to change to $value"
>    while [ $timeout -gt 0 ]; do
> @@ -105,6 +105,16 @@ wait_ready()
>    fi
>  }
>
> +wait_ready()
> +{
> +  port=$1;
> +  what=$2;
> +  value=$3;
> +  timeout=20;
> +  wait_ready4 "$port" "$what" "$value" "$timeout"
> +}
> +
> +
>  archive_test() {
>      __BUILDID=$1
>      __SOURCEPATH=$2
> diff --git a/tests/run-srcfiles-self.sh b/tests/run-srcfiles-self.sh
> index 0e64dd2b..afbb1e77 100755
> --- a/tests/run-srcfiles-self.sh
> +++ b/tests/run-srcfiles-self.sh
> @@ -1,4 +1,4 @@
> -#! /bin/sh
> +#! /usr/bin/env bash
>  # Copyright (C) 2023 Red Hat, Inc.
>  # This file is part of elfutils.
>  #
> @@ -15,7 +15,13 @@
>  # You should have received a copy of the GNU General Public License
>  # along with this program.  If not, see <http://www.gnu.org/licenses/>.
>
> -. $srcdir/test-subr.sh
> +. $srcdir/debuginfod-subr.sh
> +
> +# for test case debugging, uncomment:
> +# set -x
> +set -e
> +base=14000
> +get_ports
>
>  # Test different command line combinations on the srcfiles binary itself.
>  ET_EXEC="${abs_top_builddir}/src/srcfiles"
> @@ -26,11 +32,16 @@ SRC_NAME="srcfiles.cxx"
>  # Ensure the output contains the expected source file srcfiles.cxx
>  testrun $ET_EXEC -e $ET_EXEC | grep $SRC_NAME > /dev/null
>
> +# Check if zip option is available (only available if libarchive is available.
> +#   Debuginfod optional to fetch source files from debuginfod federation.)
> +$ET_EXEC --help | grep -q zip && command -v unzip >/dev/null 2>&1 && zip=true || zip=false
> +
>  for null_arg in --null ""; do
>    for verbose_arg in --verbose ""; do
> +    echo "Test with options $null_arg $verbose_arg"
>      testrun $ET_EXEC $null_arg $verbose_arg -p $ET_PID > /dev/null
>
> -    # Ensure that the output contains srclines.cxx
> +    # Ensure that the output contains srcfiles.cxx
>      cu_only=$(testrun $ET_EXEC $null_arg $verbose_arg -c -e $ET_EXEC)
>      default=$(testrun $ET_EXEC $null_arg $verbose_arg -e $ET_EXEC)
>      result1=$(echo "$cu_only" | grep "$SRC_NAME")
> @@ -40,9 +51,64 @@ for null_arg in --null ""; do
>        exit 1
>      fi
>
> -    # Ensure that the output with the cu-only option contains less source files
> +    # Ensure that the output with the cu-only option contains fewer source files
>      if [ $(echo "$cu_only" | wc -m) -gt $(echo "$default" | wc -m) ]; then
>        exit 1
>      fi
> +
> +    if $zip; then
> +      # Zip option tests
> +        testrun $ET_EXEC $verbose_arg -z -e $ET_EXEC > test.zip
> +        tempfiles test.zip
> +
> +        unzip -v test.zip
> +        unzip -t test.zip
> +
> +        # Ensure unzipped srcfiles.cxx and its contents are the same as the original source file
> +        unzip -j test.zip "*/$SRC_NAME"
> +        diff "$SRC_NAME" $abs_srcdir/../src/$SRC_NAME
> +        rm -f test.zip $SRC_NAME
> +    fi
>    done
>  done
> +
> +# Debuginfod source file downloading test.
> +# Start debuginfod server on the elfutils build directory.
> +if [ -x ${abs_builddir}/../debuginfod/debuginfod ] && $zip; then
> +  LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod -vvvv -d debuginfod.sqlite3 -F -p $PORT1 ${abs_top_builddir}/src > debuginfod.log 2>&1 &
> +  PID1=$!
> +  tempfiles debuginfod.sqlite3 debuginfod.log
> +  wait_ready $PORT1 'ready' 1
> +  wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
> +  wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
> +  wait_ready4 $PORT1 'thread_busy{role="scan"}' 0 300 # lots of source files may be slow to index with $db on NFS
> +
> +  export DEBUGINFOD_URLS="http://localhost:${PORT1}/"
> +  export DEBUGINFOD_VERBOSE=1
> +  testrun $ET_EXEC -z -b -e $ET_EXEC > test.zip
> +  tempfiles test.zip
> +
> +  unzip -v test.zip
> +  unzip -t test.zip
> +
> +  # Extract the zip.
> +  mkdir extracted
> +  unzip test.zip -d extracted
> +
> +  # Ensure that source files for this tool have been archived.
> +  source_files="srcfiles.cxx libdwfl.h gelf.h"
> +  extracted_files=$(find extracted -type f)
> +  for file in $source_files; do
> +      echo "$extracted_files" | grep -q "$file" > /dev/null
> +  done
> +
> +  # Compare between the extracted file and the actual source file srcfiles.cxx.
> +  extracted_file=$(find extracted -name $SRC_NAME)
> +  diff "$extracted_file" $abs_srcdir/../src/$SRC_NAME
> +
> +  rm -rf extracted
> +
> +  kill $PID1
> +  wait
> +  PID1=0
> +fi
> --
> 2.43.0
>
  
Mark Wielaard Feb. 6, 2024, 12:03 p.m. UTC | #2
Hi,

On Mon, Feb 05, 2024 at 07:24:33PM -0500, Aaron Merey wrote:
> > diff --git a/NEWS b/NEWS
> > index 0420d3b8..3391d6a1 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -1,3 +1,8 @@
> > +Version 0.191 (after 0.189)
> > +
> > +srcfiles: Can now fetch the source files of a DWARF/ELF file and
> > +          place them into a zip.
> > +
> >  Version 0.190 "Woke!"
> >
> >  CONTRIBUTING: Switch from real name policy to known identity policy.
> > @@ -9,6 +14,9 @@ libelf: Add RELR support.
> >
> >  libdw: Recognize .debug_[ct]u_index sections
> >
> > +srcfiles: added srcfiles tool that lists all the source files of a given
> > +          DWARF/ELF file.
> > +
> >  readelf: Support readelf -Ds, --use-dynamic --symbol.
> >           Support .gdb_index version 9
> >

This looks like a merge issue after the 0.190 release. I updated the
version at the top and removed the duplicate entry under 0.190.

> > diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
> > index 524be948..6b21f46f 100644
> > --- a/debuginfod/debuginfod.cxx
> > +++ b/debuginfod/debuginfod.cxx
> > @@ -2996,8 +2996,6 @@ dwarf_extract_source_paths (Elf *elf, set<string>& debug_sourcefiles)
> >
> >        if (comp_dir[0] == '\0' && cuname[0] != '/')
> >          {
> > -          // This is a common symptom for dwz-compressed debug files,
> > -          // where the altdebug file cannot be resolved.
> >            if (verbose > 3)
> >              obatched(clog) << "skipping cu=" << cuname << " due to empty comp_dir" << endl;
> >            continue;

Was it intended to remove this comment?

Cheers,

Mark
From 71794afb54fd9668a02a361134f15180ce85f3ad Mon Sep 17 00:00:00 2001
From: Mark Wielaard <mark@klomp.org>
Date: Tue, 6 Feb 2024 12:53:54 +0100
Subject: [PATCH] NEWS: Update version number and remove duplicate 0.190 entry

---
 NEWS | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/NEWS b/NEWS
index e2393e77..98dc78f5 100644
--- a/NEWS
+++ b/NEWS
@@ -1,4 +1,4 @@
-Version 0.191 (after 0.189)
+Version 0.191 (after 0.190)
 
 srcfiles: Can now fetch the source files of a DWARF/ELF file and
           place them into a zip.
@@ -14,9 +14,6 @@ libelf: Add RELR support.
 
 libdw: Recognize .debug_[ct]u_index sections
 
-srcfiles: added srcfiles tool that lists all the source files of a given
-          DWARF/ELF file.
-
 readelf: Support readelf -Ds, --use-dynamic --symbol.
          Support .gdb_index version 9
  
Frank Ch. Eigler Feb. 6, 2024, 2:47 p.m. UTC | #3
Hi -

> > > -          // This is a common symptom for dwz-compressed debug files,
> > > -          // where the altdebug file cannot be resolved.
> 
> Was it intended to remove this comment?

Yes, it was copy/pasted from debuginfod by mistake.

- FChE
  

Patch

diff --git a/NEWS b/NEWS
index 0420d3b8..3391d6a1 100644
--- a/NEWS
+++ b/NEWS
@@ -1,3 +1,8 @@ 
+Version 0.191 (after 0.189)
+
+srcfiles: Can now fetch the source files of a DWARF/ELF file and
+          place them into a zip.
+
 Version 0.190 "Woke!"
 
 CONTRIBUTING: Switch from real name policy to known identity policy.
@@ -9,6 +14,9 @@  libelf: Add RELR support.
 
 libdw: Recognize .debug_[ct]u_index sections
 
+srcfiles: added srcfiles tool that lists all the source files of a given 
+          DWARF/ELF file.
+
 readelf: Support readelf -Ds, --use-dynamic --symbol.
          Support .gdb_index version 9
 
diff --git a/configure.ac b/configure.ac
index af5b6bf7..ddb79b83 100644
--- a/configure.ac
+++ b/configure.ac
@@ -841,7 +841,7 @@  AM_CONDITIONAL([LIBDEBUGINFOD],[test "x$enable_libdebuginfod" = "xyes" || test "
 AM_CONDITIONAL([DUMMY_LIBDEBUGINFOD],[test "x$enable_libdebuginfod" = "xdummy"])
 AC_CHECK_HEADERS([execinfo.h])
 
-# Look for libmicrohttpd, libarchive, sqlite for debuginfo server
+# Look for libmicrohttpd, libarchive, sqlite for debuginfo server and srcfiles tool
 # minimum versions as per rhel7.
 AC_ARG_ENABLE([debuginfod],AS_HELP_STRING([--enable-debuginfod], [Build debuginfod server]))
 AS_IF([test "x$enable_debuginfod" != "xno"], [
@@ -853,11 +853,12 @@  AS_IF([test "x$enable_debuginfod" != "xno"], [
       AC_MSG_ERROR([need libdebuginfod (or dummy), use --disable-debuginfod to disable.])
     fi
     enable_debuginfod=yes # presume success
+    AC_DEFINE([HAVE_LIBARCHIVE], [1], [Define to 1 if libarchive is available]) # presume success
     PKG_PROG_PKG_CONFIG
     PKG_CHECK_MODULES([libmicrohttpd],[libmicrohttpd >= 0.9.33],[],[enable_debuginfod=no])
     PKG_CHECK_MODULES([oldlibmicrohttpd],[libmicrohttpd < 0.9.51],[old_libmicrohttpd=yes],[old_libmicrohttpd=no])
     PKG_CHECK_MODULES([sqlite3],[sqlite3 >= 3.7.17],[],[enable_debuginfod=no])
-    PKG_CHECK_MODULES([libarchive],[libarchive >= 3.1.2],[],[enable_debuginfod=no])
+    PKG_CHECK_MODULES([libarchive],[libarchive >= 3.1.2],[],[enable_debuginfod=no], AC_DEFINE([HAVE_LIBARCHIVE], [0], [Define to 0 if libarchive is not available]))
     if test "x$enable_debuginfod" = "xno"; then
       AC_MSG_ERROR([dependencies not found, use --disable-debuginfod to disable.])
     fi
diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index 524be948..6b21f46f 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -2996,8 +2996,6 @@  dwarf_extract_source_paths (Elf *elf, set<string>& debug_sourcefiles)
 
       if (comp_dir[0] == '\0' && cuname[0] != '/')
         {
-          // This is a common symptom for dwz-compressed debug files,
-          // where the altdebug file cannot be resolved.
           if (verbose > 3)
             obatched(clog) << "skipping cu=" << cuname << " due to empty comp_dir" << endl;
           continue;
diff --git a/doc/srcfiles.1 b/doc/srcfiles.1
index 6149c21b..c6338315 100644
--- a/doc/srcfiles.1
+++ b/doc/srcfiles.1
@@ -21,15 +21,18 @@ 
 eu-srcfiles \- Lists the source files of a DWARF/ELF file.
 
 .SH "SYNOPSIS"
-eu-srcfiles [\fB\-0\fR|\fB\-\-null\fR] [\fB\-c\fR|\fB\-\-cu\-only\fR] [\fB\-v\fR|\fB\-\-verbose\fR] INPUT
+eu-srcfiles [\fB\-0\fR|\fB\-\-null\fR] [\fB\-c\fR|\fB\-\-cu\-only\fR] [\fB\-v\fR|\fB\-\-verbose\fR] [\fB\-z\fR|\fB\-\-zip\fR] INPUT
 
 .SH "DESCRIPTION"
-\fBeu-srcfiles\fR lists the source files of a given \s-DWARF/ELF\s0
+\fBeu-srcfiles\fR lists all the source files of a given DWARF/ELF
 file.  This list is based on a search of the DWARF debuginfo, which
 may be automatically fetched by debuginfod if applicable.  The target
 file may be an executable, a coredump, a process, or even the running
-kernel.  The default is the file 'a.out'.  The source file names are
-made unique and printed to standard output.
+kernel.  The default input is the file 'a.out'.  The source file names are
+made unique by prepending the full path name and then printed to standard output. The source files can be
+placed in a zip file that is sent to stdout.
+
+Note that all occurrences of '/./' and '/../' in the path name are canonicalized.
 
 .SH "INPUT OPTIONS"
 The long and short forms of options, shown here as alternatives, are
@@ -80,14 +83,29 @@  Print program version.
 \fB\-0, \-\-null\fR
 Separate items by a null instead of a newline.
 
+.TP
+\fB\-b, \-\-no-backup\fR
+Disables local source file search when 
+debuginfod fails to fetch files.  This 
+option is only applicable when fetching and 
+zipping files.
+
 .TP
 \fB\-c, \-\-cu\-only\fR
-Only list the CU names.
+Only list the CU (compilation unit) names.
 
 .TP
 \fB\-v, \-\-verbose\fR
 Increase verbosity of logging messages.
 
+.TP
+\fB\-z, \-\-zip\fR
+Zip all the source files and send to stdout. 
+By default, files will be automatically fetched by 
+debuginfod (if applicable) or locally as a 
+backup. Any source files that were not found 
+will not be archived.
+
 
 .SH EXAMPLES
 
@@ -119,6 +137,11 @@  List the source files of a kernel image.
 eu-srcfiles -e /boot/vmlinuz-`uname -r`
 .ESAMPLE
 
+Zip all the source files for a binary.
+.SAMPLE
+eu-srcfiles -z -e /bin/ls > ls.zip
+.ESAMPLE
+
 
 .SH "AUTHOR"
 Written by Housam Alamour.
diff --git a/src/Makefile.am b/src/Makefile.am
index d3d9d408..5fa8df3d 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -21,7 +21,7 @@  DEFS += $(YYDEBUG) -DDEBUGPRED=@DEBUGPRED@ \
 	-DSRCDIR=\"$(shell cd $(srcdir);pwd)\" -DOBJDIR=\"$(shell pwd)\"
 AM_CPPFLAGS += -I$(srcdir)/../libelf -I$(srcdir)/../libebl \
 	    -I$(srcdir)/../libdw -I$(srcdir)/../libdwelf \
-	    -I$(srcdir)/../libdwfl -I$(srcdir)/../libasm
+	    -I$(srcdir)/../libdwfl -I$(srcdir)/../libasm -I../debuginfod
 
 AM_LDFLAGS = -Wl,-rpath-link,../libelf:../libdw $(STACK_USAGE_NO_ERROR)
 
@@ -50,6 +50,11 @@  libelf = ../libelf/libelf.so
 endif
 libebl = ../libebl/libebl.a ../backends/libebl_backends.a ../libcpu/libcpu.a
 libeu = ../lib/libeu.a
+if LIBDEBUGINFOD
+libdebuginfod = ../debuginfod/libdebuginfod.so
+else
+libdebuginfod = 
+endif
 
 if DEMANGLE
 demanglelib = -lstdc++
@@ -85,7 +90,7 @@  stack_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD) $(demanglelib)
 elfcompress_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD)
 elfclassify_LDADD = $(libelf) $(libdw) $(libeu) $(argp_LDADD)
 srcfiles_SOURCES = srcfiles.cxx
-srcfiles_LDADD = $(libdw) $(libelf) $(libeu)  $(argp_LDADD)
+srcfiles_LDADD = $(libdw) $(libelf) $(libeu)  $(argp_LDADD) $(libarchive_LIBS) $(libdebuginfod)
 
 installcheck-binPROGRAMS: $(bin_PROGRAMS)
 	bad=0; pid=$$$$; list="$(bin_PROGRAMS)"; for p in $$list; do \
diff --git a/src/srcfiles.cxx b/src/srcfiles.cxx
index 3c7afdc4..df602719 100644
--- a/src/srcfiles.cxx
+++ b/src/srcfiles.cxx
@@ -15,6 +15,21 @@ 
 
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+  
+
+/* In case we have a bad fts we include this before config.h because it
+   can't handle _FILE_OFFSET_BITS.
+   Everything we need here is fine if its declarations just come first.
+   Also, include sys/types.h before fts.  On some systems fts.h is not self
+   contained.  */
+#ifdef BAD_FTS
+#include <sys/types.h>
+#include <fts.h>
+#endif
+
+#ifdef HAVE_CONFIG_H
+# include <config.h>
+#endif
 
 #include "printversion.h"
 #include <dwarf.h>
@@ -23,12 +38,37 @@ 
 #include <set>
 #include <string>
 #include <cassert>
-#include <config.h>
+#include <gelf.h>
+#include <memory>
+
+#ifdef ENABLE_LIBDEBUGINFOD
+#include "debuginfod.h"
+#endif
 
 #include <libdwfl.h>
 #include <fcntl.h>
 #include <iostream>
 #include <libdw.h>
+#include <sstream>
+#include <vector>
+
+/* Libraries for use by the --zip option */
+#ifdef HAVE_LIBARCHIVE
+#include <archive.h>
+#include <archive_entry.h>
+#endif
+
+/* If fts.h is included before config.h, its indirect inclusions may not
+   give us the right LFS aliases of these functions, so map them manually.  */
+#ifdef BAD_FTS
+#ifdef _FILE_OFFSET_BITS
+#define open open64
+#define fopen fopen64
+#endif
+#else
+  #include <sys/types.h>
+  #include <fts.h>
+#endif
 
 using namespace std;
 
@@ -38,8 +78,10 @@  ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
 /* Bug report address.  */
 ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
 
+constexpr size_t BUFFER_SIZE = 8192;
+
 /* Definitions of arguments for argp functions.  */
-static const struct argp_option options[] =
+static const struct argp_option options[] = 
 {
   { NULL, 0, NULL, OPTION_DOC, N_("Output options:"), 1 },
   { "null", '0', NULL, 0,
@@ -47,11 +89,20 @@  static const struct argp_option options[] =
   { "verbose", 'v', NULL, 0,
     N_ ("Increase verbosity of logging messages."), 0 },
   { "cu-only", 'c', NULL, 0, N_ ("Only list the CU names."), 0 },
-  { NULL, 0, NULL, 0, NULL, 0 }
+  #ifdef HAVE_LIBARCHIVE
+  { "zip", 'z', NULL, 0, N_ ("Zip all the source files and send to stdout."
+    "Cannot be used with the null option"), 0 },
+    #ifdef ENABLE_LIBDEBUGINFOD
+    { "no-backup", 'b', NULL, 0, N_ ("Disables local source file search when " 
+      "debuginfod fails to fetch files. This option is only applicable"
+      "when fetching and zipping files."), 0 },
+    #endif
+  #endif
+  { NULL, 0, NULL, 0, NULL, 0 } 
 };
 
 /* Short description of program.  */
-static const char doc[] = N_("Lists the source files of a DWARF/ELF file. The default input is the file 'a.out'.");
+static const char doc[] = N_("Lists the source files of a DWARF/ELF file.  The default input is the file 'a.out'.");
 
 /* Strings for arguments in help texts.  */
 static const char args_doc[] = N_("INPUT");
@@ -67,18 +118,29 @@  static const struct argp argp =
   options, parse_opt, args_doc, doc, argp_children, NULL, NULL
 };
 
-/* Verbose message printing. */
+/* Verbose message printing.  */
 static bool verbose;
-/* Delimit the output with nulls. */
+/* Delimit the output with nulls.  */
 static bool null_arg;
-/* Only print compilation unit names. */
+/* Only print compilation unit names.  */
 static bool CU_only;
-
-/* Handle program arguments. */
+#ifdef HAVE_LIBARCHIVE
+  /* Zip all the source files and send to stdout. */
+  static bool zip;
+
+  #ifdef ENABLE_LIBDEBUGINFOD
+    /* Disables local source file search when debuginfod fails to fetch them. 
+       This option is only applicable when fetching and zipping files.*/
+    static bool no_backup;
+  #endif
+#endif
+
+/* Handle program arguments.  Note null arg and zip 
+    cannot be combined due to warnings raised when unzipping.  */
 static error_t
 parse_opt (int key, char *arg, struct argp_state *state)
 {
-  /* Suppress "unused parameter" warning. */
+  /* Suppress "unused parameter" warning.  */
   (void)arg;
   switch (key)
     {
@@ -97,6 +159,18 @@  parse_opt (int key, char *arg, struct argp_state *state)
     case 'c':
       CU_only = true;
       break;
+    
+    #ifdef HAVE_LIBARCHIVE
+      case 'z':
+      zip = true;
+      break;
+
+      #ifdef ENABLE_LIBDEBUGINFOD
+        case 'b':
+        no_backup = true;
+        break;
+      #endif
+    #endif
 
     default:
       return ARGP_ERR_UNKNOWN;
@@ -104,11 +178,40 @@  parse_opt (int key, char *arg, struct argp_state *state)
   return 0;
 }
 
+/* Remove the "/./" , "../" and the preceding directory 
+    that some paths include which raise errors during unzip.  */
+string canonicalize_path(string path) 
+{
+    stringstream ss(path);
+    string token;
+    vector<string> tokens;
+    /* Extract each directory of the path and place into a vector.  */
+    while (getline(ss, token, '/')) {
+      /* Ignore any empty //, or /./ dirs.  */
+        if (token == "" || token == ".")
+            continue;
+      /* When /..  is encountered, remove the most recent directory from the vector.  */
+        else if (token == "..") {
+            if (!tokens.empty())
+                tokens.pop_back();
+        } else
+            tokens.push_back(token);
+    }
+    stringstream result;
+    if (tokens.empty())
+        return "/";
+    /* Reconstruct the path from the extracted directories.  */
+    for (const string &t : tokens) {
+        result << '/' << t;
+    }
+    return result.str();
+}
 
-/* Global list of collected source files.  Normally, it'll contain
-   the sources of just one named binary, but the '-K' option can cause
-   multiple dwfl modules to be loaded, thus listed.   */
-   set<string> debug_sourcefiles;
+/* Global list of collected source files and their respective module.
+   Normally, it'll contain the sources of just one named binary, but 
+   the '-K' option can cause multiple dwfl modules to be loaded, thus 
+   listed.  */
+set<pair<string, Dwfl_Module*>> debug_sourcefiles;
 
 static int
 collect_sourcefiles (Dwfl_Module *dwflmod,
@@ -118,14 +221,13 @@  collect_sourcefiles (Dwfl_Module *dwflmod,
                      void *arg __attribute__ ((unused)))
 {
   Dwarf *dbg;
-  Dwarf_Addr bias; /* ignored - for addressing purposes only  */
-
+  Dwarf_Addr bias; /* ignored - for addressing purposes only.  */
+  
   dbg = dwfl_module_getdwarf (dwflmod, &bias);
 
   Dwarf_Off offset = 0;
   Dwarf_Off old_offset;
   size_t hsize;
-
   /* Traverse all CUs of this module.  */
   while (dwarf_nextcu (dbg, old_offset = offset, &offset, &hsize, NULL, NULL, NULL) == 0)
     {
@@ -141,7 +243,7 @@  collect_sourcefiles (Dwfl_Module *dwflmod,
       if (dwarf_getsrcfiles (cudie, &files, &nfiles) != 0)
         continue;
 
-      /* extract DW_AT_comp_dir to resolve relative file names  */
+      /* extract DW_AT_comp_dir to resolve relative file names.  */
       const char *comp_dir = "";
       const char *const *dirs;
       size_t ndirs;
@@ -152,11 +254,19 @@  collect_sourcefiles (Dwfl_Module *dwflmod,
         comp_dir = "";
 
       if (verbose)
-        std::clog << "searching for sources for cu=" << cuname
+        clog << "searching for sources for cu=" << cuname
                   << " comp_dir=" << comp_dir << " #files=" << nfiles
                   << " #dirs=" << ndirs << endl;
-
-      for (size_t f = 1; f < nfiles; f++)
+      
+      if (comp_dir[0] == '\0' && cuname[0] != '/')
+        {
+          /* This is a common symptom for dwz-compressed debug files,
+             where the altdebug file cannot be resolved.  */
+          if (verbose)
+            clog << "skipping cu=" << cuname << " due to empty comp_dir" << endl;
+          continue;
+        }
+      for (size_t f = 1; f < nfiles; ++f)
         {
           const char *hat;
           if (CU_only)
@@ -172,7 +282,7 @@  collect_sourcefiles (Dwfl_Module *dwflmod,
             continue;
 
           if (string(hat).find("<built-in>")
-              != std::string::npos) /* gcc intrinsics, don't bother record  */
+              != string::npos) /* gcc intrinsics, don't bother recording */
             continue;
 
           string waldo;
@@ -180,38 +290,167 @@  collect_sourcefiles (Dwfl_Module *dwflmod,
             waldo = (string (hat));
           else if (comp_dir[0] != '\0') /* comp_dir relative */
             waldo = (string (comp_dir) + string ("/") + string (hat));
-          debug_sourcefiles.insert (waldo);
+          else
+           {
+             if (verbose)
+              clog << "skipping file=" << hat << " due to empty comp_dir" << endl;
+             continue;
+           }
+          waldo = canonicalize_path (waldo);
+          debug_sourcefiles.insert (make_pair(waldo, dwflmod));
         }
     }
-
   return DWARF_CB_OK;
 }
 
+#ifdef HAVE_LIBARCHIVE
+void zip_files()
+{
+  struct archive *a = archive_write_new();
+  struct stat st;
+  char buff[BUFFER_SIZE];
+  int len;
+  int fd;
+  #ifdef ENABLE_LIBDEBUGINFOD
+  /* Initialize a debuginfod client.  */
+  static unique_ptr <debuginfod_client, void (*)(debuginfod_client*)>
+    client (debuginfod_begin(), &debuginfod_end);
+  #endif
+
+  archive_write_set_format_zip(a);
+  archive_write_open_fd(a, STDOUT_FILENO);
+
+  int missing_files = 0;
+  for (const auto &pair : debug_sourcefiles)
+  {
+    fd = -1;
+    const std::string &file_path = pair.first;
+
+    /* Attempt to query debuginfod client to fetch source files.  */
+    #ifdef ENABLE_LIBDEBUGINFOD
+    Dwfl_Module* dwflmod = pair.second;
+    /* Obtain source file's build ID.  */
+    const unsigned char *bits;
+    GElf_Addr vaddr;
+    int bits_length = dwfl_module_build_id(dwflmod, &bits, &vaddr);
+    /* Ensure successful client and build ID acquisition.  */
+    if (client.get() != NULL && bits_length > 0)
+    {
+      fd = debuginfod_find_source(client.get(),
+                                    bits, bits_length, 
+                                    file_path.c_str(), NULL);
+    }
+    else
+    {
+        if (client.get() == NULL)
+            cerr << "Error: Failed to initialize debuginfod client." << endl;
+        else
+            cerr << "Error: Invalid build ID length (" << bits_length << ")." << endl;
+    }
+    #endif
+
+    if (!no_backup)
+      /* Files could not be located using debuginfod, search locally */
+      if (fd < 0)
+        fd = open(file_path.c_str(), O_RDONLY);
+    if (fd < 0)
+    {
+      if (verbose)
+        cerr << file_path << endl;
+      missing_files++;
+      continue;
+    }
+
+    /* Create an entry for each file including file information to be placed in the zip.  */
+    if (fstat(fd, &st) == -1)
+    {
+      if (verbose)
+        cerr << file_path << endl;
+      missing_files++;
+      if (verbose)
+        cerr << "Error: Failed to get file status for " << file_path << ": " << strerror(errno) << endl;
+      continue;
+    }
+    struct archive_entry *entry = archive_entry_new();
+    /* Removing first "/"" to make the path "relative" before zipping, otherwise warnings are raised when unzipping.  */
+    string entry_name = file_path.substr(file_path.find_first_of('/') + 1);
+    archive_entry_set_pathname(entry, entry_name.c_str());
+    archive_entry_copy_stat(entry, &st);
+    if (archive_write_header(a, entry) != ARCHIVE_OK)
+    {
+      if (verbose)
+        cerr << file_path << endl;
+      missing_files++;
+      if (verbose)
+        cerr << "Error: failed to write header for " << file_path << ": " << archive_error_string(a) << endl;
+      continue;
+    }
+
+    /* Write the file to the zip.  */
+    len = read(fd, buff, sizeof(buff));
+    if (len == -1)
+    {
+      if (verbose)
+        cerr << file_path << endl;
+      missing_files++;
+      if (verbose)
+        cerr << "Error: Failed to open file: " << file_path << ": " << strerror(errno) <<endl;
+      continue;
+    }
+    while (len > 0)
+    {
+      if (archive_write_data(a, buff, len) < ARCHIVE_OK)
+      {
+        if (verbose)
+          cerr << "Error: Failed to read from the file: " << file_path << ": " << strerror(errno) << endl;
+        break; 
+      }
+      len = read(fd, buff, sizeof(buff));
+    }
+    close(fd);
+    archive_entry_free(entry);
+  }
+  if (verbose && missing_files > 0 )
+    cerr << missing_files << " file(s) listed above could not be found.  " << endl;
+
+  archive_write_close(a);
+  archive_write_free(a);
+}
+#endif
 
 int
 main (int argc, char *argv[])
 {
   int remaining;
-
+  
   /* Parse and process arguments.  This includes opening the modules.  */
   argp_children[0].argp = dwfl_standard_argp ();
   argp_children[0].group = 1;
-
+  
   Dwfl *dwfl = NULL;
   (void) argp_parse (&argp, argc, argv, 0, &remaining, &dwfl);
   assert (dwfl != NULL);
-  /* Process all loaded modules - probably just one, except if -K or -p is used. */
+  /* Process all loaded modules - probably just one, except if -K or -p is used.  */
   (void) dwfl_getmodules (dwfl, &collect_sourcefiles, NULL, 0);
 
   if (!debug_sourcefiles.empty ())
-    for (const string &element : debug_sourcefiles)
+  {
+    #ifdef HAVE_LIBARCHIVE
+      if (zip)
+        zip_files();
+      else
+    #endif
       {
-        std::cout << element;
-        if (null_arg)
-          std::cout << '\0';
-        else
-          std::cout << '\n';
+        for (const auto &pair : debug_sourcefiles)
+          {
+            cout << pair.first;
+            if (null_arg)
+              cout << '\0';
+            else
+              cout << '\n';
+          }
       }
+  }
 
   dwfl_end (dwfl);
   return 0;
diff --git a/tests/debuginfod-subr.sh b/tests/debuginfod-subr.sh
index 108dff74..c3b0603d 100755
--- a/tests/debuginfod-subr.sh
+++ b/tests/debuginfod-subr.sh
@@ -79,12 +79,12 @@  errfiles() {
 # So we gather the LD_LIBRARY_PATH with this cunning trick:
 ldpath=`testrun sh -c 'echo $LD_LIBRARY_PATH'`
 
-wait_ready()
+wait_ready4()
 {
   port=$1;
   what=$2;
   value=$3;
-  timeout=20;
+  timeout=$4;
 
   echo "Wait $timeout seconds on $port for metric $what to change to $value"
   while [ $timeout -gt 0 ]; do
@@ -105,6 +105,16 @@  wait_ready()
   fi
 }
 
+wait_ready()
+{
+  port=$1;
+  what=$2;
+  value=$3;
+  timeout=20;
+  wait_ready4 "$port" "$what" "$value" "$timeout"
+}
+
+
 archive_test() {
     __BUILDID=$1
     __SOURCEPATH=$2
diff --git a/tests/run-srcfiles-self.sh b/tests/run-srcfiles-self.sh
index 0e64dd2b..afbb1e77 100755
--- a/tests/run-srcfiles-self.sh
+++ b/tests/run-srcfiles-self.sh
@@ -1,4 +1,4 @@ 
-#! /bin/sh
+#! /usr/bin/env bash
 # Copyright (C) 2023 Red Hat, Inc.
 # This file is part of elfutils.
 #
@@ -15,7 +15,13 @@ 
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 
-. $srcdir/test-subr.sh
+. $srcdir/debuginfod-subr.sh
+
+# for test case debugging, uncomment:
+# set -x
+set -e
+base=14000
+get_ports
 
 # Test different command line combinations on the srcfiles binary itself.
 ET_EXEC="${abs_top_builddir}/src/srcfiles"
@@ -26,11 +32,16 @@  SRC_NAME="srcfiles.cxx"
 # Ensure the output contains the expected source file srcfiles.cxx
 testrun $ET_EXEC -e $ET_EXEC | grep $SRC_NAME > /dev/null
 
+# Check if zip option is available (only available if libarchive is available. 
+#   Debuginfod optional to fetch source files from debuginfod federation.)
+$ET_EXEC --help | grep -q zip && command -v unzip >/dev/null 2>&1 && zip=true || zip=false
+
 for null_arg in --null ""; do
   for verbose_arg in --verbose ""; do
+    echo "Test with options $null_arg $verbose_arg"
     testrun $ET_EXEC $null_arg $verbose_arg -p $ET_PID > /dev/null
 
-    # Ensure that the output contains srclines.cxx
+    # Ensure that the output contains srcfiles.cxx
     cu_only=$(testrun $ET_EXEC $null_arg $verbose_arg -c -e $ET_EXEC)
     default=$(testrun $ET_EXEC $null_arg $verbose_arg -e $ET_EXEC)
     result1=$(echo "$cu_only" | grep "$SRC_NAME")
@@ -40,9 +51,64 @@  for null_arg in --null ""; do
       exit 1
     fi
 
-    # Ensure that the output with the cu-only option contains less source files
+    # Ensure that the output with the cu-only option contains fewer source files
     if [ $(echo "$cu_only" | wc -m) -gt $(echo "$default" | wc -m) ]; then
       exit 1
     fi
+
+    if $zip; then
+      # Zip option tests
+        testrun $ET_EXEC $verbose_arg -z -e $ET_EXEC > test.zip
+        tempfiles test.zip
+        
+        unzip -v test.zip
+        unzip -t test.zip
+        
+        # Ensure unzipped srcfiles.cxx and its contents are the same as the original source file
+        unzip -j test.zip "*/$SRC_NAME"
+        diff "$SRC_NAME" $abs_srcdir/../src/$SRC_NAME
+        rm -f test.zip $SRC_NAME
+    fi
   done
 done
+
+# Debuginfod source file downloading test.
+# Start debuginfod server on the elfutils build directory.
+if [ -x ${abs_builddir}/../debuginfod/debuginfod ] && $zip; then
+  LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod -vvvv -d debuginfod.sqlite3 -F -p $PORT1 ${abs_top_builddir}/src > debuginfod.log 2>&1 &
+  PID1=$!
+  tempfiles debuginfod.sqlite3 debuginfod.log
+  wait_ready $PORT1 'ready' 1
+  wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
+  wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
+  wait_ready4 $PORT1 'thread_busy{role="scan"}' 0 300 # lots of source files may be slow to index with $db on NFS 
+  
+  export DEBUGINFOD_URLS="http://localhost:${PORT1}/"
+  export DEBUGINFOD_VERBOSE=1
+  testrun $ET_EXEC -z -b -e $ET_EXEC > test.zip
+  tempfiles test.zip
+  
+  unzip -v test.zip
+  unzip -t test.zip
+  
+  # Extract the zip.
+  mkdir extracted
+  unzip test.zip -d extracted
+  
+  # Ensure that source files for this tool have been archived.
+  source_files="srcfiles.cxx libdwfl.h gelf.h"
+  extracted_files=$(find extracted -type f)
+  for file in $source_files; do
+      echo "$extracted_files" | grep -q "$file" > /dev/null
+  done
+
+  # Compare between the extracted file and the actual source file srcfiles.cxx.
+  extracted_file=$(find extracted -name $SRC_NAME)
+  diff "$extracted_file" $abs_srcdir/../src/$SRC_NAME
+
+  rm -rf extracted
+  
+  kill $PID1
+  wait
+  PID1=0
+fi