Support gzip compressed exec and core files in gdb

Message ID	54FF77D6.7010400@eagerm.com
State	New, archived
Headers	Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk Sender: gdb-patches-owner@sourceware.org sender: eager@eagerm.com) by homiemail-a94.g.dreamhost.com (Postfix) with ESMTPSA id 30CF638A058; Tue, 10 Mar 2015 16:01:43 -0700 (PDT) Message-ID: <54FF77D6.7010400@eagerm.com> Date: Tue, 10 Mar 2015 16:01:42 -0700 From: Michael Eager <eager@eagerm.com> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: "gdb-patches@sourceware.org" <gdb-patches@sourceware.org>, binutils <binutils@sourceware.org> Subject: [PATCH] Support gzip compressed exec and core files in gdb Content-Type: multipart/mixed; boundary="------------000408050600040509030202"

Message ID

54FF77D6.7010400@eagerm.com

State

New, archived

Headers

Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: gdb-patches-owner@sourceware.org
Message-ID: <54FF77D6.7010400@eagerm.com>
Date: Tue, 10 Mar 2015 16:01:42 -0700
From: Michael Eager <eager@eagerm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: "gdb-patches@sourceware.org" <gdb-patches@sourceware.org>,
	binutils <binutils@sourceware.org>
Subject: [PATCH] Support gzip compressed exec and core files in gdb
Content-Type: multipart/mixed;
	boundary="------------000408050600040509030202"

Commit Message

Michael Eager March 10, 2015, 11:01 p.m. UTC

  Add support to automatically unzip compressed executable and core files.
Files will be uncompressed into temporary directory (/tmp or $TMPDIR)
and are deleted when GDB exits.  This should be transparent to users,
except for disk space requirements.  The name of the uncompressed file is
mentioned, but all references to the file in GDB messages is to the file
which the user specified.

This operation cannot be done completely by BFD because BFD allows an opened
file to be passed to it for processing.  GDB uses this functionality.

BFD:
   * bfd-in2.h: Regenerate.
   * bfd.c (struct bfd): Add uncompressed_filename.
   * bfdio.c (bfd_get_mtime): Set bfd->mtime_set to true.
   * cache.c (bfd_open): Open previously created uncompressed file.

GDB:
   * common/filestuff.c (struct compressed_file_cache_search, eq_compressed_file,
   is_gzip, decompress_gzip, do_compressed_cleanup, gdb_uncompress): New.
   * common/filestuff.h (gdb_uncompress): Declare.
   * corelow.c (core_open): Uncompress core file.
   * exec.c (exec_file_attach): Uncompress exe file.
   * symfile.c (symfile_bfd_open): Uncompress sym (exe) file.

GDB/DOC:
   * gdb.texinfo: Mention gzipped exec and core files.

Comments

Mike Frysinger March 11, 2015, 2:37 a.m. UTC | #1

On 10 Mar 2015 16:01, Michael Eager wrote:
> --- a/gdb/common/filestuff.c
> +++ b/gdb/common/filestuff.c
>
> +#define COMPRESS_BUF_SIZE (1024*1024)
> +static int
> +decompress_gzip (const char *filename, FILE *tmp)
> +{
> +#ifdef HAVE_ZLIB_H
> +  char *buf = malloc (COMPRESS_BUF_SIZE);

xmalloc ?

> +  if (buf == NULL || compressed == NULL)
> +    {
> +      printf (_("error copying gzip file\n"));

fprintf_filtered (gdb_stderr, ...) ?

> +	  printf (_("error decompressing gzip file\n"));

here too ?

> +          free (buf);

indentation is broken.  this comes up a lot, so you should scan the whole thing.

> +  fflush (tmp);

that needed ?

> +int
> +gdb_uncompress (const char *filename, char **uncompressed_filename)
> +{
> +  FILE *handle;
> +  struct compressed_file_cache_search search, *found;
> +  struct stat st;
> +  hashval_t hash;
> +  void **slot;
> +  static unsigned char buffer[1024];
> +  size_t count;
> +  enum {NONE, GZIP, BZIP2} file_compression = NONE;
> +  int decomp_fd;
> +  FILE *decomp_file;
> +  int ret = 0;
> +  char *tmpdir, *p;
> +  char *template = xmalloc(128 + 12 + 7 + 1);

probably should be a comment as to the constants you've written here

> +  if (compressed_file_cache == NULL)

style says there should be a blank line above this if statement

> +    compressed_file_cache = htab_create_alloc (1, htab_hash_string,
> +						eq_compressed_file,
> +						NULL, xcalloc, xfree);
> +
> +  if ((stat (filename, &st) < 0))

excess set of paren

> +	  /* Return file if compressed file not changed.  */
> +	  *uncompressed_filename = found->uncompressed_filename;
> +	  return 1;
> +	}
> +      else
> +        {
> +	  /* Delete old uncompressed file.  */
> +	  unlink (found->uncompressed_filename);
> +	  xfree ((void *)found->filename);

is that cast really needed ?

> +  if ((handle = fopen (filename, "rb")) == NULL)

gdb generally doesn't like to pack assignments into if statements

use FOPEN_RB instead of "rb" ?

> +  /* Create temporary file name for uncompressed file.  */
> +  if (!(tmpdir = getenv ("TMPDIR")))
> +    tmpdir = "/tmp";
> +  strncpy (template, tmpdir, 128);
> +  strcat (template, "/");
> +  for (p = (char *)filename + strlen (filename) - 1;
> +       p >= filename && *p != '/'; p--)  /* find final slash.  */  ;
> +  strncat (template, ++p, 128);
> +  p = template + strlen (template);
> +  if (strcmp (p - 3, ".gz") == 0)
> +    *(p - 3) = '\0';
> +  strcat (template, "-XXXXXX");

that's pretty messy man.  why not use mkstemp() and fdopen() instead ?

in general, there's no need for all this strcat business.  asprintf allows yout 
easily concat paths & allocate the right amount of space.
-mike

Alan Modra March 11, 2015, 8:14 a.m. UTC | #2

On Tue, Mar 10, 2015 at 04:01:42PM -0700, Michael Eager wrote:
> This operation cannot be done completely by BFD because BFD allows an opened
> file to be passed to it for processing.  GDB uses this functionality.

I'd prefer you do this entirely outside of BFD, without adding another
field to struct bfd.  I think that can be done by simply clearing
abfd->cacheable on files you uncompress.  This prevents BFD from
closing the file, so you won't need to open it again.

Gary Benson March 11, 2015, 10:07 a.m. UTC | #3

Michael Eager wrote:
> Add support to automatically unzip compressed executable and core
> files.  Files will be uncompressed into temporary directory (/tmp or
> $TMPDIR) and are deleted when GDB exits.  This should be transparent
> to users, except for disk space requirements.  The name of the
> uncompressed file is mentioned, but all references to the file in
> GDB messages is to the file which the user specified.
...
> diff --git a/gdb/common/filestuff.c b/gdb/common/filestuff.c
> index 14d6324..b2c31fd 100644
> --- a/gdb/common/filestuff.c
> +++ b/gdb/common/filestuff.c
...
> +#ifndef GDBSERVER

Please do not add GDBSERVER conditionals to gdb/common, I spent half
a year removing them all.

It looks like this code is only used by GDB, not gdbserver, so the
fix is simple, just put the code somewhere GDB-specific.

Thanks,
Gary

Michael Eager March 11, 2015, 2:56 p.m. UTC | #4

On 03/11/15 01:14, Alan Modra wrote:
> On Tue, Mar 10, 2015 at 04:01:42PM -0700, Michael Eager wrote:
>> This operation cannot be done completely by BFD because BFD allows an opened
>> file to be passed to it for processing.  GDB uses this functionality.
>
> I'd prefer you do this entirely outside of BFD, without adding another
> field to struct bfd.  I think that can be done by simply clearing
> abfd->cacheable on files you uncompress.  This prevents BFD from
> closing the file, so you won't need to open it again.

GDB closes the exec file, then uses BFD to seek (I think when reading
syms).  BFD then re-opens the file, so it needs the name of the
uncompressed file.

Michael Eager March 11, 2015, 2:58 p.m. UTC | #5

On 03/11/15 03:07, Gary Benson wrote:
> Michael Eager wrote:
>> Add support to automatically unzip compressed executable and core
>> files.  Files will be uncompressed into temporary directory (/tmp or
>> $TMPDIR) and are deleted when GDB exits.  This should be transparent
>> to users, except for disk space requirements.  The name of the
>> uncompressed file is mentioned, but all references to the file in
>> GDB messages is to the file which the user specified.
> ...
>> diff --git a/gdb/common/filestuff.c b/gdb/common/filestuff.c
>> index 14d6324..b2c31fd 100644
>> --- a/gdb/common/filestuff.c
>> +++ b/gdb/common/filestuff.c
> ...
>> +#ifndef GDBSERVER
>
> Please do not add GDBSERVER conditionals to gdb/common, I spent half
> a year removing them all.
>
> It looks like this code is only used by GDB, not gdbserver, so the
> fix is simple, just put the code somewhere GDB-specific.

Sure.  Any suggestions?  gdb_bfd.c seems like the wrong place.

Michael Eager March 11, 2015, 3 p.m. UTC | #6

On 03/10/15 19:37, Mike Frysinger wrote:
> On 10 Mar 2015 16:01, Michael Eager wrote:
>> --- a/gdb/common/filestuff.c
>> +++ b/gdb/common/filestuff.c
>>
>> +#define COMPRESS_BUF_SIZE (1024*1024)
>> +static int
>> +decompress_gzip (const char *filename, FILE *tmp)
>> +{
>> +#ifdef HAVE_ZLIB_H
>> +  char *buf = malloc (COMPRESS_BUF_SIZE);
>
> xmalloc ?
>
>> +  if (buf == NULL || compressed == NULL)
>> +    {
>> +      printf (_("error copying gzip file\n"));
>
> fprintf_filtered (gdb_stderr, ...) ?
>
>> +	  printf (_("error decompressing gzip file\n"));
>
> here too ?
>
>> +          free (buf);
>
> indentation is broken.  this comes up a lot, so you should scan the whole thing.
>
>> +  fflush (tmp);
>
> that needed ?
>
>> +int
>> +gdb_uncompress (const char *filename, char **uncompressed_filename)
>> +{
>> +  FILE *handle;
>> +  struct compressed_file_cache_search search, *found;
>> +  struct stat st;
>> +  hashval_t hash;
>> +  void **slot;
>> +  static unsigned char buffer[1024];
>> +  size_t count;
>> +  enum {NONE, GZIP, BZIP2} file_compression = NONE;
>> +  int decomp_fd;
>> +  FILE *decomp_file;
>> +  int ret = 0;
>> +  char *tmpdir, *p;
>> +  char *template = xmalloc(128 + 12 + 7 + 1);
>
> probably should be a comment as to the constants you've written here
>
>> +  if (compressed_file_cache == NULL)
>
> style says there should be a blank line above this if statement
>
>> +    compressed_file_cache = htab_create_alloc (1, htab_hash_string,
>> +						eq_compressed_file,
>> +						NULL, xcalloc, xfree);
>> +
>> +  if ((stat (filename, &st) < 0))
>
> excess set of paren
>
>> +	  /* Return file if compressed file not changed.  */
>> +	  *uncompressed_filename = found->uncompressed_filename;
>> +	  return 1;
>> +	}
>> +      else
>> +        {
>> +	  /* Delete old uncompressed file.  */
>> +	  unlink (found->uncompressed_filename);
>> +	  xfree ((void *)found->filename);
>
> is that cast really needed ?
>
>> +  if ((handle = fopen (filename, "rb")) == NULL)
>
> gdb generally doesn't like to pack assignments into if statements
>
> use FOPEN_RB instead of "rb" ?
>
>> +  /* Create temporary file name for uncompressed file.  */
>> +  if (!(tmpdir = getenv ("TMPDIR")))
>> +    tmpdir = "/tmp";
>> +  strncpy (template, tmpdir, 128);
>> +  strcat (template, "/");
>> +  for (p = (char *)filename + strlen (filename) - 1;
>> +       p >= filename && *p != '/'; p--)  /* find final slash.  */  ;
>> +  strncat (template, ++p, 128);
>> +  p = template + strlen (template);
>> +  if (strcmp (p - 3, ".gz") == 0)
>> +    *(p - 3) = '\0';
>> +  strcat (template, "-XXXXXX");
>
> that's pretty messy man.  why not use mkstemp() and fdopen() instead ?
>
> in general, there's no need for all this strcat business.  asprintf allows yout
> easily concat paths & allocate the right amount of space.
> -mike

Thanks for the review.  I'll clean up and resubmit.

Eli Zaretskii March 11, 2015, 4:15 p.m. UTC | #7

> Date: Tue, 10 Mar 2015 16:01:42 -0700
> From: Michael Eager <eager@eagerm.com>
> 
> GDB/DOC:
>    * gdb.texinfo: Mention gzipped exec and core files.

This should mention the name of the node where you make changes; see
the other ChangeLog entries in gdb/doc/.

Do we need a NEWS entry for this?

> --- a/gdb/doc/gdb.texinfo
> +++ b/gdb/doc/gdb.texinfo
> @@ -17391,6 +17391,11 @@ via @code{gdbserver} (@pxref{Server, file, Using the @code{gdbserver}
>  Program}).  In these situations the @value{GDBN} commands to specify
>  new files are useful.
>  
> +Executable and core files may be compressed using @code{gzip}.  These

'gzip' is a command, so please use @command, not @code.

> +files will be uncompressed into temporary files either in /tmp

'/tmp' is a file name, so please use @file for its markup.

>                                                                 or in 
> +@code{$TMPDIR} if this is set in the environment.

I'd prefer to say

  in the system-wide temporary directory

and mention neither /tmp nor $TMPDIR, as both are platform-dependent.

Otherwise, the documentation parts are OK.

I agree with Mike that it's better to use library functions for
creating temporary files, since that hides platform dependencies.

Thanks.

Gary Benson March 11, 2015, 5:42 p.m. UTC | #8

Michael Eager wrote:
> On 03/11/15 03:07, Gary Benson wrote:
> >Michael Eager wrote:
> > > Add support to automatically unzip compressed executable and core
> > > files.  Files will be uncompressed into temporary directory (/tmp or
> > > $TMPDIR) and are deleted when GDB exits.  This should be transparent
> > > to users, except for disk space requirements.  The name of the
> > > uncompressed file is mentioned, but all references to the file in
> > > GDB messages is to the file which the user specified.
> >...
> > > diff --git a/gdb/common/filestuff.c b/gdb/common/filestuff.c
> > > index 14d6324..b2c31fd 100644
> > > --- a/gdb/common/filestuff.c
> > > +++ b/gdb/common/filestuff.c
> >...
> > > +#ifndef GDBSERVER
> >
> > Please do not add GDBSERVER conditionals to gdb/common, I spent
> > half a year removing them all.
> > 
> > It looks like this code is only used by GDB, not gdbserver, so
> > the fix is simple, just put the code somewhere GDB-specific.
> 
> Sure.  Any suggestions?  gdb_bfd.c seems like the wrong place.

gdb/utils.c maybe?

Cheers,
Gary

Doug Evans March 11, 2015, 6:09 p.m. UTC | #9

On Wed, Mar 11, 2015 at 10:42 AM, Gary Benson <gbenson@redhat.com> wrote:
> Michael Eager wrote:
>> On 03/11/15 03:07, Gary Benson wrote:
>> >Michael Eager wrote:
>> > > Add support to automatically unzip compressed executable and core
>> > > files.  Files will be uncompressed into temporary directory (/tmp or
>> > > $TMPDIR) and are deleted when GDB exits.  This should be transparent
>> > > to users, except for disk space requirements.  The name of the
>> > > uncompressed file is mentioned, but all references to the file in
>> > > GDB messages is to the file which the user specified.
>> >...
>> > > diff --git a/gdb/common/filestuff.c b/gdb/common/filestuff.c
>> > > index 14d6324..b2c31fd 100644
>> > > --- a/gdb/common/filestuff.c
>> > > +++ b/gdb/common/filestuff.c
>> >...
>> > > +#ifndef GDBSERVER
>> >
>> > Please do not add GDBSERVER conditionals to gdb/common, I spent
>> > half a year removing them all.
>> >
>> > It looks like this code is only used by GDB, not gdbserver, so
>> > the fix is simple, just put the code somewhere GDB-specific.
>>
>> Sure.  Any suggestions?  gdb_bfd.c seems like the wrong place.
>
> gdb/utils.c maybe?

Fine by me.
Though, OTOH, there's nothing intrinsic in what's being done there
that gdbserver (or anything else that might want to use common)
couldn't use it tomorrow.
I'd be ok with common/compression.c or some such.

Gary Benson March 11, 2015, 6:21 p.m. UTC | #10

Doug Evans wrote:
> On Wed, Mar 11, 2015 at 10:42 AM, Gary Benson <gbenson@redhat.com> wrote:
> > Michael Eager wrote:
> > > On 03/11/15 03:07, Gary Benson wrote:
> > > >Michael Eager wrote:
> > > > > Add support to automatically unzip compressed executable and
> > > > > core files.  Files will be uncompressed into temporary
> > > > > directory (/tmp or $TMPDIR) and are deleted when GDB exits.
> > > > > This should be transparent to users, except for disk space
> > > > > requirements.  The name of the uncompressed file is
> > > > > mentioned, but all references to the file in GDB messages is
> > > > > to the file which the user specified.
> > > >...
> > > > > diff --git a/gdb/common/filestuff.c b/gdb/common/filestuff.c
> > > > > index 14d6324..b2c31fd 100644
> > > > > --- a/gdb/common/filestuff.c
> > > > > +++ b/gdb/common/filestuff.c
> > > >...
> > > > > +#ifndef GDBSERVER
> > > >
> > > > Please do not add GDBSERVER conditionals to gdb/common, I spent
> > > > half a year removing them all.
> > > >
> > > > It looks like this code is only used by GDB, not gdbserver, so
> > > > the fix is simple, just put the code somewhere GDB-specific.
> > >
> > > Sure.  Any suggestions?  gdb_bfd.c seems like the wrong place.
> >
> > gdb/utils.c maybe?
> 
> Fine by me.
> Though, OTOH, there's nothing intrinsic in what's being done there
> that gdbserver (or anything else that might want to use common)
> couldn't use it tomorrow.
> I'd be ok with common/compression.c or some such.

Do we really want to be putting things in common speculatively?

Cheers,
Gary

Cary Coutant March 11, 2015, 6:24 p.m. UTC | #11

How will this affect split DWARF?

If you uncompress a foo.gz binary, and the binary has relative paths
to .dwo files, will GDB look for the .dwo files relative to the
original binary, or to the uncompressed binary?

When it looks for a .dwp file, will it look for foo.dwp or foo.gz.dwp,
and will it look in the same directory as the original, or in /tmp?

If foo.dwp is also compressed, will it uncompress that?

Would it make sense to support .tar.gz/.tgz files containing both a
binary and its .dwp?

-cary



On Tue, Mar 10, 2015 at 4:01 PM, Michael Eager <eager@eagerm.com> wrote:
> Add support to automatically unzip compressed executable and core files.
> Files will be uncompressed into temporary directory (/tmp or $TMPDIR)
> and are deleted when GDB exits.  This should be transparent to users,
> except for disk space requirements.  The name of the uncompressed file is
> mentioned, but all references to the file in GDB messages is to the file
> which the user specified.
>
> This operation cannot be done completely by BFD because BFD allows an opened
> file to be passed to it for processing.  GDB uses this functionality.
>
> BFD:
>   * bfd-in2.h: Regenerate.
>   * bfd.c (struct bfd): Add uncompressed_filename.
>   * bfdio.c (bfd_get_mtime): Set bfd->mtime_set to true.
>   * cache.c (bfd_open): Open previously created uncompressed file.
>
> GDB:
>   * common/filestuff.c (struct compressed_file_cache_search,
> eq_compressed_file,
>   is_gzip, decompress_gzip, do_compressed_cleanup, gdb_uncompress): New.
>   * common/filestuff.h (gdb_uncompress): Declare.
>   * corelow.c (core_open): Uncompress core file.
>   * exec.c (exec_file_attach): Uncompress exe file.
>   * symfile.c (symfile_bfd_open): Uncompress sym (exe) file.
>
> GDB/DOC:
>   * gdb.texinfo: Mention gzipped exec and core files.
>
> --
> Michael Eager    eager@eagercon.com
> 1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

Doug Evans March 11, 2015, 6:56 p.m. UTC | #12

On Wed, Mar 11, 2015 at 11:21 AM, Gary Benson <gbenson@redhat.com> wrote:
> Do we really want to be putting things in common speculatively?

I don't have a strong enough opinion to spend too many emails on it.
But I would ask that such a rule be written down in an easy to find place.

Michael Eager March 11, 2015, 7:55 p.m. UTC | #13

On 03/11/15 09:15, Eli Zaretskii wrote:
>> Date: Tue, 10 Mar 2015 16:01:42 -0700
>> From: Michael Eager <eager@eagerm.com>
>>
>> GDB/DOC:
>>     * gdb.texinfo: Mention gzipped exec and core files.
>
> This should mention the name of the node where you make changes; see
> the other ChangeLog entries in gdb/doc/.
>
> Do we need a NEWS entry for this?
>
>> --- a/gdb/doc/gdb.texinfo
>> +++ b/gdb/doc/gdb.texinfo
>> @@ -17391,6 +17391,11 @@ via @code{gdbserver} (@pxref{Server, file, Using the @code{gdbserver}
>>   Program}).  In these situations the @value{GDBN} commands to specify
>>   new files are useful.
>>
>> +Executable and core files may be compressed using @code{gzip}.  These
>
> 'gzip' is a command, so please use @command, not @code.
>
>> +files will be uncompressed into temporary files either in /tmp
>
> '/tmp' is a file name, so please use @file for its markup.
>
>>                                                                  or in
>> +@code{$TMPDIR} if this is set in the environment.
>
> I'd prefer to say
>
>    in the system-wide temporary directory
>
> and mention neither /tmp nor $TMPDIR, as both are platform-dependent.

I think it might be good to tell people how to specify where the
uncompressed file is located.

>
> Otherwise, the documentation parts are OK.

Thanks.  I'll make the changes.

> I agree with Mike that it's better to use library functions for
> creating temporary files, since that hides platform dependencies.

The code uses mkstemp() to create a temporary file.  Is there a
GDB or BFD wrapper for this function?

Michael Eager March 11, 2015, 8:12 p.m. UTC | #14

On 03/11/15 11:24, Cary Coutant wrote:
> How will this affect split DWARF?
>
> If you uncompress a foo.gz binary, and the binary has relative paths
> to .dwo files, will GDB look for the .dwo files relative to the
> original binary, or to the uncompressed binary?

GDB should look for .dwo files relative to the original file.  The
uncompressed file is essentially hidden.

If you can send me a test case creating .dwo files, I'll make sure
that it works.

> When it looks for a .dwp file, will it look for foo.dwp or foo.gz.dwp,
> and will it look in the same directory as the original, or in /tmp?
>
> If foo.dwp is also compressed, will it uncompress that?

The file search for exec and core are unchanged, so gdb will find
whatever file it is asked to look for, using the current search path,
exactly as it does now.  No file names are modified.

There's specific code to uncompress exec and core files.  I might
need to add code for dwp.

I originally looked at doing decompression in BFD, so that it would
simply be transparent.  If a component opened a gzipped file, BFD
would transparently decompress it.  This didn't work.  GDB does
some file operations, like closing and reopening files, which I think
should be done in BFD.

> Would it make sense to support .tar.gz/.tgz files containing both a
> binary and its .dwp?

Perhaps, but I don't want to extend this (conceptually) simple extension
to also handling tar files.  Maybe as a follow-on.

>
> -cary
>
>
>
> On Tue, Mar 10, 2015 at 4:01 PM, Michael Eager <eager@eagerm.com> wrote:
>> Add support to automatically unzip compressed executable and core files.
>> Files will be uncompressed into temporary directory (/tmp or $TMPDIR)
>> and are deleted when GDB exits.  This should be transparent to users,
>> except for disk space requirements.  The name of the uncompressed file is
>> mentioned, but all references to the file in GDB messages is to the file
>> which the user specified.
>>
>> This operation cannot be done completely by BFD because BFD allows an opened
>> file to be passed to it for processing.  GDB uses this functionality.
>>
>> BFD:
>>    * bfd-in2.h: Regenerate.
>>    * bfd.c (struct bfd): Add uncompressed_filename.
>>    * bfdio.c (bfd_get_mtime): Set bfd->mtime_set to true.
>>    * cache.c (bfd_open): Open previously created uncompressed file.
>>
>> GDB:
>>    * common/filestuff.c (struct compressed_file_cache_search,
>> eq_compressed_file,
>>    is_gzip, decompress_gzip, do_compressed_cleanup, gdb_uncompress): New.
>>    * common/filestuff.h (gdb_uncompress): Declare.
>>    * corelow.c (core_open): Uncompress core file.
>>    * exec.c (exec_file_attach): Uncompress exe file.
>>    * symfile.c (symfile_bfd_open): Uncompress sym (exe) file.
>>
>> GDB/DOC:
>>    * gdb.texinfo: Mention gzipped exec and core files.
>>
>> --
>> Michael Eager    eager@eagercon.com
>> 1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
>

Mike Frysinger March 11, 2015, 8:18 p.m. UTC | #15

On 11 Mar 2015 12:55, Michael Eager wrote:
> On 03/11/15 09:15, Eli Zaretskii wrote:
> > I agree with Mike that it's better to use library functions for
> > creating temporary files, since that hides platform dependencies.
> 
> The code uses mkstemp() to create a temporary file.  Is there a
> GDB or BFD wrapper for this function?

if you're worried about portability, gdb uses gnulib now, so it should be easy 
to import a gnulib module that provides that func as a fallback
-mike

Doug Evans March 11, 2015, 8:19 p.m. UTC | #16

On Wed, Mar 11, 2015 at 1:12 PM, Michael Eager <eager@eagerm.com> wrote:
>> Would it make sense to support .tar.gz/.tgz files containing both a
>> binary and its .dwp?
>
>
> Perhaps, but I don't want to extend this (conceptually) simple extension
> to also handling tar files.  Maybe as a follow-on.

Is there a fuse-based wrapper for tarballs?
A little googling suggests there is but I didn't dig for details.

How useful would it be to make that work from gdb more easily?

E.g., "file tarfuse:myexec"

and then provide a way to register a handler for "tarfuse".

Or some such.
Just thinking out loud ...

Eli Zaretskii March 11, 2015, 8:24 p.m. UTC | #17

> Date: Wed, 11 Mar 2015 12:55:19 -0700
> From: Michael Eager <eager@eagerm.com>
> CC: gdb-patches@sourceware.org, binutils@sourceware.org
> 
> > and mention neither /tmp nor $TMPDIR, as both are platform-dependent.
> 
> I think it might be good to tell people how to specify where the
> uncompressed file is located.

Saying "system-wide temporary directory" does tell that, doesn't it?
Users of each platform know where that is, I think.

We could, of course, describe all the details for each supported
platform, but then the description needs to be much longer than what
you wrote.  Is it really that important?

> The code uses mkstemp() to create a temporary file.  Is there a
> GDB or BFD wrapper for this function?

Not that I know of.  Why do you need a wrapper?

Jan Kratochvil March 11, 2015, 10:13 p.m. UTC | #18

On Wed, 11 Mar 2015 00:01:42 +0100, Michael Eager wrote:
> Add support to automatically unzip compressed executable and core files.
> Files will be uncompressed into temporary directory (/tmp or $TMPDIR)
> and are deleted when GDB exits.

Such feature has been requested to support xz-compressed core files as
currently being stored by systemd.  But to make it more convenient one should
decompress on-demand only the blocks of file that are really accessed by GDB
- expecting by bfd_iovec.  Obviously GDB usually needs to access only small
part of the whole core file.

I did not check how it is supported by gzip but for xz one needs to use
--block-size, otherwise the file blocks cannot be decompressed independently
in random access way.

ISTM libz-gzip and liblzma-xz compatibility is mutually exclusive.


Jan

Doug Evans March 11, 2015, 11:14 p.m. UTC | #19

On Wed, Mar 11, 2015 at 3:13 PM, Jan Kratochvil
<jan.kratochvil@redhat.com> wrote:
> ISTM libz-gzip and liblzma-xz compatibility is mutually exclusive.

Can you elaborate?

Alan Modra March 12, 2015, 12:08 a.m. UTC | #20

On Wed, Mar 11, 2015 at 07:56:30AM -0700, Michael Eager wrote:
> On 03/11/15 01:14, Alan Modra wrote:
> >On Tue, Mar 10, 2015 at 04:01:42PM -0700, Michael Eager wrote:
> >>This operation cannot be done completely by BFD because BFD allows an opened
> >>file to be passed to it for processing.  GDB uses this functionality.
> >
> >I'd prefer you do this entirely outside of BFD, without adding another
> >field to struct bfd.  I think that can be done by simply clearing
> >abfd->cacheable on files you uncompress.  This prevents BFD from
> >closing the file, so you won't need to open it again.
> 
> GDB closes the exec file, then uses BFD to seek (I think when reading
> syms).  BFD then re-opens the file, so it needs the name of the
> uncompressed file.

Really?  I think it quite unclean if gdb expects BFD to reopen a file
that gdb has closed!

Michael Eager March 12, 2015, 12:40 a.m. UTC | #21

On 03/11/15 15:13, Jan Kratochvil wrote:
> On Wed, 11 Mar 2015 00:01:42 +0100, Michael Eager wrote:
>> Add support to automatically unzip compressed executable and core files.
>> Files will be uncompressed into temporary directory (/tmp or $TMPDIR)
>> and are deleted when GDB exits.
>
> Such feature has been requested to support xz-compressed core files as
> currently being stored by systemd.  But to make it more convenient one should
> decompress on-demand only the blocks of file that are really accessed by GDB
> - expecting by bfd_iovec.  Obviously GDB usually needs to access only small
> part of the whole core file.
>
> I did not check how it is supported by gzip but for xz one needs to use
> --block-size, otherwise the file blocks cannot be decompressed independently
> in random access way.

gzip is not compressed block-by-block.  As far as I can tell, you need to
decompress starting from the beginning of the file.

> ISTM libz-gzip and liblzma-xz compatibility is mutually exclusive.

I don't know why they would be incompatible, but support for an on-demand
block-compression scheme would be significantly different.  Decompressing
an xz file by making a copy (as is done for gzip) would be a simple extension
to the current patch.

Michael Eager March 12, 2015, 12:45 a.m. UTC | #22

On 03/11/15 17:08, Alan Modra wrote:
> On Wed, Mar 11, 2015 at 07:56:30AM -0700, Michael Eager wrote:
>> On 03/11/15 01:14, Alan Modra wrote:
>>> On Tue, Mar 10, 2015 at 04:01:42PM -0700, Michael Eager wrote:
>>>> This operation cannot be done completely by BFD because BFD allows an opened
>>>> file to be passed to it for processing.  GDB uses this functionality.
>>>
>>> I'd prefer you do this entirely outside of BFD, without adding another
>>> field to struct bfd.  I think that can be done by simply clearing
>>> abfd->cacheable on files you uncompress.  This prevents BFD from
>>> closing the file, so you won't need to open it again.
>>
>> GDB closes the exec file, then uses BFD to seek (I think when reading
>> syms).  BFD then re-opens the file, so it needs the name of the
>> uncompressed file.
>
> Really?  I think it quite unclean if gdb expects BFD to reopen a file
> that gdb has closed!

Agreed.

GDB doesn't expect BFD to reopen the file, per se.  But it does a seek
on an exec file (IIRC, while reading symbols) which it previously closed
and when BFD notices that the file is closed, it opens it.  I don't think
that it is feasible to remove calls to exec_close() so this doesn't happen.

There's overlapping code between BFD and GDB.  It would be much cleaner if
GDB let BFD do everything with files, and BFD had support for opening files
with additional flags like O_CLOEXEC.

Alan Modra March 12, 2015, 2:50 a.m. UTC | #23

On Wed, Mar 11, 2015 at 05:45:28PM -0700, Michael Eager wrote:
> On 03/11/15 17:08, Alan Modra wrote:
> >On Wed, Mar 11, 2015 at 07:56:30AM -0700, Michael Eager wrote:
> >>On 03/11/15 01:14, Alan Modra wrote:
> >>>On Tue, Mar 10, 2015 at 04:01:42PM -0700, Michael Eager wrote:
> >>>>This operation cannot be done completely by BFD because BFD allows an opened
> >>>>file to be passed to it for processing.  GDB uses this functionality.
> >>>
> >>>I'd prefer you do this entirely outside of BFD, without adding another
> >>>field to struct bfd.  I think that can be done by simply clearing
> >>>abfd->cacheable on files you uncompress.  This prevents BFD from
> >>>closing the file, so you won't need to open it again.
> >>
> >>GDB closes the exec file, then uses BFD to seek (I think when reading
> >>syms).  BFD then re-opens the file, so it needs the name of the
> >>uncompressed file.
> >
> >Really?  I think it quite unclean if gdb expects BFD to reopen a file
> >that gdb has closed!
> 
> Agreed.
> 
> GDB doesn't expect BFD to reopen the file, per se.  But it does a seek
> on an exec file (IIRC, while reading symbols) which it previously closed
> and when BFD notices that the file is closed, it opens it.  I don't think
> that it is feasible to remove calls to exec_close() so this doesn't happen.

It looks to me that exec_close() calls bfd_close().  You won't be able
to do anything with the bfd after bfd_close(), so I think your
analysis is faulty and very much doubt your statement that "GDB closes
the exec file, then uses BFD..".

Pedro Alves March 12, 2015, 10:41 a.m. UTC | #24

On 03/12/2015 12:40 AM, Michael Eager wrote:
> On 03/11/15 15:13, Jan Kratochvil wrote:
>> On Wed, 11 Mar 2015 00:01:42 +0100, Michael Eager wrote:
>>> Add support to automatically unzip compressed executable and core files.
>>> Files will be uncompressed into temporary directory (/tmp or $TMPDIR)
>>> and are deleted when GDB exits.
>>
>> Such feature has been requested to support xz-compressed core files as
>> currently being stored by systemd.  But to make it more convenient one should
>> decompress on-demand only the blocks of file that are really accessed by GDB
>> - expecting by bfd_iovec.  Obviously GDB usually needs to access only small
>> part of the whole core file.
>>
>> I did not check how it is supported by gzip but for xz one needs to use
>> --block-size, otherwise the file blocks cannot be decompressed independently
>> in random access way.
> 
> gzip is not compressed block-by-block.  As far as I can tell, you need to
> decompress starting from the beginning of the file.
> 
>> ISTM libz-gzip and liblzma-xz compatibility is mutually exclusive.
> 
> I don't know why they would be incompatible, but support for an on-demand
> block-compression scheme would be significantly different.  Decompressing
> an xz file by making a copy (as is done for gzip) would be a simple extension
> to the current patch.

I won't strongly object if others want to approve it, but IMO, having GDB
decompress the whole file to temp doesn't add that much convenience over
decompressing it outside GDB.  Let me explore and expand (below).

We need to weigh the convenience of having gdb do this, over maintaining
this inside BFD+GDB going forward.

If you loading the core just once to extract a backtrace, in an
automated fashion, you can simply decompress before loading the core
with a trivial script.  So the convenience added for this use case
is not significant.

If OTOH you're doing interactive debugging of the core dump, then it's
convenient to be able to skip manual/laborious steps.

However, if people are compressing cores, it's because they are big.

As a quick experiment, I ran 'gcore `pidof firefox`' and generated
a core dump of the firefox process that I have running.  That resulted
in a 4.5GB core dump.  I gzipped it, which shrinked it to 4.5MB.
A ~1000/1 factor.  Then I timed gunzipping it.  It took almost
2 (two) minutes.

If you're doing interactive debugging (either CLI, or GUI) you'll
likely end up starting gdb multiple times, and thus load the core multiple
times into gdb, and each of those invocations will result in a
slow decompression of the whole file.

Waiting for GDB to decompress that once is already painful.  Waiting for it
multiple times likely results in cursing and swearing at gdb's slow start
up.  Smart users will realize that and end up decompressing the file manually
outside gdb, just once, anyway, thus saving time.

We could "fix" the "multiple times" issue by adding even more smarts,
based on an already-decompressed-files cache or some such.  Though of
course, more smarts, more code to maintain.

I agree with Jan -- The real convenience would be being able to skip the
long whole-file decompression step altogether, with an on-demand
block-decompress scheme, because gdb in reality doesn't need to touch
most of the vast majority of the core dump's contents.  That would
be a solution that I'd be happy to see implemented.

If we're just decompressing to /tmp, then we also need to
compare the benefits of a built-in solution against having users
do the same with a user-provided gdb command implemented in one
of gdb's extensions languages (python, scheme), e.g., a command
that adds a "decompress-core" command that does the same:
decompresses whatever compression format, and loads the result
with "core /tmp/FILE".

IMO, whatever the solution, if built in, this is best implemented
in BFD, so that objdump can dump the same files gdb can.

Thanks,
Pedro Alves

Michael Eager March 12, 2015, 3:34 p.m. UTC | #25

On 03/12/15 03:41, Pedro Alves wrote:

> Waiting for GDB to decompress that once is already painful.  Waiting for it
> multiple times likely results in cursing and swearing at gdb's slow start
> up.  Smart users will realize that and end up decompressing the file manually
> outside gdb, just once, anyway, thus saving time.
>
> We could "fix" the "multiple times" issue by adding even more smarts,
> based on an already-decompressed-files cache or some such.  Though of
> course, more smarts, more code to maintain.

I had considered adding a command or command line option to specify
the name of the uncompress file, so that it could be reused.

> I agree with Jan -- The real convenience would be being able to skip the
> long whole-file decompression step altogether, with an on-demand
> block-decompress scheme, because gdb in reality doesn't need to touch
> most of the vast majority of the core dump's contents.  That would
> be a solution that I'd be happy to see implemented.

That's a solution to a different problem.

> If we're just decompressing to /tmp, then we also need to
> compare the benefits of a built-in solution against having users
> do the same with a user-provided gdb command implemented in one
> of gdb's extensions languages (python, scheme), e.g., a command
> that adds a "decompress-core" command that does the same:
> decompresses whatever compression format, and loads the result
> with "core /tmp/FILE".

This requires that users manually decompress files, and makes it
impossible to put the compressed file name on the command line.

It also looks to me like a wart and kludge, rather than having
GDB automatically identify the compression method and do the
operation transparently for the user.

> IMO, whatever the solution, if built in, this is best implemented
> in BFD, so that objdump can dump the same files gdb can.

I took that approach initially.  But GDB finds and opens files,
not BFD.  Moving what GDB is doing into BFD, where it should have
been in the first place (IMO), seemed more problematic.

Pedro Alves March 12, 2015, 4:12 p.m. UTC | #26

On 03/12/2015 03:34 PM, Michael Eager wrote:
> On 03/12/15 03:41, Pedro Alves wrote:
> 
>> Waiting for GDB to decompress that once is already painful.  Waiting for it
>> multiple times likely results in cursing and swearing at gdb's slow start
>> up.  Smart users will realize that and end up decompressing the file manually
>> outside gdb, just once, anyway, thus saving time.
>>
>> We could "fix" the "multiple times" issue by adding even more smarts,
>> based on an already-decompressed-files cache or some such.  Though of
>> course, more smarts, more code to maintain.
> 
> I had considered adding a command or command line option to specify
> the name of the uncompress file, so that it could be reused.

What's the point then?  If you need to do that, then you alread
lost all the convenience.  Just type "gunzip core.gz && gdb core"
instead of "gdb -tmp-core /tmp/core core.gz".

So I think my point still stands, and IMO, it's a crucial point.

> 
>> I agree with Jan -- The real convenience would be being able to skip the
>> long whole-file decompression step altogether, with an on-demand
>> block-decompress scheme, because gdb in reality doesn't need to touch
>> most of the vast majority of the core dump's contents.  That would
>> be a solution that I'd be happy to see implemented.
> 
> That's a solution to a different problem.

I don't think it is.  What's the real problem you're solving?

>> If we're just decompressing to /tmp, then we also need to
>> compare the benefits of a built-in solution against having users
>> do the same with a user-provided gdb command implemented in one
>> of gdb's extensions languages (python, scheme), e.g., a command
>> that adds a "decompress-core" command that does the same:
>> decompresses whatever compression format, and loads the result
>> with "core /tmp/FILE".
> 
> This requires that users manually decompress files, and makes it
> impossible to put the compressed file name on the command line.

No it doesn't: there's '-ex' to run commands on the command
line: gdb -ex "decompress-and-load-core foo.gz"

> 
> It also looks to me like a wart and kludge, rather than having
> GDB automatically identify the compression method and do the
> operation transparently for the user.

What I'm saying is that it seems to me that you're doing
automatically in GDB, it can be done automatically in a
script.  A gdb command implemented in python can of course
also identify the compression method and support a multiple
number of compression formats, by just calling the
appropriate decompression tool.

As I said, I won't strongly object, though I still don't
see the compelling use case that warrants doing this in GDB.
I do envision ahead the usability problems and support
requests this will lead to.

> 
>> IMO, whatever the solution, if built in, this is best implemented
>> in BFD, so that objdump can dump the same files gdb can.
> 
> I took that approach initially.  But GDB finds and opens files,
> not BFD.  Moving what GDB is doing into BFD, where it should have
> been in the first place (IMO), seemed more problematic.

If there's problems, let's fix them.  From Alan's response,
the problem you mention doesn't really exist in the form you
described.

Thanks,
Pedro Alves

Michael Eager March 12, 2015, 4:14 p.m. UTC | #27

On 03/11/15 19:50, Alan Modra wrote:
> On Wed, Mar 11, 2015 at 05:45:28PM -0700, Michael Eager wrote:
>> On 03/11/15 17:08, Alan Modra wrote:
>>> On Wed, Mar 11, 2015 at 07:56:30AM -0700, Michael Eager wrote:
>>>> On 03/11/15 01:14, Alan Modra wrote:
>>>>> On Tue, Mar 10, 2015 at 04:01:42PM -0700, Michael Eager wrote:
>>>>>> This operation cannot be done completely by BFD because BFD allows an opened
>>>>>> file to be passed to it for processing.  GDB uses this functionality.
>>>>>
>>>>> I'd prefer you do this entirely outside of BFD, without adding another
>>>>> field to struct bfd.  I think that can be done by simply clearing
>>>>> abfd->cacheable on files you uncompress.  This prevents BFD from
>>>>> closing the file, so you won't need to open it again.
>>>>
>>>> GDB closes the exec file, then uses BFD to seek (I think when reading
>>>> syms).  BFD then re-opens the file, so it needs the name of the
>>>> uncompressed file.
>>>
>>> Really?  I think it quite unclean if gdb expects BFD to reopen a file
>>> that gdb has closed!
>>
>> Agreed.
>>
>> GDB doesn't expect BFD to reopen the file, per se.  But it does a seek
>> on an exec file (IIRC, while reading symbols) which it previously closed
>> and when BFD notices that the file is closed, it opens it.  I don't think
>> that it is feasible to remove calls to exec_close() so this doesn't happen.
>
> It looks to me that exec_close() calls bfd_close().  You won't be able
> to do anything with the bfd after bfd_close(), so I think your
> analysis is faulty and very much doubt your statement that "GDB closes
> the exec file, then uses BFD..".

The file opened in exec_file_attach() is closed in bfd_cache_close_all(),
before the function returns, not in the call to exec_close().  The bfd
is not deleted.

Later, in bfd_get_section_contents(), while sniffing the OSABI, GDB calls
bfd_seek() to do a seek on the same bfd.  This notices that bfd->iostream
is zero and re-opens the file.

Michael Eager March 12, 2015, 4:58 p.m. UTC | #28

On 03/12/15 09:12, Pedro Alves wrote:
> On 03/12/2015 03:34 PM, Michael Eager wrote:
>> On 03/12/15 03:41, Pedro Alves wrote:
>>
>>> Waiting for GDB to decompress that once is already painful.  Waiting for it
>>> multiple times likely results in cursing and swearing at gdb's slow start
>>> up.  Smart users will realize that and end up decompressing the file manually
>>> outside gdb, just once, anyway, thus saving time.
>>>
>>> We could "fix" the "multiple times" issue by adding even more smarts,
>>> based on an already-decompressed-files cache or some such.  Though of
>>> course, more smarts, more code to maintain.
>>
>> I had considered adding a command or command line option to specify
>> the name of the uncompress file, so that it could be reused.
>
> What's the point then?  If you need to do that, then you alread
> lost all the convenience.  Just type "gunzip core.gz && gdb core"
> instead of "gdb -tmp-core /tmp/core core.gz".
>
> So I think my point still stands, and IMO, it's a crucial point.
>
>>
>>> I agree with Jan -- The real convenience would be being able to skip the
>>> long whole-file decompression step altogether, with an on-demand
>>> block-decompress scheme, because gdb in reality doesn't need to touch
>>> most of the vast majority of the core dump's contents.  That would
>>> be a solution that I'd be happy to see implemented.
>>
>> That's a solution to a different problem.
>
> I don't think it is.  What's the real problem you're solving?

Lots of users, most who are not very conversant with gdb, lots of
compressed exec and core files, all compressed with gzip.  The
goal is to make using compressed files simple and transparent,
without users having to enter additional commands.

A wonderful patch supporting xz compressed files with an on-demand
block-decompress scheme would be a great solution for some other
use case, one which seems hypothetical at the moment.

>>> If we're just decompressing to /tmp, then we also need to
>>> compare the benefits of a built-in solution against having users
>>> do the same with a user-provided gdb command implemented in one
>>> of gdb's extensions languages (python, scheme), e.g., a command
>>> that adds a "decompress-core" command that does the same:
>>> decompresses whatever compression format, and loads the result
>>> with "core /tmp/FILE".
>>
>> This requires that users manually decompress files, and makes it
>> impossible to put the compressed file name on the command line.
>
> No it doesn't: there's '-ex' to run commands on the command
> line: gdb -ex "decompress-and-load-core foo.gz"

As I said, this requires users to know whether an exec or core
is compressed and manually enter a command to uncompress it.

>> It also looks to me like a wart and kludge, rather than having
>> GDB automatically identify the compression method and do the
>> operation transparently for the user.
>
> What I'm saying is that it seems to me that you're doing
> automatically in GDB, it can be done automatically in a
> script.  A gdb command implemented in python can of course
> also identify the compression method and support a multiple
> number of compression formats, by just calling the
> appropriate decompression tool.

We have many wrappers around gdb, some of which handle compressed
files, some of which don't.  They are all different and
problematic to debug and maintain.  Putting support for compressed
files into gdb means that it simply works, and users don't have
to remember which wrapper accepts compressed files and which
don't.

> As I said, I won't strongly object, though I still don't
> see the compelling use case that warrants doing this in GDB.
> I do envision ahead the usability problems and support
> requests this will lead to.
>
>>
>>> IMO, whatever the solution, if built in, this is best implemented
>>> in BFD, so that objdump can dump the same files gdb can.
>>
>> I took that approach initially.  But GDB finds and opens files,
>> not BFD.  Moving what GDB is doing into BFD, where it should have
>> been in the first place (IMO), seemed more problematic.
>
> If there's problems, let's fix them.  From Alan's response,
> the problem you mention doesn't really exist in the form you
> described.

I misspoke/misremembered.  It isn't exec_close() which closes the
file, it is bfd_cache_close_all().  The bfd is not closed, only
the file.

While BFD does most of the file operations, there isn't a clear
functional boundary between GDB file operations and BFD file
operations.  Allowing an opened fd to be passed into BFD makes
doing the decompression in BFD problematic, since BFD doesn't
know where the opened file was found, and the decompress libraries
(at least gzopen()) expects a path, not an opened fd.  BFD
wouldn't be know how the fd was opened so that it could use the
same flags to open the uncompressed file.

Fixing the problem would mean eliminating functions in BFD
which accepted an open fd and replacing calls to these functions
with modified versions of bfd_open() which accepted additional
flags or did whatever other operations the caller did when opening
the files.  GDB opens files a number of places, such as in
exec_file_attach() using openp() which searches the path for
the file.  This would have to be migrated into BFD.

Jan Kratochvil March 12, 2015, 5:11 p.m. UTC | #29

On Thu, 12 Mar 2015 17:58:02 +0100, Michael Eager wrote:
> I misspoke/misremembered.  It isn't exec_close() which closes the
> file, it is bfd_cache_close_all().  The bfd is not closed, only
> the file.

This is problematic, I have already posted a pending patch for it:
	[patch] Do not close BFDs, breaking deleted inferior shlibs
	https://sourceware.org/ml/gdb-patches/2015-02/msg00367.html


> Allowing an opened fd to be passed into BFD makes
> doing the decompression in BFD problematic, since BFD doesn't
> know where the opened file was found, and the decompress libraries
> (at least gzopen()) expects a path, not an opened fd.

One can readlink(/proc/self/fd/%d) although I haven't checked now how exactly
to use it.


Jan

Michael Eager March 12, 2015, 5:37 p.m. UTC | #30

On 03/12/15 10:11, Jan Kratochvil wrote:
> On Thu, 12 Mar 2015 17:58:02 +0100, Michael Eager wrote:
>> I misspoke/misremembered.  It isn't exec_close() which closes the
>> file, it is bfd_cache_close_all().  The bfd is not closed, only
>> the file.
>
> This is problematic, I have already posted a pending patch for it:
> 	[patch] Do not close BFDs, breaking deleted inferior shlibs
> 	https://sourceware.org/ml/gdb-patches/2015-02/msg00367.html

Thanks.  I'll see if this eliminates the need for the BFD patch.

>> Allowing an opened fd to be passed into BFD makes
>> doing the decompression in BFD problematic, since BFD doesn't
>> know where the opened file was found, and the decompress libraries
>> (at least gzopen()) expects a path, not an opened fd.
>
> One can readlink(/proc/self/fd/%d) although I haven't checked now how exactly
> to use it.

This won't tell what flags were used to open the file,
so that the same flags can be used on the decompressed file.

Jan Kratochvil March 12, 2015, 5:47 p.m. UTC | #31

On Thu, 12 Mar 2015 18:37:45 +0100, Michael Eager wrote:
> On 03/12/15 10:11, Jan Kratochvil wrote:
> > One can readlink(/proc/self/fd/%d) although I haven't checked now how exactly
> > to use it.
> 
> This won't tell what flags were used to open the file,
> so that the same flags can be used on the decompressed file.

If you mean the O_RDONLY etc. flags then read(/proc/self/fdinfo/%d):
	pos:	0
	flags:	0102002
	mnt_id:	20


Jan

Sergio Durigan Junior March 12, 2015, 9:16 p.m. UTC | #32

On Wednesday, March 11 2015, Eli Zaretskii wrote:

>> I think it might be good to tell people how to specify where the
>> uncompressed file is located.
>
> Saying "system-wide temporary directory" does tell that, doesn't it?
> Users of each platform know where that is, I think.

Why not say:

  system-wide temporary directory (e.g., @file{/tmp} on some systems)

?

Alan Modra March 13, 2015, 12:13 a.m. UTC | #33

On Thu, Mar 12, 2015 at 09:14:12AM -0700, Michael Eager wrote:
> On 03/11/15 19:50, Alan Modra wrote:
> >On Wed, Mar 11, 2015 at 05:45:28PM -0700, Michael Eager wrote:
> >>On 03/11/15 17:08, Alan Modra wrote:
> >>>On Wed, Mar 11, 2015 at 07:56:30AM -0700, Michael Eager wrote:
> >>>>On 03/11/15 01:14, Alan Modra wrote:
> >>>>>On Tue, Mar 10, 2015 at 04:01:42PM -0700, Michael Eager wrote:
> >>>>>>This operation cannot be done completely by BFD because BFD allows an opened
> >>>>>>file to be passed to it for processing.  GDB uses this functionality.
> >>>>>
> >>>>>I'd prefer you do this entirely outside of BFD, without adding another
> >>>>>field to struct bfd.  I think that can be done by simply clearing
> >>>>>abfd->cacheable on files you uncompress.  This prevents BFD from
> >>>>>closing the file, so you won't need to open it again.
> >>>>
> >>>>GDB closes the exec file, then uses BFD to seek (I think when reading
> >>>>syms).  BFD then re-opens the file, so it needs the name of the
> >>>>uncompressed file.
> >>>
> >>>Really?  I think it quite unclean if gdb expects BFD to reopen a file
> >>>that gdb has closed!
> >>
> >>Agreed.
> >>
> >>GDB doesn't expect BFD to reopen the file, per se.  But it does a seek
> >>on an exec file (IIRC, while reading symbols) which it previously closed
> >>and when BFD notices that the file is closed, it opens it.  I don't think
> >>that it is feasible to remove calls to exec_close() so this doesn't happen.
> >
> >It looks to me that exec_close() calls bfd_close().  You won't be able
> >to do anything with the bfd after bfd_close(), so I think your
> >analysis is faulty and very much doubt your statement that "GDB closes
> >the exec file, then uses BFD..".
> 
> The file opened in exec_file_attach() is closed in bfd_cache_close_all(),
> before the function returns, not in the call to exec_close().  The bfd
> is not deleted.

Thanks for tracking it down.  Not a gdb problem then, but likely a BFD
bug that bfd_cache_close{,_all} closes !cacheable bfds.

Eli Zaretskii March 13, 2015, 6:04 a.m. UTC | #34

> From: Sergio Durigan Junior <sergiodj@redhat.com>
> Cc: Michael Eager <eager@eagerm.com>, gdb-patches@sourceware.org,
>         binutils@sourceware.org
> Date: Thu, 12 Mar 2015 17:16:55 -0400
> 
> On Wednesday, March 11 2015, Eli Zaretskii wrote:
> 
> >> I think it might be good to tell people how to specify where the
> >> uncompressed file is located.
> >
> > Saying "system-wide temporary directory" does tell that, doesn't it?
> > Users of each platform know where that is, I think.
> 
> Why not say:
> 
>   system-wide temporary directory (e.g., @file{/tmp} on some systems)
> 
> ?

That'd be fine with me, thanks.

Ed Maste March 13, 2015, 6:25 a.m. UTC | #35

On 12 March 2015 at 13:11, Jan Kratochvil <jan.kratochvil@redhat.com> wrote:
>
> One can readlink(/proc/self/fd/%d) although I haven't checked now how exactly
> to use it.

That isn't portable though.

diff mbox

Patch

From b675cf554012ad1067e5bb1693fe695844cc09d6 Mon Sep 17 00:00:00 2001
From: Michael Eager <meager@cisco.com>
Date: Tue, 10 Mar 2015 15:31:11 -0700
Subject: [PATCH] GDB support compressed exec and core files.

Add support to automatically unzip compressed executable and core files.
Files will be uncompressed into temporary directory (/tmp or $TMPDIR)
and are deleted when GDB exits.  This should be transparent to users,
except for disk space requirements.  The name of the uncompressed file is
mentioned, but all references to the file in GDB messages is to the file
which the user specified.

This operation cannot be done completely by BFD because BFD allows an opened
file to be passed to it for processing.  GDB uses this functionality.

BFD:
  * bfd-in2.h: Regenerate.
  * bfd.c (struct bfd): Add uncompressed_filename.
  * bfdio.c (bfd_get_mtime): Set bfd->mtime_set to true.
  * cache.c (bfd_open): Open previously created uncompressed file.

GDB:
  * common/filestuff.c (struct compressed_file_cache_search, eq_compressed_file,
  is_gzip, decompress_gzip, do_compressed_cleanup, gdb_uncompress): New.
  * common/filestuff.h (gdb_uncompress): Declare.
  * corelow.c (core_open): Uncompress core file.
  * exec.c (exec_file_attach): Uncompress exe file.
  * symfile.c (symfile_bfd_open): Uncompress sym (exe) file.

GDB/DOC:
  * gdb.texinfo: Mention gzipped exec and core files.
---
 bfd/bfd-in2.h          |   3 +
 bfd/bfd.c              |   3 +
 bfd/bfdio.c            |   1 +
 bfd/cache.c            |   6 +-
 gdb/common/filestuff.c | 204 +++++++++++++++++++++++++++++++++++++++++++++++++
 gdb/common/filestuff.h |   6 ++
 gdb/corelow.c          |  10 +++
 gdb/doc/gdb.texinfo    |   5 ++
 gdb/exec.c             |  15 +++-
 gdb/symfile.c          |  13 +++-
 10 files changed, 263 insertions(+), 3 deletions(-)

diff --git a/bfd/bfd-in2.h b/bfd/bfd-in2.h
index 2d2e7ac..dd5d6ea 100644
--- a/bfd/bfd-in2.h
+++ b/bfd/bfd-in2.h
@@ -6294,6 +6294,9 @@  struct bfd
   /* The filename the application opened the BFD with.  */
   const char *filename;
 
+  /* The file name after being uncompressed.  */
+  const char *uncompressed_filename;
+
   /* A pointer to the target jump table.  */
   const struct bfd_target *xvec;
 
diff --git a/bfd/bfd.c b/bfd/bfd.c
index 5ae5eca..4c78e7b 100644
--- a/bfd/bfd.c
+++ b/bfd/bfd.c
@@ -56,6 +56,9 @@  CODE_FRAGMENT
 .  {* The filename the application opened the BFD with.  *}
 .  const char *filename;
 .
+.  {* The file name after being uncompressed.  *}
+.  const char *uncompressed_filename;
+.
 .  {* A pointer to the target jump table.  *}
 .  const struct bfd_target *xvec;
 .
diff --git a/bfd/bfdio.c b/bfd/bfdio.c
index 89406e3..e82c888 100644
--- a/bfd/bfdio.c
+++ b/bfd/bfdio.c
@@ -382,6 +382,7 @@  bfd_get_mtime (bfd *abfd)
     return 0;
 
   abfd->mtime = buf.st_mtime;		/* Save value in case anyone wants it */
+  abfd->mtime_set = TRUE;		/* Say that it is valid.  */
   return buf.st_mtime;
 }
 
diff --git a/bfd/cache.c b/bfd/cache.c
index 94a82da..f46e9dd 100644
--- a/bfd/cache.c
+++ b/bfd/cache.c
@@ -588,7 +588,11 @@  bfd_open_file (bfd *abfd)
     {
     case read_direction:
     case no_direction:
-      abfd->iostream = real_fopen (abfd->filename, FOPEN_RB);
+      if (abfd->uncompressed_filename)
+        /* BFD was previously uncompressed.  Reopen file.  */
+        abfd->iostream = real_fopen (abfd->uncompressed_filename, FOPEN_RB);
+      else
+        abfd->iostream = real_fopen (abfd->filename, FOPEN_RB);
       break;
     case both_direction:
     case write_direction:
diff --git a/gdb/common/filestuff.c b/gdb/common/filestuff.c
index 14d6324..b2c31fd 100644
--- a/gdb/common/filestuff.c
+++ b/gdb/common/filestuff.c
@@ -19,10 +19,16 @@ 
 #include "common-defs.h"
 #include "filestuff.h"
 #include "gdb_vecs.h"
+#include "hashtab.h"
 #include <fcntl.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <stdio.h>
+
+#ifdef HAVE_ZLIB_H
+#include <zlib.h>
+#endif
 
 #ifdef USE_WIN32API
 #include <winsock2.h>
@@ -47,6 +53,27 @@ 
 #define SOCK_CLOEXEC 0
 #endif
 
+#ifdef HAVE_ZLIB_H
+/* Hash table of compressed files.  */
+
+static htab_t compressed_file_cache;
+
+struct compressed_file_cache_search
+{
+  const char *filename;
+  char *uncompressed_filename;
+  time_t mtime;
+};
+
+static int
+eq_compressed_file (const void *a, const void *b)
+{
+  const struct compressed_file_cache_search *entry = a;
+  const struct compressed_file_cache_search *search = b;
+
+  return (strcmp (entry->filename, search->filename) == 0);
+}
+#endif
 
 
 #ifndef HAVE_FDWALK
@@ -404,3 +431,180 @@  gdb_pipe_cloexec (int filedes[2])
 
   return result;
 }
+
+#ifndef GDBSERVER
+/* Test if file is compressed with gzip.  */
+
+static inline int
+is_gzip (unsigned char *buf)
+{
+  return (buf[0] == 037 && buf[1] == 0213);	/* From /usr/share/magic.  */
+}
+
+#define COMPRESS_BUF_SIZE (1024*1024)
+static int
+decompress_gzip (const char *filename, FILE *tmp)
+{
+#ifdef HAVE_ZLIB_H
+  char *buf = malloc (COMPRESS_BUF_SIZE);
+  gzFile compressed = gzopen (filename, "r");
+  int count, res;
+
+  if (buf == NULL || compressed == NULL)
+    {
+      printf (_("error copying gzip file\n"));
+      free (buf);
+      return 0;
+    }
+
+  while ((count = gzread (compressed, buf, COMPRESS_BUF_SIZE)))
+    {
+      res = fwrite (buf, 1, count, tmp);
+      if (res != count)
+	{
+	  printf (_("error decompressing gzip file\n"));
+          free (buf);
+	  return 0;
+	}
+    }
+
+  fflush (tmp);
+  gzclose (compressed);
+  free (buf);
+  return 1;
+#else
+  return 0;
+#endif
+}
+
+/* Delete uncompressed temp file when terminating.  */
+static void
+do_compressed_cleanup (void *filename)
+{
+  unlink (filename);
+  xfree (filename);
+}
+
+/* If file is compressed, uncompress it into a temporary.  */
+
+int
+gdb_uncompress (const char *filename, char **uncompressed_filename)
+{
+  FILE *handle;
+  struct compressed_file_cache_search search, *found;
+  struct stat st;
+  hashval_t hash;
+  void **slot;
+  static unsigned char buffer[1024];
+  size_t count;
+  enum {NONE, GZIP, BZIP2} file_compression = NONE;
+  int decomp_fd;
+  FILE *decomp_file;
+  int ret = 0;
+  char *tmpdir, *p;
+  char *template = xmalloc(128 + 12 + 7 + 1);
+  if (compressed_file_cache == NULL)
+    compressed_file_cache = htab_create_alloc (1, htab_hash_string,
+						eq_compressed_file,
+						NULL, xcalloc, xfree);
+
+  if ((stat (filename, &st) < 0))
+    return 0;
+
+  search.filename = filename;
+  search.uncompressed_filename = NULL;
+
+  hash = htab_hash_string (filename);
+  found = htab_find_with_hash (compressed_file_cache, &search, hash);
+
+  if (found)
+    {
+      /* We previously uncompressed the file.  */
+      if (found->mtime == st.st_mtime)
+        {
+	  /* Return file if compressed file not changed.  */
+	  *uncompressed_filename = found->uncompressed_filename;
+	  return 1;
+	}
+      else
+        {
+	  /* Delete old uncompressed file.  */
+	  unlink (found->uncompressed_filename);
+	  xfree ((void *)found->filename);
+	  xfree (found->uncompressed_filename);
+        }
+    }
+
+  if ((handle = fopen (filename, "rb")) == NULL)
+    return 0;
+
+  if ((count = fread (buffer, 1, sizeof buffer, handle)) > 0)
+    {
+      if (is_gzip (buffer))
+	file_compression = GZIP;
+    }
+
+  fclose (handle);
+
+  if (file_compression == NONE)
+    return 0;
+
+  /* Create temporary file name for uncompressed file.  */
+  if (!(tmpdir = getenv ("TMPDIR")))
+    tmpdir = "/tmp";
+  strncpy (template, tmpdir, 128);
+  strcat (template, "/");
+  for (p = (char *)filename + strlen (filename) - 1;
+       p >= filename && *p != '/'; p--)  /* find final slash.  */  ;
+  strncat (template, ++p, 128);
+  p = template + strlen (template);
+  if (strcmp (p - 3, ".gz") == 0)
+    *(p - 3) = '\0';
+  strcat (template, "-XXXXXX");
+
+  if ((decomp_fd = mkstemp (template)) != -1)
+    {
+      decomp_file = fdopen (decomp_fd, "w+b");
+
+      if (file_compression == GZIP)
+        {
+	  printf (_("Decompressing %s to %s\n"), filename, template);
+	  ret = decompress_gzip (filename, decomp_file);
+	}
+      else
+        {
+          xfree (template);
+          return 0;
+        }
+      fclose (decomp_file);
+
+      if (ret)
+	{
+	  if (!found)
+	    {
+	      slot = htab_find_slot_with_hash (compressed_file_cache,
+					       &search, hash, INSERT);
+	      gdb_assert (slot && !*slot);
+	      found = xmalloc (sizeof (struct compressed_file_cache_search));
+	      *slot = found;
+	    }
+	  found->filename = strdup (filename);
+	  found->mtime = st.st_mtime;
+	  found->uncompressed_filename = template;
+	}
+    }
+  else
+    {
+      warning (_("Decompression failed\n"));
+      xfree (template);
+      return 0;
+    }
+
+  *uncompressed_filename = template;
+
+  /* Schedule delete of temp file when gdb ends.  */
+  make_final_cleanup (do_compressed_cleanup, xstrdup (template));
+
+  return 1;
+}
+#endif
diff --git a/gdb/common/filestuff.h b/gdb/common/filestuff.h
index 98522a6..7a1d12d 100644
--- a/gdb/common/filestuff.h
+++ b/gdb/common/filestuff.h
@@ -67,4 +67,10 @@  extern int gdb_socket_cloexec (int domain, int style, int protocol);
 
 extern int gdb_pipe_cloexec (int filedes[2]);
 
+/* If 'filename' is compressed, uncompress it and return name of
+   uncompressed file.  */
+
+extern int gdb_uncompress (const char *filename,
+			   char **uncompressed_filename);
+
 #endif /* FILESTUFF_H */
diff --git a/gdb/corelow.c b/gdb/corelow.c
index 9218003..13662a7 100644
--- a/gdb/corelow.c
+++ b/gdb/corelow.c
@@ -279,6 +279,7 @@  core_open (const char *arg, int from_tty)
   int scratch_chan;
   int flags;
   char *filename;
+  char *uncompressed_filename;
 
   target_preopen (from_tty);
   if (!arg)
@@ -316,6 +317,15 @@  core_open (const char *arg, int from_tty)
   if (temp_bfd == NULL)
     perror_with_name (filename);
 
+  if (!write_files && gdb_uncompress (filename, &uncompressed_filename))
+    {
+      close (scratch_chan);
+      scratch_chan = gdb_open_cloexec (uncompressed_filename, flags, 0);
+      temp_bfd = gdb_bfd_fopen (uncompressed_filename, gnutarget,
+				FOPEN_RB, scratch_chan);
+      temp_bfd->uncompressed_filename = uncompressed_filename;
+    }
+
   if (!bfd_check_format (temp_bfd, bfd_core)
       && !gdb_check_format (temp_bfd))
     {
diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index 35dbe86..3b4d389 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -17391,6 +17391,11 @@  via @code{gdbserver} (@pxref{Server, file, Using the @code{gdbserver}
 Program}).  In these situations the @value{GDBN} commands to specify
 new files are useful.
 
+Executable and core files may be compressed using @code{gzip}.  These
+files will be uncompressed into temporary files either in /tmp or in 
+@code{$TMPDIR} if this is set in the environment.  The files will have
+a unique name and will be deleted when @value{GDBN} terminates.
+
 @table @code
 @cindex executable file
 @kindex file
diff --git a/gdb/exec.c b/gdb/exec.c
index 696458d..6d3c1fc 100644
--- a/gdb/exec.c
+++ b/gdb/exec.c
@@ -35,6 +35,7 @@ 
 #include "progspace.h"
 #include "gdb_bfd.h"
 #include "gcore.h"
+#include "filestuff.h"
 
 #include <fcntl.h>
 #include "readline/readline.h"
@@ -155,6 +156,7 @@  void
 exec_file_attach (const char *filename, int from_tty)
 {
   struct cleanup *cleanups;
+  char *uncompressed_filename = NULL;
 
   /* First, acquire a reference to the current exec_bfd.  We release
      this at the end of the function; but acquiring it now lets the
@@ -209,7 +211,18 @@  exec_file_attach (const char *filename, int from_tty)
 	exec_bfd = gdb_bfd_fopen (canonical_pathname, gnutarget,
 				  FOPEN_RUB, scratch_chan);
       else
-	exec_bfd = gdb_bfd_open (canonical_pathname, gnutarget, scratch_chan);
+	{
+          if (!gdb_uncompress (canonical_pathname, &uncompressed_filename))
+	    exec_bfd = gdb_bfd_open (canonical_pathname, gnutarget, scratch_chan);
+          else
+	    {
+	      close (scratch_chan);
+	      scratch_chan = openp ("", 0, uncompressed_filename,
+				    O_RDONLY | O_BINARY, &scratch_pathname);
+
+	      exec_bfd = gdb_bfd_open (uncompressed_filename, gnutarget, scratch_chan);
+	    }
+	}
 
       if (!exec_bfd)
 	{
diff --git a/gdb/symfile.c b/gdb/symfile.c
index c2a71ec..0d173f1 100644
--- a/gdb/symfile.c
+++ b/gdb/symfile.c
@@ -56,6 +56,7 @@ 
 #include "stack.h"
 #include "gdb_bfd.h"
 #include "cli/cli-utils.h"
+#include "filestuff.h"
 
 #include <sys/types.h>
 #include <fcntl.h>
@@ -1739,6 +1740,7 @@  symfile_bfd_open (const char *cname)
   bfd *sym_bfd;
   int desc;
   char *name, *absolute_name;
+  char *uncompressed_filename;
   struct cleanup *back_to;
 
   if (remote_filename_p (cname))
@@ -1783,7 +1785,16 @@  symfile_bfd_open (const char *cname)
   name = absolute_name;
   back_to = make_cleanup (xfree, name);
 
-  sym_bfd = gdb_bfd_open (name, gnutarget, desc);
+  if (!gdb_uncompress (name, &uncompressed_filename))
+    sym_bfd = gdb_bfd_open (name, gnutarget, desc);
+  else
+    {
+      close (desc);
+      desc = openp ("", 0, uncompressed_filename, O_RDONLY | O_BINARY, &absolute_name);
+
+      sym_bfd = gdb_bfd_open (uncompressed_filename, gnutarget, desc);
+    }
+
   if (!sym_bfd)
     error (_("`%s': can't open to read symbols: %s."), name,
 	   bfd_errmsg (bfd_get_error ()));
-- 
2.2.1