[RFC] readelf: partial support of ZSTD compression

Message ID 542eb279-d15d-6f17-02c0-c53fd0f33055@suse.cz
State Changes Requested
Headers
Series [RFC] readelf: partial support of ZSTD compression |

Commit Message

Martin Liška Oct. 24, 2022, 11:09 a.m. UTC
  Support decompression of ZSTD sections and add support
for it when -SWz is used:

...
[30] .debug_abbrev        PROGBITS     0000000000000000 00001f9d 00000168  0 C      0   0  1
     [ELF ZSTD (2) 000002fc  1]
...

One TODO I see is that:
+libelf_so_LDLIBS = $(libelf_so_DEPS) -lz -lzstd

should be conditional based on HAVE_ZSTD. But I don't know how to do that?

And the second part should cover elfcompress where it should support ZSTD compression,
leaving that for now.

Martin

libelf/ChangeLog:

	* Makefile.am: Add -lzstd.
	* elf.h (ELFCOMPRESS_ZSTD): Add new value.
	* elf_compress.c (__libelf_decompress): Dispatch based
	on the compression algorithm.
	(__libelf_decompress_zlib): New.
	(__libelf_decompress_zstd): New.
	(__libelf_decompress_elf): Pass type of compression to
	__libelf_decompress.
	* elf_compress_gnu.c (elf_compress_gnu): Use ELFCOMPRESS_ZLIB
	as .z* sections can be only compressed with ZLIB.
	* libelfP.h (__libelf_decompress): Change signature.

m4/ChangeLog:

	* zstd.m4: New file.

src/ChangeLog:

	* readelf.c (elf_ch_type_name): Use switch and support ZSTD.
---
 libelf/Makefile.am        |  2 +-
 libelf/elf.h              |  3 +++
 libelf/elf_compress.c     | 56 ++++++++++++++++++++++++++++++++++++---
 libelf/elf_compress_gnu.c |  2 +-
 libelf/libelfP.h          |  2 +-
 m4/zstd.m4                | 23 ++++++++++++++++
 src/readelf.c             | 18 ++++++++-----
 7 files changed, 93 insertions(+), 13 deletions(-)
 create mode 100644 m4/zstd.m4
  

Comments

Dmitry V. Levin Oct. 24, 2022, 11:41 a.m. UTC | #1
On Mon, Oct 24, 2022 at 01:09:59PM +0200, Martin Liška wrote:
[...]
> One TODO I see is that:
> +libelf_so_LDLIBS = $(libelf_so_DEPS) -lz -lzstd
> 
> should be conditional based on HAVE_ZSTD. But I don't know how to do that?

I suppose you're talking about libzstd_LIBS.

[...]
> diff --git a/m4/zstd.m4 b/m4/zstd.m4
> new file mode 100644
> index 00000000..6da4db68
> --- /dev/null
> +++ b/m4/zstd.m4
> @@ -0,0 +1,23 @@
> +dnl Copyright (C) 2022 Free Software Foundation, Inc.
> +dnl This file is free software, distributed under the terms of the GNU
> +dnl General Public License.  As a special exception to the GNU General
> +dnl Public License, this file may be distributed as part of a program
> +dnl that contains a configuration script generated by Autoconf, under
> +dnl the same distribution terms as the rest of that program.
> +
> +dnl Enable features using the zstd library.
> +AC_DEFUN([AC_ZSTD], [
> +AC_ARG_WITH(zstd,
> +  [AS_HELP_STRING([--with-zstd], [support zstd compressed debug sections (default=auto)])],
> +  [], [with_zstd=auto])

Where does this code come from?
I though the "AC_" prefix is reserved for the GNU Autoconf.
Also, looks like it would be more appropriate to call it --enable-zstd
rather than --with-zstd.
  
Martin Liška Oct. 24, 2022, 12:17 p.m. UTC | #2
On 10/24/22 13:41, Dmitry V. Levin wrote:
> On Mon, Oct 24, 2022 at 01:09:59PM +0200, Martin Liška wrote:
> [...]
>> One TODO I see is that:
>> +libelf_so_LDLIBS = $(libelf_so_DEPS) -lz -lzstd
>>
>> should be conditional based on HAVE_ZSTD. But I don't know how to do that?
> 
> I suppose you're talking about libzstd_LIBS.

Hm, can't see it after autoreconf -fi and ./configure.

> 
> [...]
>> diff --git a/m4/zstd.m4 b/m4/zstd.m4
>> new file mode 100644
>> index 00000000..6da4db68
>> --- /dev/null
>> +++ b/m4/zstd.m4
>> @@ -0,0 +1,23 @@
>> +dnl Copyright (C) 2022 Free Software Foundation, Inc.
>> +dnl This file is free software, distributed under the terms of the GNU
>> +dnl General Public License.  As a special exception to the GNU General
>> +dnl Public License, this file may be distributed as part of a program
>> +dnl that contains a configuration script generated by Autoconf, under
>> +dnl the same distribution terms as the rest of that program.
>> +
>> +dnl Enable features using the zstd library.
>> +AC_DEFUN([AC_ZSTD], [
>> +AC_ARG_WITH(zstd,
>> +  [AS_HELP_STRING([--with-zstd], [support zstd compressed debug sections (default=auto)])],
>> +  [], [with_zstd=auto])
> 
> Where does this code come from?
> I though the "AC_" prefix is reserved for the GNU Autoconf.

It comes from binutils './config/zstd.m4' file.

> Also, looks like it would be more appropriate to call it --enable-zstd
> rather than --with-zstd.
> 

Ah, I see.

Thanks,
Martin
  
Dmitry V. Levin Oct. 24, 2022, 4:48 p.m. UTC | #3
On Mon, Oct 24, 2022 at 02:17:17PM +0200, Martin Liška wrote:
> On 10/24/22 13:41, Dmitry V. Levin wrote:
> > On Mon, Oct 24, 2022 at 01:09:59PM +0200, Martin Liška wrote:
> > [...]
> >> One TODO I see is that:
> >> +libelf_so_LDLIBS = $(libelf_so_DEPS) -lz -lzstd
> >>
> >> should be conditional based on HAVE_ZSTD. But I don't know how to do that?
> > 
> > I suppose you're talking about libzstd_LIBS.
> 
> Hm, can't see it after autoreconf -fi and ./configure.

That's because you do
PKG_CHECK_MODULES(ZSTD, [libzstd], ...)
and this defines ZSTD_CFLAGS and ZSTD_LIBS instead of libzstd_CFLAGS
and libzstd_LIBS because PKG_CHECK_MODULES() uses its first argument
as the prefix for these variables.
  
Martin Liška Oct. 24, 2022, 6:16 p.m. UTC | #4
On 10/24/22 18:48, Dmitry V. Levin wrote:
> On Mon, Oct 24, 2022 at 02:17:17PM +0200, Martin Liška wrote:
>> On 10/24/22 13:41, Dmitry V. Levin wrote:
>>> On Mon, Oct 24, 2022 at 01:09:59PM +0200, Martin Liška wrote:
>>> [...]
>>>> One TODO I see is that:
>>>> +libelf_so_LDLIBS = $(libelf_so_DEPS) -lz -lzstd
>>>>
>>>> should be conditional based on HAVE_ZSTD. But I don't know how to do that?
>>>
>>> I suppose you're talking about libzstd_LIBS.
>>
>> Hm, can't see it after autoreconf -fi and ./configure.
> 
> That's because you do
> PKG_CHECK_MODULES(ZSTD, [libzstd], ...)
> and this defines ZSTD_CFLAGS and ZSTD_LIBS instead of libzstd_CFLAGS
> and libzstd_LIBS because PKG_CHECK_MODULES() uses its first argument
> as the prefix for these variables.
> 
> 

Thank you. Apparently, I collided with the existing:
eu_ZIPLIB(zstd,ZSTD,zstd,ZSTD_decompress,[ZSTD (zst)])

Anyway, I'm sending V2 that works fine --with-zstd and --without-zstd as expected.

Ready for master?

Thanks,
Martin
  
Mark Wielaard Oct. 28, 2022, 10:21 p.m. UTC | #5
Hi Martin,

On Mon, Oct 24, 2022 at 08:16:09PM +0200, Martin Liška wrote:
> Anyway, I'm sending V2 that works fine --with-zstd and --without-zstd as expected.
> 
> Ready for master?

The decompression part looks good. Few small nitpicks below.

Although I like to also have compression working.  Then we can also
add support to src/elfcompress, which makes for a good testcase. See
tests/run-compress.sh (which has found some subtle memory issues when
run under valgrind).

> From 4aea412783b9b0dcaf0f887947bf2e8ee6c5368b Mon Sep 17 00:00:00 2001
> From: Martin Liska <mliska@suse.cz>
> Date: Mon, 24 Oct 2022 11:53:13 +0200
> Subject: [PATCH] readelf: partial support of ZSTD compression
> 
> Support decompression of ZSTD sections and add support
> for it when -SWz is used:
> 
> ...
> [30] .debug_abbrev        PROGBITS     0000000000000000 00001f9d 00000168  0 C      0   0  1
>      [ELF ZSTD (2) 000002fc  1]
> ...

And for this it would be nice to have a tests/run-readelf-z.sh testcase.

> ChangeLog:
> 
> 	* configure.ac: Add zstd_LIBS.
> 
> libelf/ChangeLog:
> 
> 	* Makefile.am: Use zstd_LIBS.
> 	* elf.h (ELFCOMPRESS_ZSTD): Add new value.
> 	* elf_compress.c (__libelf_decompress): Dispatch based
> 	on the compression algorithm.
> 	(__libelf_decompress_zlib): New.
> 	(__libelf_decompress_zstd): New.
> 	(__libelf_decompress_elf): Pass type of compression to
> 	__libelf_decompress.
> 	* elf_compress_gnu.c (elf_compress_gnu): Use ELFCOMPRESS_ZLIB
> 	as .z* sections can be only compressed with ZLIB.
> 	* libelfP.h (__libelf_decompress): Change signature.
> 
> src/ChangeLog:
> 
> 	* readelf.c (elf_ch_type_name): Use switch and support ZSTD.

> diff --git a/libelf/elf.h b/libelf/elf.h
> index 02a1b3f5..f0f0ec7d 100644
> --- a/libelf/elf.h
> +++ b/libelf/elf.h

We normally sync elf.h from glibc. I'll do that in a second.

> +#ifdef USE_ZSTD
> +void *
> +internal_function
> +__libelf_decompress_zstd (void *buf_in, size_t size_in, size_t size_out)
> +{
> +  /* Malloc might return NULL when requestion zero size.  This is highly

requesting

> +     unlikely, it would only happen when the compression was forced.
> +     But we do need a non-NULL buffer to return and set as result.
> +     Just make sure to always allocate at least 1 byte.  */

> +void *
> +internal_function
> +__libelf_decompress (int chtype, void *buf_in, size_t size_in, size_t size_out)
> +{
> +  if (chtype == ELFCOMPRESS_ZLIB)
> +    return __libelf_decompress_zlib (buf_in, size_in, size_out);
> +  else
> +    {
> +#ifdef USE_ZSTD
> +    return __libelf_decompress_zstd (buf_in, size_in, size_out);
> +#else
> +    return 0;
> +#endif

return NULL for consistency?

> +    }
> +}
> +
>  void *
>  internal_function
>  __libelf_decompress_elf (Elf_Scn *scn, size_t *size_out, size_t *addralign)
> @@ -268,7 +316,7 @@ __libelf_decompress_elf (Elf_Scn *scn, size_t *size_out, size_t *addralign)
>    if (gelf_getchdr (scn, &chdr) == NULL)
>      return NULL;
>  
> -  if (chdr.ch_type != ELFCOMPRESS_ZLIB)
> +  if (chdr.ch_type != ELFCOMPRESS_ZLIB && chdr.ch_type != ELFCOMPRESS_ZSTD)
>      {

What about the ifndef USE_ZSTD case? Should this then not recognize
ELFCOMPRESS_ZSTD?

Thanks,

Mark
  
Martin Liška Nov. 28, 2022, 1:16 p.m. UTC | #6
On 10/29/22 00:21, Mark Wielaard wrote:
> Although I like to also have compression working.  Then we can also
> add support to src/elfcompress, which makes for a good testcase. See
> tests/run-compress.sh (which has found some subtle memory issues when
> run under valgrind).

Hi.

All right, so I'm preparing a full support for ZSTD (both compression and compression)
and I noticed a refactoring would be handy for compress_section function and callers
of the function.

Note right now, there are basically 3 compression types and process_file
function handles basically all combinations of these (3 x 3 options), while adding ZSTD
support would make it even more complicated. However, ZSTD will behave very similar to ZLIB
(not zlib-gnu), except a different algorithm will be used. Plus, in order to distinguish
ZLIB from ZSTD, we need to read GElf_Chdr.

So what do you think about the refactoring as the first step?

Cheers,
Martin
  
Mark Wielaard Nov. 28, 2022, 10:29 p.m. UTC | #7
Hi Martin,

On Mon, Nov 28, 2022 at 02:16:35PM +0100, Martin Liška wrote:
> On 10/29/22 00:21, Mark Wielaard wrote:
> > Although I like to also have compression working.  Then we can also
> > add support to src/elfcompress, which makes for a good testcase. See
> > tests/run-compress.sh (which has found some subtle memory issues when
> > run under valgrind).
> 
> All right, so I'm preparing a full support for ZSTD (both compression and compression)
> and I noticed a refactoring would be handy for compress_section function and callers
> of the function.
> 
> Note right now, there are basically 3 compression types and process_file
> function handles basically all combinations of these (3 x 3 options), while adding ZSTD
> support would make it even more complicated. However, ZSTD will behave very similar to ZLIB
> (not zlib-gnu), except a different algorithm will be used. Plus, in order to distinguish
> ZLIB from ZSTD, we need to read GElf_Chdr.
> 
> So what do you think about the refactoring as the first step?

Looks good. I would just add a sanity check that chdr.ch_type is of a
known compression type (it is only checked to be not NONE atm).

Cheers,

Mark
  
Martin Liška Nov. 29, 2022, 9:34 a.m. UTC | #8
On 11/28/22 23:29, Mark Wielaard wrote:
> Hi Martin,
> 
> On Mon, Nov 28, 2022 at 02:16:35PM +0100, Martin Liška wrote:
>> On 10/29/22 00:21, Mark Wielaard wrote:
>>> Although I like to also have compression working.  Then we can also
>>> add support to src/elfcompress, which makes for a good testcase. See
>>> tests/run-compress.sh (which has found some subtle memory issues when
>>> run under valgrind).
>>
>> All right, so I'm preparing a full support for ZSTD (both compression and compression)
>> and I noticed a refactoring would be handy for compress_section function and callers
>> of the function.
>>
>> Note right now, there are basically 3 compression types and process_file
>> function handles basically all combinations of these (3 x 3 options), while adding ZSTD
>> support would make it even more complicated. However, ZSTD will behave very similar to ZLIB
>> (not zlib-gnu), except a different algorithm will be used. Plus, in order to distinguish
>> ZLIB from ZSTD, we need to read GElf_Chdr.
>>
>> So what do you think about the refactoring as the first step?
> 
> Looks good. I would just add a sanity check that chdr.ch_type is of a
> known compression type (it is only checked to be not NONE atm).

Thanks, fixes that and pushed as 6bb3e0b5. Now I can finish working on the ZSTD support.

Cheers,
Martin

> 
> Cheers,
> 
> Mark
> 
>
  

Patch

diff --git a/libelf/Makefile.am b/libelf/Makefile.am
index 560ed45f..33f3a0a1 100644
--- a/libelf/Makefile.am
+++ b/libelf/Makefile.am
@@ -106,7 +106,7 @@  libelf_pic_a_SOURCES =
 am_libelf_pic_a_OBJECTS = $(libelf_a_SOURCES:.c=.os)
 
 libelf_so_DEPS = ../lib/libeu.a
-libelf_so_LDLIBS = $(libelf_so_DEPS) -lz
+libelf_so_LDLIBS = $(libelf_so_DEPS) -lz -lzstd
 if USE_LOCKS
 libelf_so_LDLIBS += -lpthread
 endif
diff --git a/libelf/elf.h b/libelf/elf.h
index 02a1b3f5..f0f0ec7d 100644
--- a/libelf/elf.h
+++ b/libelf/elf.h
@@ -506,6 +506,9 @@  typedef struct
 
 /* Legal values for ch_type (compression algorithm).  */
 #define ELFCOMPRESS_ZLIB	1	   /* ZLIB/DEFLATE algorithm.  */
+#define ELFCOMPRESS_ZSTD	2	   /* Compressed with zstd  */
+					   /* (see http://www.zstandard.org). */
+
 #define ELFCOMPRESS_LOOS	0x60000000 /* Start of OS-specific.  */
 #define ELFCOMPRESS_HIOS	0x6fffffff /* End of OS-specific.  */
 #define ELFCOMPRESS_LOPROC	0x70000000 /* Start of processor-specific.  */
diff --git a/libelf/elf_compress.c b/libelf/elf_compress.c
index d7f53af2..62b41b20 100644
--- a/libelf/elf_compress.c
+++ b/libelf/elf_compress.c
@@ -39,6 +39,10 @@ 
 #include <string.h>
 #include <zlib.h>
 
+#ifdef USE_ZSTD
+#include <zstd.h>
+#endif
+
 /* Cleanup and return result.  Don't leak memory.  */
 static void *
 do_deflate_cleanup (void *result, z_stream *z, void *out_buf,
@@ -207,7 +211,7 @@  __libelf_compress (Elf_Scn *scn, size_t hsize, int ei_data,
 
 void *
 internal_function
-__libelf_decompress (void *buf_in, size_t size_in, size_t size_out)
+__libelf_decompress_zlib (void *buf_in, size_t size_in, size_t size_out)
 {
   /* Catch highly unlikely compression ratios so we don't allocate
      some giant amount of memory for nothing. The max compression
@@ -260,6 +264,50 @@  __libelf_decompress (void *buf_in, size_t size_in, size_t size_out)
   return buf_out;
 }
 
+#ifdef USE_ZSTD
+void *
+internal_function
+__libelf_decompress_zstd (void *buf_in, size_t size_in, size_t size_out)
+{
+  /* Malloc might return NULL when requestion zero size.  This is highly
+     unlikely, it would only happen when the compression was forced.
+     But we do need a non-NULL buffer to return and set as result.
+     Just make sure to always allocate at least 1 byte.  */
+  void *buf_out = malloc (size_out ?: 1);
+  if (unlikely (buf_out == NULL))
+    {
+      __libelf_seterrno (ELF_E_NOMEM);
+      return NULL;
+    }
+
+  size_t ret = ZSTD_decompress (buf_out, size_out, buf_in, size_in);
+  if (ZSTD_isError (ret))
+    {
+      free (buf_out);
+      __libelf_seterrno (ELF_E_DECOMPRESS_ERROR);
+      return NULL;
+    }
+  else
+    return buf_out;
+}
+#endif
+
+void *
+internal_function
+__libelf_decompress (int chtype, void *buf_in, size_t size_in, size_t size_out)
+{
+  if (chtype == ELFCOMPRESS_ZLIB)
+    return __libelf_decompress_zlib (buf_in, size_in, size_out);
+  else
+    {
+#ifdef USE_ZSTD
+    return __libelf_decompress_zstd (buf_in, size_in, size_out);
+#else
+    return 0;
+#endif
+    }
+}
+
 void *
 internal_function
 __libelf_decompress_elf (Elf_Scn *scn, size_t *size_out, size_t *addralign)
@@ -268,7 +316,7 @@  __libelf_decompress_elf (Elf_Scn *scn, size_t *size_out, size_t *addralign)
   if (gelf_getchdr (scn, &chdr) == NULL)
     return NULL;
 
-  if (chdr.ch_type != ELFCOMPRESS_ZLIB)
+  if (chdr.ch_type != ELFCOMPRESS_ZLIB && chdr.ch_type != ELFCOMPRESS_ZSTD)
     {
       __libelf_seterrno (ELF_E_UNKNOWN_COMPRESSION_TYPE);
       return NULL;
@@ -295,7 +343,9 @@  __libelf_decompress_elf (Elf_Scn *scn, size_t *size_out, size_t *addralign)
 		  ? sizeof (Elf32_Chdr) : sizeof (Elf64_Chdr));
   size_t size_in = data->d_size - hsize;
   void *buf_in = data->d_buf + hsize;
-  void *buf_out = __libelf_decompress (buf_in, size_in, chdr.ch_size);
+  void *buf_out
+    = __libelf_decompress (chdr.ch_type, buf_in, size_in, chdr.ch_size);
+
   *size_out = chdr.ch_size;
   *addralign = chdr.ch_addralign;
   return buf_out;
diff --git a/libelf/elf_compress_gnu.c b/libelf/elf_compress_gnu.c
index 3d2977e7..be9e990e 100644
--- a/libelf/elf_compress_gnu.c
+++ b/libelf/elf_compress_gnu.c
@@ -178,7 +178,7 @@  elf_compress_gnu (Elf_Scn *scn, int inflate, unsigned int flags)
       size_t size = gsize;
       size_t size_in = data->d_size - hsize;
       void *buf_in = data->d_buf + hsize;
-      void *buf_out = __libelf_decompress (buf_in, size_in, size);
+      void *buf_out = __libelf_decompress (ELFCOMPRESS_ZLIB, buf_in, size_in, size);
       if (buf_out == NULL)
 	return -1;
 
diff --git a/libelf/libelfP.h b/libelf/libelfP.h
index d88a613c..ab82357c 100644
--- a/libelf/libelfP.h
+++ b/libelf/libelfP.h
@@ -577,7 +577,7 @@  extern void * __libelf_compress (Elf_Scn *scn, size_t hsize, int ei_data,
 				 size_t *size, bool force)
      internal_function;
 
-extern void * __libelf_decompress (void *buf_in, size_t size_in,
+extern void * __libelf_decompress (int chtype, void *buf_in, size_t size_in,
 				   size_t size_out) internal_function;
 extern void * __libelf_decompress_elf (Elf_Scn *scn,
 				       size_t *size_out, size_t *addralign)
diff --git a/m4/zstd.m4 b/m4/zstd.m4
new file mode 100644
index 00000000..6da4db68
--- /dev/null
+++ b/m4/zstd.m4
@@ -0,0 +1,23 @@ 
+dnl Copyright (C) 2022 Free Software Foundation, Inc.
+dnl This file is free software, distributed under the terms of the GNU
+dnl General Public License.  As a special exception to the GNU General
+dnl Public License, this file may be distributed as part of a program
+dnl that contains a configuration script generated by Autoconf, under
+dnl the same distribution terms as the rest of that program.
+
+dnl Enable features using the zstd library.
+AC_DEFUN([AC_ZSTD], [
+AC_ARG_WITH(zstd,
+  [AS_HELP_STRING([--with-zstd], [support zstd compressed debug sections (default=auto)])],
+  [], [with_zstd=auto])
+
+AS_IF([test "$with_zstd" != no],
+  [PKG_CHECK_MODULES(ZSTD, [libzstd], [
+    AC_DEFINE(HAVE_ZSTD, 1, [Define to 1 if zstd is enabled.])
+  ], [
+    if test "$with_zstd" = yes; then
+      AC_MSG_ERROR([--with-zstd was given, but pkgconfig/libzstd.pc is not found])
+    fi
+  ])
+  ])
+])
diff --git a/src/readelf.c b/src/readelf.c
index a206e60e..1af20e35 100644
--- a/src/readelf.c
+++ b/src/readelf.c
@@ -1229,13 +1229,17 @@  get_visibility_type (int value)
 static const char *
 elf_ch_type_name (unsigned int code)
 {
-  if (code == 0)
-    return "NONE";
-
-  if (code == ELFCOMPRESS_ZLIB)
-    return "ZLIB";
-
-  return "UNKNOWN";
+  switch (code)
+    {
+    case 0:
+      return "NONE";
+    case ELFCOMPRESS_ZLIB:
+      return "ZLIB";
+    case ELFCOMPRESS_ZSTD:
+      return "ZSTD";
+    default:
+      return "UNKNOWN";
+    }
 }
 
 /* Print the section headers.  */