libdw: check memory access in get_(u|s)leb128

Message ID 20230125160530.949622-1-vvvvvv@google.com
State Superseded
Headers
Series libdw: check memory access in get_(u|s)leb128 |

Commit Message

lilydjwg--- via Elfutils-devel Jan. 25, 2023, 4:05 p.m. UTC
  From: Aleksei Vetrov <vvvvvv@google.com>

__libdw_get_uleb128 and __libdw_get_sleb128 should check if addrp has
already reached the end before unrolling the first step. It is done by
moving __libdw_max_len to the beginning of the function, which already
has all the checks.

Signed-off-by: Aleksei Vetrov <vvvvvv@google.com>
---
 libdw/memory-access.h | 10 ++++++----
 tests/leb128.c        | 29 ++++++++++++++++++++++++++++-
 2 files changed, 34 insertions(+), 5 deletions(-)
  

Comments

Mark Wielaard Jan. 26, 2023, 9:05 p.m. UTC | #1
Hi Aleksei,

On Wed, Jan 25, 2023 at 04:05:30PM +0000, Aleksei Vetrov via Elfutils-devel wrote:
> From: Aleksei Vetrov <vvvvvv@google.com>
> 
> __libdw_get_uleb128 and __libdw_get_sleb128 should check if addrp has
> already reached the end before unrolling the first step. It is done by
> moving __libdw_max_len to the beginning of the function, which already
> has all the checks.

I understand why you might want to add these extra safeguards. But
when the bounds checking and unrolling for get_uleb128/get_sleb128 was
introduced the idea was that they should only be called with addrp <
endp. Precisely so at least one byte could be read without needing to
bounds check.

See the following commits:

commit 7a053473c7bedd22e3db39c444a4cd8f97eace25
Author: Mark Wielaard <mjw@redhat.com>
Date:   Sun Dec 14 21:48:23 2014 +0100

    libdw: Add get_uleb128 and get_sleb128 bounds checking.
    
    Both get_uleb128 and get_sleb128 now take an end pointer to prevent
    reading too much data. Adjust all callers to provide the end pointer.
    
    There are still two exceptions. "Raw" dwarf_getabbrevattr and
    read_encoded_valued don't have a end pointer associated yet.
    They will have to be provided in the future.
    
    Signed-off-by: Mark Wielaard <mjw@redhat.com>

commit 54662f13d14d59d44943543c48bdb21d50dd008d
Author: Josh Stone <jistone@redhat.com>
Date:   Mon Dec 15 12:18:25 2014 -0800

    libdw: pre-compute leb128 loop limits
    
    Signed-off-by: Josh Stone <jistone@redhat.com>

commit 1b5477ddf360af0f6262c6f15a590448b4e1a65a
Author: Mark Wielaard <mjw@redhat.com>
Date:   Tue Dec 16 10:53:22 2014 +0100

    libdw: Unroll the first get_sleb128 step to help the compiler optimize.
    
    The common case is a single-byte. So no extra (max len) calculation is
    necessary then.
    
    Signed-off-by: Mark Wielaard <mjw@redhat.com>

Now this was 8 years ago and might have been too optimistic. Maybe we
missed some checks? Did you actually find situations where these
functions were called with addrp >= endp?

It turns out that get_[su]leb128 dominates some operations and really
does have to be as fast as possible. So I do like to know what the
impact is of this change.

Thanks,

Mark
  
Aleksei Vetrov Feb. 7, 2023, 4:17 p.m. UTC | #2
Hi Mark,

> Did you actually find situations where these functions were called with
addrp
> >= endp?

Yes, for example libdw/libdw_form.c:91:7.

> It turns out that get_[su]leb128 dominates some operations and really does
> have to be as fast as possible. So I do like to know what the impact is of
> this change.

This patch just moves __libdw_max_len_uleb128 to the beginning of the
function
and adds only one new if. So hopefully it shouldn't affect performance at
all.
  
Mark Wielaard Feb. 7, 2023, 5:17 p.m. UTC | #3
Hi Aleksei,

On Tue, 2023-02-07 at 16:17 +0000, Aleksei Vetrov wrote:
> > Did you actually find situations where these functions were called
> > with addrp
> > > = endp?
> 
> Yes, for example libdw/libdw_form.c:91:7.
> 

Urgh. There are actually 3 places in that function that need a guard.
Then I started reviewing all the library code and came up with more
than 10 other places that had a guard missing (see attached). I'll
clean this up and also audit elflint.c and readelf.c and submit a
proper patch for that.

> > It turns out that get_[su]leb128 dominates some operations and really does
> > have to be as fast as possible. So I do like to know what the impact is of
> > this change.
> 
> This patch just moves __libdw_max_len_uleb128 to the beginning of the function
> and adds only one new if. So hopefully it shouldn't affect performance at all.

Yeah. At first I thought it might be unnecessary because we already
have this check in the calling code. But given that I quickly found 10+
places were that wasn't true I don't feel very confident about that
now...

Then I thought maybe we can just remove the calling code checks and
simply always check like you are proposing. But the checks are slightly
different. The caller guards signal bad data errors, while yours are
hardening checks that make sure no invalid data is read. It is better
to have both. Even if it might slow down the code.

I'll finish the audit of readelf.c and elflint.c and see if adding both
those checks and your hardening checks don't pessimize the code too
much. Hopefully it won't and we can just add all the extra checks.

Cheers,

Mark
  
Mark Wielaard Feb. 11, 2023, 11:42 p.m. UTC | #4
Hi Aleksei,

On Wed, Jan 25, 2023 at 04:05:30PM +0000, Aleksei Vetrov via Elfutils-devel wrote:
> From: Aleksei Vetrov <vvvvvv@google.com>
> 
> __libdw_get_uleb128 and __libdw_get_sleb128 should check if addrp has
> already reached the end before unrolling the first step. It is done by
> moving __libdw_max_len to the beginning of the function, which already
> has all the checks.

I did some performance tests and couldn't find any significant change.
Even with my other extra checks added in libdw and readelf.

One question about the sleb128 case though:

>  static inline int64_t
>  __libdw_get_sleb128 (const unsigned char **addrp, const unsigned char *end)
>  {
> +  const size_t max = __libdw_max_len_sleb128 (*addrp, end);
>    /* Do the work in an unsigned type, but use implementation-defined
>       behavior to cast to signed on return.  This avoids some undefined
>       behavior when shifting.  */
> @@ -131,9 +133,9 @@ __libdw_get_sleb128 (const unsigned char **addrp, const unsigned char *end)
>  
>    /* Unroll the first step to help the compiler optimize
>       for the common single-byte case.  */
> -  get_sleb128_step (acc, *addrp, 0);
> +  if (likely (max > 0))
> +    get_sleb128_step (acc, *addrp, 0);
>  
> -  const size_t max = __libdw_max_len_sleb128 (*addrp - 1, end);
>    for (size_t i = 1; i < max; ++i)
>      get_sleb128_step (acc, *addrp, i);
>    if (*addrp == end)

But what about the case where *addrp > end?
After this code we will do:

  /* There might be one extra byte.  */
  unsigned char b = **addrp;
  ++*addrp;

So I think we want to catch that too.  Easiest imho seems to move (and
invert) the max check immediately after calculating max:

diff --git a/libdw/memory-access.h b/libdw/memory-access.h
index 1cac6af3..72348623 100644
--- a/libdw/memory-access.h
+++ b/libdw/memory-access.h
@@ -126,6 +126,9 @@ static inline int64_t
 __libdw_get_sleb128 (const unsigned char **addrp, const unsigned char *end)
 {
   const size_t max = __libdw_max_len_sleb128 (*addrp, end);
+  if (unlikely (max == 0))
+    return INT64_MAX;
+
   /* Do the work in an unsigned type, but use implementation-defined
      behavior to cast to signed on return.  This avoids some undefined
      behavior when shifting.  */
@@ -133,8 +136,7 @@ __libdw_get_sleb128 (const unsigned char **addrp, const unsigned char *end)
 
   /* Unroll the first step to help the compiler optimize
      for the common single-byte case.  */
-  if (likely (max > 0))
-    get_sleb128_step (acc, *addrp, 0);
+  get_sleb128_step (acc, *addrp, 0);
 
   for (size_t i = 1; i < max; ++i)
     get_sleb128_step (acc, *addrp, i);

Cheers,

Mark
  
Aleksei Vetrov Feb. 13, 2023, 8:03 p.m. UTC | #5
Hi Mark,

On Sat, Feb 11, 2023 at 11:43 PM Mark Wielaard <mark@klomp.org> wrote:
> After this code we will do:
>
>   /* There might be one extra byte.  */
>   unsigned char b = **addrp;
>   ++*addrp;
>
> So I think we want to catch that too.  Easiest imho seems to move (and
> invert) the max check immediately after calculating max.

Thanks for finding this! Sounds good and I'm going to do this also in
uleb128 for consistency. Sending updated patch in reply.
  

Patch

diff --git a/libdw/memory-access.h b/libdw/memory-access.h
index fca4129a..1cac6af3 100644
--- a/libdw/memory-access.h
+++ b/libdw/memory-access.h
@@ -72,13 +72,14 @@  __libdw_max_len_sleb128 (const unsigned char *addr, const unsigned char *end)
 static inline uint64_t
 __libdw_get_uleb128 (const unsigned char **addrp, const unsigned char *end)
 {
+  const size_t max = __libdw_max_len_uleb128 (*addrp, end);
   uint64_t acc = 0;
 
   /* Unroll the first step to help the compiler optimize
      for the common single-byte case.  */
-  get_uleb128_step (acc, *addrp, 0);
+  if (likely (max > 0))
+    get_uleb128_step (acc, *addrp, 0);
 
-  const size_t max = __libdw_max_len_uleb128 (*addrp - 1, end);
   for (size_t i = 1; i < max; ++i)
     get_uleb128_step (acc, *addrp, i);
   /* Other implementations set VALUE to UINT_MAX in this
@@ -124,6 +125,7 @@  __libdw_get_uleb128_unchecked (const unsigned char **addrp)
 static inline int64_t
 __libdw_get_sleb128 (const unsigned char **addrp, const unsigned char *end)
 {
+  const size_t max = __libdw_max_len_sleb128 (*addrp, end);
   /* Do the work in an unsigned type, but use implementation-defined
      behavior to cast to signed on return.  This avoids some undefined
      behavior when shifting.  */
@@ -131,9 +133,9 @@  __libdw_get_sleb128 (const unsigned char **addrp, const unsigned char *end)
 
   /* Unroll the first step to help the compiler optimize
      for the common single-byte case.  */
-  get_sleb128_step (acc, *addrp, 0);
+  if (likely (max > 0))
+    get_sleb128_step (acc, *addrp, 0);
 
-  const size_t max = __libdw_max_len_sleb128 (*addrp - 1, end);
   for (size_t i = 1; i < max; ++i)
     get_sleb128_step (acc, *addrp, i);
   if (*addrp == end)
diff --git a/tests/leb128.c b/tests/leb128.c
index 47b57c0d..03090d80 100644
--- a/tests/leb128.c
+++ b/tests/leb128.c
@@ -117,6 +117,19 @@  test_sleb (void)
   return OK;
 }
 
+static int
+test_sleb_safety (void)
+{
+  const int64_t expected_error = INT64_MAX;
+  int64_t value;
+  const unsigned char *test = NULL;
+  get_sleb128 (value, test, test);
+  if (value != expected_error)
+    return FAIL;
+
+  return OK;
+}
+
 static int
 test_one_uleb (const unsigned char *data, size_t len, uint64_t expect)
 {
@@ -166,8 +179,22 @@  test_uleb (void)
   return OK;
 }
 
+static int
+test_uleb_safety (void)
+{
+  const uint64_t expected_error = UINT64_MAX;
+  uint64_t value;
+  const unsigned char *test = NULL;
+  get_uleb128 (value, test, test);
+  if (value != expected_error)
+    return FAIL;
+
+  return OK;
+}
+
 int
 main (void)
 {
-  return test_sleb () || test_uleb ();
+  return test_sleb () || test_sleb_safety () || test_uleb ()
+	 || test_uleb_safety ();
 }