malloc/malloc.c: Mitigate null-byte overflow attacks

Message ID	82d1760e-ef0f-9f0f-57be-3848f2b8d0ad@cs.ucsb.edu
State	New, archived
Headers	Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org Subject: Re: [PATCH] malloc/malloc.c: Mitigate null-byte overflow attacks To: DJ Delorie <dj@redhat.com> Cc: libc-alpha@sourceware.org, scarybeasts@gmail.com, fweimer@redhat.com References: <xn4lqp4laq.fsf@greed.delorie.com> From: Moritz Eckert <m.eckert@cs.ucsb.edu> Message-ID: <82d1760e-ef0f-9f0f-57be-3848f2b8d0ad@cs.ucsb.edu> Date: Thu, 26 Oct 2017 20:17:26 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <xn4lqp4laq.fsf@greed.delorie.com> Content-Type: multipart/mixed; boundary="------------EBBB815F5B0299976973A9F8"

Message ID

82d1760e-ef0f-9f0f-57be-3848f2b8d0ad@cs.ucsb.edu

State

New, archived

Headers

Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
Subject: Re: [PATCH] malloc/malloc.c: Mitigate null-byte overflow attacks
To: DJ Delorie <dj@redhat.com>
Cc: libc-alpha@sourceware.org, scarybeasts@gmail.com, fweimer@redhat.com
References: <xn4lqp4laq.fsf@greed.delorie.com>
From: Moritz Eckert <m.eckert@cs.ucsb.edu>
Message-ID: <82d1760e-ef0f-9f0f-57be-3848f2b8d0ad@cs.ucsb.edu>
Date: Thu, 26 Oct 2017 20:17:26 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
	Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <xn4lqp4laq.fsf@greed.delorie.com>
Content-Type: multipart/mixed;
	boundary="------------EBBB815F5B0299976973A9F8"

Commit Message

Moritz Eckert Oct. 27, 2017, 3:17 a.m. UTC

  >>> I wonder if we should add a "size_is_sane()" macro to check for
>>> unreasonable sizes before we use them to compute pointers.
>> That sounds like a good idea to me. Would you prefer a separate macro
>> for prev_size and size that only gets the current chunk as a parameter or
>> a single macro that gets a parameter what to check for?
> 
> I don't know, I was just wondering if there were some other way to
> determine that a size has been corrupted other than consistency checks.

Here would be my idea for a macro. Do you like that better?

The macro could also do the whole stepping backward in memory, as in 
"prev_chunk_secure(p)", which points p to the prev_chunk or
aborts otherwise.

Also would you prefer to keep the current check in unlink, or remove it 
entirely?

Comments

Florian Weimer Oct. 30, 2017, 2:28 p.m. UTC | #1

On 10/27/2017 05:17 AM, Moritz Eckert wrote:
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index d3fcadd20e..73fc98d2c3 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -1426,6 +1426,12 @@ typedef struct malloc_chunk *mbinptr;
>         }									      \
>   }
>   
> +/* Check if prev_size is consistent with size*/
> +#define prev_size_is_sane(p, prevsize) {				   \
> +    if (__builtin_expect (prevsize !=  chunksize (p), 0))		      \
> +        malloc_printerr ("corrupted size vs. prev_size");		      \
> +}
> +
>   /*
>      Indexing
>   
> @@ -4227,6 +4233,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
>         prevsize = prev_size (p);
>         size += prevsize;
>         p = chunk_at_offset(p, -((long) prevsize));
> +      prev_size_is_sane(p, prevsize)
>         unlink(av, p, bck, fwd);
>       }

More context for the unpatched code:

     /* consolidate backward */
     if (!prev_inuse(p)) {
       prevsize = prev_size (p);
       size += prevsize;
       p = chunk_at_offset(p, -((long) prevsize));
       unlink(av, p, bck, fwd);
     }

I'm not really familiar with the malloc code, so I'll try to summarize 
my understanding.

p is the pointer being freed.  I assume the goal is to mitigate a 
single-byte overwrite of the lowest byte of the chunk header for p.  The 
overwrite clears the PREV_INUSE bit, and therefore triggers 
reinterpretation of the allocated object contents as heap metadata 
because free will now read the previous size (prev_size) at p.  The 
value of prev_size is very likely under the control of the exploit 
writer because it comes *before* the overwritten byte.  So I think it 
would still be feasible to install a fake chunk and conduct an 
unlink-based attack.

As far as I can tell, there are three ways to mitigate single-NUL write 
attacks reliably:

(1) Store the chunk size (with the PREV_INUSE) in something else but 
little-endian format.  We could use big endian, or simply a rotated 
representation, whatever has lower run-time overhead.  (This is 
something which could be done as part of the heap protector, or by 
itself.  The existing accessor macros should directly support this.)

(2) Flip the meaning of the PREV_INUSE bit, so that the attacker would 
have to write a non-NUL byte.

(3) Always allocate at least one byte more than the application needs, 
so that single-byte overflows never touch metadata.  The heap arena 
layout does not need the a full word of metadata for allocated chunks:

   * 32-bit: 1 MiB per heap, minimum allocation 8, 17 bit counter
   * 64-bit: 64 MiB per heap, minimum allocation 16, 22 bit counter

So we could compress the header to 3 bytes: We can encode the IS_MMAPPED 
bit with an otherwise impossible size counter and store the actual chunk 
size in a separate field.  The PREV_INUSE and NON_MAIN_ARENA bits would 
still be needed.  And we would have to use mmap for large allocations 
instead of sbrk even on the main arena.

Thanks,
Florian

Moritz Eckert Oct. 30, 2017, 7:29 p.m. UTC | #2

On 10/30/2017 07:28 AM, Florian Weimer wrote:
> On 10/27/2017 05:17 AM, Moritz Eckert wrote:
>> diff --git a/malloc/malloc.c b/malloc/malloc.c
>> index d3fcadd20e..73fc98d2c3 100644
>> --- a/malloc/malloc.c
>> +++ b/malloc/malloc.c
>> @@ -1426,6 +1426,12 @@ typedef struct malloc_chunk *mbinptr;
>>         }                                          \
>>   }
>> +/* Check if prev_size is consistent with size*/
>> +#define prev_size_is_sane(p, prevsize) {                   \
>> +    if (__builtin_expect (prevsize !=  chunksize (p), 0))              \
>> +        malloc_printerr ("corrupted size vs. prev_size");              \
>> +}
>> +
>>   /*
>>      Indexing
>> @@ -4227,6 +4233,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
>>         prevsize = prev_size (p);
>>         size += prevsize;
>>         p = chunk_at_offset(p, -((long) prevsize));
>> +      prev_size_is_sane(p, prevsize)
>>         unlink(av, p, bck, fwd);
>>       }
> 
> More context for the unpatched code:
> 
>      /* consolidate backward */
>      if (!prev_inuse(p)) {
>        prevsize = prev_size (p);
>        size += prevsize;
>        p = chunk_at_offset(p, -((long) prevsize));
>        unlink(av, p, bck, fwd);
>      }
> 
> I'm not really familiar with the malloc code, so I'll try to summarize 
> my understanding.
> 
> p is the pointer being freed.  I assume the goal is to mitigate a 
> single-byte overwrite of the lowest byte of the chunk header for p.  The 
> overwrite clears the PREV_INUSE bit, and therefore triggers 
> reinterpretation of the allocated object contents as heap metadata 
> because free will now read the previous size (prev_size) at p.  The 
> value of prev_size is very likely under the control of the exploit 
> writer because it comes *before* the overwritten byte.  So I think it 
> would still be feasible to install a fake chunk and conduct an 
> unlink-based attack.

Unfortunately, there is a misunderstanding here.
This is not a mitigation against unlink attacks in general. So the 
overflow is not into the lowest byte of p's size. The overflow happens 
into the size field of the 'real' previous chunk of p, so that a 
prev_size update is lost and p has a false prev_size at the time of the 
unlink, but it is not controlled. That's important, because as you 
pointed out, any unlink against a fake chunk is not preventable by those 
checks.

> As far as I can tell, there are three ways to mitigate single-NUL write 
> attacks reliably:
> 
> (1) Store the chunk size (with the PREV_INUSE) in something else but 
> little-endian format.  We could use big endian, or simply a rotated 
> representation, whatever has lower run-time overhead.  (This is 
> something which could be done as part of the heap protector, or by 
> itself.  The existing accessor macros should directly support this.)
> 
> (2) Flip the meaning of the PREV_INUSE bit, so that the attacker would 
> have to write a non-NUL byte.
> 
> (3) Always allocate at least one byte more than the application needs, 
> so that single-byte overflows never touch metadata.  The heap arena 
> layout does not need the a full word of metadata for allocated chunks:
> 
>    * 32-bit: 1 MiB per heap, minimum allocation 8, 17 bit counter
>    * 64-bit: 64 MiB per heap, minimum allocation 16, 22 bit counter
> 
> So we could compress the header to 3 bytes: We can encode the IS_MMAPPED 
> bit with an otherwise impossible size counter and store the actual chunk 
> size in a separate field.  The PREV_INUSE and NON_MAIN_ARENA bits would 
> still be needed.  And we would have to use mmap for large allocations 
> instead of sbrk even on the main arena.

I like those ideas, the first two seem to be straight forward to 
integrate. If that's something you guys want, I can write a patch for that?

Thanks,
Moritz

DJ Delorie Nov. 3, 2017, 5:44 p.m. UTC | #3

> I like those ideas, the first two seem to be straight forward to
> integrate. If that's something you guys want, I can write a patch for that?

Assuming we have a fast way to convert to big-endian, I think it would be interesting to make all the size fields big-endian, and see if that affects performance measurably.  I think it would make overflow hacks significantly more difficult.  The MSB alone isn't enough, so a simple rotate is insufficient, as the MSB tends to be zero already on 64-bit platforms.

Alternately, a simple XOR with a magic number means a set-to-zero would un-XOR to a horribly wrong new "size".  Even a fixed magic number would increase hackability significantly, although a per-process one would be better (and more expensive to do at runtime, unfortunately).

Heck, even ~size would be interesting to ponder.  The question is, which operations will break-in attempts have access to?

This will, of course, further break dumped heaps, like emacs, but hopefully we're past that by now.

Florian Weimer Nov. 3, 2017, 7:38 p.m. UTC | #4

On 11/03/2017 06:44 PM, DJ Delorie wrote:
> 
> 
>> I like those ideas, the first two seem to be straight forward to
>> integrate. If that's something you guys want, I can write a patch for that?
> 
> Assuming we have a fast way to convert to big-endian, I think it would be interesting to make all the size fields big-endian, and see if that affects performance measurably.  I think it would make overflow hacks significantly more difficult.

__builtin_bswap64 ((uintpr_t)p) should be decent.

32-bit will need __builtin_bswap, which is a bit of a wart.

> The MSB alone isn't enough, so a simple rotate is insufficient, as the MSB tends to be zero already on 64-bit platforms.

I'd suggest to rotate by more than a byte if BSWAP is not fast enough.

> Alternately, a simple XOR with a magic number means a set-to-zero would un-XOR to a horribly wrong new "size".  Even a fixed magic number would increase hackability significantly, although a per-process one would be better (and more expensive to do at runtime, unfortunately).

See my old heap protector patches.  You could probably swap in bswap in 
place of the encryption, and it will just work.

> Heck, even ~size would be interesting to ponder.  The question is, which operations will break-in attempts have access to?

Most overflows are more than just a single NUL byte, unfortunately.

> This will, of course, further break dumped heaps, like emacs, but hopefully we're past that by now.

Actually, that's not a problem.  I think my heap protector patch simply 
rewrites the dumped chunk headers into the appropriate format.

I will likely be busy with ABI-impacting work for many months to come, 
so I won't finish the heap protector patches anytime soon.

Thanks,
Florian

Moritz Eckert Nov. 3, 2017, 9:56 p.m. UTC | #5

>> Alternately, a simple XOR with a magic number means a set-to-zero 
>> would un-XOR to a horribly wrong new "size".  Even a fixed magic 
>> number would increase hackability significantly, although a 
>> per-process one would be better (and more expensive to do at runtime, 
>> unfortunately).
> 
> See my old heap protector patches.  You could probably swap in bswap in 
> place of the encryption, and it will just work.

Where do I find those patches?

> 
>> Heck, even ~size would be interesting to ponder.  The question is, 
>> which operations will break-in attempts have access to?
> 
> Most overflows are more than just a single NUL byte, unfortunately.
> 
>> This will, of course, further break dumped heaps, like emacs, but 
>> hopefully we're past that by now.
> 
> Actually, that's not a problem.  I think my heap protector patch simply 
> rewrites the dumped chunk headers into the appropriate format.
> 
> I will likely be busy with ABI-impacting work for many months to come, 
> so I won't finish the heap protector patches anytime soon.

I would be interested to take a look at those heap protector patches!
This seems to be promising!
But as of now, regarding my proposed patch, it would prevent the 
Poison-Null-Byte attack immediately and with no performance impact, 
which seems like a good solution until the heap protector is ready.

Thanks,
Moritz

Moritz Eckert Nov. 9, 2017, 11:44 p.m. UTC | #6

Hi,

This mailinglist is pretty busy, so just wanted to do a friendly ping 
about my latest response:-)

Thanks,
Moritz

On 11/03/2017 02:56 PM, Moritz Eckert wrote:
>>> Alternately, a simple XOR with a magic number means a set-to-zero 
>>> would un-XOR to a horribly wrong new "size".  Even a fixed magic 
>>> number would increase hackability significantly, although a 
>>> per-process one would be better (and more expensive to do at runtime, 
>>> unfortunately).
>>
>> See my old heap protector patches.  You could probably swap in bswap 
>> in place of the encryption, and it will just work.
> 
> Where do I find those patches?
> 
>>
>>> Heck, even ~size would be interesting to ponder.  The question is, 
>>> which operations will break-in attempts have access to?
>>
>> Most overflows are more than just a single NUL byte, unfortunately.
>>
>>> This will, of course, further break dumped heaps, like emacs, but 
>>> hopefully we're past that by now.
>>
>> Actually, that's not a problem.  I think my heap protector patch 
>> simply rewrites the dumped chunk headers into the appropriate format.
>>
>> I will likely be busy with ABI-impacting work for many months to come, 
>> so I won't finish the heap protector patches anytime soon.
> 
> I would be interested to take a look at those heap protector patches!
> This seems to be promising!
> But as of now, regarding my proposed patch, it would prevent the 
> Poison-Null-Byte attack immediately and with no performance impact, 
> which seems like a good solution until the heap protector is ready.
> 
> Thanks,
> Moritz

Florian Weimer Aug. 16, 2018, 3:02 p.m. UTC | #7

On 11/03/2017 10:56 PM, Moritz Eckert wrote:
>>> Alternately, a simple XOR with a magic number means a set-to-zero 
>>> would un-XOR to a horribly wrong new "size".  Even a fixed magic 
>>> number would increase hackability significantly, although a 
>>> per-process one would be better (and more expensive to do at runtime, 
>>> unfortunately).
>>
>> See my old heap protector patches.  You could probably swap in bswap 
>> in place of the encryption, and it will just work.
> 
> Where do I find those patches?

I posted them here:

   https://sourceware.org/ml/libc-alpha/2016-10/msg00531.html

There probably has been some code drift, so the patch won't apply as-is.

Florian

diff mbox

Patch

diff --git a/malloc/malloc.c b/malloc/malloc.c
index d3fcadd20e..73fc98d2c3 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -1426,6 +1426,12 @@  typedef struct malloc_chunk *mbinptr;
       }									      \
 }
 
+/* Check if prev_size is consistent with size*/
+#define prev_size_is_sane(p, prevsize) {				   \
+    if (__builtin_expect (prevsize !=  chunksize (p), 0))		      \
+        malloc_printerr ("corrupted size vs. prev_size");		      \
+}
+
 /*
    Indexing
 
@@ -4227,6 +4233,7 @@  _int_free (mstate av, mchunkptr p, int have_lock)
       prevsize = prev_size (p);
       size += prevsize;
       p = chunk_at_offset(p, -((long) prevsize));
+      prev_size_is_sane(p, prevsize)
       unlink(av, p, bck, fwd);
     }
 
@@ -4392,6 +4399,7 @@  static void malloc_consolidate(mstate av)
 	    prevsize = prev_size (p);
 	    size += prevsize;
 	    p = chunk_at_offset(p, -((long) prevsize));
+	    prev_size_is_sane(p, prevsize)
 	    unlink(av, p, bck, fwd);
 	  }