malloc: Fix [BZ #22853] weak randomization on thread arenas.

Message ID f44c10e6-4300-17bf-f279-2c1f9082a7b6@disca.upv.es
State DCO or assignment missing, archived
Headers
Series malloc: Fix [BZ #22853] weak randomization on thread arenas. |

Commit Message

iripoll March 23, 2020, 5:06 p.m. UTC
  Hi!

 > Thanks for the work. However, patch discussion is done through the
 > email list [1].  Could you re-send it on the libc-alpha?
 >
 > It seems that alignment restriction was added originally to simplify
 > the heap_for_ptr macro and I think it should be added on patch
 > description (stating there is no other restriction about arena
 > alignment).

   Regarding the comments in the code, we don't know what would be the
best way to describe the patch. We don't know if the code shall be
commented with respect to the previous code or if it would be better
to describe the operation of the new heap_for_ptr macro (which can be
tricky). Please, add the comments that you think are better.

 > Also, would be possible to add a testcase? I am not sure which
 > approach would be better (a maillist discussion might give some
 > ideas). Maybe by parametrize the mmap entropy information you
 > compiled on arch-specific files (similar to the stackinfo.h), create
 > N thread that run M allocation, record the output, and compare the
 > histogram with the expected entropy with a certain margin.

   The patch (in the mail) contains both the fix and a new test
[tst-arena-aslr.c] that checks the effective randomization of the
malloc on threads and compares it with that of the mmap. That way
the test shall be platform independent.

   Although the security issue affects the entropy of the dynamic
memory requested by the threads, it is necessary to launch 1000
processes to have different memory layouts samples. Remember that
ASLR does not change on a per processes basis (fork) but it is
renewed every new binary image (exec).

 > Finally, please also add the outcome of the
 > bench-malloc-{simple,thread} with and without the patch.


   The output of the test is:

--------  If failed (three lines) ------
FAIL: malloc arena ASLR.
   Expected value: 29 random bits
   But got:        15 random bits

--------  If passed (one line) -------
PASS: malloc arena ASLR.

   The test is passed if the random bits of the thread malloc is
greater or equal to the random bits of mmap(). With a small
tolerance. This way (testing greater or equal than) the test will
also pass if additional entropy is add to the heaps in the future.

   Important: the test does not estimate the entropy accurately! because
it is not the goal of the test. A more accurate estimation would need
much more samples and a more complex algorithm, which is heavy. In fact,
the actual entropy is 28 bits and 14 in x86_64, but the test gives 29
and 15 for mmap and thread's mallocs respectively.


Regards,
Ismael and Hector.


https://sourceware.org/bugzilla/show_bug.cgi?id=22853
Signed-off-by: Ismael Ripoll <iripoll@disca.upv.es>
Co-Authored-By: Hector Marco-Gisbert <hmarco@hmarco.org>
---
  malloc/Makefile         |   3 +
  malloc/arena.c          |  25 ++++-
  malloc/tst-arena-aslr.c | 213 ++++++++++++++++++++++++++++++++++++++++
  3 files changed, 236 insertions(+), 5 deletions(-)
  create mode 100644 malloc/tst-arena-aslr.c
  

Comments

Adhemerval Zanella March 23, 2020, 8:14 p.m. UTC | #1
On 23/03/2020 14:06, iripoll wrote:
> Hi!
> 
>> Thanks for the work. However, patch discussion is done through the
>> email list [1].  Could you re-send it on the libc-alpha?
>>
>> It seems that alignment restriction was added originally to simplify
>> the heap_for_ptr macro and I think it should be added on patch
>> description (stating there is no other restriction about arena
>> alignment).
> 
>   Regarding the comments in the code, we don't know what would be the
> best way to describe the patch. We don't know if the code shall be
> commented with respect to the previous code or if it would be better
> to describe the operation of the new heap_for_ptr macro (which can be
> tricky). Please, add the comments that you think are better.

Thanks for follow this up, in fact I was about to send this very
patch in your behalf.  I will comment below.

> 
>> Also, would be possible to add a testcase? I am not sure which
>> approach would be better (a maillist discussion might give some
>> ideas). Maybe by parametrize the mmap entropy information you
>> compiled on arch-specific files (similar to the stackinfo.h), create
>> N thread that run M allocation, record the output, and compare the
>> histogram with the expected entropy with a certain margin.
> 
>   The patch (in the mail) contains both the fix and a new test
> [tst-arena-aslr.c] that checks the effective randomization of the
> malloc on threads and compares it with that of the mmap. That way
> the test shall be platform independent.
> 
>   Although the security issue affects the entropy of the dynamic
> memory requested by the threads, it is necessary to launch 1000
> processes to have different memory layouts samples. Remember that
> ASLR does not change on a per processes basis (fork) but it is
> renewed every new binary image (exec).

Right, we might need to handle and tune if it takes too much
time (specially on some architectures).

> 
>> Finally, please also add the outcome of the
>> bench-malloc-{simple,thread} with and without the patch.
> 
> 
>   The output of the test is:
> 
> --------  If failed (three lines) ------
> FAIL: malloc arena ASLR.
>   Expected value: 29 random bits
>   But got:        15 random bits
> 
> --------  If passed (one line) -------
> PASS: malloc arena ASLR.
> 
>   The test is passed if the random bits of the thread malloc is
> greater or equal to the random bits of mmap(). With a small
> tolerance. This way (testing greater or equal than) the test will
> also pass if additional entropy is add to the heaps in the future.
> 
>   Important: the test does not estimate the entropy accurately! because
> it is not the goal of the test. A more accurate estimation would need
> much more samples and a more complex algorithm, which is heavy. In fact,
> the actual entropy is 28 bits and 14 in x86_64, but the test gives 29
> and 15 for mmap and thread's mallocs respectively.

Alright, the main issue is not that it calculates entropy inaccurately,
but if it accuses false-positives. I tried to code something similar,
and checking on the testcase provided it seems too fragile to give
meaningful results. So I think we should dropped it for now.

> 
> 
> Regards,
> Ismael and Hector.
> 
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=22853
> Signed-off-by: Ismael Ripoll <iripoll@disca.upv.es>
> Co-Authored-By: Hector Marco-Gisbert <hmarco@hmarco.org>
> ---
>  malloc/Makefile         |   3 +
>  malloc/arena.c          |  25 ++++-
>  malloc/tst-arena-aslr.c | 213 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 236 insertions(+), 5 deletions(-)
>  create mode 100644 malloc/tst-arena-aslr.c
> 
> diff --git a/malloc/Makefile b/malloc/Makefile
> index 984045b5b9..dc845b1c24 100644
> --- a/malloc/Makefile
> +++ b/malloc/Makefile
> @@ -39,6 +39,8 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
>       tst-malloc-too-large \
>       tst-malloc-stats-cancellation \
>       tst-tcfree1 tst-tcfree2 tst-tcfree3 \
> +     tst-arena-aslr \
> +
> 
>  tests-static := \
>       tst-interpose-static-nothread \

No '\' on last item.

> @@ -99,6 +101,7 @@ $(objpfx)tst-malloc-thread-exit: $(shared-thread-library)
>  $(objpfx)tst-malloc-thread-fail: $(shared-thread-library)
>  $(objpfx)tst-malloc-fork-deadlock: $(shared-thread-library)
>  $(objpfx)tst-malloc-stats-cancellation: $(shared-thread-library)
> +$(objpfx)tst-arena-aslr: $(shared-thread-library)
> 
>  # Export the __malloc_initialize_hook variable to libc.so.
>  LDFLAGS-tst-mallocstate = -rdynamic

Ok.

> diff --git a/malloc/arena.c b/malloc/arena.c
> index cecdb7f4c4..416559e269 100644
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -122,10 +122,15 @@ int __malloc_initialized = -1;
>          ptr = arena_get2 ((size), NULL);                      \
>    } while (0)
> 
> -/* find the heap and corresponding arena for a given ptr */
> -
> +/* find the heap and corresponding arena for a given ptr.  Note that
> +   heap_info is not HEAP_MAX_SIZE aligned any more. But a random
> +   offset from the expected alignment, known by the process. This way
> +   it is fast to get the head of the area and be ASLR friendly.
> +*/

Style: capitalize 'style' and two space after period.

> +static unsigned long __arena_rnd;

The variable is static, so no need to double underscore to handle
namespace issues.  Also I think we should use uintptr_t for its
type.

>  #define heap_for_ptr(ptr) \
> -  ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1)))
> +  ((heap_info *) ((((unsigned long) ptr-__arena_rnd) & ~(HEAP_MAX_SIZE - 1)) \
> +                  |  __arena_rnd))
>  #define arena_for_chunk(ptr) \
>    (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr)
> 

I would prefer if we use static inline functions instead. For the
version I was intended to send, I used:

  static inline heap_info *
  heap_for_ptr (mchunkptr ptr)
  {
    return (heap_info *) ((((uintptr_t) ptr - arena_rnd)
                          & ~(HEAP_MAX_SIZE - 1)) | arena_rnd);
  }
  
  static inline struct malloc_state *
  arena_for_chunk (mchunkptr ptr)
  {
   return chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr;
  }


> @@ -293,6 +298,11 @@ ptmalloc_init (void)
> 
>    __malloc_initialized = 0;
> 
> +  size_t pagesize = GLRO (dl_pagesize);
> +  /* Get the entropy from the already existing ASLR. */
> +  __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) &
> +                ~(pagesize-1);
> +
>  #ifdef SHARED
>    /* In case this libc copy is in a non-default namespace, never use brk.
>       Likewise if dlopened from statically linked program.  */

The mmap arena size would be at most HEAP_MAX_SIZE << 1, which is
134217728 for 64 bit or 2097152 for 32 bit and the heap_info size
is 32 bytes on 64 bits and 16 on 32 bits.  So it would requires 
at least 22 bits for 64 bits and 17 bits for 32 bits to fully
randomize heap_info (if I didn't make any confusion here).

And relying on ASLR entropy solely covers it only for some architectures,
with the issue that it does not help for architectures that do not
support ARCH_HAS_ELF_RANDOMIZE.

So I think it would be better to use some bits from AT_RANDOM and maybe
refactor dl-osinfo.h to handle it transparently (currently it uses only 
up to 64 bits for stack check and pointer guard).  

Another option would be to either refactor Florian suggestion to use
SHA256 with AT_RANDOM, which gives us some more bits to work.  

In a long term I think the better alternative is to have a more robust
source of entropy, such as the arc4random.

Also, some style issues (like space after & and between operators).

[1] https://sourceware.org/pipermail/libc-alpha/2020-March/112042.html

> @@ -490,6 +500,11 @@ new_heap (size_t size, size_t top_pad)
>          {
>            p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1))
>                           & ~(HEAP_MAX_SIZE - 1));
> +          /* The heap_info is at a random offset from the alignment to
> +             HEAP_MAX_SIZE. */
> +          p2 = (char *) ((unsigned long) p2 | __arena_rnd);
> +          if (p1 + HEAP_MAX_SIZE <= p2)
> +            p2 -= HEAP_MAX_SIZE;
>            ul = p2 - p1;
>            if (ul)
>              __munmap (p1, ul);

Ok.

> @@ -500,12 +515,12 @@ new_heap (size_t size, size_t top_pad)
>        else
>          {
>            /* Try to take the chance that an allocation of only HEAP_MAX_SIZE
> -             is already aligned. */
> +             is already aligned to __arena_rnd. */
>            p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE);
>            if (p2 == MAP_FAILED)
>              return 0;
> 
> -          if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1))
> +          if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != __arena_rnd)
>              {
>                __munmap (p2, HEAP_MAX_SIZE);
>                return 0;

Ok.

> diff --git a/malloc/tst-arena-aslr.c b/malloc/tst-arena-aslr.c
> new file mode 100644
> index 0000000000..c285111e1c
> --- /dev/null
> +++ b/malloc/tst-arena-aslr.c
> @@ -0,0 +1,213 @@
> +/* Bug 22853: aslr on thread's malloc weakened due to arena alignment.
> +   https://sourceware.org/bugzilla/show_bug.cgi?id=22853

No need to reference the full bug report link.

> +   Copyright (C) 2020 Free Software Foundation.
> +   Copying and distribution of this file, with or without modification,
> +   are permitted in any medium without royalty provided the copyright
> +   notice and this notice are preserved. This file is offered as-is,
> +   without any warranty.  */

This Copyright disclaimer does not follow the model one used for
newer files.

> +
> +#include <assert.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <libc-diag.h>
> +#include <pthread.h>
> +#include <string.h>
> +#include <sys/mman.h>
> +
> +/* Bits that are considered to be acceptable: */
> +#define TOLERANCE 1
> +/* Samples to be run but statisticaly accurate: */
> +#define NR_SAMPLES 1000
> +
> +/* Size to be allocated in the thread's arena pool. */
> +#define MALLOC_ALLOCATED_SIZE  400
> +
> +/* Data structure sent by the sampler to the tester. */
> +struct sample_t
> +{
> +  size_t obj_mmap;
> +  size_t obj_arena;
> +} sample;
> +
> +size_t pagesize;
> +
> +static void *
> +thread_main(void * dummy)
> +{
> +    sample.obj_arena = (size_t) malloc (MALLOC_ALLOCATED_SIZE);
> +    pthread_exit(0);
> +}
> +
> +/* Creates a thread to sample malloc and mmap addresses, and writes
> +   the result to stdout. */
> +static void
> +sampler(void)
> +{
> +    int ret;
> +    pthread_t th;
> +    pthread_attr_t attrs;
> +
> +    ret = pthread_attr_init (&attrs);
> +    if (0 != ret)
> +      {
> +        perror ("pthread_attr_init");
> +        abort ();
> +      }
> +    sample.obj_mmap = (size_t)mmap( (void *)0x0, pagesize,
> +                                    PROT_READ | PROT_WRITE,
> +                                    MAP_PRIVATE | MAP_ANONYMOUS, -1 ,0);
> +    if (!sample.obj_mmap)
> +      {
> +        perror ("mmap");
> +        abort ();
> +      }
> +    if (pthread_create (&th, &attrs, thread_main, NULL))
> +      {
> +        perror ("pthread_create");
> +        abort ();
> +      }
> +    if (pthread_join (th, NULL))
> +      {
> +        perror ("pthread_join");
> +        abort ();
> +      }
> +    if (sizeof (sample) != write (1, &sample, sizeof (sample)))
> +      {
> +        perror ("write sampler");
> +        abort ();
> +      }
> +    exit(0);
> +}
> +
> +
> +struct statitics_t
> +{
> +  size_t min, max, and, or;
> +};
> +
> +static void
> +stats_init (struct statitics_t *st)
> +{
> +  st->min = st->and = ~0;
> +  st->max = st->or = 0;
> +}
> +
> +static void
> +stats_account (struct statitics_t *st, size_t value)
> +{
> +  st->min = value < st->min ? value : st->min;
> +  st->max = value > st->max ? value : st->max;
> +  st->and &= value;
> +  st->or |= value;
> +}
> +
> +/* Flipping bits, this is a "pourman" entroy estimator. There are much
> +   better and accurate entropy estimators, but they need more samples
> +   and mote time... which do not worth it. */
> +static int
> +stats_pmentropy (struct statitics_t *st)
> +{
> +
> +  return __builtin_popcountll(st->and ^ st->or);
> +}
> +
> +#if 0
> +/* Debugging the test. Not needed for production testing */
> +void
> +stats_print (struct statitics_t *st)
> +{
> +  printf ("[%18zx %18zx] %18zx %18zx %18zx %18zx %d\n", st->min, st->max,
> +          (st->max >> 2) + (st->min >> 2),
> +          st->and,
> +          st->or,
> +          st->and ^ st->or,
> +          stats_pmentropy(st));
> +}
> +#endif
> +
> +static int
> +do_test (int argc, char **argv)
> +{
> +  pid_t child;
> +  int tube[2];
> +  int wstatus;
> +  int i, mmap_entropy, arena_entropy;
> +  struct sample_t sample;
> +  struct statitics_t stats_mmap, stats_arena;
> +
> +  pagesize = sysconf (_SC_PAGESIZE);
> +
> +  if ( (2 == argc) && !strcmp (argv[1],"sample"))
> +    {
> +      sampler(); /* The role of this process is to sample one address. */
> +      exit(0);
> +    }
> +
> +  /* The role of this part is to launch NR_SAMPLES samplers and
> +     collect the output. */
> +  assert (pipe(tube) == 0);
> +
> +  stats_init (&stats_mmap);
> +  stats_init (&stats_arena);
> +
> +  /* Collect all the samples. */
> +  for (i = 0; i < NR_SAMPLES ; i++)
> +    {
> +      child = fork ();
> +      if (child < 0)
> +        {
> +          perror ("fork");
> +          exit (1);
> +        }
> +      if (0 == child)
> +        {
> +          assert (dup2(tube[1],1) != -1);
> +          execl (argv[0], argv[0], "sample", NULL);
> +          perror ("execl");
> +          abort ();
> +        }
> +      if (TEMP_FAILURE_RETRY (read(tube[0], &sample, sizeof (sample)))
> +          != sizeof (sample))
> +        {
> +          printf ("Error: can not read from child process\n");
> +      abort ();
> +        }
> +      if (TEMP_FAILURE_RETRY (waitpid (child, &wstatus, 0) != child) ||
> +          (wstatus != 0))
> +        {
> +          printf ("Error: waitpid for child process\n");
> +      abort ();
> +        }
> +      stats_account (&stats_mmap, sample.obj_mmap);
> +      stats_account (&stats_arena, sample.obj_arena);
> +    }
> +
> +  /* Compute the actual entropy and analyze the results. */
> +  mmap_entropy = stats_pmentropy (&stats_mmap);
> +  arena_entropy = stats_pmentropy (&stats_arena);
> +
> +  if ((arena_entropy + TOLERANCE) < mmap_entropy)
> +    {
> +      printf ("FAIL: malloc arena ASLR.\n");
> +      printf ("  Expected value: %d random bits\n", mmap_entropy);
> +      printf ("  But got:        %d random bits\n", arena_entropy);
> +
> +      /* stats_print(&stats_mmap);
> +         stats_print(&stats_arena); */
> +      return 1;
> +    }
> +  printf ("PASS: malloc arena ASLR.\n");
> +  return 0;
> +}
> +
> +#define TEST_FUNCTON_ARGV do_test
> +#include "../test-skeleton.c"> +
> +/* Local Variables: */
> +/* c-file-style: "gnu" */
> +/* c-basic-offset: 2 */
> +/* End: */
  
iripoll March 24, 2020, 12:54 p.m. UTC | #2
El 23/3/20 a las 21:14, Adhemerval Zanella escribió:
 >
 >
 > On 23/03/2020 14:06, iripoll wrote:
 >> Hi!
 >>
 >>> Thanks for the work. However, patch discussion is done through the
 >>> email list [1].  Could you re-send it on the libc-alpha?
 >>>
 >>> It seems that alignment restriction was added originally to simplify
 >>> the heap_for_ptr macro and I think it should be added on patch
 >>> description (stating there is no other restriction about arena
 >>> alignment).
 >>
 >>    Regarding the comments in the code, we don't know what would be the
 >> best way to describe the patch. We don't know if the code shall be
 >> commented with respect to the previous code or if it would be better
 >> to describe the operation of the new heap_for_ptr macro (which can be
 >> tricky). Please, add the comments that you think are better.
 >
 > Thanks for follow this up, in fact I was about to send this very
 > patch in your behalf.  I will comment below.
 >  >>
 >>> Also, would be possible to add a testcase? I am not sure which
 >>> approach would be better (a maillist discussion might give some
 >>> ideas). Maybe by parametrize the mmap entropy information you
 >>> compiled on arch-specific files (similar to the stackinfo.h), create
 >>> N thread that run M allocation, record the output, and compare the
 >>> histogram with the expected entropy with a certain margin.
 >>
 >>    The patch (in the mail) contains both the fix and a new test
 >> [tst-arena-aslr.c] that checks the effective randomization of the
 >> malloc on threads and compares it with that of the mmap. That way
 >> the test shall be platform independent.
 >>
 >>    Although the security issue affects the entropy of the dynamic
 >> memory requested by the threads, it is necessary to launch 1000
 >> processes to have different memory layouts samples. Remember that
 >> ASLR does not change on a per processes basis (fork) but it is
 >> renewed every new binary image (exec).
 >
 > Right, we might need to handle and tune if it takes too much
 > time (specially on some architectures).
 >
 >>
 >>> Finally, please also add the outcome of the
 >>> bench-malloc-{simple,thread} with and without the patch.
 >>
 >>
 >>    The output of the test is:
 >>
 >> --------  If failed (three lines) ------
 >> FAIL: malloc arena ASLR.
 >>    Expected value: 29 random bits
 >>    But got:        15 random bits
 >>
 >> --------  If passed (one line) -------
 >> PASS: malloc arena ASLR.
 >>
 >>    The test is passed if the random bits of the thread malloc is
 >> greater or equal to the random bits of mmap(). With a small
 >> tolerance. This way (testing greater or equal than) the test will
 >> also pass if additional entropy is add to the heaps in the future.
 >>
 >>    Important: the test does not estimate the entropy accurately! because
 >> it is not the goal of the test. A more accurate estimation would need
 >> much more samples and a more complex algorithm, which is heavy. In fact,
 >> the actual entropy is 28 bits and 14 in x86_64, but the test gives 29
 >> and 15 for mmap and thread's mallocs respectively.
 >
 > Alright, the main issue is not that it calculates entropy inaccurately,
 > but if it accuses false-positives. I tried to code something similar,
 > and checking on the testcase provided it seems too fragile to give
 > meaningful results. So I think we should dropped it for now.
 >

   We think that the test shall work on all architectures. We have developed a more complex tool 
which performs a complete statistical analysis of the entropy. We guess that the tst-arena-aslr.c is 
simple but useful.

Here the link to the results of the tool:
http://cybersecurity.upv.es/tools/aslra/aslr-analyzer.html

   In any case, you have the full picture of what is needed in the glibc.

 >>
 >>
 >> Regards,
 >> Ismael and Hector.
 >>
 >>
 >> https://sourceware.org/bugzilla/show_bug.cgi?id=22853
 >> Signed-off-by: Ismael Ripoll <iripoll@disca.upv.es>
 >> Co-Authored-By: Hector Marco-Gisbert <hmarco@hmarco.org>
 >> ---
 >>   malloc/Makefile         |   3 +
 >>   malloc/arena.c          |  25 ++++-
 >>   malloc/tst-arena-aslr.c | 213 ++++++++++++++++++++++++++++++++++++++++
 >>   3 files changed, 236 insertions(+), 5 deletions(-)
 >>   create mode 100644 malloc/tst-arena-aslr.c
 >>
 >> diff --git a/malloc/Makefile b/malloc/Makefile
 >> index 984045b5b9..dc845b1c24 100644
 >> --- a/malloc/Makefile
 >> +++ b/malloc/Makefile
 >> @@ -39,6 +39,8 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
 >>        tst-malloc-too-large \
 >>        tst-malloc-stats-cancellation \
 >>        tst-tcfree1 tst-tcfree2 tst-tcfree3 \
 >> +     tst-arena-aslr \
 >> +
 >>
 >>   tests-static := \
 >>        tst-interpose-static-nothread \
 >
 > No '\' on last item.
 >

Agree. We added the "\" because the rest of the makefile rules ended with "\", which was odd to us.


 >> @@ -99,6 +101,7 @@ $(objpfx)tst-malloc-thread-exit: $(shared-thread-library)
 >>   $(objpfx)tst-malloc-thread-fail: $(shared-thread-library)
 >>   $(objpfx)tst-malloc-fork-deadlock: $(shared-thread-library)
 >>   $(objpfx)tst-malloc-stats-cancellation: $(shared-thread-library)
 >> +$(objpfx)tst-arena-aslr: $(shared-thread-library)
 >>
 >>   # Export the __malloc_initialize_hook variable to libc.so.
 >>   LDFLAGS-tst-mallocstate = -rdynamic
 >
 > Ok.
 >
 >> diff --git a/malloc/arena.c b/malloc/arena.c
 >> index cecdb7f4c4..416559e269 100644
 >> --- a/malloc/arena.c
 >> +++ b/malloc/arena.c
 >> @@ -122,10 +122,15 @@ int __malloc_initialized = -1;
 >>           ptr = arena_get2 ((size), NULL);                      \
 >>     } while (0)
 >>
 >> -/* find the heap and corresponding arena for a given ptr */
 >> -
 >> +/* find the heap and corresponding arena for a given ptr.  Note that
 >> +   heap_info is not HEAP_MAX_SIZE aligned any more. But a random
 >> +   offset from the expected alignment, known by the process. This way
 >> +   it is fast to get the head of the area and be ASLR friendly.
 >> +*/
 >
 > Style: capitalize 'style' and two space after period.

OK!

 >
 >> +static unsigned long __arena_rnd;
 >
 > The variable is static, so no need to double underscore to handle
 > namespace issues.  Also I think we should use uintptr_t for its
 > type.
 >
 >>   #define heap_for_ptr(ptr) \
 >> -  ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1)))
 >> +  ((heap_info *) ((((unsigned long) ptr-__arena_rnd) & ~(HEAP_MAX_SIZE - 1)) \
 >> +                  |  __arena_rnd))
 >>   #define arena_for_chunk(ptr) \
 >>     (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr)
 >>
 >
 > I would prefer if we use static inline functions instead. For the
 > version I was intended to send, I used:
 >
 >    static inline heap_info *
 >    heap_for_ptr (mchunkptr ptr)
 >    {
 >      return (heap_info *) ((((uintptr_t) ptr - arena_rnd)
 >                            & ~(HEAP_MAX_SIZE - 1)) | arena_rnd);
 >    }
 >       static inline struct malloc_state *
 >    arena_for_chunk (mchunkptr ptr)
 >    {
 >     return chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr;
 >    }
 >
 >

Agree. The macros were used since the first versions of the code, now inlines are preferred.

But, we don't know a fix patch shall change the "style" of the code, or it would be better to move 
code improvements into another patch. We decided to preserve the code as much as possible, so that 
the changes can be easily tracked, and reviewed.

Whatever you choose will be ok.


 >> @@ -293,6 +298,11 @@ ptmalloc_init (void)
 >>
 >>     __malloc_initialized = 0;
 >>
 >> +  size_t pagesize = GLRO (dl_pagesize);
 >> +  /* Get the entropy from the already existing ASLR. */
 >> +  __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) &
 >> +                ~(pagesize-1);
 >> +
 >>   #ifdef SHARED
 >>     /* In case this libc copy is in a non-default namespace, never use brk.
 >>        Likewise if dlopened from statically linked program.  */
 >
 > The mmap arena size would be at most HEAP_MAX_SIZE << 1, which is
 > 134217728 for 64 bit or 2097152 for 32 bit and the heap_info size
 > is 32 bytes on 64 bits and 16 on 32 bits.  So it would requires
 > at least 22 bits for 64 bits and 17 bits for 32 bits to fully
 > randomize heap_info (if I didn't make any confusion here).
 >
 > And relying on ASLR entropy solely covers it only for some architectures,
 > with the issue that it does not help for architectures that do not
 > support ARCH_HAS_ELF_RANDOMIZE.
 >
 > So I think it would be better to use some bits from AT_RANDOM and maybe
 > refactor dl-osinfo.h to handle it transparently (currently it uses only
 > up to 64 bits for stack check and pointer guard).
 >
 > Another option would be to either refactor Florian suggestion to use
 > SHA256 with AT_RANDOM, which gives us some more bits to work.
 >
 > In a long term I think the better alternative is to have a more robust
 > source of entropy, such as the arc4random.
 >
 > Also, some style issues (like space after & and between operators).
 >

This is an important issue: where the entropy comes from? and how many entropy shall have the arenas?

Here is why we decided to use the existing entropy of the mmaps.

1.- Is it very fast.
     It does not need to call a hash function.
     Nor consume entropy from the auxv vector.

2.- It is fully transparent backward compatible.
     If the user disables ASLR, then the arenas would be fixed too.

We were tempted to increase the randomization by adding "sub-page" randomization. Some upper page 
bits (from bit 4 or 5 until 12, depending on the architecture alignment) could also be randomized, 
but it will break compatibility and also may cause problems in the future (when larger malloc 
alignment will be required). So we decided to keep this solution.

Also, remember that if the randomization of the arenas are independent of the rest of the memory 
layout, then there must be a mechanism to disable it. It is necessary for debugging.

The idea of the patch is to preserve the existing randomization, by giving a generic solution 
compatible with ASLR and fast arena finding.

For a larger discussion about ASLR, see the work that we started a few years ago (but that we had no 
time to complete):
http://cybersecurity.upv.es/solutions/aslr-ng/aslr-ng.html


 > [1] https://sourceware.org/pipermail/libc-alpha/2020-March/112042.html
 >
 >> @@ -490,6 +500,11 @@ new_heap (size_t size, size_t top_pad)
 >>           {
 >>             p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1))
 >>                            & ~(HEAP_MAX_SIZE - 1));
 >> +          /* The heap_info is at a random offset from the alignment to
 >> +             HEAP_MAX_SIZE. */
 >> +          p2 = (char *) ((unsigned long) p2 | __arena_rnd);
 >> +          if (p1 + HEAP_MAX_SIZE <= p2)
 >> +            p2 -= HEAP_MAX_SIZE;
 >>             ul = p2 - p1;
 >>             if (ul)
 >>               __munmap (p1, ul);
 >
 > Ok.
 >
 >> @@ -500,12 +515,12 @@ new_heap (size_t size, size_t top_pad)
 >>         else
 >>           {
 >>             /* Try to take the chance that an allocation of only HEAP_MAX_SIZE
 >> -             is already aligned. */
 >> +             is already aligned to __arena_rnd. */
 >>             p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE);
 >>             if (p2 == MAP_FAILED)
 >>               return 0;
 >>
 >> -          if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1))
 >> +          if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != __arena_rnd)
 >>               {
 >>                 __munmap (p2, HEAP_MAX_SIZE);
 >>                 return 0;
 >
 > Ok.
 >
 >> diff --git a/malloc/tst-arena-aslr.c b/malloc/tst-arena-aslr.c
 >> new file mode 100644
 >> index 0000000000..c285111e1c
 >> --- /dev/null
 >> +++ b/malloc/tst-arena-aslr.c
 >> @@ -0,0 +1,213 @@
 >> +/* Bug 22853: aslr on thread's malloc weakened due to arena alignment.
 >> +   https://sourceware.org/bugzilla/show_bug.cgi?id=22853
 >
 > No need to reference the full bug report link.

OK.

 >
 >> +   Copyright (C) 2020 Free Software Foundation.
 >> +   Copying and distribution of this file, with or without modification,
 >> +   are permitted in any medium without royalty provided the copyright
 >> +   notice and this notice are preserved. This file is offered as-is,
 >> +   without any warranty.  */
 >
 > This Copyright disclaimer does not follow the model one used for
 > newer files.
 >
 >> +
 >> +#include <assert.h>
 >> +#include <errno.h>
 >> +#include <stdio.h>
 >> +#include <stdlib.h>
 >> +#include <unistd.h>
 >> +#include <sys/types.h>
 >> +#include <sys/wait.h>
 >> +#include <libc-diag.h>
 >> +#include <pthread.h>
 >> +#include <string.h>
 >> +#include <sys/mman.h>
 >> +
 >> +/* Bits that are considered to be acceptable: */
 >> +#define TOLERANCE 1
 >> +/* Samples to be run but statisticaly accurate: */
 >> +#define NR_SAMPLES 1000
 >> +
 >> +/* Size to be allocated in the thread's arena pool. */
 >> +#define MALLOC_ALLOCATED_SIZE  400
 >> +
 >> +/* Data structure sent by the sampler to the tester. */
 >> +struct sample_t
 >> +{
 >> +  size_t obj_mmap;
 >> +  size_t obj_arena;
 >> +} sample;
 >> +
 >> +size_t pagesize;
 >> +
 >> +static void *
 >> +thread_main(void * dummy)
 >> +{
 >> +    sample.obj_arena = (size_t) malloc (MALLOC_ALLOCATED_SIZE);
 >> +    pthread_exit(0);
 >> +}
 >> +
 >> +/* Creates a thread to sample malloc and mmap addresses, and writes
 >> +   the result to stdout. */
 >> +static void
 >> +sampler(void)
 >> +{
 >> +    int ret;
 >> +    pthread_t th;
 >> +    pthread_attr_t attrs;
 >> +
 >> +    ret = pthread_attr_init (&attrs);
 >> +    if (0 != ret)
 >> +      {
 >> +        perror ("pthread_attr_init");
 >> +        abort ();
 >> +      }
 >> +    sample.obj_mmap = (size_t)mmap( (void *)0x0, pagesize,
 >> +                                    PROT_READ | PROT_WRITE,
 >> +                                    MAP_PRIVATE | MAP_ANONYMOUS, -1 ,0);
 >> +    if (!sample.obj_mmap)
 >> +      {
 >> +        perror ("mmap");
 >> +        abort ();
 >> +      }
 >> +    if (pthread_create (&th, &attrs, thread_main, NULL))
 >> +      {
 >> +        perror ("pthread_create");
 >> +        abort ();
 >> +      }
 >> +    if (pthread_join (th, NULL))
 >> +      {
 >> +        perror ("pthread_join");
 >> +        abort ();
 >> +      }
 >> +    if (sizeof (sample) != write (1, &sample, sizeof (sample)))
 >> +      {
 >> +        perror ("write sampler");
 >> +        abort ();
 >> +      }
 >> +    exit(0);
 >> +}
 >> +
 >> +
 >> +struct statitics_t
 >> +{
 >> +  size_t min, max, and, or;
 >> +};
 >> +
 >> +static void
 >> +stats_init (struct statitics_t *st)
 >> +{
 >> +  st->min = st->and = ~0;
 >> +  st->max = st->or = 0;
 >> +}
 >> +
 >> +static void
 >> +stats_account (struct statitics_t *st, size_t value)
 >> +{
 >> +  st->min = value < st->min ? value : st->min;
 >> +  st->max = value > st->max ? value : st->max;
 >> +  st->and &= value;
 >> +  st->or |= value;
 >> +}
 >> +
 >> +/* Flipping bits, this is a "pourman" entroy estimator. There are much
 >> +   better and accurate entropy estimators, but they need more samples
 >> +   and mote time... which do not worth it. */
 >> +static int
 >> +stats_pmentropy (struct statitics_t *st)
 >> +{
 >> +
 >> +  return __builtin_popcountll(st->and ^ st->or);
 >> +}
 >> +
 >> +#if 0
 >> +/* Debugging the test. Not needed for production testing */
 >> +void
 >> +stats_print (struct statitics_t *st)
 >> +{
 >> +  printf ("[%18zx %18zx] %18zx %18zx %18zx %18zx %d\n", st->min, st->max,
 >> +          (st->max >> 2) + (st->min >> 2),
 >> +          st->and,
 >> +          st->or,
 >> +          st->and ^ st->or,
 >> +          stats_pmentropy(st));
 >> +}
 >> +#endif
 >> +
 >> +static int
 >> +do_test (int argc, char **argv)
 >> +{
 >> +  pid_t child;
 >> +  int tube[2];
 >> +  int wstatus;
 >> +  int i, mmap_entropy, arena_entropy;
 >> +  struct sample_t sample;
 >> +  struct statitics_t stats_mmap, stats_arena;
 >> +
 >> +  pagesize = sysconf (_SC_PAGESIZE);
 >> +
 >> +  if ( (2 == argc) && !strcmp (argv[1],"sample"))
 >> +    {
 >> +      sampler(); /* The role of this process is to sample one address. */
 >> +      exit(0);
 >> +    }
 >> +
 >> +  /* The role of this part is to launch NR_SAMPLES samplers and
 >> +     collect the output. */
 >> +  assert (pipe(tube) == 0);
 >> +
 >> +  stats_init (&stats_mmap);
 >> +  stats_init (&stats_arena);
 >> +
 >> +  /* Collect all the samples. */
 >> +  for (i = 0; i < NR_SAMPLES ; i++)
 >> +    {
 >> +      child = fork ();
 >> +      if (child < 0)
 >> +        {
 >> +          perror ("fork");
 >> +          exit (1);
 >> +        }
 >> +      if (0 == child)
 >> +        {
 >> +          assert (dup2(tube[1],1) != -1);
 >> +          execl (argv[0], argv[0], "sample", NULL);
 >> +          perror ("execl");
 >> +          abort ();
 >> +        }
 >> +      if (TEMP_FAILURE_RETRY (read(tube[0], &sample, sizeof (sample)))
 >> +          != sizeof (sample))
 >> +        {
 >> +          printf ("Error: can not read from child process\n");
 >> +      abort ();
 >> +        }
 >> +      if (TEMP_FAILURE_RETRY (waitpid (child, &wstatus, 0) != child) ||
 >> +          (wstatus != 0))
 >> +        {
 >> +          printf ("Error: waitpid for child process\n");
 >> +      abort ();
 >> +        }
 >> +      stats_account (&stats_mmap, sample.obj_mmap);
 >> +      stats_account (&stats_arena, sample.obj_arena);
 >> +    }
 >> +
 >> +  /* Compute the actual entropy and analyze the results. */
 >> +  mmap_entropy = stats_pmentropy (&stats_mmap);
 >> +  arena_entropy = stats_pmentropy (&stats_arena);
 >> +
 >> +  if ((arena_entropy + TOLERANCE) < mmap_entropy)
 >> +    {
 >> +      printf ("FAIL: malloc arena ASLR.\n");
 >> +      printf ("  Expected value: %d random bits\n", mmap_entropy);
 >> +      printf ("  But got:        %d random bits\n", arena_entropy);
 >> +
 >> +      /* stats_print(&stats_mmap);
 >> +         stats_print(&stats_arena); */
 >> +      return 1;
 >> +    }
 >> +  printf ("PASS: malloc arena ASLR.\n");
 >> +  return 0;
 >> +}
 >> +
 >> +#define TEST_FUNCTON_ARGV do_test
 >> +#include "../test-skeleton.c"> +
 >> +/* Local Variables: */
 >> +/* c-file-style: "gnu" */
 >> +/* c-basic-offset: 2 */
 >> +/* End: */
  
Adhemerval Zanella March 24, 2020, 1:26 p.m. UTC | #3
On 24/03/2020 09:54, iripoll wrote:
> El 23/3/20 a las 21:14, Adhemerval Zanella escribió:
>>
>>
>> On 23/03/2020 14:06, iripoll wrote:
>>> Hi!
>>>
>>>> Thanks for the work. However, patch discussion is done through the
>>>> email list [1].  Could you re-send it on the libc-alpha?
>>>>
>>>> It seems that alignment restriction was added originally to simplify
>>>> the heap_for_ptr macro and I think it should be added on patch
>>>> description (stating there is no other restriction about arena
>>>> alignment).
>>>
>>>    Regarding the comments in the code, we don't know what would be the
>>> best way to describe the patch. We don't know if the code shall be
>>> commented with respect to the previous code or if it would be better
>>> to describe the operation of the new heap_for_ptr macro (which can be
>>> tricky). Please, add the comments that you think are better.
>>
>> Thanks for follow this up, in fact I was about to send this very
>> patch in your behalf.  I will comment below.
>>  >>
>>>> Also, would be possible to add a testcase? I am not sure which
>>>> approach would be better (a maillist discussion might give some
>>>> ideas). Maybe by parametrize the mmap entropy information you
>>>> compiled on arch-specific files (similar to the stackinfo.h), create
>>>> N thread that run M allocation, record the output, and compare the
>>>> histogram with the expected entropy with a certain margin.
>>>
>>>    The patch (in the mail) contains both the fix and a new test
>>> [tst-arena-aslr.c] that checks the effective randomization of the
>>> malloc on threads and compares it with that of the mmap. That way
>>> the test shall be platform independent.
>>>
>>>    Although the security issue affects the entropy of the dynamic
>>> memory requested by the threads, it is necessary to launch 1000
>>> processes to have different memory layouts samples. Remember that
>>> ASLR does not change on a per processes basis (fork) but it is
>>> renewed every new binary image (exec).
>>
>> Right, we might need to handle and tune if it takes too much
>> time (specially on some architectures).
>>
>>>
>>>> Finally, please also add the outcome of the
>>>> bench-malloc-{simple,thread} with and without the patch.
>>>
>>>
>>>    The output of the test is:
>>>
>>> --------  If failed (three lines) ------
>>> FAIL: malloc arena ASLR.
>>>    Expected value: 29 random bits
>>>    But got:        15 random bits
>>>
>>> --------  If passed (one line) -------
>>> PASS: malloc arena ASLR.
>>>
>>>    The test is passed if the random bits of the thread malloc is
>>> greater or equal to the random bits of mmap(). With a small
>>> tolerance. This way (testing greater or equal than) the test will
>>> also pass if additional entropy is add to the heaps in the future.
>>>
>>>    Important: the test does not estimate the entropy accurately! because
>>> it is not the goal of the test. A more accurate estimation would need
>>> much more samples and a more complex algorithm, which is heavy. In fact,
>>> the actual entropy is 28 bits and 14 in x86_64, but the test gives 29
>>> and 15 for mmap and thread's mallocs respectively.
>>
>> Alright, the main issue is not that it calculates entropy inaccurately,
>> but if it accuses false-positives. I tried to code something similar,
>> and checking on the testcase provided it seems too fragile to give
>> meaningful results. So I think we should dropped it for now.
>>
> 
>   We think that the test shall work on all architectures. We have developed a more complex tool which performs a complete statistical analysis of the entropy. We guess that the tst-arena-aslr.c is simple but useful.
> 
> Here the link to the results of the tool:
> http://cybersecurity.upv.es/tools/aslra/aslr-analyzer.html
> 
>   In any case, you have the full picture of what is needed in the glibc.

We usually try to validate the kernel interface, but hardly to validate
the kernel semantic itself.  The test, for instance, fails on the smoke
tests I did on my system (stock ubuntu 18 on x86_64). And you also need
to take in consideration system that either does not have ASLR enabled
or don't have support at all.  

I though at first it should be doable, but the required data and post
analysis does not really fit for a glibc testcase.  Maybe we can add the
tool link in commit message and briefly explain what it intend to do,
although I would only make sense once it is upstream (since on the link
you posted the tool is still not released).

> 
>>>
>>>
>>> Regards,
>>> Ismael and Hector.
>>>
>>>
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=22853
>>> Signed-off-by: Ismael Ripoll <iripoll@disca.upv.es>
>>> Co-Authored-By: Hector Marco-Gisbert <hmarco@hmarco.org>
>>> ---
>>>   malloc/Makefile         |   3 +
>>>   malloc/arena.c          |  25 ++++-
>>>   malloc/tst-arena-aslr.c | 213 ++++++++++++++++++++++++++++++++++++++++
>>>   3 files changed, 236 insertions(+), 5 deletions(-)
>>>   create mode 100644 malloc/tst-arena-aslr.c
>>>
>>> diff --git a/malloc/Makefile b/malloc/Makefile
>>> index 984045b5b9..dc845b1c24 100644
>>> --- a/malloc/Makefile
>>> +++ b/malloc/Makefile
>>> @@ -39,6 +39,8 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
>>>        tst-malloc-too-large \
>>>        tst-malloc-stats-cancellation \
>>>        tst-tcfree1 tst-tcfree2 tst-tcfree3 \
>>> +     tst-arena-aslr \
>>> +
>>>
>>>   tests-static := \
>>>        tst-interpose-static-nothread \
>>
>> No '\' on last item.
>>
> 
> Agree. We added the "\" because the rest of the makefile rules ended with "\", which was odd to us.
> 
> 
>>> @@ -99,6 +101,7 @@ $(objpfx)tst-malloc-thread-exit: $(shared-thread-library)
>>>   $(objpfx)tst-malloc-thread-fail: $(shared-thread-library)
>>>   $(objpfx)tst-malloc-fork-deadlock: $(shared-thread-library)
>>>   $(objpfx)tst-malloc-stats-cancellation: $(shared-thread-library)
>>> +$(objpfx)tst-arena-aslr: $(shared-thread-library)
>>>
>>>   # Export the __malloc_initialize_hook variable to libc.so.
>>>   LDFLAGS-tst-mallocstate = -rdynamic
>>
>> Ok.
>>
>>> diff --git a/malloc/arena.c b/malloc/arena.c
>>> index cecdb7f4c4..416559e269 100644
>>> --- a/malloc/arena.c
>>> +++ b/malloc/arena.c
>>> @@ -122,10 +122,15 @@ int __malloc_initialized = -1;
>>>           ptr = arena_get2 ((size), NULL);                      \
>>>     } while (0)
>>>
>>> -/* find the heap and corresponding arena for a given ptr */
>>> -
>>> +/* find the heap and corresponding arena for a given ptr.  Note that
>>> +   heap_info is not HEAP_MAX_SIZE aligned any more. But a random
>>> +   offset from the expected alignment, known by the process. This way
>>> +   it is fast to get the head of the area and be ASLR friendly.
>>> +*/
>>
>> Style: capitalize 'style' and two space after period.
> 
> OK!
> 
>>
>>> +static unsigned long __arena_rnd;
>>
>> The variable is static, so no need to double underscore to handle
>> namespace issues.  Also I think we should use uintptr_t for its
>> type.
>>
>>>   #define heap_for_ptr(ptr) \
>>> -  ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1)))
>>> +  ((heap_info *) ((((unsigned long) ptr-__arena_rnd) & ~(HEAP_MAX_SIZE - 1)) \
>>> +                  |  __arena_rnd))
>>>   #define arena_for_chunk(ptr) \
>>>     (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr)
>>>
>>
>> I would prefer if we use static inline functions instead. For the
>> version I was intended to send, I used:
>>
>>    static inline heap_info *
>>    heap_for_ptr (mchunkptr ptr)
>>    {
>>      return (heap_info *) ((((uintptr_t) ptr - arena_rnd)
>>                            & ~(HEAP_MAX_SIZE - 1)) | arena_rnd);
>>    }
>>       static inline struct malloc_state *
>>    arena_for_chunk (mchunkptr ptr)
>>    {
>>     return chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr;
>>    }
>>
>>
> 
> Agree. The macros were used since the first versions of the code, now inlines are preferred.
> 
> But, we don't know a fix patch shall change the "style" of the code, or it would be better to move code improvements into another patch. We decided to preserve the code as much as possible, so that the changes can be easily tracked, and reviewed.
> 
> Whatever you choose will be ok.

I would prefer to use static inline, it provided a better semantic
for compiler analysis and possible debug.

> 
> 
>>> @@ -293,6 +298,11 @@ ptmalloc_init (void)
>>>
>>>     __malloc_initialized = 0;
>>>
>>> +  size_t pagesize = GLRO (dl_pagesize);
>>> +  /* Get the entropy from the already existing ASLR. */
>>> +  __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) &
>>> +                ~(pagesize-1);
>>> +
>>>   #ifdef SHARED
>>>     /* In case this libc copy is in a non-default namespace, never use brk.
>>>        Likewise if dlopened from statically linked program.  */
>>
>> The mmap arena size would be at most HEAP_MAX_SIZE << 1, which is
>> 134217728 for 64 bit or 2097152 for 32 bit and the heap_info size
>> is 32 bytes on 64 bits and 16 on 32 bits.  So it would requires
>> at least 22 bits for 64 bits and 17 bits for 32 bits to fully
>> randomize heap_info (if I didn't make any confusion here).
>>
>> And relying on ASLR entropy solely covers it only for some architectures,
>> with the issue that it does not help for architectures that do not
>> support ARCH_HAS_ELF_RANDOMIZE.
>>
>> So I think it would be better to use some bits from AT_RANDOM and maybe
>> refactor dl-osinfo.h to handle it transparently (currently it uses only
>> up to 64 bits for stack check and pointer guard).
>>
>> Another option would be to either refactor Florian suggestion to use
>> SHA256 with AT_RANDOM, which gives us some more bits to work.
>>
>> In a long term I think the better alternative is to have a more robust
>> source of entropy, such as the arc4random.
>>
>> Also, some style issues (like space after & and between operators).
>>
> 
> This is an important issue: where the entropy comes from? and how many entropy shall have the arenas?
> 
> Here is why we decided to use the existing entropy of the mmaps.
> 
> 1.- Is it very fast.
>     It does not need to call a hash function.
>     Nor consume entropy from the auxv vector.
> 
> 2.- It is fully transparent backward compatible.
>     If the user disables ASLR, then the arenas would be fixed too.
> 
> We were tempted to increase the randomization by adding "sub-page" randomization. Some upper page bits (from bit 4 or 5 until 12, depending on the architecture alignment) could also be randomized, but it will break compatibility and also may cause problems in the future (when larger malloc alignment will be required). So we decided to keep this solution.


My main issue about using ASLR entropy is it provides different values
depending of the underlying architecture and kernel configuration. And
glibc still support some architecture that do not support ASLR at all.
And, as you pointed out, even for some ASLR enabled architectures the
entropy most kernels provides is low.

That's why I would prefer a platform-neutral solution.  As I pointed
out some bits from AT_RANDOM are still unused, and I don't see why
we can't use it here. 

> 
> Also, remember that if the randomization of the arenas are independent of the rest of the memory layout, then there must be a mechanism to disable it. It is necessary for debugging.

For such scenario we might disable randomization with a tunable.

> 
> The idea of the patch is to preserve the existing randomization, by giving a generic solution compatible with ASLR and fast arena finding.
> 
> For a larger discussion about ASLR, see the work that we started a few years ago (but that we had no time to complete):
> http://cybersecurity.upv.es/solutions/aslr-ng/aslr-ng.html
> 
> 
>> [1] https://sourceware.org/pipermail/libc-alpha/2020-March/112042.html
>>
>>> @@ -490,6 +500,11 @@ new_heap (size_t size, size_t top_pad)
>>>           {
>>>             p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1))
>>>                            & ~(HEAP_MAX_SIZE - 1));
>>> +          /* The heap_info is at a random offset from the alignment to
>>> +             HEAP_MAX_SIZE. */
>>> +          p2 = (char *) ((unsigned long) p2 | __arena_rnd);
>>> +          if (p1 + HEAP_MAX_SIZE <= p2)
>>> +            p2 -= HEAP_MAX_SIZE;
>>>             ul = p2 - p1;
>>>             if (ul)
>>>               __munmap (p1, ul);
>>
>> Ok.
>>
>>> @@ -500,12 +515,12 @@ new_heap (size_t size, size_t top_pad)
>>>         else
>>>           {
>>>             /* Try to take the chance that an allocation of only HEAP_MAX_SIZE
>>> -             is already aligned. */
>>> +             is already aligned to __arena_rnd. */
>>>             p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE);
>>>             if (p2 == MAP_FAILED)
>>>               return 0;
>>>
>>> -          if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1))
>>> +          if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != __arena_rnd)
>>>               {
>>>                 __munmap (p2, HEAP_MAX_SIZE);
>>>                 return 0;
>>
>> Ok.
>>
>>> diff --git a/malloc/tst-arena-aslr.c b/malloc/tst-arena-aslr.c
>>> new file mode 100644
>>> index 0000000000..c285111e1c
>>> --- /dev/null
>>> +++ b/malloc/tst-arena-aslr.c
>>> @@ -0,0 +1,213 @@
>>> +/* Bug 22853: aslr on thread's malloc weakened due to arena alignment.
>>> +   https://sourceware.org/bugzilla/show_bug.cgi?id=22853
>>
>> No need to reference the full bug report link.
> 
> OK.
> 
>>
>>> +   Copyright (C) 2020 Free Software Foundation.
>>> +   Copying and distribution of this file, with or without modification,
>>> +   are permitted in any medium without royalty provided the copyright
>>> +   notice and this notice are preserved. This file is offered as-is,
>>> +   without any warranty.  */
>>
>> This Copyright disclaimer does not follow the model one used for
>> newer files.
>>
>>> +
>>> +#include <assert.h>
>>> +#include <errno.h>
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +#include <unistd.h>
>>> +#include <sys/types.h>
>>> +#include <sys/wait.h>
>>> +#include <libc-diag.h>
>>> +#include <pthread.h>
>>> +#include <string.h>
>>> +#include <sys/mman.h>
>>> +
>>> +/* Bits that are considered to be acceptable: */
>>> +#define TOLERANCE 1
>>> +/* Samples to be run but statisticaly accurate: */
>>> +#define NR_SAMPLES 1000
>>> +
>>> +/* Size to be allocated in the thread's arena pool. */
>>> +#define MALLOC_ALLOCATED_SIZE  400
>>> +
>>> +/* Data structure sent by the sampler to the tester. */
>>> +struct sample_t
>>> +{
>>> +  size_t obj_mmap;
>>> +  size_t obj_arena;
>>> +} sample;
>>> +
>>> +size_t pagesize;
>>> +
>>> +static void *
>>> +thread_main(void * dummy)
>>> +{
>>> +    sample.obj_arena = (size_t) malloc (MALLOC_ALLOCATED_SIZE);
>>> +    pthread_exit(0);
>>> +}
>>> +
>>> +/* Creates a thread to sample malloc and mmap addresses, and writes
>>> +   the result to stdout. */
>>> +static void
>>> +sampler(void)
>>> +{
>>> +    int ret;
>>> +    pthread_t th;
>>> +    pthread_attr_t attrs;
>>> +
>>> +    ret = pthread_attr_init (&attrs);
>>> +    if (0 != ret)
>>> +      {
>>> +        perror ("pthread_attr_init");
>>> +        abort ();
>>> +      }
>>> +    sample.obj_mmap = (size_t)mmap( (void *)0x0, pagesize,
>>> +                                    PROT_READ | PROT_WRITE,
>>> +                                    MAP_PRIVATE | MAP_ANONYMOUS, -1 ,0);
>>> +    if (!sample.obj_mmap)
>>> +      {
>>> +        perror ("mmap");
>>> +        abort ();
>>> +      }
>>> +    if (pthread_create (&th, &attrs, thread_main, NULL))
>>> +      {
>>> +        perror ("pthread_create");
>>> +        abort ();
>>> +      }
>>> +    if (pthread_join (th, NULL))
>>> +      {
>>> +        perror ("pthread_join");
>>> +        abort ();
>>> +      }
>>> +    if (sizeof (sample) != write (1, &sample, sizeof (sample)))
>>> +      {
>>> +        perror ("write sampler");
>>> +        abort ();
>>> +      }
>>> +    exit(0);
>>> +}
>>> +
>>> +
>>> +struct statitics_t
>>> +{
>>> +  size_t min, max, and, or;
>>> +};
>>> +
>>> +static void
>>> +stats_init (struct statitics_t *st)
>>> +{
>>> +  st->min = st->and = ~0;
>>> +  st->max = st->or = 0;
>>> +}
>>> +
>>> +static void
>>> +stats_account (struct statitics_t *st, size_t value)
>>> +{
>>> +  st->min = value < st->min ? value : st->min;
>>> +  st->max = value > st->max ? value : st->max;
>>> +  st->and &= value;
>>> +  st->or |= value;
>>> +}
>>> +
>>> +/* Flipping bits, this is a "pourman" entroy estimator. There are much
>>> +   better and accurate entropy estimators, but they need more samples
>>> +   and mote time... which do not worth it. */
>>> +static int
>>> +stats_pmentropy (struct statitics_t *st)
>>> +{
>>> +
>>> +  return __builtin_popcountll(st->and ^ st->or);
>>> +}
>>> +
>>> +#if 0
>>> +/* Debugging the test. Not needed for production testing */
>>> +void
>>> +stats_print (struct statitics_t *st)
>>> +{
>>> +  printf ("[%18zx %18zx] %18zx %18zx %18zx %18zx %d\n", st->min, st->max,
>>> +          (st->max >> 2) + (st->min >> 2),
>>> +          st->and,
>>> +          st->or,
>>> +          st->and ^ st->or,
>>> +          stats_pmentropy(st));
>>> +}
>>> +#endif
>>> +
>>> +static int
>>> +do_test (int argc, char **argv)
>>> +{
>>> +  pid_t child;
>>> +  int tube[2];
>>> +  int wstatus;
>>> +  int i, mmap_entropy, arena_entropy;
>>> +  struct sample_t sample;
>>> +  struct statitics_t stats_mmap, stats_arena;
>>> +
>>> +  pagesize = sysconf (_SC_PAGESIZE);
>>> +
>>> +  if ( (2 == argc) && !strcmp (argv[1],"sample"))
>>> +    {
>>> +      sampler(); /* The role of this process is to sample one address. */
>>> +      exit(0);
>>> +    }
>>> +
>>> +  /* The role of this part is to launch NR_SAMPLES samplers and
>>> +     collect the output. */
>>> +  assert (pipe(tube) == 0);
>>> +
>>> +  stats_init (&stats_mmap);
>>> +  stats_init (&stats_arena);
>>> +
>>> +  /* Collect all the samples. */
>>> +  for (i = 0; i < NR_SAMPLES ; i++)
>>> +    {
>>> +      child = fork ();
>>> +      if (child < 0)
>>> +        {
>>> +          perror ("fork");
>>> +          exit (1);
>>> +        }
>>> +      if (0 == child)
>>> +        {
>>> +          assert (dup2(tube[1],1) != -1);
>>> +          execl (argv[0], argv[0], "sample", NULL);
>>> +          perror ("execl");
>>> +          abort ();
>>> +        }
>>> +      if (TEMP_FAILURE_RETRY (read(tube[0], &sample, sizeof (sample)))
>>> +          != sizeof (sample))
>>> +        {
>>> +          printf ("Error: can not read from child process\n");
>>> +      abort ();
>>> +        }
>>> +      if (TEMP_FAILURE_RETRY (waitpid (child, &wstatus, 0) != child) ||
>>> +          (wstatus != 0))
>>> +        {
>>> +          printf ("Error: waitpid for child process\n");
>>> +      abort ();
>>> +        }
>>> +      stats_account (&stats_mmap, sample.obj_mmap);
>>> +      stats_account (&stats_arena, sample.obj_arena);
>>> +    }
>>> +
>>> +  /* Compute the actual entropy and analyze the results. */
>>> +  mmap_entropy = stats_pmentropy (&stats_mmap);
>>> +  arena_entropy = stats_pmentropy (&stats_arena);
>>> +
>>> +  if ((arena_entropy + TOLERANCE) < mmap_entropy)
>>> +    {
>>> +      printf ("FAIL: malloc arena ASLR.\n");
>>> +      printf ("  Expected value: %d random bits\n", mmap_entropy);
>>> +      printf ("  But got:        %d random bits\n", arena_entropy);
>>> +
>>> +      /* stats_print(&stats_mmap);
>>> +         stats_print(&stats_arena); */
>>> +      return 1;
>>> +    }
>>> +  printf ("PASS: malloc arena ASLR.\n");
>>> +  return 0;
>>> +}
>>> +
>>> +#define TEST_FUNCTON_ARGV do_test
>>> +#include "../test-skeleton.c"> +
>>> +/* Local Variables: */
>>> +/* c-file-style: "gnu" */
>>> +/* c-basic-offset: 2 */
>>> +/* End: */
  
Hector Marco March 24, 2020, 2:40 p.m. UTC | #4
Hello Adhemerval,

Good to see you are processing this. Please let me share our view about
the issue here:

>
> My main issue about using ASLR entropy is it provides different values
> depending of the underlying architecture and kernel configuration. And
> glibc still support some architecture that do not support ASLR at all.

We don't see the problem here. Can you please elaborate a little bit
more what's wrong? To our understanding (and what our patch does) if an
architecture does not support ASLR then no randomization is applied.

> And, as you pointed out, even for some ASLR enabled architectures the
> entropy most kernels provides is low.

The kernel entropy is not low. Actually the aliment is the one
destroying the entropy but the entropy provided by the kernel is a high
quality entropy one.

>
> That's why I would prefer a platform-neutral solution.  As I pointed
> out some bits from AT_RANDOM are still unused, and I don't see why
> we can't use it here.
>

We don't thing so. The kernel provides 16 random bytes via AT_RANDOM and
on 64-bit architectures those bytes are already used by the stack
smashing protector and the pointer guard glibc. Please look at the
sysdeps/generic/dl-osinfo.h file.

Therefore, we will be introducing a correlation (weakness) and attackers
could use a partial info leak to bypass the canary (among other attacks).

A different history would be to change how we are using the entropy from
AT_RANDOM. As suggested by kernel developers, we could probably use it
to feed a CSPRNG (note that we cannot claim that arc4random is an
entropy source but an entropy collector or intermediary).

We didn't go for this because the obvious overhead introduced. Right now
our solution has zero overhead and it provides good entropy (a
info leak cannot be used to leak anything else than what it is already
known, the mmap area).


>>
>> Also, remember that if the randomization of the arenas are
independent of the rest of the memory layout, then there must be a
mechanism to disable it. It is necessary for debugging.
>
> For such scenario we might disable randomization with a tunable.
>

Yes, we can but as you said, there are architectures that does not
support ASLR and it sounds odd to disable part of the ASLR on
architectures that have no ASLR at all.
  
Adhemerval Zanella March 24, 2020, 2:58 p.m. UTC | #5
On 24/03/2020 11:40, Hector Marco wrote:
> Hello Adhemerval,
> 
> Good to see you are processing this. Please let me share our view about
> the issue here:
> 
>>
>> My main issue about using ASLR entropy is it provides different values
>> depending of the underlying architecture and kernel configuration. And
>> glibc still support some architecture that do not support ASLR at all.
> 
> We don't see the problem here. Can you please elaborate a little bit
> more what's wrong? To our understanding (and what our patch does) if an
> architecture does not support ASLR then no randomization is applied.

My understanding of this idea hardening is to randomize the arenas using
ASLR entropy and my point is to provide such hardening even when kernel
does not provide such ASLR. 

> 
>> And, as you pointed out, even for some ASLR enabled architectures the
>> entropy most kernels provides is low.
> 
> The kernel entropy is not low. Actually the aliment is the one
> destroying the entropy but the entropy provided by the kernel is a high
> quality entropy one.

My point is it might be considered low depending of the architecture and 
kernel configuration, although it might not an issue once the heap 
alignment is not taking in consideration (for instance i686 will provide
8 bits, where 17 would be required to get a fully randomized placement
on the mmap area). So my question how randomized heap_info placement
do we really require here?


> 
>>
>> That's why I would prefer a platform-neutral solution.  As I pointed
>> out some bits from AT_RANDOM are still unused, and I don't see why
>> we can't use it here.
>>
> 
> We don't thing so. The kernel provides 16 random bytes via AT_RANDOM and
> on 64-bit architectures those bytes are already used by the stack
> smashing protector and the pointer guard glibc. Please look at the
> sysdeps/generic/dl-osinfo.h file.

I think you mean sysdeps/unix/sysv/linux/dl-osinfo.h, since it is the
one used for Linux.  And my idea was to use something similar of what
Florian has suggested some time ago [1] to use the AT_RANDOM as a feeder.

[1] https://sourceware.org/legacy-ml/libc-alpha/2016-10/msg00531.html

> 
> Therefore, we will be introducing a correlation (weakness) and attackers
> could use a partial info leak to bypass the canary (among other attacks).
> 
> A different history would be to change how we are using the entropy from
> AT_RANDOM. As suggested by kernel developers, we could probably use it
> to feed a CSPRNG (note that we cannot claim that arc4random is an
> entropy source but an entropy collector or intermediary).
> 
> We didn't go for this because the obvious overhead introduced. Right now
> our solution has zero overhead and it provides good entropy (a
> info leak cannot be used to leak anything else than what it is already
> known, the mmap area).

Alright, my comments are not a blocker to this patch. What I would to
discuss if we should also randomize the arenas for environments that
do not provide ASLR (or limited entropy) and which would be best way
to do it.

> 
> 
>>>
>>> Also, remember that if the randomization of the arenas are
> independent of the rest of the memory layout, then there must be a
> mechanism to disable it. It is necessary for debugging.
>>
>> For such scenario we might disable randomization with a tunable.
>>
> 
> Yes, we can but as you said, there are architectures that does not
> support ASLR and it sounds odd to disable part of the ASLR on
> architectures that have no ASLR at all.
> 
> 
>
  
iripoll March 24, 2020, 5:34 p.m. UTC | #6
Hi Adhemerval,


   Basically there are two issues to consider:

1.- How much entropy shall be added to the arenas.
2.- Where the entropy is taken.


El 24/3/20 a las 15:58, Adhemerval Zanella escribió:
 >
 >
 > On 24/03/2020 11:40, Hector Marco wrote:
 >> Hello Adhemerval,
 >>
 >> Good to see you are processing this. Please let me share our view about
 >> the issue here:
 >>
 >>>
 >>> My main issue about using ASLR entropy is it provides different values
 >>> depending of the underlying architecture and kernel configuration. And
 >>> glibc still support some architecture that do not support ASLR at all.
 >>
 >> We don't see the problem here. Can you please elaborate a little bit
 >> more what's wrong? To our understanding (and what our patch does) if an
 >> architecture does not support ASLR then no randomization is applied.
 >
 > My understanding of this idea hardening is to randomize the arenas using
 > ASLR entropy and my point is to provide such hardening even when kernel
 > does not provide such ASLR.
 >

  The problem with the ASLR is that all the memory zones have to be randomized in order to be really 
effective. Therefore, if the libraries or thread stacks are not randomized, then there is no point 
to randomize the heaps of the threads.

 >>
 >>> And, as you pointed out, even for some ASLR enabled architectures the
 >>> entropy most kernels provides is low.
 >>
 >> The kernel entropy is not low. Actually the aliment is the one
 >> destroying the entropy but the entropy provided by the kernel is a high
 >> quality entropy one.
 >
 > My point is it might be considered low depending of the architecture and
 > kernel configuration, although it might not an issue once the heap
 > alignment is not taking in consideration (for instance i686 will provide
 > 8 bits, where 17 would be required to get a fully randomized placement
 > on the mmap area). So my question how randomized heap_info placement
 > do we really require here?
 >

   ASLR is extremely tricky and subtle issue. It may affect performance (caused by alignment or 
cache row collisions) and memory space (due to extra page table entries, the pagetable is less 
compact). Even more, sometimes the ASLR was used to improve the performance. For example, a few 
random bits were introduced in the thread stacks to improve performance.

   It would be a good exercise to implement the ASLR in the library, but it is a
far more complex issue, which requires a much deeper analysis.


   In summary, we think that it shall have (at least) the same entropy
than the less randomized memory object (the mmaps). This way, an
attacker can not take advantage of it. Which is the goal of the patch.

 >
 >>
 >>>
 >>> That's why I would prefer a platform-neutral solution.  As I pointed
 >>> out some bits from AT_RANDOM are still unused, and I don't see why
 >>> we can't use it here.
 >>>
 >>
 >> We don't thing so. The kernel provides 16 random bytes via AT_RANDOM and
 >> on 64-bit architectures those bytes are already used by the stack
 >> smashing protector and the pointer guard glibc. Please look at the
 >> sysdeps/generic/dl-osinfo.h file.
 >
 > I think you mean sysdeps/unix/sysv/linux/dl-osinfo.h, since it is the
 > one used for Linux.  And my idea was to use something similar of what
 > Florian has suggested some time ago [1] to use the AT_RANDOM as a feeder.
 >
 > [1] https://sourceware.org/legacy-ml/libc-alpha/2016-10/msg00531.html
 >

This is the second problem: Where the entropy comes from.

As you suggest, the entropy can be taken from a CSPRNG. But we shall test/consider
the time overhead.

But if the entropy is not "linked" with the original ASLR, then we will need to enable another 
feature to disable it. Let's see an example:

If I want to run a process without ASLR, now we can:
$ setarch x86_64 -R ./testing

But if we use tunables, then we will need to use something like:

$ GLIBC_TUNABLES=glibc.malloc.arena_randomization=0 setarch x86_64 -R ./testing

Which is possible, but it looks cumbersome and it is hard to explain why we have to disable arenas 
randomization. It may raise the question whether threads stacks can also be disabled or not and why.

   During the development of the patch we analyzed these options, and we discarded them because 
complexity.

   If you think that adding extra randomization bits to the arenas is necessary, then the tunables 
shall be the choice. But as we stated before, we think that it is not necessary.

 >>
 >> Therefore, we will be introducing a correlation (weakness) and attackers
 >> could use a partial info leak to bypass the canary (among other attacks).
 >>
 >> A different history would be to change how we are using the entropy from
 >> AT_RANDOM. As suggested by kernel developers, we could probably use it
 >> to feed a CSPRNG (note that we cannot claim that arc4random is an
 >> entropy source but an entropy collector or intermediary).
 >>
 >> We didn't go for this because the obvious overhead introduced. Right now
 >> our solution has zero overhead and it provides good entropy (a
 >> info leak cannot be used to leak anything else than what it is already
 >> known, the mmap area).
 >
 > Alright, my comments are not a blocker to this patch. What I would to
 > discuss if we should also randomize the arenas for environments that
 > do not provide ASLR (or limited entropy) and which would be best way
 > to do it.
 >

   Discussion is necessary and clarifying. Thanks.

   Regarding the test. The way to measure the entropy is very similar to the one used in the paxtest 
tool. Which is good enough for testing the correctness of the patch. But once the patch is 
integrated, then we don't think that the test is necessary any longer.

Regards.

 >>
 >>
 >>>>
 >>>> Also, remember that if the randomization of the arenas are
 >> independent of the rest of the memory layout, then there must be a
 >> mechanism to disable it. It is necessary for debugging.
 >>>
 >>> For such scenario we might disable randomization with a tunable.
 >>>
 >>
 >> Yes, we can but as you said, there are architectures that does not
 >> support ASLR and it sounds odd to disable part of the ASLR on
 >> architectures that have no ASLR at all.
 >>
 >>
  
Adhemerval Zanella March 24, 2020, 7:40 p.m. UTC | #7
On 24/03/2020 14:34, iripoll wrote:
> 
> 
> Hi Adhemerval,
> 
> 
>   Basically there are two issues to consider:
> 
> 1.- How much entropy shall be added to the arenas.
> 2.- Where the entropy is taken.
> 
> 
> El 24/3/20 a las 15:58, Adhemerval Zanella escribió:
>>
>>
>> On 24/03/2020 11:40, Hector Marco wrote:
>>> Hello Adhemerval,
>>>
>>> Good to see you are processing this. Please let me share our view about
>>> the issue here:
>>>
>>>>
>>>> My main issue about using ASLR entropy is it provides different values
>>>> depending of the underlying architecture and kernel configuration. And
>>>> glibc still support some architecture that do not support ASLR at all.
>>>
>>> We don't see the problem here. Can you please elaborate a little bit
>>> more what's wrong? To our understanding (and what our patch does) if an
>>> architecture does not support ASLR then no randomization is applied.
>>
>> My understanding of this idea hardening is to randomize the arenas using
>> ASLR entropy and my point is to provide such hardening even when kernel
>> does not provide such ASLR.
>>
> 
>  The problem with the ASLR is that all the memory zones have to be randomized in order to be really effective. Therefore, if the libraries or thread stacks are not randomized, then there is no point to randomize the heaps of the threads.

Right, I agreethe adding some randomness arena placement in such cases
lead minimum gain security-wise.

> 
>>>
>>>> And, as you pointed out, even for some ASLR enabled architectures the
>>>> entropy most kernels provides is low.
>>>
>>> The kernel entropy is not low. Actually the aliment is the one
>>> destroying the entropy but the entropy provided by the kernel is a high
>>> quality entropy one.
>>
>> My point is it might be considered low depending of the architecture and
>> kernel configuration, although it might not an issue once the heap
>> alignment is not taking in consideration (for instance i686 will provide
>> 8 bits, where 17 would be required to get a fully randomized placement
>> on the mmap area). So my question how randomized heap_info placement
>> do we really require here?
>>
> 
>   ASLR is extremely tricky and subtle issue. It may affect performance (caused by alignment or cache row collisions) and memory space (due to extra page table entries, the pagetable is less compact). Even more, sometimes the ASLR was used to improve the performance. For example, a few random bits were introduced in the thread stacks to improve performance.
> 
>   It would be a good exercise to implement the ASLR in the library, but it is a
> far more complex issue, which requires a much deeper analysis.
> 
> 
>   In summary, we think that it shall have (at least) the same entropy
> than the less randomized memory object (the mmaps). This way, an
> attacker can not take advantage of it. Which is the goal of the patch.

My point is not to implement ASLR by the loaders, it would most likely
be inefficient or error-prone. My point is, assuming ASLR, if we should
use a more robust source of entropy to randomize the heap_info placement
to be platform neutral or if more entropy than ASLR already provides
does not help much here. I take your point is the former, correct?

> 
>>
>>>
>>>>
>>>> That's why I would prefer a platform-neutral solution.  As I pointed
>>>> out some bits from AT_RANDOM are still unused, and I don't see why
>>>> we can't use it here.
>>>>
>>>
>>> We don't thing so. The kernel provides 16 random bytes via AT_RANDOM and
>>> on 64-bit architectures those bytes are already used by the stack
>>> smashing protector and the pointer guard glibc. Please look at the
>>> sysdeps/generic/dl-osinfo.h file.
>>
>> I think you mean sysdeps/unix/sysv/linux/dl-osinfo.h, since it is the
>> one used for Linux.  And my idea was to use something similar of what
>> Florian has suggested some time ago [1] to use the AT_RANDOM as a feeder.
>>
>> [1] https://sourceware.org/legacy-ml/libc-alpha/2016-10/msg00531.html
>>
> 
> This is the second problem: Where the entropy comes from.
> 
> As you suggest, the entropy can be taken from a CSPRNG. But we shall test/consider
> the time overhead.
> 
> But if the entropy is not "linked" with the original ASLR, then we will need to enable another feature to disable it. Let's see an example:
> 
> If I want to run a process without ASLR, now we can:
> $ setarch x86_64 -R ./testing
> 
> But if we use tunables, then we will need to use something like:
> 
> $ GLIBC_TUNABLES=glibc.malloc.arena_randomization=0 setarch x86_64 -R ./testing
> 
> Which is possible, but it looks cumbersome and it is hard to explain why we have to disable arenas randomization. It may raise the question whether threads stacks can also be disabled or not and why.

At least two other memory allocators (scudo and hardened_malloc) that aim
for a more security approach adds randomness in some internals data 
structure placement without a way to actually disable it.  But the question
is whether we actually need to disable such randomization for a debug
perceptive (and I am not sure if is good security-wise to actually such a 
feature to disable hardening in such scenarios). In any case, I see your
point.

> 
>   During the development of the patch we analyzed these options, and we discarded them because complexity.
> 
>   If you think that adding extra randomization bits to the arenas is necessary, then the tunables shall be the choice. But as we stated before, we think that it is not necessary.

Ok, I think the complexity also does not pay off.  We might revise it once
we have a more robust CSPRNG which can be used internally (such as the
arc4random patch from Florian).

> 
>>>
>>> Therefore, we will be introducing a correlation (weakness) and attackers
>>> could use a partial info leak to bypass the canary (among other attacks).
>>>
>>> A different history would be to change how we are using the entropy from
>>> AT_RANDOM. As suggested by kernel developers, we could probably use it
>>> to feed a CSPRNG (note that we cannot claim that arc4random is an
>>> entropy source but an entropy collector or intermediary).
>>>
>>> We didn't go for this because the obvious overhead introduced. Right now
>>> our solution has zero overhead and it provides good entropy (a
>>> info leak cannot be used to leak anything else than what it is already
>>> known, the mmap area).
>>
>> Alright, my comments are not a blocker to this patch. What I would to
>> discuss if we should also randomize the arenas for environments that
>> do not provide ASLR (or limited entropy) and which would be best way
>> to do it.
>>
> 
>   Discussion is necessary and clarifying. Thanks.
> 
>   Regarding the test. The way to measure the entropy is very similar to the one used in the paxtest tool. Which is good enough for testing the correctness of the patch. But once the patch is integrated, then we don't think that the test is necessary any longer.

In fact the testcase if more to check the functionality of the change,
I really don't want to actually validate the ASLR entropy of the system.
We can live without it for now.

> 
> Regards.
> 
>>>
>>>
>>>>>
>>>>> Also, remember that if the randomization of the arenas are
>>> independent of the rest of the memory layout, then there must be a
>>> mechanism to disable it. It is necessary for debugging.
>>>>
>>>> For such scenario we might disable randomization with a tunable.
>>>>
>>>
>>> Yes, we can but as you said, there are architectures that does not
>>> support ASLR and it sounds odd to disable part of the ASLR on
>>> architectures that have no ASLR at all.
>>>
>>>
> 

Alright, could you post a v2 with the inline functions change, without
the test and with a proper commit message? 

Thanks for you patience and for the work on moving this forward.
  
iripoll March 25, 2020, 2:56 p.m. UTC | #8
Hello,

>>
> 
> Alright, could you post a v2 with the inline functions change, without
> the test and with a proper commit message?
> 
> Thanks for you patience and for the work on moving this forward.
> 

Here is the the patch with all the comments and suggestions.

Regards,
Ismael and Hector.



 From 6230b07e0fc85856275f89c16a18b46f8ed23936 Mon Sep 17 00:00:00 2001
From: Ismael Ripoll <iripoll@disca.upv.es>
Date: Wed, 25 Mar 2020 12:36:24 +0100
Subject: [PATCH] Fix [BZ #22853] weak randomization on thread arenas.

Signed-off-by: Ismael Ripoll <iripoll@disca.upv.es>
Co-Authored-By: Hector Marco-Gisbert <hmarco@hmarco.org>
---
  malloc/arena.c | 38 +++++++++++++++++++++++++++++---------
  1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/malloc/arena.c b/malloc/arena.c
index cecdb7f4c4..623824088b 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -122,13 +122,23 @@ int __malloc_initialized = -1;
          ptr = arena_get2 ((size), NULL);				      \
    } while (0)

-/* find the heap and corresponding arena for a given ptr */
-
-#define heap_for_ptr(ptr) \
-  ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1)))
-#define arena_for_chunk(ptr) \
-  (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr)
-
+/* find the heap and corresponding arena for a given ptr.  Note that
+   heap_info is not HEAP_MAX_SIZE aligned any more.  But a random
+   offset from the expected alignment, known by the process.  This way
+   it is fast to get the head of the area and be ASLR friendly.
+*/
+static unsigned long arena_rnd;
+static inline heap_info *
+heap_for_ptr (mchunkptr ptr)
+{
+  return (heap_info *) ((((uintptr_t) ptr - arena_rnd)
+                         & ~(HEAP_MAX_SIZE - 1)) | arena_rnd);
+}
+static inline struct malloc_state *
+arena_for_chunk (mchunkptr ptr)
+{
+  return chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr;
+}

  /**************************************************************************/

@@ -293,6 +303,11 @@ ptmalloc_init (void)

    __malloc_initialized = 0;

+  size_t pagesize = GLRO (dl_pagesize);
+  /* Get the entropy from the already existing ASLR. */
+  arena_rnd = ((unsigned long) & arena_rnd) & (HEAP_MAX_SIZE - 1) &
+              ~(pagesize - 1);
+
  #ifdef SHARED
    /* In case this libc copy is in a non-default namespace, never use brk.
       Likewise if dlopened from statically linked program.  */
@@ -490,6 +505,11 @@ new_heap (size_t size, size_t top_pad)
          {
            p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1))
                           & ~(HEAP_MAX_SIZE - 1));
+          /* The heap_info is at a random offset from the alignment to
+             HEAP_MAX_SIZE. */
+          p2 = (char *) ((unsigned long) p2 | arena_rnd);
+          if (p1 + HEAP_MAX_SIZE <= p2)
+            p2 -= HEAP_MAX_SIZE;
            ul = p2 - p1;
            if (ul)
              __munmap (p1, ul);
@@ -500,12 +520,12 @@ new_heap (size_t size, size_t top_pad)
        else
          {
            /* Try to take the chance that an allocation of only HEAP_MAX_SIZE
-             is already aligned. */
+             is already aligned to arena_rnd. */
            p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE);
            if (p2 == MAP_FAILED)
              return 0;

-          if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1))
+          if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != arena_rnd)
              {
                __munmap (p2, HEAP_MAX_SIZE);
                return 0;
  
Carlos O'Donell March 25, 2020, 8:05 p.m. UTC | #9
On 3/25/20 10:56 AM, iripoll wrote:
> Hello,
> 
>>>
>>
>> Alright, could you post a v2 with the inline functions change, without
>> the test and with a proper commit message?
>>
>> Thanks for you patience and for the work on moving this forward.
>>
> 
> Here is the the patch with all the comments and suggestions.

I am excited that this kind of work is going forward.

The next step here is that we need to work out the copyright assignment
for this work so we can accept the patches.

The contribution checklist explains "FSF copyright Assignment":
https://sourceware.org/glibc/wiki/Contribution%20checklist#FSF_copyright_Assignment

I would suggest a futures assignment:
http://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-assign.future

This would allow the community to review and accept any of your 
patches that you post

If you're going through the process I usually recommend getting 
assignments in place for glibc, gcc, gdb, and binutils, because
all 4 projects share headers, that way you can post patches
across all of them. However, this is just a recommendation.

If your educational institution claims copyright on work you 
create you may need to tell the FSF that and they in turn may
need to sign a corporate disclaimer.

This process is usually quite fast and smooth.

Your last patches in 2013 were < 15 lines, and so were accepted
under the flexible rules regarding legally significant
contributions. 

Thank you again for all of this work! :-)

Cheers,
Carlos.
  
iripoll March 26, 2020, 2:12 p.m. UTC | #10
Hello,

   Our aim is to sing the copyright assignment ASAP. But please, give us some time to check it with 
our respective universities.

  It may take longer than normal because of the covid-19. We are under full lockdown.

Best,
Ismael.


El 25/3/20 a las 21:05, Carlos O'Donell escribió:
> On 3/25/20 10:56 AM, iripoll wrote:
>> Hello,
>>
>>>>
>>>
>>> Alright, could you post a v2 with the inline functions change, without
>>> the test and with a proper commit message?
>>>
>>> Thanks for you patience and for the work on moving this forward.
>>>
>>
>> Here is the the patch with all the comments and suggestions.
> 
> I am excited that this kind of work is going forward.
> 
> The next step here is that we need to work out the copyright assignment
> for this work so we can accept the patches.
> 
> The contribution checklist explains "FSF copyright Assignment":
> https://sourceware.org/glibc/wiki/Contribution%20checklist#FSF_copyright_Assignment
> 
> I would suggest a futures assignment:
> http://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-assign.future
> 
> This would allow the community to review and accept any of your
> patches that you post
> 
> If you're going through the process I usually recommend getting
> assignments in place for glibc, gcc, gdb, and binutils, because
> all 4 projects share headers, that way you can post patches
> across all of them. However, this is just a recommendation.
> 
> If your educational institution claims copyright on work you
> create you may need to tell the FSF that and they in turn may
> need to sign a corporate disclaimer.
> 
> This process is usually quite fast and smooth.
> 
> Your last patches in 2013 were < 15 lines, and so were accepted
> under the flexible rules regarding legally significant
> contributions.
> 
> Thank you again for all of this work! :-)
> 
> Cheers,
> Carlos.
>
  

Patch

diff --git a/malloc/Makefile b/malloc/Makefile
index 984045b5b9..dc845b1c24 100644
--- a/malloc/Makefile
+++ b/malloc/Makefile
@@ -39,6 +39,8 @@  tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
  	 tst-malloc-too-large \
  	 tst-malloc-stats-cancellation \
  	 tst-tcfree1 tst-tcfree2 tst-tcfree3 \
+	 tst-arena-aslr \
+

  tests-static := \
  	 tst-interpose-static-nothread \
@@ -99,6 +101,7 @@  $(objpfx)tst-malloc-thread-exit: $(shared-thread-library)
  $(objpfx)tst-malloc-thread-fail: $(shared-thread-library)
  $(objpfx)tst-malloc-fork-deadlock: $(shared-thread-library)
  $(objpfx)tst-malloc-stats-cancellation: $(shared-thread-library)
+$(objpfx)tst-arena-aslr: $(shared-thread-library)

  # Export the __malloc_initialize_hook variable to libc.so.
  LDFLAGS-tst-mallocstate = -rdynamic
diff --git a/malloc/arena.c b/malloc/arena.c
index cecdb7f4c4..416559e269 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -122,10 +122,15 @@  int __malloc_initialized = -1;
          ptr = arena_get2 ((size), NULL);				      \
    } while (0)

-/* find the heap and corresponding arena for a given ptr */
-
+/* find the heap and corresponding arena for a given ptr.  Note that
+   heap_info is not HEAP_MAX_SIZE aligned any more. But a random
+   offset from the expected alignment, known by the process. This way
+   it is fast to get the head of the area and be ASLR friendly.
+*/
+static unsigned long __arena_rnd;
  #define heap_for_ptr(ptr) \
-  ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1)))
+  ((heap_info *) ((((unsigned long) ptr-__arena_rnd) & ~(HEAP_MAX_SIZE - 1)) \
+                  |  __arena_rnd))
  #define arena_for_chunk(ptr) \
    (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr)

@@ -293,6 +298,11 @@  ptmalloc_init (void)

    __malloc_initialized = 0;

+  size_t pagesize = GLRO (dl_pagesize);
+  /* Get the entropy from the already existing ASLR. */
+  __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) &
+                ~(pagesize-1);
+
  #ifdef SHARED
    /* In case this libc copy is in a non-default namespace, never use brk.
       Likewise if dlopened from statically linked program.  */
@@ -490,6 +500,11 @@  new_heap (size_t size, size_t top_pad)
          {
            p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1))
                           & ~(HEAP_MAX_SIZE - 1));
+          /* The heap_info is at a random offset from the alignment to
+             HEAP_MAX_SIZE. */
+          p2 = (char *) ((unsigned long) p2 | __arena_rnd);
+          if (p1 + HEAP_MAX_SIZE <= p2)
+            p2 -= HEAP_MAX_SIZE;
            ul = p2 - p1;
            if (ul)
              __munmap (p1, ul);
@@ -500,12 +515,12 @@  new_heap (size_t size, size_t top_pad)
        else
          {
            /* Try to take the chance that an allocation of only HEAP_MAX_SIZE
-             is already aligned. */
+             is already aligned to __arena_rnd. */
            p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE);
            if (p2 == MAP_FAILED)
              return 0;

-          if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1))
+          if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != __arena_rnd)
              {
                __munmap (p2, HEAP_MAX_SIZE);
                return 0;
diff --git a/malloc/tst-arena-aslr.c b/malloc/tst-arena-aslr.c
new file mode 100644
index 0000000000..c285111e1c
--- /dev/null
+++ b/malloc/tst-arena-aslr.c
@@ -0,0 +1,213 @@ 
+/* Bug 22853: aslr on thread's malloc weakened due to arena alignment.
+   https://sourceware.org/bugzilla/show_bug.cgi?id=22853
+   Copyright (C) 2020 Free Software Foundation.
+   Copying and distribution of this file, with or without modification,
+   are permitted in any medium without royalty provided the copyright
+   notice and this notice are preserved. This file is offered as-is,
+   without any warranty.  */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <libc-diag.h>
+#include <pthread.h>
+#include <string.h>
+#include <sys/mman.h>
+
+/* Bits that are considered to be acceptable: */
+#define TOLERANCE 1
+/* Samples to be run but statisticaly accurate: */
+#define NR_SAMPLES 1000
+
+/* Size to be allocated in the thread's arena pool. */
+#define MALLOC_ALLOCATED_SIZE  400
+
+/* Data structure sent by the sampler to the tester. */
+struct sample_t
+{
+  size_t obj_mmap;
+  size_t obj_arena;
+} sample;
+
+size_t pagesize;
+
+static void *
+thread_main(void * dummy)
+{
+    sample.obj_arena = (size_t) malloc (MALLOC_ALLOCATED_SIZE);
+    pthread_exit(0);
+}
+
+/* Creates a thread to sample malloc and mmap addresses, and writes
+   the result to stdout. */
+static void
+sampler(void)
+{
+    int ret;
+    pthread_t th;
+    pthread_attr_t attrs;
+
+    ret = pthread_attr_init (&attrs);
+    if (0 != ret)
+      {
+        perror ("pthread_attr_init");
+        abort ();
+      }
+    sample.obj_mmap = (size_t)mmap( (void *)0x0, pagesize,
+                                    PROT_READ | PROT_WRITE,
+                                    MAP_PRIVATE | MAP_ANONYMOUS, -1 ,0);
+    if (!sample.obj_mmap)
+      {
+        perror ("mmap");
+        abort ();
+      }
+    if (pthread_create (&th, &attrs, thread_main, NULL))
+      {
+        perror ("pthread_create");
+        abort ();
+      }
+    if (pthread_join (th, NULL))
+      {
+        perror ("pthread_join");
+        abort ();
+      }
+    if (sizeof (sample) != write (1, &sample, sizeof (sample)))
+      {
+        perror ("write sampler");
+        abort ();
+      }
+    exit(0);
+}
+
+
+struct statitics_t
+{
+  size_t min, max, and, or;
+};
+
+static void
+stats_init (struct statitics_t *st)
+{
+  st->min = st->and = ~0;
+  st->max = st->or = 0;
+}
+
+static void
+stats_account (struct statitics_t *st, size_t value)
+{
+  st->min = value < st->min ? value : st->min;
+  st->max = value > st->max ? value : st->max;
+  st->and &= value;
+  st->or |= value;
+}
+
+/* Flipping bits, this is a "pourman" entroy estimator. There are much
+   better and accurate entropy estimators, but they need more samples
+   and mote time... which do not worth it. */
+static int
+stats_pmentropy (struct statitics_t *st)
+{
+
+  return __builtin_popcountll(st->and ^ st->or);
+}
+
+#if 0
+/* Debugging the test. Not needed for production testing */
+void
+stats_print (struct statitics_t *st)
+{
+  printf ("[%18zx %18zx] %18zx %18zx %18zx %18zx %d\n", st->min, st->max,
+          (st->max >> 2) + (st->min >> 2),
+          st->and,
+          st->or,
+          st->and ^ st->or,
+          stats_pmentropy(st));
+}
+#endif
+
+static int
+do_test (int argc, char **argv)
+{
+  pid_t child;
+  int tube[2];
+  int wstatus;
+  int i, mmap_entropy, arena_entropy;
+  struct sample_t sample;
+  struct statitics_t stats_mmap, stats_arena;
+
+  pagesize = sysconf (_SC_PAGESIZE);
+
+  if ( (2 == argc) && !strcmp (argv[1],"sample"))
+    {
+      sampler(); /* The role of this process is to sample one address. */
+      exit(0);
+    }
+
+  /* The role of this part is to launch NR_SAMPLES samplers and
+     collect the output. */
+  assert (pipe(tube) == 0);
+
+  stats_init (&stats_mmap);
+  stats_init (&stats_arena);
+
+  /* Collect all the samples. */
+  for (i = 0; i < NR_SAMPLES ; i++)
+    {
+      child = fork ();
+      if (child < 0)
+        {
+          perror ("fork");
+          exit (1);
+        }
+      if (0 == child)
+        {
+          assert (dup2(tube[1],1) != -1);
+          execl (argv[0], argv[0], "sample", NULL);
+          perror ("execl");
+          abort ();
+        }
+      if (TEMP_FAILURE_RETRY (read(tube[0], &sample, sizeof (sample)))
+          != sizeof (sample))
+        {
+          printf ("Error: can not read from child process\n");
+	  abort ();
+        }
+      if (TEMP_FAILURE_RETRY (waitpid (child, &wstatus, 0) != child) ||
+          (wstatus != 0))
+        {
+          printf ("Error: waitpid for child process\n");
+	  abort ();
+        }
+      stats_account (&stats_mmap, sample.obj_mmap);
+      stats_account (&stats_arena, sample.obj_arena);
+    }
+
+  /* Compute the actual entropy and analyze the results. */
+  mmap_entropy = stats_pmentropy (&stats_mmap);
+  arena_entropy = stats_pmentropy (&stats_arena);
+
+  if ((arena_entropy + TOLERANCE) < mmap_entropy)
+    {
+      printf ("FAIL: malloc arena ASLR.\n");
+      printf ("  Expected value: %d random bits\n", mmap_entropy);
+      printf ("  But got:        %d random bits\n", arena_entropy);
+
+      /* stats_print(&stats_mmap);
+         stats_print(&stats_arena); */
+      return 1;
+    }
+  printf ("PASS: malloc arena ASLR.\n");
+  return 0;
+}
+
+#define TEST_FUNCTON_ARGV do_test
+#include "../test-skeleton.c"
+
+/* Local Variables: */
+/* c-file-style: "gnu" */
+/* c-basic-offset: 2 */
+/* End: */