Message ID | f44c10e6-4300-17bf-f279-2c1f9082a7b6@disca.upv.es |
---|---|
State | DCO or assignment missing |
Headers |
Return-Path: <iripoll@disca.upv.es> X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtpsalv.upv.es (smtpsal1.cc.upv.es [158.42.249.61]) by sourceware.org (Postfix) with ESMTPS id 42624385B832 for <libc-alpha@sourceware.org>; Mon, 23 Mar 2020 17:06:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 42624385B832 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=disca.upv.es Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=iripoll@disca.upv.es Received: from smtpx.upv.es (smtpxv.cc.upv.es [158.42.249.46]) by smtpsalv.upv.es (8.14.7/8.14.7) with ESMTP id 02NH6pET012065 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 23 Mar 2020 18:06:51 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=upv.es; s=default; t=1584983211; bh=dwxT0wsqYifatgV+uZcun/HQE9Bewe2oMF7fUKElERE=; h=To:From:Subject:Cc:Date:From; b=XPz2J95UHeGw4D6WZ/2Qs11BkuncfYReQQb02jMKmYRqIHGB93zG67PhYPtDz5KMZ OXdpbbOClYQUphDIMwgHaIK6K5bAe4AyJqp7yZjLVgqGWyTa1J8o9dQeEAlnYymV7r OMIBfKTwhoBcfgGu903cDdGtLOWd7+3DEBpvdljGwkbxxSEXjLvkngcvcbiQC8JRDl Q7wGKtgHbtTTZ1pJTckBlsx2NqTF5ryXK9cDYsBZkcYL0tZtkxh9Iy4UgHasexnPck iMpg+Seg6nd8qLti69+6EbJnnCcpen4+LPLR2CHyNkSYTYO2ZEao92/RZ1LULopcDy UKkOmy9o0k0Rw== Received: from smtp.upv.es (smtpv.cc.upv.es [158.42.249.16]) by smtpx.upv.es (8.14.7/8.14.7) with ESMTP id 02NH6oAn009066; Mon, 23 Mar 2020 18:06:51 +0100 Received: from [192.168.56.2] (static-2-140-87-188.ipcom.comunitel.net [188.87.140.2]) (authenticated bits=0) by smtp.upv.es (8.14.7/8.14.7) with ESMTP id 02NH6nd5030826 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 23 Mar 2020 18:06:50 +0100 To: libc-alpha@sourceware.org From: iripoll <iripoll@disca.upv.es> Subject: [PATCH] malloc: Fix [BZ #22853] weak randomization on thread arenas. Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>, Hector Marco-Gisbert <hmarco@hmarco.org> Message-ID: <f44c10e6-4300-17bf-f279-2c1f9082a7b6@disca.upv.es> Date: Mon, 23 Mar 2020 18:06:49 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-27.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <http://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <http://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <http://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> X-List-Received-Date: Mon, 23 Mar 2020 17:06:58 -0000 |
Series |
malloc: Fix [BZ #22853] weak randomization on thread arenas.
|
|
Commit Message
iripoll
March 23, 2020, 5:06 p.m. UTC
Hi! > Thanks for the work. However, patch discussion is done through the > email list [1]. Could you re-send it on the libc-alpha? > > It seems that alignment restriction was added originally to simplify > the heap_for_ptr macro and I think it should be added on patch > description (stating there is no other restriction about arena > alignment). Regarding the comments in the code, we don't know what would be the best way to describe the patch. We don't know if the code shall be commented with respect to the previous code or if it would be better to describe the operation of the new heap_for_ptr macro (which can be tricky). Please, add the comments that you think are better. > Also, would be possible to add a testcase? I am not sure which > approach would be better (a maillist discussion might give some > ideas). Maybe by parametrize the mmap entropy information you > compiled on arch-specific files (similar to the stackinfo.h), create > N thread that run M allocation, record the output, and compare the > histogram with the expected entropy with a certain margin. The patch (in the mail) contains both the fix and a new test [tst-arena-aslr.c] that checks the effective randomization of the malloc on threads and compares it with that of the mmap. That way the test shall be platform independent. Although the security issue affects the entropy of the dynamic memory requested by the threads, it is necessary to launch 1000 processes to have different memory layouts samples. Remember that ASLR does not change on a per processes basis (fork) but it is renewed every new binary image (exec). > Finally, please also add the outcome of the > bench-malloc-{simple,thread} with and without the patch. The output of the test is: -------- If failed (three lines) ------ FAIL: malloc arena ASLR. Expected value: 29 random bits But got: 15 random bits -------- If passed (one line) ------- PASS: malloc arena ASLR. The test is passed if the random bits of the thread malloc is greater or equal to the random bits of mmap(). With a small tolerance. This way (testing greater or equal than) the test will also pass if additional entropy is add to the heaps in the future. Important: the test does not estimate the entropy accurately! because it is not the goal of the test. A more accurate estimation would need much more samples and a more complex algorithm, which is heavy. In fact, the actual entropy is 28 bits and 14 in x86_64, but the test gives 29 and 15 for mmap and thread's mallocs respectively. Regards, Ismael and Hector. https://sourceware.org/bugzilla/show_bug.cgi?id=22853 Signed-off-by: Ismael Ripoll <iripoll@disca.upv.es> Co-Authored-By: Hector Marco-Gisbert <hmarco@hmarco.org> --- malloc/Makefile | 3 + malloc/arena.c | 25 ++++- malloc/tst-arena-aslr.c | 213 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 236 insertions(+), 5 deletions(-) create mode 100644 malloc/tst-arena-aslr.c
Comments
On 23/03/2020 14:06, iripoll wrote: > Hi! > >> Thanks for the work. However, patch discussion is done through the >> email list [1]. Could you re-send it on the libc-alpha? >> >> It seems that alignment restriction was added originally to simplify >> the heap_for_ptr macro and I think it should be added on patch >> description (stating there is no other restriction about arena >> alignment). > > Regarding the comments in the code, we don't know what would be the > best way to describe the patch. We don't know if the code shall be > commented with respect to the previous code or if it would be better > to describe the operation of the new heap_for_ptr macro (which can be > tricky). Please, add the comments that you think are better. Thanks for follow this up, in fact I was about to send this very patch in your behalf. I will comment below. > >> Also, would be possible to add a testcase? I am not sure which >> approach would be better (a maillist discussion might give some >> ideas). Maybe by parametrize the mmap entropy information you >> compiled on arch-specific files (similar to the stackinfo.h), create >> N thread that run M allocation, record the output, and compare the >> histogram with the expected entropy with a certain margin. > > The patch (in the mail) contains both the fix and a new test > [tst-arena-aslr.c] that checks the effective randomization of the > malloc on threads and compares it with that of the mmap. That way > the test shall be platform independent. > > Although the security issue affects the entropy of the dynamic > memory requested by the threads, it is necessary to launch 1000 > processes to have different memory layouts samples. Remember that > ASLR does not change on a per processes basis (fork) but it is > renewed every new binary image (exec). Right, we might need to handle and tune if it takes too much time (specially on some architectures). > >> Finally, please also add the outcome of the >> bench-malloc-{simple,thread} with and without the patch. > > > The output of the test is: > > -------- If failed (three lines) ------ > FAIL: malloc arena ASLR. > Expected value: 29 random bits > But got: 15 random bits > > -------- If passed (one line) ------- > PASS: malloc arena ASLR. > > The test is passed if the random bits of the thread malloc is > greater or equal to the random bits of mmap(). With a small > tolerance. This way (testing greater or equal than) the test will > also pass if additional entropy is add to the heaps in the future. > > Important: the test does not estimate the entropy accurately! because > it is not the goal of the test. A more accurate estimation would need > much more samples and a more complex algorithm, which is heavy. In fact, > the actual entropy is 28 bits and 14 in x86_64, but the test gives 29 > and 15 for mmap and thread's mallocs respectively. Alright, the main issue is not that it calculates entropy inaccurately, but if it accuses false-positives. I tried to code something similar, and checking on the testcase provided it seems too fragile to give meaningful results. So I think we should dropped it for now. > > > Regards, > Ismael and Hector. > > > https://sourceware.org/bugzilla/show_bug.cgi?id=22853 > Signed-off-by: Ismael Ripoll <iripoll@disca.upv.es> > Co-Authored-By: Hector Marco-Gisbert <hmarco@hmarco.org> > --- > malloc/Makefile | 3 + > malloc/arena.c | 25 ++++- > malloc/tst-arena-aslr.c | 213 ++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 236 insertions(+), 5 deletions(-) > create mode 100644 malloc/tst-arena-aslr.c > > diff --git a/malloc/Makefile b/malloc/Makefile > index 984045b5b9..dc845b1c24 100644 > --- a/malloc/Makefile > +++ b/malloc/Makefile > @@ -39,6 +39,8 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \ > tst-malloc-too-large \ > tst-malloc-stats-cancellation \ > tst-tcfree1 tst-tcfree2 tst-tcfree3 \ > + tst-arena-aslr \ > + > > tests-static := \ > tst-interpose-static-nothread \ No '\' on last item. > @@ -99,6 +101,7 @@ $(objpfx)tst-malloc-thread-exit: $(shared-thread-library) > $(objpfx)tst-malloc-thread-fail: $(shared-thread-library) > $(objpfx)tst-malloc-fork-deadlock: $(shared-thread-library) > $(objpfx)tst-malloc-stats-cancellation: $(shared-thread-library) > +$(objpfx)tst-arena-aslr: $(shared-thread-library) > > # Export the __malloc_initialize_hook variable to libc.so. > LDFLAGS-tst-mallocstate = -rdynamic Ok. > diff --git a/malloc/arena.c b/malloc/arena.c > index cecdb7f4c4..416559e269 100644 > --- a/malloc/arena.c > +++ b/malloc/arena.c > @@ -122,10 +122,15 @@ int __malloc_initialized = -1; > ptr = arena_get2 ((size), NULL); \ > } while (0) > > -/* find the heap and corresponding arena for a given ptr */ > - > +/* find the heap and corresponding arena for a given ptr. Note that > + heap_info is not HEAP_MAX_SIZE aligned any more. But a random > + offset from the expected alignment, known by the process. This way > + it is fast to get the head of the area and be ASLR friendly. > +*/ Style: capitalize 'style' and two space after period. > +static unsigned long __arena_rnd; The variable is static, so no need to double underscore to handle namespace issues. Also I think we should use uintptr_t for its type. > #define heap_for_ptr(ptr) \ > - ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1))) > + ((heap_info *) ((((unsigned long) ptr-__arena_rnd) & ~(HEAP_MAX_SIZE - 1)) \ > + | __arena_rnd)) > #define arena_for_chunk(ptr) \ > (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr) > I would prefer if we use static inline functions instead. For the version I was intended to send, I used: static inline heap_info * heap_for_ptr (mchunkptr ptr) { return (heap_info *) ((((uintptr_t) ptr - arena_rnd) & ~(HEAP_MAX_SIZE - 1)) | arena_rnd); } static inline struct malloc_state * arena_for_chunk (mchunkptr ptr) { return chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr; } > @@ -293,6 +298,11 @@ ptmalloc_init (void) > > __malloc_initialized = 0; > > + size_t pagesize = GLRO (dl_pagesize); > + /* Get the entropy from the already existing ASLR. */ > + __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) & > + ~(pagesize-1); > + > #ifdef SHARED > /* In case this libc copy is in a non-default namespace, never use brk. > Likewise if dlopened from statically linked program. */ The mmap arena size would be at most HEAP_MAX_SIZE << 1, which is 134217728 for 64 bit or 2097152 for 32 bit and the heap_info size is 32 bytes on 64 bits and 16 on 32 bits. So it would requires at least 22 bits for 64 bits and 17 bits for 32 bits to fully randomize heap_info (if I didn't make any confusion here). And relying on ASLR entropy solely covers it only for some architectures, with the issue that it does not help for architectures that do not support ARCH_HAS_ELF_RANDOMIZE. So I think it would be better to use some bits from AT_RANDOM and maybe refactor dl-osinfo.h to handle it transparently (currently it uses only up to 64 bits for stack check and pointer guard). Another option would be to either refactor Florian suggestion to use SHA256 with AT_RANDOM, which gives us some more bits to work. In a long term I think the better alternative is to have a more robust source of entropy, such as the arc4random. Also, some style issues (like space after & and between operators). [1] https://sourceware.org/pipermail/libc-alpha/2020-March/112042.html > @@ -490,6 +500,11 @@ new_heap (size_t size, size_t top_pad) > { > p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1)) > & ~(HEAP_MAX_SIZE - 1)); > + /* The heap_info is at a random offset from the alignment to > + HEAP_MAX_SIZE. */ > + p2 = (char *) ((unsigned long) p2 | __arena_rnd); > + if (p1 + HEAP_MAX_SIZE <= p2) > + p2 -= HEAP_MAX_SIZE; > ul = p2 - p1; > if (ul) > __munmap (p1, ul); Ok. > @@ -500,12 +515,12 @@ new_heap (size_t size, size_t top_pad) > else > { > /* Try to take the chance that an allocation of only HEAP_MAX_SIZE > - is already aligned. */ > + is already aligned to __arena_rnd. */ > p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE); > if (p2 == MAP_FAILED) > return 0; > > - if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) > + if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != __arena_rnd) > { > __munmap (p2, HEAP_MAX_SIZE); > return 0; Ok. > diff --git a/malloc/tst-arena-aslr.c b/malloc/tst-arena-aslr.c > new file mode 100644 > index 0000000000..c285111e1c > --- /dev/null > +++ b/malloc/tst-arena-aslr.c > @@ -0,0 +1,213 @@ > +/* Bug 22853: aslr on thread's malloc weakened due to arena alignment. > + https://sourceware.org/bugzilla/show_bug.cgi?id=22853 No need to reference the full bug report link. > + Copyright (C) 2020 Free Software Foundation. > + Copying and distribution of this file, with or without modification, > + are permitted in any medium without royalty provided the copyright > + notice and this notice are preserved. This file is offered as-is, > + without any warranty. */ This Copyright disclaimer does not follow the model one used for newer files. > + > +#include <assert.h> > +#include <errno.h> > +#include <stdio.h> > +#include <stdlib.h> > +#include <unistd.h> > +#include <sys/types.h> > +#include <sys/wait.h> > +#include <libc-diag.h> > +#include <pthread.h> > +#include <string.h> > +#include <sys/mman.h> > + > +/* Bits that are considered to be acceptable: */ > +#define TOLERANCE 1 > +/* Samples to be run but statisticaly accurate: */ > +#define NR_SAMPLES 1000 > + > +/* Size to be allocated in the thread's arena pool. */ > +#define MALLOC_ALLOCATED_SIZE 400 > + > +/* Data structure sent by the sampler to the tester. */ > +struct sample_t > +{ > + size_t obj_mmap; > + size_t obj_arena; > +} sample; > + > +size_t pagesize; > + > +static void * > +thread_main(void * dummy) > +{ > + sample.obj_arena = (size_t) malloc (MALLOC_ALLOCATED_SIZE); > + pthread_exit(0); > +} > + > +/* Creates a thread to sample malloc and mmap addresses, and writes > + the result to stdout. */ > +static void > +sampler(void) > +{ > + int ret; > + pthread_t th; > + pthread_attr_t attrs; > + > + ret = pthread_attr_init (&attrs); > + if (0 != ret) > + { > + perror ("pthread_attr_init"); > + abort (); > + } > + sample.obj_mmap = (size_t)mmap( (void *)0x0, pagesize, > + PROT_READ | PROT_WRITE, > + MAP_PRIVATE | MAP_ANONYMOUS, -1 ,0); > + if (!sample.obj_mmap) > + { > + perror ("mmap"); > + abort (); > + } > + if (pthread_create (&th, &attrs, thread_main, NULL)) > + { > + perror ("pthread_create"); > + abort (); > + } > + if (pthread_join (th, NULL)) > + { > + perror ("pthread_join"); > + abort (); > + } > + if (sizeof (sample) != write (1, &sample, sizeof (sample))) > + { > + perror ("write sampler"); > + abort (); > + } > + exit(0); > +} > + > + > +struct statitics_t > +{ > + size_t min, max, and, or; > +}; > + > +static void > +stats_init (struct statitics_t *st) > +{ > + st->min = st->and = ~0; > + st->max = st->or = 0; > +} > + > +static void > +stats_account (struct statitics_t *st, size_t value) > +{ > + st->min = value < st->min ? value : st->min; > + st->max = value > st->max ? value : st->max; > + st->and &= value; > + st->or |= value; > +} > + > +/* Flipping bits, this is a "pourman" entroy estimator. There are much > + better and accurate entropy estimators, but they need more samples > + and mote time... which do not worth it. */ > +static int > +stats_pmentropy (struct statitics_t *st) > +{ > + > + return __builtin_popcountll(st->and ^ st->or); > +} > + > +#if 0 > +/* Debugging the test. Not needed for production testing */ > +void > +stats_print (struct statitics_t *st) > +{ > + printf ("[%18zx %18zx] %18zx %18zx %18zx %18zx %d\n", st->min, st->max, > + (st->max >> 2) + (st->min >> 2), > + st->and, > + st->or, > + st->and ^ st->or, > + stats_pmentropy(st)); > +} > +#endif > + > +static int > +do_test (int argc, char **argv) > +{ > + pid_t child; > + int tube[2]; > + int wstatus; > + int i, mmap_entropy, arena_entropy; > + struct sample_t sample; > + struct statitics_t stats_mmap, stats_arena; > + > + pagesize = sysconf (_SC_PAGESIZE); > + > + if ( (2 == argc) && !strcmp (argv[1],"sample")) > + { > + sampler(); /* The role of this process is to sample one address. */ > + exit(0); > + } > + > + /* The role of this part is to launch NR_SAMPLES samplers and > + collect the output. */ > + assert (pipe(tube) == 0); > + > + stats_init (&stats_mmap); > + stats_init (&stats_arena); > + > + /* Collect all the samples. */ > + for (i = 0; i < NR_SAMPLES ; i++) > + { > + child = fork (); > + if (child < 0) > + { > + perror ("fork"); > + exit (1); > + } > + if (0 == child) > + { > + assert (dup2(tube[1],1) != -1); > + execl (argv[0], argv[0], "sample", NULL); > + perror ("execl"); > + abort (); > + } > + if (TEMP_FAILURE_RETRY (read(tube[0], &sample, sizeof (sample))) > + != sizeof (sample)) > + { > + printf ("Error: can not read from child process\n"); > + abort (); > + } > + if (TEMP_FAILURE_RETRY (waitpid (child, &wstatus, 0) != child) || > + (wstatus != 0)) > + { > + printf ("Error: waitpid for child process\n"); > + abort (); > + } > + stats_account (&stats_mmap, sample.obj_mmap); > + stats_account (&stats_arena, sample.obj_arena); > + } > + > + /* Compute the actual entropy and analyze the results. */ > + mmap_entropy = stats_pmentropy (&stats_mmap); > + arena_entropy = stats_pmentropy (&stats_arena); > + > + if ((arena_entropy + TOLERANCE) < mmap_entropy) > + { > + printf ("FAIL: malloc arena ASLR.\n"); > + printf (" Expected value: %d random bits\n", mmap_entropy); > + printf (" But got: %d random bits\n", arena_entropy); > + > + /* stats_print(&stats_mmap); > + stats_print(&stats_arena); */ > + return 1; > + } > + printf ("PASS: malloc arena ASLR.\n"); > + return 0; > +} > + > +#define TEST_FUNCTON_ARGV do_test > +#include "../test-skeleton.c"> + > +/* Local Variables: */ > +/* c-file-style: "gnu" */ > +/* c-basic-offset: 2 */ > +/* End: */
El 23/3/20 a las 21:14, Adhemerval Zanella escribió: > > > On 23/03/2020 14:06, iripoll wrote: >> Hi! >> >>> Thanks for the work. However, patch discussion is done through the >>> email list [1]. Could you re-send it on the libc-alpha? >>> >>> It seems that alignment restriction was added originally to simplify >>> the heap_for_ptr macro and I think it should be added on patch >>> description (stating there is no other restriction about arena >>> alignment). >> >> Regarding the comments in the code, we don't know what would be the >> best way to describe the patch. We don't know if the code shall be >> commented with respect to the previous code or if it would be better >> to describe the operation of the new heap_for_ptr macro (which can be >> tricky). Please, add the comments that you think are better. > > Thanks for follow this up, in fact I was about to send this very > patch in your behalf. I will comment below. > >> >>> Also, would be possible to add a testcase? I am not sure which >>> approach would be better (a maillist discussion might give some >>> ideas). Maybe by parametrize the mmap entropy information you >>> compiled on arch-specific files (similar to the stackinfo.h), create >>> N thread that run M allocation, record the output, and compare the >>> histogram with the expected entropy with a certain margin. >> >> The patch (in the mail) contains both the fix and a new test >> [tst-arena-aslr.c] that checks the effective randomization of the >> malloc on threads and compares it with that of the mmap. That way >> the test shall be platform independent. >> >> Although the security issue affects the entropy of the dynamic >> memory requested by the threads, it is necessary to launch 1000 >> processes to have different memory layouts samples. Remember that >> ASLR does not change on a per processes basis (fork) but it is >> renewed every new binary image (exec). > > Right, we might need to handle and tune if it takes too much > time (specially on some architectures). > >> >>> Finally, please also add the outcome of the >>> bench-malloc-{simple,thread} with and without the patch. >> >> >> The output of the test is: >> >> -------- If failed (three lines) ------ >> FAIL: malloc arena ASLR. >> Expected value: 29 random bits >> But got: 15 random bits >> >> -------- If passed (one line) ------- >> PASS: malloc arena ASLR. >> >> The test is passed if the random bits of the thread malloc is >> greater or equal to the random bits of mmap(). With a small >> tolerance. This way (testing greater or equal than) the test will >> also pass if additional entropy is add to the heaps in the future. >> >> Important: the test does not estimate the entropy accurately! because >> it is not the goal of the test. A more accurate estimation would need >> much more samples and a more complex algorithm, which is heavy. In fact, >> the actual entropy is 28 bits and 14 in x86_64, but the test gives 29 >> and 15 for mmap and thread's mallocs respectively. > > Alright, the main issue is not that it calculates entropy inaccurately, > but if it accuses false-positives. I tried to code something similar, > and checking on the testcase provided it seems too fragile to give > meaningful results. So I think we should dropped it for now. > We think that the test shall work on all architectures. We have developed a more complex tool which performs a complete statistical analysis of the entropy. We guess that the tst-arena-aslr.c is simple but useful. Here the link to the results of the tool: http://cybersecurity.upv.es/tools/aslra/aslr-analyzer.html In any case, you have the full picture of what is needed in the glibc. >> >> >> Regards, >> Ismael and Hector. >> >> >> https://sourceware.org/bugzilla/show_bug.cgi?id=22853 >> Signed-off-by: Ismael Ripoll <iripoll@disca.upv.es> >> Co-Authored-By: Hector Marco-Gisbert <hmarco@hmarco.org> >> --- >> malloc/Makefile | 3 + >> malloc/arena.c | 25 ++++- >> malloc/tst-arena-aslr.c | 213 ++++++++++++++++++++++++++++++++++++++++ >> 3 files changed, 236 insertions(+), 5 deletions(-) >> create mode 100644 malloc/tst-arena-aslr.c >> >> diff --git a/malloc/Makefile b/malloc/Makefile >> index 984045b5b9..dc845b1c24 100644 >> --- a/malloc/Makefile >> +++ b/malloc/Makefile >> @@ -39,6 +39,8 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \ >> tst-malloc-too-large \ >> tst-malloc-stats-cancellation \ >> tst-tcfree1 tst-tcfree2 tst-tcfree3 \ >> + tst-arena-aslr \ >> + >> >> tests-static := \ >> tst-interpose-static-nothread \ > > No '\' on last item. > Agree. We added the "\" because the rest of the makefile rules ended with "\", which was odd to us. >> @@ -99,6 +101,7 @@ $(objpfx)tst-malloc-thread-exit: $(shared-thread-library) >> $(objpfx)tst-malloc-thread-fail: $(shared-thread-library) >> $(objpfx)tst-malloc-fork-deadlock: $(shared-thread-library) >> $(objpfx)tst-malloc-stats-cancellation: $(shared-thread-library) >> +$(objpfx)tst-arena-aslr: $(shared-thread-library) >> >> # Export the __malloc_initialize_hook variable to libc.so. >> LDFLAGS-tst-mallocstate = -rdynamic > > Ok. > >> diff --git a/malloc/arena.c b/malloc/arena.c >> index cecdb7f4c4..416559e269 100644 >> --- a/malloc/arena.c >> +++ b/malloc/arena.c >> @@ -122,10 +122,15 @@ int __malloc_initialized = -1; >> ptr = arena_get2 ((size), NULL); \ >> } while (0) >> >> -/* find the heap and corresponding arena for a given ptr */ >> - >> +/* find the heap and corresponding arena for a given ptr. Note that >> + heap_info is not HEAP_MAX_SIZE aligned any more. But a random >> + offset from the expected alignment, known by the process. This way >> + it is fast to get the head of the area and be ASLR friendly. >> +*/ > > Style: capitalize 'style' and two space after period. OK! > >> +static unsigned long __arena_rnd; > > The variable is static, so no need to double underscore to handle > namespace issues. Also I think we should use uintptr_t for its > type. > >> #define heap_for_ptr(ptr) \ >> - ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1))) >> + ((heap_info *) ((((unsigned long) ptr-__arena_rnd) & ~(HEAP_MAX_SIZE - 1)) \ >> + | __arena_rnd)) >> #define arena_for_chunk(ptr) \ >> (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr) >> > > I would prefer if we use static inline functions instead. For the > version I was intended to send, I used: > > static inline heap_info * > heap_for_ptr (mchunkptr ptr) > { > return (heap_info *) ((((uintptr_t) ptr - arena_rnd) > & ~(HEAP_MAX_SIZE - 1)) | arena_rnd); > } > static inline struct malloc_state * > arena_for_chunk (mchunkptr ptr) > { > return chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr; > } > > Agree. The macros were used since the first versions of the code, now inlines are preferred. But, we don't know a fix patch shall change the "style" of the code, or it would be better to move code improvements into another patch. We decided to preserve the code as much as possible, so that the changes can be easily tracked, and reviewed. Whatever you choose will be ok. >> @@ -293,6 +298,11 @@ ptmalloc_init (void) >> >> __malloc_initialized = 0; >> >> + size_t pagesize = GLRO (dl_pagesize); >> + /* Get the entropy from the already existing ASLR. */ >> + __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) & >> + ~(pagesize-1); >> + >> #ifdef SHARED >> /* In case this libc copy is in a non-default namespace, never use brk. >> Likewise if dlopened from statically linked program. */ > > The mmap arena size would be at most HEAP_MAX_SIZE << 1, which is > 134217728 for 64 bit or 2097152 for 32 bit and the heap_info size > is 32 bytes on 64 bits and 16 on 32 bits. So it would requires > at least 22 bits for 64 bits and 17 bits for 32 bits to fully > randomize heap_info (if I didn't make any confusion here). > > And relying on ASLR entropy solely covers it only for some architectures, > with the issue that it does not help for architectures that do not > support ARCH_HAS_ELF_RANDOMIZE. > > So I think it would be better to use some bits from AT_RANDOM and maybe > refactor dl-osinfo.h to handle it transparently (currently it uses only > up to 64 bits for stack check and pointer guard). > > Another option would be to either refactor Florian suggestion to use > SHA256 with AT_RANDOM, which gives us some more bits to work. > > In a long term I think the better alternative is to have a more robust > source of entropy, such as the arc4random. > > Also, some style issues (like space after & and between operators). > This is an important issue: where the entropy comes from? and how many entropy shall have the arenas? Here is why we decided to use the existing entropy of the mmaps. 1.- Is it very fast. It does not need to call a hash function. Nor consume entropy from the auxv vector. 2.- It is fully transparent backward compatible. If the user disables ASLR, then the arenas would be fixed too. We were tempted to increase the randomization by adding "sub-page" randomization. Some upper page bits (from bit 4 or 5 until 12, depending on the architecture alignment) could also be randomized, but it will break compatibility and also may cause problems in the future (when larger malloc alignment will be required). So we decided to keep this solution. Also, remember that if the randomization of the arenas are independent of the rest of the memory layout, then there must be a mechanism to disable it. It is necessary for debugging. The idea of the patch is to preserve the existing randomization, by giving a generic solution compatible with ASLR and fast arena finding. For a larger discussion about ASLR, see the work that we started a few years ago (but that we had no time to complete): http://cybersecurity.upv.es/solutions/aslr-ng/aslr-ng.html > [1] https://sourceware.org/pipermail/libc-alpha/2020-March/112042.html > >> @@ -490,6 +500,11 @@ new_heap (size_t size, size_t top_pad) >> { >> p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1)) >> & ~(HEAP_MAX_SIZE - 1)); >> + /* The heap_info is at a random offset from the alignment to >> + HEAP_MAX_SIZE. */ >> + p2 = (char *) ((unsigned long) p2 | __arena_rnd); >> + if (p1 + HEAP_MAX_SIZE <= p2) >> + p2 -= HEAP_MAX_SIZE; >> ul = p2 - p1; >> if (ul) >> __munmap (p1, ul); > > Ok. > >> @@ -500,12 +515,12 @@ new_heap (size_t size, size_t top_pad) >> else >> { >> /* Try to take the chance that an allocation of only HEAP_MAX_SIZE >> - is already aligned. */ >> + is already aligned to __arena_rnd. */ >> p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE); >> if (p2 == MAP_FAILED) >> return 0; >> >> - if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) >> + if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != __arena_rnd) >> { >> __munmap (p2, HEAP_MAX_SIZE); >> return 0; > > Ok. > >> diff --git a/malloc/tst-arena-aslr.c b/malloc/tst-arena-aslr.c >> new file mode 100644 >> index 0000000000..c285111e1c >> --- /dev/null >> +++ b/malloc/tst-arena-aslr.c >> @@ -0,0 +1,213 @@ >> +/* Bug 22853: aslr on thread's malloc weakened due to arena alignment. >> + https://sourceware.org/bugzilla/show_bug.cgi?id=22853 > > No need to reference the full bug report link. OK. > >> + Copyright (C) 2020 Free Software Foundation. >> + Copying and distribution of this file, with or without modification, >> + are permitted in any medium without royalty provided the copyright >> + notice and this notice are preserved. This file is offered as-is, >> + without any warranty. */ > > This Copyright disclaimer does not follow the model one used for > newer files. > >> + >> +#include <assert.h> >> +#include <errno.h> >> +#include <stdio.h> >> +#include <stdlib.h> >> +#include <unistd.h> >> +#include <sys/types.h> >> +#include <sys/wait.h> >> +#include <libc-diag.h> >> +#include <pthread.h> >> +#include <string.h> >> +#include <sys/mman.h> >> + >> +/* Bits that are considered to be acceptable: */ >> +#define TOLERANCE 1 >> +/* Samples to be run but statisticaly accurate: */ >> +#define NR_SAMPLES 1000 >> + >> +/* Size to be allocated in the thread's arena pool. */ >> +#define MALLOC_ALLOCATED_SIZE 400 >> + >> +/* Data structure sent by the sampler to the tester. */ >> +struct sample_t >> +{ >> + size_t obj_mmap; >> + size_t obj_arena; >> +} sample; >> + >> +size_t pagesize; >> + >> +static void * >> +thread_main(void * dummy) >> +{ >> + sample.obj_arena = (size_t) malloc (MALLOC_ALLOCATED_SIZE); >> + pthread_exit(0); >> +} >> + >> +/* Creates a thread to sample malloc and mmap addresses, and writes >> + the result to stdout. */ >> +static void >> +sampler(void) >> +{ >> + int ret; >> + pthread_t th; >> + pthread_attr_t attrs; >> + >> + ret = pthread_attr_init (&attrs); >> + if (0 != ret) >> + { >> + perror ("pthread_attr_init"); >> + abort (); >> + } >> + sample.obj_mmap = (size_t)mmap( (void *)0x0, pagesize, >> + PROT_READ | PROT_WRITE, >> + MAP_PRIVATE | MAP_ANONYMOUS, -1 ,0); >> + if (!sample.obj_mmap) >> + { >> + perror ("mmap"); >> + abort (); >> + } >> + if (pthread_create (&th, &attrs, thread_main, NULL)) >> + { >> + perror ("pthread_create"); >> + abort (); >> + } >> + if (pthread_join (th, NULL)) >> + { >> + perror ("pthread_join"); >> + abort (); >> + } >> + if (sizeof (sample) != write (1, &sample, sizeof (sample))) >> + { >> + perror ("write sampler"); >> + abort (); >> + } >> + exit(0); >> +} >> + >> + >> +struct statitics_t >> +{ >> + size_t min, max, and, or; >> +}; >> + >> +static void >> +stats_init (struct statitics_t *st) >> +{ >> + st->min = st->and = ~0; >> + st->max = st->or = 0; >> +} >> + >> +static void >> +stats_account (struct statitics_t *st, size_t value) >> +{ >> + st->min = value < st->min ? value : st->min; >> + st->max = value > st->max ? value : st->max; >> + st->and &= value; >> + st->or |= value; >> +} >> + >> +/* Flipping bits, this is a "pourman" entroy estimator. There are much >> + better and accurate entropy estimators, but they need more samples >> + and mote time... which do not worth it. */ >> +static int >> +stats_pmentropy (struct statitics_t *st) >> +{ >> + >> + return __builtin_popcountll(st->and ^ st->or); >> +} >> + >> +#if 0 >> +/* Debugging the test. Not needed for production testing */ >> +void >> +stats_print (struct statitics_t *st) >> +{ >> + printf ("[%18zx %18zx] %18zx %18zx %18zx %18zx %d\n", st->min, st->max, >> + (st->max >> 2) + (st->min >> 2), >> + st->and, >> + st->or, >> + st->and ^ st->or, >> + stats_pmentropy(st)); >> +} >> +#endif >> + >> +static int >> +do_test (int argc, char **argv) >> +{ >> + pid_t child; >> + int tube[2]; >> + int wstatus; >> + int i, mmap_entropy, arena_entropy; >> + struct sample_t sample; >> + struct statitics_t stats_mmap, stats_arena; >> + >> + pagesize = sysconf (_SC_PAGESIZE); >> + >> + if ( (2 == argc) && !strcmp (argv[1],"sample")) >> + { >> + sampler(); /* The role of this process is to sample one address. */ >> + exit(0); >> + } >> + >> + /* The role of this part is to launch NR_SAMPLES samplers and >> + collect the output. */ >> + assert (pipe(tube) == 0); >> + >> + stats_init (&stats_mmap); >> + stats_init (&stats_arena); >> + >> + /* Collect all the samples. */ >> + for (i = 0; i < NR_SAMPLES ; i++) >> + { >> + child = fork (); >> + if (child < 0) >> + { >> + perror ("fork"); >> + exit (1); >> + } >> + if (0 == child) >> + { >> + assert (dup2(tube[1],1) != -1); >> + execl (argv[0], argv[0], "sample", NULL); >> + perror ("execl"); >> + abort (); >> + } >> + if (TEMP_FAILURE_RETRY (read(tube[0], &sample, sizeof (sample))) >> + != sizeof (sample)) >> + { >> + printf ("Error: can not read from child process\n"); >> + abort (); >> + } >> + if (TEMP_FAILURE_RETRY (waitpid (child, &wstatus, 0) != child) || >> + (wstatus != 0)) >> + { >> + printf ("Error: waitpid for child process\n"); >> + abort (); >> + } >> + stats_account (&stats_mmap, sample.obj_mmap); >> + stats_account (&stats_arena, sample.obj_arena); >> + } >> + >> + /* Compute the actual entropy and analyze the results. */ >> + mmap_entropy = stats_pmentropy (&stats_mmap); >> + arena_entropy = stats_pmentropy (&stats_arena); >> + >> + if ((arena_entropy + TOLERANCE) < mmap_entropy) >> + { >> + printf ("FAIL: malloc arena ASLR.\n"); >> + printf (" Expected value: %d random bits\n", mmap_entropy); >> + printf (" But got: %d random bits\n", arena_entropy); >> + >> + /* stats_print(&stats_mmap); >> + stats_print(&stats_arena); */ >> + return 1; >> + } >> + printf ("PASS: malloc arena ASLR.\n"); >> + return 0; >> +} >> + >> +#define TEST_FUNCTON_ARGV do_test >> +#include "../test-skeleton.c"> + >> +/* Local Variables: */ >> +/* c-file-style: "gnu" */ >> +/* c-basic-offset: 2 */ >> +/* End: */
On 24/03/2020 09:54, iripoll wrote: > El 23/3/20 a las 21:14, Adhemerval Zanella escribió: >> >> >> On 23/03/2020 14:06, iripoll wrote: >>> Hi! >>> >>>> Thanks for the work. However, patch discussion is done through the >>>> email list [1]. Could you re-send it on the libc-alpha? >>>> >>>> It seems that alignment restriction was added originally to simplify >>>> the heap_for_ptr macro and I think it should be added on patch >>>> description (stating there is no other restriction about arena >>>> alignment). >>> >>> Regarding the comments in the code, we don't know what would be the >>> best way to describe the patch. We don't know if the code shall be >>> commented with respect to the previous code or if it would be better >>> to describe the operation of the new heap_for_ptr macro (which can be >>> tricky). Please, add the comments that you think are better. >> >> Thanks for follow this up, in fact I was about to send this very >> patch in your behalf. I will comment below. >> >> >>>> Also, would be possible to add a testcase? I am not sure which >>>> approach would be better (a maillist discussion might give some >>>> ideas). Maybe by parametrize the mmap entropy information you >>>> compiled on arch-specific files (similar to the stackinfo.h), create >>>> N thread that run M allocation, record the output, and compare the >>>> histogram with the expected entropy with a certain margin. >>> >>> The patch (in the mail) contains both the fix and a new test >>> [tst-arena-aslr.c] that checks the effective randomization of the >>> malloc on threads and compares it with that of the mmap. That way >>> the test shall be platform independent. >>> >>> Although the security issue affects the entropy of the dynamic >>> memory requested by the threads, it is necessary to launch 1000 >>> processes to have different memory layouts samples. Remember that >>> ASLR does not change on a per processes basis (fork) but it is >>> renewed every new binary image (exec). >> >> Right, we might need to handle and tune if it takes too much >> time (specially on some architectures). >> >>> >>>> Finally, please also add the outcome of the >>>> bench-malloc-{simple,thread} with and without the patch. >>> >>> >>> The output of the test is: >>> >>> -------- If failed (three lines) ------ >>> FAIL: malloc arena ASLR. >>> Expected value: 29 random bits >>> But got: 15 random bits >>> >>> -------- If passed (one line) ------- >>> PASS: malloc arena ASLR. >>> >>> The test is passed if the random bits of the thread malloc is >>> greater or equal to the random bits of mmap(). With a small >>> tolerance. This way (testing greater or equal than) the test will >>> also pass if additional entropy is add to the heaps in the future. >>> >>> Important: the test does not estimate the entropy accurately! because >>> it is not the goal of the test. A more accurate estimation would need >>> much more samples and a more complex algorithm, which is heavy. In fact, >>> the actual entropy is 28 bits and 14 in x86_64, but the test gives 29 >>> and 15 for mmap and thread's mallocs respectively. >> >> Alright, the main issue is not that it calculates entropy inaccurately, >> but if it accuses false-positives. I tried to code something similar, >> and checking on the testcase provided it seems too fragile to give >> meaningful results. So I think we should dropped it for now. >> > > We think that the test shall work on all architectures. We have developed a more complex tool which performs a complete statistical analysis of the entropy. We guess that the tst-arena-aslr.c is simple but useful. > > Here the link to the results of the tool: > http://cybersecurity.upv.es/tools/aslra/aslr-analyzer.html > > In any case, you have the full picture of what is needed in the glibc. We usually try to validate the kernel interface, but hardly to validate the kernel semantic itself. The test, for instance, fails on the smoke tests I did on my system (stock ubuntu 18 on x86_64). And you also need to take in consideration system that either does not have ASLR enabled or don't have support at all. I though at first it should be doable, but the required data and post analysis does not really fit for a glibc testcase. Maybe we can add the tool link in commit message and briefly explain what it intend to do, although I would only make sense once it is upstream (since on the link you posted the tool is still not released). > >>> >>> >>> Regards, >>> Ismael and Hector. >>> >>> >>> https://sourceware.org/bugzilla/show_bug.cgi?id=22853 >>> Signed-off-by: Ismael Ripoll <iripoll@disca.upv.es> >>> Co-Authored-By: Hector Marco-Gisbert <hmarco@hmarco.org> >>> --- >>> malloc/Makefile | 3 + >>> malloc/arena.c | 25 ++++- >>> malloc/tst-arena-aslr.c | 213 ++++++++++++++++++++++++++++++++++++++++ >>> 3 files changed, 236 insertions(+), 5 deletions(-) >>> create mode 100644 malloc/tst-arena-aslr.c >>> >>> diff --git a/malloc/Makefile b/malloc/Makefile >>> index 984045b5b9..dc845b1c24 100644 >>> --- a/malloc/Makefile >>> +++ b/malloc/Makefile >>> @@ -39,6 +39,8 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \ >>> tst-malloc-too-large \ >>> tst-malloc-stats-cancellation \ >>> tst-tcfree1 tst-tcfree2 tst-tcfree3 \ >>> + tst-arena-aslr \ >>> + >>> >>> tests-static := \ >>> tst-interpose-static-nothread \ >> >> No '\' on last item. >> > > Agree. We added the "\" because the rest of the makefile rules ended with "\", which was odd to us. > > >>> @@ -99,6 +101,7 @@ $(objpfx)tst-malloc-thread-exit: $(shared-thread-library) >>> $(objpfx)tst-malloc-thread-fail: $(shared-thread-library) >>> $(objpfx)tst-malloc-fork-deadlock: $(shared-thread-library) >>> $(objpfx)tst-malloc-stats-cancellation: $(shared-thread-library) >>> +$(objpfx)tst-arena-aslr: $(shared-thread-library) >>> >>> # Export the __malloc_initialize_hook variable to libc.so. >>> LDFLAGS-tst-mallocstate = -rdynamic >> >> Ok. >> >>> diff --git a/malloc/arena.c b/malloc/arena.c >>> index cecdb7f4c4..416559e269 100644 >>> --- a/malloc/arena.c >>> +++ b/malloc/arena.c >>> @@ -122,10 +122,15 @@ int __malloc_initialized = -1; >>> ptr = arena_get2 ((size), NULL); \ >>> } while (0) >>> >>> -/* find the heap and corresponding arena for a given ptr */ >>> - >>> +/* find the heap and corresponding arena for a given ptr. Note that >>> + heap_info is not HEAP_MAX_SIZE aligned any more. But a random >>> + offset from the expected alignment, known by the process. This way >>> + it is fast to get the head of the area and be ASLR friendly. >>> +*/ >> >> Style: capitalize 'style' and two space after period. > > OK! > >> >>> +static unsigned long __arena_rnd; >> >> The variable is static, so no need to double underscore to handle >> namespace issues. Also I think we should use uintptr_t for its >> type. >> >>> #define heap_for_ptr(ptr) \ >>> - ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1))) >>> + ((heap_info *) ((((unsigned long) ptr-__arena_rnd) & ~(HEAP_MAX_SIZE - 1)) \ >>> + | __arena_rnd)) >>> #define arena_for_chunk(ptr) \ >>> (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr) >>> >> >> I would prefer if we use static inline functions instead. For the >> version I was intended to send, I used: >> >> static inline heap_info * >> heap_for_ptr (mchunkptr ptr) >> { >> return (heap_info *) ((((uintptr_t) ptr - arena_rnd) >> & ~(HEAP_MAX_SIZE - 1)) | arena_rnd); >> } >> static inline struct malloc_state * >> arena_for_chunk (mchunkptr ptr) >> { >> return chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr; >> } >> >> > > Agree. The macros were used since the first versions of the code, now inlines are preferred. > > But, we don't know a fix patch shall change the "style" of the code, or it would be better to move code improvements into another patch. We decided to preserve the code as much as possible, so that the changes can be easily tracked, and reviewed. > > Whatever you choose will be ok. I would prefer to use static inline, it provided a better semantic for compiler analysis and possible debug. > > >>> @@ -293,6 +298,11 @@ ptmalloc_init (void) >>> >>> __malloc_initialized = 0; >>> >>> + size_t pagesize = GLRO (dl_pagesize); >>> + /* Get the entropy from the already existing ASLR. */ >>> + __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) & >>> + ~(pagesize-1); >>> + >>> #ifdef SHARED >>> /* In case this libc copy is in a non-default namespace, never use brk. >>> Likewise if dlopened from statically linked program. */ >> >> The mmap arena size would be at most HEAP_MAX_SIZE << 1, which is >> 134217728 for 64 bit or 2097152 for 32 bit and the heap_info size >> is 32 bytes on 64 bits and 16 on 32 bits. So it would requires >> at least 22 bits for 64 bits and 17 bits for 32 bits to fully >> randomize heap_info (if I didn't make any confusion here). >> >> And relying on ASLR entropy solely covers it only for some architectures, >> with the issue that it does not help for architectures that do not >> support ARCH_HAS_ELF_RANDOMIZE. >> >> So I think it would be better to use some bits from AT_RANDOM and maybe >> refactor dl-osinfo.h to handle it transparently (currently it uses only >> up to 64 bits for stack check and pointer guard). >> >> Another option would be to either refactor Florian suggestion to use >> SHA256 with AT_RANDOM, which gives us some more bits to work. >> >> In a long term I think the better alternative is to have a more robust >> source of entropy, such as the arc4random. >> >> Also, some style issues (like space after & and between operators). >> > > This is an important issue: where the entropy comes from? and how many entropy shall have the arenas? > > Here is why we decided to use the existing entropy of the mmaps. > > 1.- Is it very fast. > It does not need to call a hash function. > Nor consume entropy from the auxv vector. > > 2.- It is fully transparent backward compatible. > If the user disables ASLR, then the arenas would be fixed too. > > We were tempted to increase the randomization by adding "sub-page" randomization. Some upper page bits (from bit 4 or 5 until 12, depending on the architecture alignment) could also be randomized, but it will break compatibility and also may cause problems in the future (when larger malloc alignment will be required). So we decided to keep this solution. My main issue about using ASLR entropy is it provides different values depending of the underlying architecture and kernel configuration. And glibc still support some architecture that do not support ASLR at all. And, as you pointed out, even for some ASLR enabled architectures the entropy most kernels provides is low. That's why I would prefer a platform-neutral solution. As I pointed out some bits from AT_RANDOM are still unused, and I don't see why we can't use it here. > > Also, remember that if the randomization of the arenas are independent of the rest of the memory layout, then there must be a mechanism to disable it. It is necessary for debugging. For such scenario we might disable randomization with a tunable. > > The idea of the patch is to preserve the existing randomization, by giving a generic solution compatible with ASLR and fast arena finding. > > For a larger discussion about ASLR, see the work that we started a few years ago (but that we had no time to complete): > http://cybersecurity.upv.es/solutions/aslr-ng/aslr-ng.html > > >> [1] https://sourceware.org/pipermail/libc-alpha/2020-March/112042.html >> >>> @@ -490,6 +500,11 @@ new_heap (size_t size, size_t top_pad) >>> { >>> p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1)) >>> & ~(HEAP_MAX_SIZE - 1)); >>> + /* The heap_info is at a random offset from the alignment to >>> + HEAP_MAX_SIZE. */ >>> + p2 = (char *) ((unsigned long) p2 | __arena_rnd); >>> + if (p1 + HEAP_MAX_SIZE <= p2) >>> + p2 -= HEAP_MAX_SIZE; >>> ul = p2 - p1; >>> if (ul) >>> __munmap (p1, ul); >> >> Ok. >> >>> @@ -500,12 +515,12 @@ new_heap (size_t size, size_t top_pad) >>> else >>> { >>> /* Try to take the chance that an allocation of only HEAP_MAX_SIZE >>> - is already aligned. */ >>> + is already aligned to __arena_rnd. */ >>> p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE); >>> if (p2 == MAP_FAILED) >>> return 0; >>> >>> - if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) >>> + if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != __arena_rnd) >>> { >>> __munmap (p2, HEAP_MAX_SIZE); >>> return 0; >> >> Ok. >> >>> diff --git a/malloc/tst-arena-aslr.c b/malloc/tst-arena-aslr.c >>> new file mode 100644 >>> index 0000000000..c285111e1c >>> --- /dev/null >>> +++ b/malloc/tst-arena-aslr.c >>> @@ -0,0 +1,213 @@ >>> +/* Bug 22853: aslr on thread's malloc weakened due to arena alignment. >>> + https://sourceware.org/bugzilla/show_bug.cgi?id=22853 >> >> No need to reference the full bug report link. > > OK. > >> >>> + Copyright (C) 2020 Free Software Foundation. >>> + Copying and distribution of this file, with or without modification, >>> + are permitted in any medium without royalty provided the copyright >>> + notice and this notice are preserved. This file is offered as-is, >>> + without any warranty. */ >> >> This Copyright disclaimer does not follow the model one used for >> newer files. >> >>> + >>> +#include <assert.h> >>> +#include <errno.h> >>> +#include <stdio.h> >>> +#include <stdlib.h> >>> +#include <unistd.h> >>> +#include <sys/types.h> >>> +#include <sys/wait.h> >>> +#include <libc-diag.h> >>> +#include <pthread.h> >>> +#include <string.h> >>> +#include <sys/mman.h> >>> + >>> +/* Bits that are considered to be acceptable: */ >>> +#define TOLERANCE 1 >>> +/* Samples to be run but statisticaly accurate: */ >>> +#define NR_SAMPLES 1000 >>> + >>> +/* Size to be allocated in the thread's arena pool. */ >>> +#define MALLOC_ALLOCATED_SIZE 400 >>> + >>> +/* Data structure sent by the sampler to the tester. */ >>> +struct sample_t >>> +{ >>> + size_t obj_mmap; >>> + size_t obj_arena; >>> +} sample; >>> + >>> +size_t pagesize; >>> + >>> +static void * >>> +thread_main(void * dummy) >>> +{ >>> + sample.obj_arena = (size_t) malloc (MALLOC_ALLOCATED_SIZE); >>> + pthread_exit(0); >>> +} >>> + >>> +/* Creates a thread to sample malloc and mmap addresses, and writes >>> + the result to stdout. */ >>> +static void >>> +sampler(void) >>> +{ >>> + int ret; >>> + pthread_t th; >>> + pthread_attr_t attrs; >>> + >>> + ret = pthread_attr_init (&attrs); >>> + if (0 != ret) >>> + { >>> + perror ("pthread_attr_init"); >>> + abort (); >>> + } >>> + sample.obj_mmap = (size_t)mmap( (void *)0x0, pagesize, >>> + PROT_READ | PROT_WRITE, >>> + MAP_PRIVATE | MAP_ANONYMOUS, -1 ,0); >>> + if (!sample.obj_mmap) >>> + { >>> + perror ("mmap"); >>> + abort (); >>> + } >>> + if (pthread_create (&th, &attrs, thread_main, NULL)) >>> + { >>> + perror ("pthread_create"); >>> + abort (); >>> + } >>> + if (pthread_join (th, NULL)) >>> + { >>> + perror ("pthread_join"); >>> + abort (); >>> + } >>> + if (sizeof (sample) != write (1, &sample, sizeof (sample))) >>> + { >>> + perror ("write sampler"); >>> + abort (); >>> + } >>> + exit(0); >>> +} >>> + >>> + >>> +struct statitics_t >>> +{ >>> + size_t min, max, and, or; >>> +}; >>> + >>> +static void >>> +stats_init (struct statitics_t *st) >>> +{ >>> + st->min = st->and = ~0; >>> + st->max = st->or = 0; >>> +} >>> + >>> +static void >>> +stats_account (struct statitics_t *st, size_t value) >>> +{ >>> + st->min = value < st->min ? value : st->min; >>> + st->max = value > st->max ? value : st->max; >>> + st->and &= value; >>> + st->or |= value; >>> +} >>> + >>> +/* Flipping bits, this is a "pourman" entroy estimator. There are much >>> + better and accurate entropy estimators, but they need more samples >>> + and mote time... which do not worth it. */ >>> +static int >>> +stats_pmentropy (struct statitics_t *st) >>> +{ >>> + >>> + return __builtin_popcountll(st->and ^ st->or); >>> +} >>> + >>> +#if 0 >>> +/* Debugging the test. Not needed for production testing */ >>> +void >>> +stats_print (struct statitics_t *st) >>> +{ >>> + printf ("[%18zx %18zx] %18zx %18zx %18zx %18zx %d\n", st->min, st->max, >>> + (st->max >> 2) + (st->min >> 2), >>> + st->and, >>> + st->or, >>> + st->and ^ st->or, >>> + stats_pmentropy(st)); >>> +} >>> +#endif >>> + >>> +static int >>> +do_test (int argc, char **argv) >>> +{ >>> + pid_t child; >>> + int tube[2]; >>> + int wstatus; >>> + int i, mmap_entropy, arena_entropy; >>> + struct sample_t sample; >>> + struct statitics_t stats_mmap, stats_arena; >>> + >>> + pagesize = sysconf (_SC_PAGESIZE); >>> + >>> + if ( (2 == argc) && !strcmp (argv[1],"sample")) >>> + { >>> + sampler(); /* The role of this process is to sample one address. */ >>> + exit(0); >>> + } >>> + >>> + /* The role of this part is to launch NR_SAMPLES samplers and >>> + collect the output. */ >>> + assert (pipe(tube) == 0); >>> + >>> + stats_init (&stats_mmap); >>> + stats_init (&stats_arena); >>> + >>> + /* Collect all the samples. */ >>> + for (i = 0; i < NR_SAMPLES ; i++) >>> + { >>> + child = fork (); >>> + if (child < 0) >>> + { >>> + perror ("fork"); >>> + exit (1); >>> + } >>> + if (0 == child) >>> + { >>> + assert (dup2(tube[1],1) != -1); >>> + execl (argv[0], argv[0], "sample", NULL); >>> + perror ("execl"); >>> + abort (); >>> + } >>> + if (TEMP_FAILURE_RETRY (read(tube[0], &sample, sizeof (sample))) >>> + != sizeof (sample)) >>> + { >>> + printf ("Error: can not read from child process\n"); >>> + abort (); >>> + } >>> + if (TEMP_FAILURE_RETRY (waitpid (child, &wstatus, 0) != child) || >>> + (wstatus != 0)) >>> + { >>> + printf ("Error: waitpid for child process\n"); >>> + abort (); >>> + } >>> + stats_account (&stats_mmap, sample.obj_mmap); >>> + stats_account (&stats_arena, sample.obj_arena); >>> + } >>> + >>> + /* Compute the actual entropy and analyze the results. */ >>> + mmap_entropy = stats_pmentropy (&stats_mmap); >>> + arena_entropy = stats_pmentropy (&stats_arena); >>> + >>> + if ((arena_entropy + TOLERANCE) < mmap_entropy) >>> + { >>> + printf ("FAIL: malloc arena ASLR.\n"); >>> + printf (" Expected value: %d random bits\n", mmap_entropy); >>> + printf (" But got: %d random bits\n", arena_entropy); >>> + >>> + /* stats_print(&stats_mmap); >>> + stats_print(&stats_arena); */ >>> + return 1; >>> + } >>> + printf ("PASS: malloc arena ASLR.\n"); >>> + return 0; >>> +} >>> + >>> +#define TEST_FUNCTON_ARGV do_test >>> +#include "../test-skeleton.c"> + >>> +/* Local Variables: */ >>> +/* c-file-style: "gnu" */ >>> +/* c-basic-offset: 2 */ >>> +/* End: */
Hello Adhemerval, Good to see you are processing this. Please let me share our view about the issue here: > > My main issue about using ASLR entropy is it provides different values > depending of the underlying architecture and kernel configuration. And > glibc still support some architecture that do not support ASLR at all. We don't see the problem here. Can you please elaborate a little bit more what's wrong? To our understanding (and what our patch does) if an architecture does not support ASLR then no randomization is applied. > And, as you pointed out, even for some ASLR enabled architectures the > entropy most kernels provides is low. The kernel entropy is not low. Actually the aliment is the one destroying the entropy but the entropy provided by the kernel is a high quality entropy one. > > That's why I would prefer a platform-neutral solution. As I pointed > out some bits from AT_RANDOM are still unused, and I don't see why > we can't use it here. > We don't thing so. The kernel provides 16 random bytes via AT_RANDOM and on 64-bit architectures those bytes are already used by the stack smashing protector and the pointer guard glibc. Please look at the sysdeps/generic/dl-osinfo.h file. Therefore, we will be introducing a correlation (weakness) and attackers could use a partial info leak to bypass the canary (among other attacks). A different history would be to change how we are using the entropy from AT_RANDOM. As suggested by kernel developers, we could probably use it to feed a CSPRNG (note that we cannot claim that arc4random is an entropy source but an entropy collector or intermediary). We didn't go for this because the obvious overhead introduced. Right now our solution has zero overhead and it provides good entropy (a info leak cannot be used to leak anything else than what it is already known, the mmap area). >> >> Also, remember that if the randomization of the arenas are independent of the rest of the memory layout, then there must be a mechanism to disable it. It is necessary for debugging. > > For such scenario we might disable randomization with a tunable. > Yes, we can but as you said, there are architectures that does not support ASLR and it sounds odd to disable part of the ASLR on architectures that have no ASLR at all.
On 24/03/2020 11:40, Hector Marco wrote: > Hello Adhemerval, > > Good to see you are processing this. Please let me share our view about > the issue here: > >> >> My main issue about using ASLR entropy is it provides different values >> depending of the underlying architecture and kernel configuration. And >> glibc still support some architecture that do not support ASLR at all. > > We don't see the problem here. Can you please elaborate a little bit > more what's wrong? To our understanding (and what our patch does) if an > architecture does not support ASLR then no randomization is applied. My understanding of this idea hardening is to randomize the arenas using ASLR entropy and my point is to provide such hardening even when kernel does not provide such ASLR. > >> And, as you pointed out, even for some ASLR enabled architectures the >> entropy most kernels provides is low. > > The kernel entropy is not low. Actually the aliment is the one > destroying the entropy but the entropy provided by the kernel is a high > quality entropy one. My point is it might be considered low depending of the architecture and kernel configuration, although it might not an issue once the heap alignment is not taking in consideration (for instance i686 will provide 8 bits, where 17 would be required to get a fully randomized placement on the mmap area). So my question how randomized heap_info placement do we really require here? > >> >> That's why I would prefer a platform-neutral solution. As I pointed >> out some bits from AT_RANDOM are still unused, and I don't see why >> we can't use it here. >> > > We don't thing so. The kernel provides 16 random bytes via AT_RANDOM and > on 64-bit architectures those bytes are already used by the stack > smashing protector and the pointer guard glibc. Please look at the > sysdeps/generic/dl-osinfo.h file. I think you mean sysdeps/unix/sysv/linux/dl-osinfo.h, since it is the one used for Linux. And my idea was to use something similar of what Florian has suggested some time ago [1] to use the AT_RANDOM as a feeder. [1] https://sourceware.org/legacy-ml/libc-alpha/2016-10/msg00531.html > > Therefore, we will be introducing a correlation (weakness) and attackers > could use a partial info leak to bypass the canary (among other attacks). > > A different history would be to change how we are using the entropy from > AT_RANDOM. As suggested by kernel developers, we could probably use it > to feed a CSPRNG (note that we cannot claim that arc4random is an > entropy source but an entropy collector or intermediary). > > We didn't go for this because the obvious overhead introduced. Right now > our solution has zero overhead and it provides good entropy (a > info leak cannot be used to leak anything else than what it is already > known, the mmap area). Alright, my comments are not a blocker to this patch. What I would to discuss if we should also randomize the arenas for environments that do not provide ASLR (or limited entropy) and which would be best way to do it. > > >>> >>> Also, remember that if the randomization of the arenas are > independent of the rest of the memory layout, then there must be a > mechanism to disable it. It is necessary for debugging. >> >> For such scenario we might disable randomization with a tunable. >> > > Yes, we can but as you said, there are architectures that does not > support ASLR and it sounds odd to disable part of the ASLR on > architectures that have no ASLR at all. > > >
Hi Adhemerval, Basically there are two issues to consider: 1.- How much entropy shall be added to the arenas. 2.- Where the entropy is taken. El 24/3/20 a las 15:58, Adhemerval Zanella escribió: > > > On 24/03/2020 11:40, Hector Marco wrote: >> Hello Adhemerval, >> >> Good to see you are processing this. Please let me share our view about >> the issue here: >> >>> >>> My main issue about using ASLR entropy is it provides different values >>> depending of the underlying architecture and kernel configuration. And >>> glibc still support some architecture that do not support ASLR at all. >> >> We don't see the problem here. Can you please elaborate a little bit >> more what's wrong? To our understanding (and what our patch does) if an >> architecture does not support ASLR then no randomization is applied. > > My understanding of this idea hardening is to randomize the arenas using > ASLR entropy and my point is to provide such hardening even when kernel > does not provide such ASLR. > The problem with the ASLR is that all the memory zones have to be randomized in order to be really effective. Therefore, if the libraries or thread stacks are not randomized, then there is no point to randomize the heaps of the threads. >> >>> And, as you pointed out, even for some ASLR enabled architectures the >>> entropy most kernels provides is low. >> >> The kernel entropy is not low. Actually the aliment is the one >> destroying the entropy but the entropy provided by the kernel is a high >> quality entropy one. > > My point is it might be considered low depending of the architecture and > kernel configuration, although it might not an issue once the heap > alignment is not taking in consideration (for instance i686 will provide > 8 bits, where 17 would be required to get a fully randomized placement > on the mmap area). So my question how randomized heap_info placement > do we really require here? > ASLR is extremely tricky and subtle issue. It may affect performance (caused by alignment or cache row collisions) and memory space (due to extra page table entries, the pagetable is less compact). Even more, sometimes the ASLR was used to improve the performance. For example, a few random bits were introduced in the thread stacks to improve performance. It would be a good exercise to implement the ASLR in the library, but it is a far more complex issue, which requires a much deeper analysis. In summary, we think that it shall have (at least) the same entropy than the less randomized memory object (the mmaps). This way, an attacker can not take advantage of it. Which is the goal of the patch. > >> >>> >>> That's why I would prefer a platform-neutral solution. As I pointed >>> out some bits from AT_RANDOM are still unused, and I don't see why >>> we can't use it here. >>> >> >> We don't thing so. The kernel provides 16 random bytes via AT_RANDOM and >> on 64-bit architectures those bytes are already used by the stack >> smashing protector and the pointer guard glibc. Please look at the >> sysdeps/generic/dl-osinfo.h file. > > I think you mean sysdeps/unix/sysv/linux/dl-osinfo.h, since it is the > one used for Linux. And my idea was to use something similar of what > Florian has suggested some time ago [1] to use the AT_RANDOM as a feeder. > > [1] https://sourceware.org/legacy-ml/libc-alpha/2016-10/msg00531.html > This is the second problem: Where the entropy comes from. As you suggest, the entropy can be taken from a CSPRNG. But we shall test/consider the time overhead. But if the entropy is not "linked" with the original ASLR, then we will need to enable another feature to disable it. Let's see an example: If I want to run a process without ASLR, now we can: $ setarch x86_64 -R ./testing But if we use tunables, then we will need to use something like: $ GLIBC_TUNABLES=glibc.malloc.arena_randomization=0 setarch x86_64 -R ./testing Which is possible, but it looks cumbersome and it is hard to explain why we have to disable arenas randomization. It may raise the question whether threads stacks can also be disabled or not and why. During the development of the patch we analyzed these options, and we discarded them because complexity. If you think that adding extra randomization bits to the arenas is necessary, then the tunables shall be the choice. But as we stated before, we think that it is not necessary. >> >> Therefore, we will be introducing a correlation (weakness) and attackers >> could use a partial info leak to bypass the canary (among other attacks). >> >> A different history would be to change how we are using the entropy from >> AT_RANDOM. As suggested by kernel developers, we could probably use it >> to feed a CSPRNG (note that we cannot claim that arc4random is an >> entropy source but an entropy collector or intermediary). >> >> We didn't go for this because the obvious overhead introduced. Right now >> our solution has zero overhead and it provides good entropy (a >> info leak cannot be used to leak anything else than what it is already >> known, the mmap area). > > Alright, my comments are not a blocker to this patch. What I would to > discuss if we should also randomize the arenas for environments that > do not provide ASLR (or limited entropy) and which would be best way > to do it. > Discussion is necessary and clarifying. Thanks. Regarding the test. The way to measure the entropy is very similar to the one used in the paxtest tool. Which is good enough for testing the correctness of the patch. But once the patch is integrated, then we don't think that the test is necessary any longer. Regards. >> >> >>>> >>>> Also, remember that if the randomization of the arenas are >> independent of the rest of the memory layout, then there must be a >> mechanism to disable it. It is necessary for debugging. >>> >>> For such scenario we might disable randomization with a tunable. >>> >> >> Yes, we can but as you said, there are architectures that does not >> support ASLR and it sounds odd to disable part of the ASLR on >> architectures that have no ASLR at all. >> >>
On 24/03/2020 14:34, iripoll wrote: > > > Hi Adhemerval, > > > Basically there are two issues to consider: > > 1.- How much entropy shall be added to the arenas. > 2.- Where the entropy is taken. > > > El 24/3/20 a las 15:58, Adhemerval Zanella escribió: >> >> >> On 24/03/2020 11:40, Hector Marco wrote: >>> Hello Adhemerval, >>> >>> Good to see you are processing this. Please let me share our view about >>> the issue here: >>> >>>> >>>> My main issue about using ASLR entropy is it provides different values >>>> depending of the underlying architecture and kernel configuration. And >>>> glibc still support some architecture that do not support ASLR at all. >>> >>> We don't see the problem here. Can you please elaborate a little bit >>> more what's wrong? To our understanding (and what our patch does) if an >>> architecture does not support ASLR then no randomization is applied. >> >> My understanding of this idea hardening is to randomize the arenas using >> ASLR entropy and my point is to provide such hardening even when kernel >> does not provide such ASLR. >> > > The problem with the ASLR is that all the memory zones have to be randomized in order to be really effective. Therefore, if the libraries or thread stacks are not randomized, then there is no point to randomize the heaps of the threads. Right, I agreethe adding some randomness arena placement in such cases lead minimum gain security-wise. > >>> >>>> And, as you pointed out, even for some ASLR enabled architectures the >>>> entropy most kernels provides is low. >>> >>> The kernel entropy is not low. Actually the aliment is the one >>> destroying the entropy but the entropy provided by the kernel is a high >>> quality entropy one. >> >> My point is it might be considered low depending of the architecture and >> kernel configuration, although it might not an issue once the heap >> alignment is not taking in consideration (for instance i686 will provide >> 8 bits, where 17 would be required to get a fully randomized placement >> on the mmap area). So my question how randomized heap_info placement >> do we really require here? >> > > ASLR is extremely tricky and subtle issue. It may affect performance (caused by alignment or cache row collisions) and memory space (due to extra page table entries, the pagetable is less compact). Even more, sometimes the ASLR was used to improve the performance. For example, a few random bits were introduced in the thread stacks to improve performance. > > It would be a good exercise to implement the ASLR in the library, but it is a > far more complex issue, which requires a much deeper analysis. > > > In summary, we think that it shall have (at least) the same entropy > than the less randomized memory object (the mmaps). This way, an > attacker can not take advantage of it. Which is the goal of the patch. My point is not to implement ASLR by the loaders, it would most likely be inefficient or error-prone. My point is, assuming ASLR, if we should use a more robust source of entropy to randomize the heap_info placement to be platform neutral or if more entropy than ASLR already provides does not help much here. I take your point is the former, correct? > >> >>> >>>> >>>> That's why I would prefer a platform-neutral solution. As I pointed >>>> out some bits from AT_RANDOM are still unused, and I don't see why >>>> we can't use it here. >>>> >>> >>> We don't thing so. The kernel provides 16 random bytes via AT_RANDOM and >>> on 64-bit architectures those bytes are already used by the stack >>> smashing protector and the pointer guard glibc. Please look at the >>> sysdeps/generic/dl-osinfo.h file. >> >> I think you mean sysdeps/unix/sysv/linux/dl-osinfo.h, since it is the >> one used for Linux. And my idea was to use something similar of what >> Florian has suggested some time ago [1] to use the AT_RANDOM as a feeder. >> >> [1] https://sourceware.org/legacy-ml/libc-alpha/2016-10/msg00531.html >> > > This is the second problem: Where the entropy comes from. > > As you suggest, the entropy can be taken from a CSPRNG. But we shall test/consider > the time overhead. > > But if the entropy is not "linked" with the original ASLR, then we will need to enable another feature to disable it. Let's see an example: > > If I want to run a process without ASLR, now we can: > $ setarch x86_64 -R ./testing > > But if we use tunables, then we will need to use something like: > > $ GLIBC_TUNABLES=glibc.malloc.arena_randomization=0 setarch x86_64 -R ./testing > > Which is possible, but it looks cumbersome and it is hard to explain why we have to disable arenas randomization. It may raise the question whether threads stacks can also be disabled or not and why. At least two other memory allocators (scudo and hardened_malloc) that aim for a more security approach adds randomness in some internals data structure placement without a way to actually disable it. But the question is whether we actually need to disable such randomization for a debug perceptive (and I am not sure if is good security-wise to actually such a feature to disable hardening in such scenarios). In any case, I see your point. > > During the development of the patch we analyzed these options, and we discarded them because complexity. > > If you think that adding extra randomization bits to the arenas is necessary, then the tunables shall be the choice. But as we stated before, we think that it is not necessary. Ok, I think the complexity also does not pay off. We might revise it once we have a more robust CSPRNG which can be used internally (such as the arc4random patch from Florian). > >>> >>> Therefore, we will be introducing a correlation (weakness) and attackers >>> could use a partial info leak to bypass the canary (among other attacks). >>> >>> A different history would be to change how we are using the entropy from >>> AT_RANDOM. As suggested by kernel developers, we could probably use it >>> to feed a CSPRNG (note that we cannot claim that arc4random is an >>> entropy source but an entropy collector or intermediary). >>> >>> We didn't go for this because the obvious overhead introduced. Right now >>> our solution has zero overhead and it provides good entropy (a >>> info leak cannot be used to leak anything else than what it is already >>> known, the mmap area). >> >> Alright, my comments are not a blocker to this patch. What I would to >> discuss if we should also randomize the arenas for environments that >> do not provide ASLR (or limited entropy) and which would be best way >> to do it. >> > > Discussion is necessary and clarifying. Thanks. > > Regarding the test. The way to measure the entropy is very similar to the one used in the paxtest tool. Which is good enough for testing the correctness of the patch. But once the patch is integrated, then we don't think that the test is necessary any longer. In fact the testcase if more to check the functionality of the change, I really don't want to actually validate the ASLR entropy of the system. We can live without it for now. > > Regards. > >>> >>> >>>>> >>>>> Also, remember that if the randomization of the arenas are >>> independent of the rest of the memory layout, then there must be a >>> mechanism to disable it. It is necessary for debugging. >>>> >>>> For such scenario we might disable randomization with a tunable. >>>> >>> >>> Yes, we can but as you said, there are architectures that does not >>> support ASLR and it sounds odd to disable part of the ASLR on >>> architectures that have no ASLR at all. >>> >>> > Alright, could you post a v2 with the inline functions change, without the test and with a proper commit message? Thanks for you patience and for the work on moving this forward.
Hello, >> > > Alright, could you post a v2 with the inline functions change, without > the test and with a proper commit message? > > Thanks for you patience and for the work on moving this forward. > Here is the the patch with all the comments and suggestions. Regards, Ismael and Hector. From 6230b07e0fc85856275f89c16a18b46f8ed23936 Mon Sep 17 00:00:00 2001 From: Ismael Ripoll <iripoll@disca.upv.es> Date: Wed, 25 Mar 2020 12:36:24 +0100 Subject: [PATCH] Fix [BZ #22853] weak randomization on thread arenas. Signed-off-by: Ismael Ripoll <iripoll@disca.upv.es> Co-Authored-By: Hector Marco-Gisbert <hmarco@hmarco.org> --- malloc/arena.c | 38 +++++++++++++++++++++++++++++--------- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/malloc/arena.c b/malloc/arena.c index cecdb7f4c4..623824088b 100644 --- a/malloc/arena.c +++ b/malloc/arena.c @@ -122,13 +122,23 @@ int __malloc_initialized = -1; ptr = arena_get2 ((size), NULL); \ } while (0) -/* find the heap and corresponding arena for a given ptr */ - -#define heap_for_ptr(ptr) \ - ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1))) -#define arena_for_chunk(ptr) \ - (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr) - +/* find the heap and corresponding arena for a given ptr. Note that + heap_info is not HEAP_MAX_SIZE aligned any more. But a random + offset from the expected alignment, known by the process. This way + it is fast to get the head of the area and be ASLR friendly. +*/ +static unsigned long arena_rnd; +static inline heap_info * +heap_for_ptr (mchunkptr ptr) +{ + return (heap_info *) ((((uintptr_t) ptr - arena_rnd) + & ~(HEAP_MAX_SIZE - 1)) | arena_rnd); +} +static inline struct malloc_state * +arena_for_chunk (mchunkptr ptr) +{ + return chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr; +} /**************************************************************************/ @@ -293,6 +303,11 @@ ptmalloc_init (void) __malloc_initialized = 0; + size_t pagesize = GLRO (dl_pagesize); + /* Get the entropy from the already existing ASLR. */ + arena_rnd = ((unsigned long) & arena_rnd) & (HEAP_MAX_SIZE - 1) & + ~(pagesize - 1); + #ifdef SHARED /* In case this libc copy is in a non-default namespace, never use brk. Likewise if dlopened from statically linked program. */ @@ -490,6 +505,11 @@ new_heap (size_t size, size_t top_pad) { p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1)) & ~(HEAP_MAX_SIZE - 1)); + /* The heap_info is at a random offset from the alignment to + HEAP_MAX_SIZE. */ + p2 = (char *) ((unsigned long) p2 | arena_rnd); + if (p1 + HEAP_MAX_SIZE <= p2) + p2 -= HEAP_MAX_SIZE; ul = p2 - p1; if (ul) __munmap (p1, ul); @@ -500,12 +520,12 @@ new_heap (size_t size, size_t top_pad) else { /* Try to take the chance that an allocation of only HEAP_MAX_SIZE - is already aligned. */ + is already aligned to arena_rnd. */ p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE); if (p2 == MAP_FAILED) return 0; - if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) + if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != arena_rnd) { __munmap (p2, HEAP_MAX_SIZE); return 0;
On 3/25/20 10:56 AM, iripoll wrote: > Hello, > >>> >> >> Alright, could you post a v2 with the inline functions change, without >> the test and with a proper commit message? >> >> Thanks for you patience and for the work on moving this forward. >> > > Here is the the patch with all the comments and suggestions. I am excited that this kind of work is going forward. The next step here is that we need to work out the copyright assignment for this work so we can accept the patches. The contribution checklist explains "FSF copyright Assignment": https://sourceware.org/glibc/wiki/Contribution%20checklist#FSF_copyright_Assignment I would suggest a futures assignment: http://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-assign.future This would allow the community to review and accept any of your patches that you post If you're going through the process I usually recommend getting assignments in place for glibc, gcc, gdb, and binutils, because all 4 projects share headers, that way you can post patches across all of them. However, this is just a recommendation. If your educational institution claims copyright on work you create you may need to tell the FSF that and they in turn may need to sign a corporate disclaimer. This process is usually quite fast and smooth. Your last patches in 2013 were < 15 lines, and so were accepted under the flexible rules regarding legally significant contributions. Thank you again for all of this work! :-) Cheers, Carlos.
Hello, Our aim is to sing the copyright assignment ASAP. But please, give us some time to check it with our respective universities. It may take longer than normal because of the covid-19. We are under full lockdown. Best, Ismael. El 25/3/20 a las 21:05, Carlos O'Donell escribió: > On 3/25/20 10:56 AM, iripoll wrote: >> Hello, >> >>>> >>> >>> Alright, could you post a v2 with the inline functions change, without >>> the test and with a proper commit message? >>> >>> Thanks for you patience and for the work on moving this forward. >>> >> >> Here is the the patch with all the comments and suggestions. > > I am excited that this kind of work is going forward. > > The next step here is that we need to work out the copyright assignment > for this work so we can accept the patches. > > The contribution checklist explains "FSF copyright Assignment": > https://sourceware.org/glibc/wiki/Contribution%20checklist#FSF_copyright_Assignment > > I would suggest a futures assignment: > http://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-assign.future > > This would allow the community to review and accept any of your > patches that you post > > If you're going through the process I usually recommend getting > assignments in place for glibc, gcc, gdb, and binutils, because > all 4 projects share headers, that way you can post patches > across all of them. However, this is just a recommendation. > > If your educational institution claims copyright on work you > create you may need to tell the FSF that and they in turn may > need to sign a corporate disclaimer. > > This process is usually quite fast and smooth. > > Your last patches in 2013 were < 15 lines, and so were accepted > under the flexible rules regarding legally significant > contributions. > > Thank you again for all of this work! :-) > > Cheers, > Carlos. >
diff --git a/malloc/Makefile b/malloc/Makefile index 984045b5b9..dc845b1c24 100644 --- a/malloc/Makefile +++ b/malloc/Makefile @@ -39,6 +39,8 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \ tst-malloc-too-large \ tst-malloc-stats-cancellation \ tst-tcfree1 tst-tcfree2 tst-tcfree3 \ + tst-arena-aslr \ + tests-static := \ tst-interpose-static-nothread \ @@ -99,6 +101,7 @@ $(objpfx)tst-malloc-thread-exit: $(shared-thread-library) $(objpfx)tst-malloc-thread-fail: $(shared-thread-library) $(objpfx)tst-malloc-fork-deadlock: $(shared-thread-library) $(objpfx)tst-malloc-stats-cancellation: $(shared-thread-library) +$(objpfx)tst-arena-aslr: $(shared-thread-library) # Export the __malloc_initialize_hook variable to libc.so. LDFLAGS-tst-mallocstate = -rdynamic diff --git a/malloc/arena.c b/malloc/arena.c index cecdb7f4c4..416559e269 100644 --- a/malloc/arena.c +++ b/malloc/arena.c @@ -122,10 +122,15 @@ int __malloc_initialized = -1; ptr = arena_get2 ((size), NULL); \ } while (0) -/* find the heap and corresponding arena for a given ptr */ - +/* find the heap and corresponding arena for a given ptr. Note that + heap_info is not HEAP_MAX_SIZE aligned any more. But a random + offset from the expected alignment, known by the process. This way + it is fast to get the head of the area and be ASLR friendly. +*/ +static unsigned long __arena_rnd; #define heap_for_ptr(ptr) \ - ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1))) + ((heap_info *) ((((unsigned long) ptr-__arena_rnd) & ~(HEAP_MAX_SIZE - 1)) \ + | __arena_rnd)) #define arena_for_chunk(ptr) \ (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr) @@ -293,6 +298,11 @@ ptmalloc_init (void) __malloc_initialized = 0; + size_t pagesize = GLRO (dl_pagesize); + /* Get the entropy from the already existing ASLR. */ + __arena_rnd = ((unsigned long)&__arena_rnd) & (HEAP_MAX_SIZE-1) & + ~(pagesize-1); + #ifdef SHARED /* In case this libc copy is in a non-default namespace, never use brk. Likewise if dlopened from statically linked program. */ @@ -490,6 +500,11 @@ new_heap (size_t size, size_t top_pad) { p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1)) & ~(HEAP_MAX_SIZE - 1)); + /* The heap_info is at a random offset from the alignment to + HEAP_MAX_SIZE. */ + p2 = (char *) ((unsigned long) p2 | __arena_rnd); + if (p1 + HEAP_MAX_SIZE <= p2) + p2 -= HEAP_MAX_SIZE; ul = p2 - p1; if (ul) __munmap (p1, ul); @@ -500,12 +515,12 @@ new_heap (size_t size, size_t top_pad) else { /* Try to take the chance that an allocation of only HEAP_MAX_SIZE - is already aligned. */ + is already aligned to __arena_rnd. */ p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE); if (p2 == MAP_FAILED) return 0; - if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) + if (((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) != __arena_rnd) { __munmap (p2, HEAP_MAX_SIZE); return 0; diff --git a/malloc/tst-arena-aslr.c b/malloc/tst-arena-aslr.c new file mode 100644 index 0000000000..c285111e1c --- /dev/null +++ b/malloc/tst-arena-aslr.c @@ -0,0 +1,213 @@ +/* Bug 22853: aslr on thread's malloc weakened due to arena alignment. + https://sourceware.org/bugzilla/show_bug.cgi?id=22853 + Copyright (C) 2020 Free Software Foundation. + Copying and distribution of this file, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. This file is offered as-is, + without any warranty. */ + +#include <assert.h> +#include <errno.h> +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <libc-diag.h> +#include <pthread.h> +#include <string.h> +#include <sys/mman.h> + +/* Bits that are considered to be acceptable: */ +#define TOLERANCE 1 +/* Samples to be run but statisticaly accurate: */ +#define NR_SAMPLES 1000 + +/* Size to be allocated in the thread's arena pool. */ +#define MALLOC_ALLOCATED_SIZE 400 + +/* Data structure sent by the sampler to the tester. */ +struct sample_t +{ + size_t obj_mmap; + size_t obj_arena; +} sample; + +size_t pagesize; + +static void * +thread_main(void * dummy) +{ + sample.obj_arena = (size_t) malloc (MALLOC_ALLOCATED_SIZE); + pthread_exit(0); +} + +/* Creates a thread to sample malloc and mmap addresses, and writes + the result to stdout. */ +static void +sampler(void) +{ + int ret; + pthread_t th; + pthread_attr_t attrs; + + ret = pthread_attr_init (&attrs); + if (0 != ret) + { + perror ("pthread_attr_init"); + abort (); + } + sample.obj_mmap = (size_t)mmap( (void *)0x0, pagesize, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1 ,0); + if (!sample.obj_mmap) + { + perror ("mmap"); + abort (); + } + if (pthread_create (&th, &attrs, thread_main, NULL)) + { + perror ("pthread_create"); + abort (); + } + if (pthread_join (th, NULL)) + { + perror ("pthread_join"); + abort (); + } + if (sizeof (sample) != write (1, &sample, sizeof (sample))) + { + perror ("write sampler"); + abort (); + } + exit(0); +} + + +struct statitics_t +{ + size_t min, max, and, or; +}; + +static void +stats_init (struct statitics_t *st) +{ + st->min = st->and = ~0; + st->max = st->or = 0; +} + +static void +stats_account (struct statitics_t *st, size_t value) +{ + st->min = value < st->min ? value : st->min; + st->max = value > st->max ? value : st->max; + st->and &= value; + st->or |= value; +} + +/* Flipping bits, this is a "pourman" entroy estimator. There are much + better and accurate entropy estimators, but they need more samples + and mote time... which do not worth it. */ +static int +stats_pmentropy (struct statitics_t *st) +{ + + return __builtin_popcountll(st->and ^ st->or); +} + +#if 0 +/* Debugging the test. Not needed for production testing */ +void +stats_print (struct statitics_t *st) +{ + printf ("[%18zx %18zx] %18zx %18zx %18zx %18zx %d\n", st->min, st->max, + (st->max >> 2) + (st->min >> 2), + st->and, + st->or, + st->and ^ st->or, + stats_pmentropy(st)); +} +#endif + +static int +do_test (int argc, char **argv) +{ + pid_t child; + int tube[2]; + int wstatus; + int i, mmap_entropy, arena_entropy; + struct sample_t sample; + struct statitics_t stats_mmap, stats_arena; + + pagesize = sysconf (_SC_PAGESIZE); + + if ( (2 == argc) && !strcmp (argv[1],"sample")) + { + sampler(); /* The role of this process is to sample one address. */ + exit(0); + } + + /* The role of this part is to launch NR_SAMPLES samplers and + collect the output. */ + assert (pipe(tube) == 0); + + stats_init (&stats_mmap); + stats_init (&stats_arena); + + /* Collect all the samples. */ + for (i = 0; i < NR_SAMPLES ; i++) + { + child = fork (); + if (child < 0) + { + perror ("fork"); + exit (1); + } + if (0 == child) + { + assert (dup2(tube[1],1) != -1); + execl (argv[0], argv[0], "sample", NULL); + perror ("execl"); + abort (); + } + if (TEMP_FAILURE_RETRY (read(tube[0], &sample, sizeof (sample))) + != sizeof (sample)) + { + printf ("Error: can not read from child process\n"); + abort (); + } + if (TEMP_FAILURE_RETRY (waitpid (child, &wstatus, 0) != child) || + (wstatus != 0)) + { + printf ("Error: waitpid for child process\n"); + abort (); + } + stats_account (&stats_mmap, sample.obj_mmap); + stats_account (&stats_arena, sample.obj_arena); + } + + /* Compute the actual entropy and analyze the results. */ + mmap_entropy = stats_pmentropy (&stats_mmap); + arena_entropy = stats_pmentropy (&stats_arena); + + if ((arena_entropy + TOLERANCE) < mmap_entropy) + { + printf ("FAIL: malloc arena ASLR.\n"); + printf (" Expected value: %d random bits\n", mmap_entropy); + printf (" But got: %d random bits\n", arena_entropy); + + /* stats_print(&stats_mmap); + stats_print(&stats_arena); */ + return 1; + } + printf ("PASS: malloc arena ASLR.\n"); + return 0; +} + +#define TEST_FUNCTON_ARGV do_test +#include "../test-skeleton.c" + +/* Local Variables: */ +/* c-file-style: "gnu" */ +/* c-basic-offset: 2 */ +/* End: */