malloc: check tcache mem size in tcache_get_n to avoid arbitrary mem allocation

Message ID 20250409082617.112696-1-dudududuMaxVer@gmail.com (mailing list archive)
State Under Review
Delegated to: Wilco Dijkstra
Headers
Series malloc: check tcache mem size in tcache_get_n to avoid arbitrary mem allocation |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit fail Patch caused testsuite regressions
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed

Commit Message

dbgbgtf April 9, 2025, 8:26 a.m. UTC
  Hi,

> What do you mean with "arb alloc"? Never heard of the term...

`arb alloc` is abbreviation of `arbitrary alloc`.

> If the malloc succeeds despite corruption, it will likely fail if that block is freed
> again, or on the next malloc that uses the now corrupted freelist. Adding free(ptr[0])
> at the end of your test causes "double free or corruption".

`ptr[0]` is the address of `ptr[2]` at the end of the
test.
At this time, attackers already got an arbitrary mem alloc, we shouldn't
leave it to `free` or another `malloc`.

> You can statically link with GLIBC, but that's not needed here - "make bench" will generate
> benchmarks which will always dynamically link with your local GLIBC build.

Is that true?
Because on my machine ldd shows it will use my system
library as default, while the GLIBC build libc.so.6 is in
`/home/dbgbgtf/Main/work/build/libc.so.6`

```
❯ ldd benchtests/bench-malloc-simple
	linux-vdso.so.1 (0x000077ae74ebf000)
	libc.so.6 => /usr/lib/libc.so.6 (0x000077ae74c98000)
	/usr/lib/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x000077ae74ec1000)
```
But anyway, `make bench` will use the correct library to run benchtests.
If you're using `make bench`, it's fine.

> There is a 3.8% difference between those runs. Which one is old, which one is new?

You can tell the old and new ones from the directory, `mybuild` was
built with my patch, while `build` was built with the `origin/master`.

So you can see my patch is faster, which make me confused.
(maybe because I remove the fastbin checks in _int_malloc?)

> Note those results seem slow given the i7-13700H is quite new. Are you running on
> the fast cores? Does the generated code look well optimized?

Those results came from my laptop, so it could be slower than desktop.
I didn't specific the tests to use fast cores. But the code is well optimized.

So I rerun the benchtests like `taskset -c 0 make bench BENCHSET="malloc-simple"`
The result was not different compare to the result in my last email.

Also I rerun `make bench BENCHSET="malloc-thread"`

This is `origin/master`
```
❯ cat benchtests/bench-malloc-thread-32.out
{
 "timing_type": "hp_timing",
 "functions": {
  "malloc": {
   "": {
    "duration": 9.33561e+11,
    "iterations": 1.10199e+10,
    "time_per_iteration": 84.7163,
    "max_rss": 7284,
    "threads": 32,
    "min_size": 4,
    "max_size": 32768,
    "random_seed": 88
   }
  }
 }
}%
```

This is `origin/master` with my patch
```
❯ cat benchtests/bench-malloc-thread-32.out
{
 "timing_type": "hp_timing",
 "functions": {
  "malloc": {
   "": {
    "duration": 9.33958e+11,
    "iterations": 1.07042e+10,
    "time_per_iteration": 87.2519,
    "max_rss": 6820,
    "threads": 32,
    "min_size": 4,
    "max_size": 32768,
    "random_seed": 88
   }
  }
 }
}%
```

Thanks,
dbgbgtf

Signed-off-by: dbgbgtf <dudududuMaxVer@gmail.com>
---
 malloc/malloc.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)
  

Comments

Wilco Dijkstra April 30, 2025, 3:33 p.m. UTC | #1
Hi,

Sorry for the delay - there has been a lot of malloc activity recently...

>> If the malloc succeeds despite corruption, it will likely fail if that block is freed
>> again, or on the next malloc that uses the now corrupted freelist. Adding free(ptr[0])
>> at the end of your test causes "double free or corruption".
>
> `ptr[0]` is the address of `ptr[2]` at the end of the test.
> At this time, attackers already got an arbitrary mem alloc, we shouldn't
> leave it to `free` or another `malloc`.

On AArch64 the alignment check does usually trigger on that final malloc in the test
without your patch, so it is capable of detecting this particular corruption. That seems
to be an issue with the testcase, however it's true the existing checks aren't consistent.

One thing that looks wrong in the existing checks is that we don't even try to maintain
simple invariants. Currently tcache_put doesn't always guarantee the chunk is aligned,
but we could trivially ensure that by moving the alignment check inside it. After that
the alignment check in tcache_get_n becomes redundant, and instead we could check
the *next* pointer is aligned and that way maintain the invariant as well as detect heap
corruption earlier.

Changing tcache_get_n to check the next pointer ensures your testcase passes without
needing additional checks. I believe that this is the best way forward - rather than just
adding random checks on some paths, we should be maintaining clear invariants on
all paths.

Additionally it would be great to have a generic testcase as part of the patch - to make
it work in GLIBC, you'll need to ensure it works with the default -O2 flag (which means
adding volatile in a few places). Also the ptr array needs to be suitably aligned so that
&ptr[2] has MALLOC_ALIGNMENT as otherwise existing checks trigger. And the test
may need to be excluded from the check and mcheck cases.

>> You can statically link with GLIBC, but that's not needed here - "make bench" will generate
>> benchmarks which will always dynamically link with your local GLIBC build.
>
> Is that true?

Yes, it overrides the default dynamic linker and thus always links with the latest GLIBC.

Cheers,
Wilco
  

Patch

diff --git a/malloc/malloc.c b/malloc/malloc.c
index a0bc733482..74f9668d3b 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3186,6 +3186,8 @@  tcache_get_n (size_t tc_idx, tcache_entry **ep)
 
   if (__glibc_unlikely (!aligned_OK (e)))
     malloc_printerr ("malloc(): unaligned tcache chunk detected");
+  if (__glibc_unlikely (tc_idx != csize2tidx(chunksize_nomask (mem2chunk(e)))))
+    malloc_printerr ("malloc(): tcache mem size vs request size");
 
   if (ep == &(tcache->entries[tc_idx]))
       *ep = REVEAL_PTR (e->next);
@@ -4009,11 +4011,6 @@  _int_malloc (mstate av, size_t bytes)
 		  while (tcache->counts[tc_idx] < mp_.tcache_count
 			 && (tc_victim = *fb) != NULL)
 		    {
-		      if (__glibc_unlikely (misaligned_chunk (tc_victim)))
-			malloc_printerr ("malloc(): unaligned fastbin chunk detected 3");
-		      size_t victim_tc_idx = csize2tidx (chunksize (tc_victim));
-		      if (__glibc_unlikely (tc_idx != victim_tc_idx))
-			malloc_printerr ("malloc(): chunk size mismatch in fastbin");
 		      if (SINGLE_THREAD_P)
 			*fb = REVEAL_PTR (tc_victim->fd);
 		      else