malloc: check tcache mem size in tcache_get_n to avoid arbitrary mem allocation
Checks
| Context |
Check |
Description |
| redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
| redhat-pt-bot/TryBot-32bit |
fail
|
Patch caused testsuite regressions
|
| linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Test passed
|
| linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Test passed
|
Commit Message
Hi,
> What do you mean with "arb alloc"? Never heard of the term...
`arb alloc` is abbreviation of `arbitrary alloc`.
> If the malloc succeeds despite corruption, it will likely fail if that block is freed
> again, or on the next malloc that uses the now corrupted freelist. Adding free(ptr[0])
> at the end of your test causes "double free or corruption".
`ptr[0]` is the address of `ptr[2]` at the end of the
test.
At this time, attackers already got an arbitrary mem alloc, we shouldn't
leave it to `free` or another `malloc`.
> You can statically link with GLIBC, but that's not needed here - "make bench" will generate
> benchmarks which will always dynamically link with your local GLIBC build.
Is that true?
Because on my machine ldd shows it will use my system
library as default, while the GLIBC build libc.so.6 is in
`/home/dbgbgtf/Main/work/build/libc.so.6`
```
❯ ldd benchtests/bench-malloc-simple
linux-vdso.so.1 (0x000077ae74ebf000)
libc.so.6 => /usr/lib/libc.so.6 (0x000077ae74c98000)
/usr/lib/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x000077ae74ec1000)
```
But anyway, `make bench` will use the correct library to run benchtests.
If you're using `make bench`, it's fine.
> There is a 3.8% difference between those runs. Which one is old, which one is new?
You can tell the old and new ones from the directory, `mybuild` was
built with my patch, while `build` was built with the `origin/master`.
So you can see my patch is faster, which make me confused.
(maybe because I remove the fastbin checks in _int_malloc?)
> Note those results seem slow given the i7-13700H is quite new. Are you running on
> the fast cores? Does the generated code look well optimized?
Those results came from my laptop, so it could be slower than desktop.
I didn't specific the tests to use fast cores. But the code is well optimized.
So I rerun the benchtests like `taskset -c 0 make bench BENCHSET="malloc-simple"`
The result was not different compare to the result in my last email.
Also I rerun `make bench BENCHSET="malloc-thread"`
This is `origin/master`
```
❯ cat benchtests/bench-malloc-thread-32.out
{
"timing_type": "hp_timing",
"functions": {
"malloc": {
"": {
"duration": 9.33561e+11,
"iterations": 1.10199e+10,
"time_per_iteration": 84.7163,
"max_rss": 7284,
"threads": 32,
"min_size": 4,
"max_size": 32768,
"random_seed": 88
}
}
}
}%
```
This is `origin/master` with my patch
```
❯ cat benchtests/bench-malloc-thread-32.out
{
"timing_type": "hp_timing",
"functions": {
"malloc": {
"": {
"duration": 9.33958e+11,
"iterations": 1.07042e+10,
"time_per_iteration": 87.2519,
"max_rss": 6820,
"threads": 32,
"min_size": 4,
"max_size": 32768,
"random_seed": 88
}
}
}
}%
```
Thanks,
dbgbgtf
Signed-off-by: dbgbgtf <dudududuMaxVer@gmail.com>
---
malloc/malloc.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
Comments
Hi,
Sorry for the delay - there has been a lot of malloc activity recently...
>> If the malloc succeeds despite corruption, it will likely fail if that block is freed
>> again, or on the next malloc that uses the now corrupted freelist. Adding free(ptr[0])
>> at the end of your test causes "double free or corruption".
>
> `ptr[0]` is the address of `ptr[2]` at the end of the test.
> At this time, attackers already got an arbitrary mem alloc, we shouldn't
> leave it to `free` or another `malloc`.
On AArch64 the alignment check does usually trigger on that final malloc in the test
without your patch, so it is capable of detecting this particular corruption. That seems
to be an issue with the testcase, however it's true the existing checks aren't consistent.
One thing that looks wrong in the existing checks is that we don't even try to maintain
simple invariants. Currently tcache_put doesn't always guarantee the chunk is aligned,
but we could trivially ensure that by moving the alignment check inside it. After that
the alignment check in tcache_get_n becomes redundant, and instead we could check
the *next* pointer is aligned and that way maintain the invariant as well as detect heap
corruption earlier.
Changing tcache_get_n to check the next pointer ensures your testcase passes without
needing additional checks. I believe that this is the best way forward - rather than just
adding random checks on some paths, we should be maintaining clear invariants on
all paths.
Additionally it would be great to have a generic testcase as part of the patch - to make
it work in GLIBC, you'll need to ensure it works with the default -O2 flag (which means
adding volatile in a few places). Also the ptr array needs to be suitably aligned so that
&ptr[2] has MALLOC_ALIGNMENT as otherwise existing checks trigger. And the test
may need to be excluded from the check and mcheck cases.
>> You can statically link with GLIBC, but that's not needed here - "make bench" will generate
>> benchmarks which will always dynamically link with your local GLIBC build.
>
> Is that true?
Yes, it overrides the default dynamic linker and thus always links with the latest GLIBC.
Cheers,
Wilco
@@ -3186,6 +3186,8 @@ tcache_get_n (size_t tc_idx, tcache_entry **ep)
if (__glibc_unlikely (!aligned_OK (e)))
malloc_printerr ("malloc(): unaligned tcache chunk detected");
+ if (__glibc_unlikely (tc_idx != csize2tidx(chunksize_nomask (mem2chunk(e)))))
+ malloc_printerr ("malloc(): tcache mem size vs request size");
if (ep == &(tcache->entries[tc_idx]))
*ep = REVEAL_PTR (e->next);
@@ -4009,11 +4011,6 @@ _int_malloc (mstate av, size_t bytes)
while (tcache->counts[tc_idx] < mp_.tcache_count
&& (tc_victim = *fb) != NULL)
{
- if (__glibc_unlikely (misaligned_chunk (tc_victim)))
- malloc_printerr ("malloc(): unaligned fastbin chunk detected 3");
- size_t victim_tc_idx = csize2tidx (chunksize (tc_victim));
- if (__glibc_unlikely (tc_idx != victim_tc_idx))
- malloc_printerr ("malloc(): chunk size mismatch in fastbin");
if (SINGLE_THREAD_P)
*fb = REVEAL_PTR (tc_victim->fd);
else