qsort: Fix a typo causing unnecessary malloc/free

Message ID 20240122193150.2731985-1-xry111@xry111.site
State Superseded
Headers
Series qsort: Fix a typo causing unnecessary malloc/free |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Testing passed

Commit Message

Xi Ruoyao Jan. 22, 2024, 7:30 p.m. UTC
  In qsort_r we allocate a buffer sized QSORT_STACK_SIZE (1024) on stack
and we intend to use it if all elements can fit into it.  But there is a
typo:

    if (total_size < sizeof buf)
      buf = tmp;
    else
      /* allocate a buffer on heap and use it ... */

Here "buf" is a pointer, thus sizeof buf is just 4 or 8, instead of
1024.

This bug is detected debugging some strange heap corruption running the
Ruby-3.3.0 test suite (on an experimental Linux From Scratch build using
Binutils-2.41.90 and Glibc trunk, and also Fedora Rawhide [1]).  It
seems Ruby is doing some wild "optimization" by jumping into somewhere
in qsort_r instead of calling it normally, resulting in a double free of
buf if we allocate it on heap.  The issue can be reproduced
deterministically with:

    LD_PRELOAD=/usr/lib/libc_malloc_debug.so MALLOC_CHECK_=3 \
    LD_LIBRARY_PATH=. ./ruby test/runner.rb test/ruby/test_enum.rb

in Ruby-3.3.0 tree after building it.  This change would hide the issue
for Ruby, but Ruby is likely still buggy (if using this "optimization"
sorting larger arrays).

[1]:https://kojipkgs.fedoraproject.org//work/tasks/9729/111889729/build.log

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
---
 stdlib/qsort.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

H.J. Lu Jan. 22, 2024, 7:46 p.m. UTC | #1
On Mon, Jan 22, 2024 at 11:32 AM Xi Ruoyao <xry111@xry111.site> wrote:
>
> In qsort_r we allocate a buffer sized QSORT_STACK_SIZE (1024) on stack
> and we intend to use it if all elements can fit into it.  But there is a
> typo:
>
>     if (total_size < sizeof buf)
>       buf = tmp;
>     else
>       /* allocate a buffer on heap and use it ... */
>
> Here "buf" is a pointer, thus sizeof buf is just 4 or 8, instead of
> 1024.

This sounds like a real bug.  A glibc bug report is needed to track it.

> This bug is detected debugging some strange heap corruption running the
> Ruby-3.3.0 test suite (on an experimental Linux From Scratch build using
> Binutils-2.41.90 and Glibc trunk, and also Fedora Rawhide [1]).  It
> seems Ruby is doing some wild "optimization" by jumping into somewhere
> in qsort_r instead of calling it normally, resulting in a double free of
> buf if we allocate it on heap.  The issue can be reproduced
> deterministically with:
>
>     LD_PRELOAD=/usr/lib/libc_malloc_debug.so MALLOC_CHECK_=3 \
>     LD_LIBRARY_PATH=. ./ruby test/runner.rb test/ruby/test_enum.rb
>
> in Ruby-3.3.0 tree after building it.  This change would hide the issue
> for Ruby, but Ruby is likely still buggy (if using this "optimization"
> sorting larger arrays).
>
> [1]:https://kojipkgs.fedoraproject.org//work/tasks/9729/111889729/build.log
>
> Signed-off-by: Xi Ruoyao <xry111@xry111.site>
> ---
>  stdlib/qsort.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/stdlib/qsort.c b/stdlib/qsort.c
> index 7f5a00fb33..1c1505e7b2 100644
> --- a/stdlib/qsort.c
> +++ b/stdlib/qsort.c
> @@ -353,7 +353,7 @@ __qsort_r (void *const pbase, size_t total_elems, size_t size,
>    if (size > INDIRECT_SORT_SIZE_THRES)
>      total_size = 2 * total_elems * sizeof (void *) + size;
>
> -  if (total_size < sizeof buf)
> +  if (total_size < sizeof tmp)
>      buf = tmp;
>    else
>      {
> --
> 2.43.0
>
  
Xi Ruoyao Jan. 22, 2024, 7:54 p.m. UTC | #2
On Mon, 2024-01-22 at 11:46 -0800, H.J. Lu wrote:
> On Mon, Jan 22, 2024 at 11:32 AM Xi Ruoyao <xry111@xry111.site> wrote:
> > 
> > In qsort_r we allocate a buffer sized QSORT_STACK_SIZE (1024) on stack
> > and we intend to use it if all elements can fit into it.  But there is a
> > typo:
> > 
> >      if (total_size < sizeof buf)
> >        buf = tmp;
> >      else
> >        /* allocate a buffer on heap and use it ... */
> > 
> > Here "buf" is a pointer, thus sizeof buf is just 4 or 8, instead of
> > 1024.
> 
> This sounds like a real bug.  A glibc bug report is needed to track it.

https://sourceware.org/bugzilla/show_bug.cgi?id=31276
  
Xi Ruoyao Jan. 22, 2024, 7:57 p.m. UTC | #3
On Tue, 2024-01-23 at 03:54 +0800, Xi Ruoyao wrote:
> On Mon, 2024-01-22 at 11:46 -0800, H.J. Lu wrote:
> > On Mon, Jan 22, 2024 at 11:32 AM Xi Ruoyao <xry111@xry111.site> wrote:
> > > 
> > > In qsort_r we allocate a buffer sized QSORT_STACK_SIZE (1024) on stack
> > > and we intend to use it if all elements can fit into it.  But there is a
> > > typo:
> > > 
> > >      if (total_size < sizeof buf)

Hmm, and it seems this should be "<=" instead of "<".  I'm preparing a
V2 with the operator changed too, and the BZ number in commit message...

> > >        buf = tmp;
> > >      else
> > >        /* allocate a buffer on heap and use it ... */
> > > 
> > > Here "buf" is a pointer, thus sizeof buf is just 4 or 8, instead of
> > > 1024.
> > 
> > This sounds like a real bug.  A glibc bug report is needed to track it.
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=31276
>
  
Florian Weimer Jan. 24, 2024, 1:45 p.m. UTC | #4
* Xi Ruoyao:

> This bug is detected debugging some strange heap corruption running the
> Ruby-3.3.0 test suite (on an experimental Linux From Scratch build using
> Binutils-2.41.90 and Glibc trunk, and also Fedora Rawhide [1]).  It
> seems Ruby is doing some wild "optimization" by jumping into somewhere
> in qsort_r instead of calling it normally, resulting in a double free of
> buf if we allocate it on heap.  The issue can be reproduced
> deterministically with:
>
>     LD_PRELOAD=/usr/lib/libc_malloc_debug.so MALLOC_CHECK_=3 \
>     LD_LIBRARY_PATH=. ./ruby test/runner.rb test/ruby/test_enum.rb
>
> in Ruby-3.3.0 tree after building it.  This change would hide the issue
> for Ruby, but Ruby is likely still buggy (if using this "optimization"
> sorting larger arrays).
>
> [1]:https://kojipkgs.fedoraproject.org//work/tasks/9729/111889729/build.log

Sorry, missed that part.  I suspect it's because the Ruby garbage
collector updates pointers in the array during invocation of the
comparison function, and we later discard those updates when we copy the
sorted runs back into the application-supplied array.  Ruby probably
scans the stack conservatively, triggering object pinning, which is
probably the reason why we don't see the issue with the on-stack copy.

The issue is reproducible with the old (glibc 2.37) qsort_r if the Ruby
test case uses a suitable array size.

We've posted our findings to the Ruby bug:

  <https://bugs.ruby-lang.org/issues/20203>

Thanks,
Florian
  
Xi Ruoyao Jan. 24, 2024, 1:49 p.m. UTC | #5
On Wed, 2024-01-24 at 14:45 +0100, Florian Weimer wrote:
> * Xi Ruoyao:
> 
> > This bug is detected debugging some strange heap corruption running the
> > Ruby-3.3.0 test suite (on an experimental Linux From Scratch build using
> > Binutils-2.41.90 and Glibc trunk, and also Fedora Rawhide [1]).  It
> > seems Ruby is doing some wild "optimization" by jumping into somewhere
> > in qsort_r instead of calling it normally, resulting in a double free of
> > buf if we allocate it on heap.  The issue can be reproduced
> > deterministically with:
> > 
> >     LD_PRELOAD=/usr/lib/libc_malloc_debug.so MALLOC_CHECK_=3 \
> >     LD_LIBRARY_PATH=. ./ruby test/runner.rb test/ruby/test_enum.rb
> > 
> > in Ruby-3.3.0 tree after building it.  This change would hide the issue
> > for Ruby, but Ruby is likely still buggy (if using this "optimization"
> > sorting larger arrays).
> > 
> > [1]:https://kojipkgs.fedoraproject.org//work/tasks/9729/111889729/build.log
> 
> Sorry, missed that part.  I suspect it's because the Ruby garbage
> collector updates pointers in the array during invocation of the
> comparison function, and we later discard those updates when we copy the
> sorted runs back into the application-supplied array.  Ruby probably
> scans the stack conservatively, triggering object pinning, which is
> probably the reason why we don't see the issue with the on-stack copy.
> 
> The issue is reproducible with the old (glibc 2.37) qsort_r if the Ruby
> test case uses a suitable array size.
> 
> We've posted our findings to the Ruby bug:
> 
>   <https://bugs.ruby-lang.org/issues/20203>

Ooooops.  We've disabled the use of qsort_r for Ruby in BLFS for now:
https://wiki.linuxfromscratch.org/blfs/changeset/77b455955c17e22d6aa544c766b30850978960ab.
 I guess their own sort should work with their own GC anyway...
  

Patch

diff --git a/stdlib/qsort.c b/stdlib/qsort.c
index 7f5a00fb33..1c1505e7b2 100644
--- a/stdlib/qsort.c
+++ b/stdlib/qsort.c
@@ -353,7 +353,7 @@  __qsort_r (void *const pbase, size_t total_elems, size_t size,
   if (size > INDIRECT_SORT_SIZE_THRES)
     total_size = 2 * total_elems * sizeof (void *) + size;
 
-  if (total_size < sizeof buf)
+  if (total_size < sizeof tmp)
     buf = tmp;
   else
     {