[v7] memalign: Support scanning for aligned chunks.

Message ID xnttxevv66.fsf@greed.delorie.com
State Committed
Commit e5524ef335dc8e28d64cc376d57c219e566fcf53
Headers
Series [v7] memalign: Support scanning for aligned chunks. |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

DJ Delorie April 18, 2023, 1:25 a.m. UTC
  "Carlos O'Donell" <carlos@redhat.com> writes:
> DJ and I sat down to review v6 in detail.
>
> We need a v7 for three reasons:
>
> (a) chunk_ok_for_memalign() is commented as taking *user* bytes but you pass
>     chunksize in this change and the comment needs adjusting. You might as
>     well change `size_t bytes` to `size_t nb` to be consistent.
>
> (b) If we are now passing chunksize then the second line of the function which
>     computes size, should use, not memsize(p), but chunksize(p), to correctly
>     account for the header bytes (otherwise we're conservative).
>
> (c) We need to exclude the test from mcheck runs via tests-exclude-mcheck since
>     the expected chunk results won't work when mcheck is in effect.
>
> 	FAIL: malloc/tst-memalign-2-mcheck
> 	FAIL: malloc/tst-memalign-3-mcheck

Adjusted as requested.  As noted previously, a large chunk of the
malloc.c diff is merely indentation change (with the exception of the
calls to chunk_ok_for_memalign).

From fee79d5ab6d385817ea88e5097254a9559c35878 Mon Sep 17 00:00:00 2001
From: DJ Delorie <dj@redhat.com>
Date: Mon, 3 Apr 2023 17:33:03 -0400
Subject: malloc: set NON_MAIN_ARENA flag for reclaimed memalign chunk (BZ #30101)

Based on these comments in malloc.c:

   size field is or'ed with NON_MAIN_ARENA if the chunk was obtained
   from a non-main arena.  This is only set immediately before handing
   the chunk to the user, if necessary.

   The NON_MAIN_ARENA flag is never set for unsorted chunks, so it
   does not have to be taken into account in size comparisons.

When we pull a chunk off the unsorted list (or any list) we need to
make sure that flag is set properly before returning the chunk.

Use the rounded-up size for chunk_ok_for_memalign()

Do not scan the arena for reusable chunks if there's no arena.

Account for chunk overhead when determining if a chunk is a reuse
candidate.

mcheck interferes with memalign, so skip mcheck variants of
memalign tests.
  

Comments

Carlos O'Donell April 18, 2023, 1:58 p.m. UTC | #1
On 4/17/23 21:25, DJ Delorie wrote:
> "Carlos O'Donell" <carlos@redhat.com> writes:
>> DJ and I sat down to review v6 in detail.
>>
>> We need a v7 for three reasons:
>>
>> (a) chunk_ok_for_memalign() is commented as taking *user* bytes but you pass
>>     chunksize in this change and the comment needs adjusting. You might as
>>     well change `size_t bytes` to `size_t nb` to be consistent.
>>
>> (b) If we are now passing chunksize then the second line of the function which
>>     computes size, should use, not memsize(p), but chunksize(p), to correctly
>>     account for the header bytes (otherwise we're conservative).
>>
>> (c) We need to exclude the test from mcheck runs via tests-exclude-mcheck since
>>     the expected chunk results won't work when mcheck is in effect.
>>
>> 	FAIL: malloc/tst-memalign-2-mcheck
>> 	FAIL: malloc/tst-memalign-3-mcheck
> 
> Adjusted as requested.  As noted previously, a large chunk of the
> malloc.c diff is merely indentation change (with the exception of the
> calls to chunk_ok_for_memalign).
> 
> From fee79d5ab6d385817ea88e5097254a9559c35878 Mon Sep 17 00:00:00 2001
> From: DJ Delorie <dj@redhat.com>
> Date: Mon, 3 Apr 2023 17:33:03 -0400
> Subject: malloc: set NON_MAIN_ARENA flag for reclaimed memalign chunk (BZ #30101)
> 
> Based on these comments in malloc.c:
> 
>    size field is or'ed with NON_MAIN_ARENA if the chunk was obtained
>    from a non-main arena.  This is only set immediately before handing
>    the chunk to the user, if necessary.
> 
>    The NON_MAIN_ARENA flag is never set for unsorted chunks, so it
>    does not have to be taken into account in size comparisons.
> 
> When we pull a chunk off the unsorted list (or any list) we need to
> make sure that flag is set properly before returning the chunk.
> 
> Use the rounded-up size for chunk_ok_for_memalign()
> 
> Do not scan the arena for reusable chunks if there's no arena.
> 
> Account for chunk overhead when determining if a chunk is a reuse
> candidate.
> 
> mcheck interferes with memalign, so skip mcheck variants of
> memalign tests.

v7 LGTM and addresses all previous issues.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>

> 
> diff --git a/malloc/Makefile b/malloc/Makefile
> index f49675845e..071dfdb9d8 100644
> --- a/malloc/Makefile
> +++ b/malloc/Makefile
> @@ -43,7 +43,8 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
>  	 tst-tcfree1 tst-tcfree2 tst-tcfree3 \
>  	 tst-safe-linking \
>  	 tst-mallocalign1 \
> -	 tst-memalign-2
> +	 tst-memalign-2 \
> +	 tst-memalign-3

OK.

>  
>  tests-static := \
>  	 tst-interpose-static-nothread \
> @@ -71,7 +72,7 @@ test-srcs = tst-mtrace
>  # with MALLOC_CHECK_=3 because they expect a specific failure.
>  tests-exclude-malloc-check = tst-malloc-check tst-malloc-usable \
>  	tst-mxfast tst-safe-linking \
> -	tst-compathooks-off tst-compathooks-on tst-memalign-2
> +	tst-compathooks-off tst-compathooks-on tst-memalign-2 tst-memalign-3

OK.

>  
>  # Run all tests with MALLOC_CHECK_=3
>  tests-malloc-check = $(filter-out $(tests-exclude-malloc-check) \
> @@ -116,6 +117,8 @@ tests-exclude-mcheck = tst-mallocstate \
>  	tst-malloc-usable-tunables \
>  	tst-malloc_info \
>  	tst-compathooks-off tst-compathooks-on \
> +	tst-memalign-2 \
> +	tst-memalign-3 \

OK.

>  	tst-mxfast
>  
>  tests-mcheck = $(filter-out $(tests-exclude-mcheck) $(tests-static), $(tests))
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 0315ac5d16..e33ed665db 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -4974,13 +4974,13 @@ _int_realloc (mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
>  
>  /* Returns 0 if the chunk is not and does not contain the requested
>     aligned sub-chunk, else returns the amount of "waste" from
> -   trimming.  BYTES is the *user* byte size, not the chunk byte
> +   trimming.  NB is the *chunk* byte size, not the user byte

OK. Corrects the comment to use chunksize as required for the back-chunk size.

>     size.  */
>  static size_t
> -chunk_ok_for_memalign (mchunkptr p, size_t alignment, size_t bytes)
> +chunk_ok_for_memalign (mchunkptr p, size_t alignment, size_t nb)
>  {
>    void *m = chunk2mem (p);
> -  INTERNAL_SIZE_T size = memsize (p);
> +  INTERNAL_SIZE_T size = chunksize (p);

OK. Use the correct tight boundary e.g. chunksize.

>    void *aligned_m = m;
>  
>    if (__glibc_unlikely (misaligned_chunk (p)))
> @@ -4997,12 +4997,12 @@ chunk_ok_for_memalign (mchunkptr p, size_t alignment, size_t bytes)
>    /* If it's a perfect fit, it's an exception to the return value rule
>       (we would return zero waste, which looks like "not usable"), so
>       handle it here by returning a small non-zero value instead.  */
> -  if (size == bytes && front_extra == 0)
> +  if (size == nb && front_extra == 0)

OK.

>      return 1;
>  
>    /* If the block we need fits in the chunk, calculate total waste.  */
> -  if (size > bytes + front_extra)
> -    return size - bytes;
> +  if (size > nb + front_extra)
> +    return size - nb;

OK.

>  
>    /* Can't use this chunk.  */
>    return 0;
> @@ -5048,95 +5048,98 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
>       and unlikely to meet our alignment requirements.  We have not done
>       any experimentation with searching for aligned fastbins.  */
>  
> -  int first_bin_index;
> -  int first_largebin_index;
> -  int last_bin_index;
> +  if (av != NULL)
> +    {
> +      int first_bin_index;
> +      int first_largebin_index;
> +      int last_bin_index;
>  
> -  if (in_smallbin_range (nb))
> -    first_bin_index = smallbin_index (nb);
> -  else
> -    first_bin_index = largebin_index (nb);
> +      if (in_smallbin_range (nb))
> +	first_bin_index = smallbin_index (nb);
> +      else
> +	first_bin_index = largebin_index (nb);
>  
> -  if (in_smallbin_range (nb * 2))
> -    last_bin_index = smallbin_index (nb * 2);
> -  else
> -    last_bin_index = largebin_index (nb * 2);
> +      if (in_smallbin_range (nb * 2))
> +	last_bin_index = smallbin_index (nb * 2);
> +      else
> +	last_bin_index = largebin_index (nb * 2);
>  
> -  first_largebin_index = largebin_index (MIN_LARGE_SIZE);
> +      first_largebin_index = largebin_index (MIN_LARGE_SIZE);
>  
> -  int victim_index;                 /* its bin index */
> +      int victim_index;                 /* its bin index */
>  
> -  for (victim_index = first_bin_index;
> -       victim_index < last_bin_index;
> -       victim_index ++)
> -    {
> -      victim = NULL;
> +      for (victim_index = first_bin_index;
> +	   victim_index < last_bin_index;
> +	   victim_index ++)
> +	{
> +	  victim = NULL;
>  
> -      if (victim_index < first_largebin_index)
> -    {
> -      /* Check small bins.  Small bin chunks are doubly-linked despite
> -	 being the same size.  */
> +	  if (victim_index < first_largebin_index)
> +	    {
> +	      /* Check small bins.  Small bin chunks are doubly-linked despite
> +		 being the same size.  */
>  
> -      mchunkptr fwd;                    /* misc temp for linking */
> -      mchunkptr bck;                    /* misc temp for linking */
> +	      mchunkptr fwd;                    /* misc temp for linking */
> +	      mchunkptr bck;                    /* misc temp for linking */
>  
> -      bck = bin_at (av, victim_index);
> -      fwd = bck->fd;
> -      while (fwd != bck)
> -	{
> -	  if (chunk_ok_for_memalign (fwd, alignment, bytes) > 0)
> -	    {
> -	      victim = fwd;
> +	      bck = bin_at (av, victim_index);
> +	      fwd = bck->fd;
> +	      while (fwd != bck)
> +		{
> +		  if (chunk_ok_for_memalign (fwd, alignment, nb) > 0)

OK. Use nb.

> +		    {
> +		      victim = fwd;
>  
> -	      /* Unlink it */
> -	      victim->fd->bk = victim->bk;
> -	      victim->bk->fd = victim->fd;
> -	      break;
> +		      /* Unlink it */
> +		      victim->fd->bk = victim->bk;
> +		      victim->bk->fd = victim->fd;
> +		      break;
> +		    }
> +
> +		  fwd = fwd->fd;
> +		}
>  	    }
> +	  else
> +	    {
> +	      /* Check large bins.  */
> +	      mchunkptr fwd;                    /* misc temp for linking */
> +	      mchunkptr bck;                    /* misc temp for linking */
> +	      mchunkptr best = NULL;
> +	      size_t best_size = 0;
>  
> -	  fwd = fwd->fd;
> -	}
> -    }
> -  else
> -    {
> -      /* Check large bins.  */
> -      mchunkptr fwd;                    /* misc temp for linking */
> -      mchunkptr bck;                    /* misc temp for linking */
> -      mchunkptr best = NULL;
> -      size_t best_size = 0;
> +	      bck = bin_at (av, victim_index);
> +	      fwd = bck->fd;
>  
> -      bck = bin_at (av, victim_index);
> -      fwd = bck->fd;
> +	      while (fwd != bck)
> +		{
> +		  int extra;
>  
> -      while (fwd != bck)
> -	{
> -	  int extra;
> +		  if (chunksize (fwd) < nb)
> +		    break;
> +		  extra = chunk_ok_for_memalign (fwd, alignment, nb);

OK. Use nb.

> +		  if (extra > 0
> +		      && (extra <= best_size || best == NULL))
> +		    {
> +		      best = fwd;
> +		      best_size = extra;
> +		    }
>  
> -	  if (chunksize (fwd) < nb)
> -	      break;
> -	  extra = chunk_ok_for_memalign (fwd, alignment, bytes);
> -	  if (extra > 0
> -	      && (extra <= best_size || best == NULL))
> -	    {
> -	      best = fwd;
> -	      best_size = extra;
> -	    }
> +		  fwd = fwd->fd;
> +		}
> +	      victim = best;
>  
> -	  fwd = fwd->fd;
> -	}
> -      victim = best;
> +	      if (victim != NULL)
> +		{
> +		  unlink_chunk (av, victim);
> +		  break;
> +		}
> +	    }
>  
> -      if (victim != NULL)
> -	{
> -	  unlink_chunk (av, victim);
> -	  break;
> +	  if (victim != NULL)
> +	    break;
>  	}
>      }
>  
> -      if (victim != NULL)
> -	break;
> -    }
> -
>    /* Strategy: find a spot within that chunk that meets the alignment
>       request, and then possibly free the leading and trailing space.
>       This strategy is incredibly costly and can lead to external
> @@ -5147,6 +5150,8 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
>        p = victim;
>        m = chunk2mem (p);
>        set_inuse (p);
> +      if (av != &main_arena)
> +	set_non_main_arena (p);

OK. Set M bit.

>      }
>    else
>      {
> diff --git a/malloc/tst-memalign-2.c b/malloc/tst-memalign-2.c
> index 4996578e9f..f229283dbf 100644
> --- a/malloc/tst-memalign-2.c
> +++ b/malloc/tst-memalign-2.c
> @@ -33,9 +33,10 @@ typedef struct TestCase {
>  } TestCase;
>  
>  static TestCase tcache_allocs[] = {
> -  { 24, 8, NULL, NULL },
> -  { 24, 16, NULL, NULL },
> -  { 128, 32, NULL, NULL }
> +  { 24, 32, NULL, NULL },
> +  { 24, 64, NULL, NULL },
> +  { 128, 128, NULL, NULL },
> +  { 500, 128, NULL, NULL }

OK. Use higher alignments.

>  };
>  #define TN array_length (tcache_allocs)
>  
> @@ -70,11 +71,15 @@ do_test (void)
>  
>    for (i = 0; i < TN; ++ i)
>      {
> +      size_t sz2;
> +
>        tcache_allocs[i].ptr1 = memalign (tcache_allocs[i].alignment, tcache_allocs[i].size);
>        CHECK (tcache_allocs[i].ptr1, tcache_allocs[i].alignment);
> +      sz2 = malloc_usable_size (tcache_allocs[i].ptr1);
>        free (tcache_allocs[i].ptr1);
> +
>        /* This should return the same chunk as was just free'd.  */
> -      tcache_allocs[i].ptr2 = memalign (tcache_allocs[i].alignment, tcache_allocs[i].size);
> +      tcache_allocs[i].ptr2 = memalign (tcache_allocs[i].alignment, sz2);
>        CHECK (tcache_allocs[i].ptr2, tcache_allocs[i].alignment);
>        free (tcache_allocs[i].ptr2);
>  
> diff --git a/malloc/tst-memalign-3.c b/malloc/tst-memalign-3.c
> new file mode 100644
> index 0000000000..ab90d6ca9b
> --- /dev/null
> +++ b/malloc/tst-memalign-3.c
> @@ -0,0 +1,173 @@
> +/* Test for memalign chunk reuse.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <errno.h>
> +#include <malloc.h>
> +#include <stdio.h>
> +#include <pthread.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <array_length.h>
> +#include <libc-pointer-arith.h>
> +#include <support/check.h>
> +#include <support/xthread.h>
> +
> +
> +typedef struct TestCase {
> +  size_t size;
> +  size_t alignment;
> +  void *ptr1;
> +  void *ptr2;
> +} TestCase;
> +
> +static TestCase tcache_allocs[] = {
> +  { 24, 32, NULL, NULL },
> +  { 24, 64, NULL, NULL },
> +  { 128, 128, NULL, NULL },
> +  { 500, 128, NULL, NULL }
> +};
> +#define TN array_length (tcache_allocs)
> +
> +static TestCase large_allocs[] = {
> +  { 23450, 64, NULL, NULL },
> +  { 23450, 64, NULL, NULL },
> +  { 23550, 64, NULL, NULL },
> +  { 23550, 64, NULL, NULL },
> +  { 23650, 64, NULL, NULL },
> +  { 23650, 64, NULL, NULL },
> +  { 33650, 64, NULL, NULL },
> +  { 33650, 64, NULL, NULL }
> +};
> +#define LN array_length (large_allocs)
> +
> +void *p;
> +
> +/* Sanity checks, ancillary to the actual test.  */
> +#define CHECK(p,a) \
> +  if (p == NULL || !PTR_IS_ALIGNED (p, a)) \
> +    FAIL_EXIT1 ("NULL or misaligned memory detected.\n");
> +
> +static void *
> +mem_test (void *closure)
> +{
> +  int i;
> +  int j;
> +  int count;
> +  void *ptr[10];
> +  void *p;
> +
> +  /* TCache test.  */
> +  for (i = 0; i < TN; ++ i)
> +    {
> +      size_t sz2;
> +
> +      tcache_allocs[i].ptr1 = memalign (tcache_allocs[i].alignment, tcache_allocs[i].size);
> +      CHECK (tcache_allocs[i].ptr1, tcache_allocs[i].alignment);
> +      sz2 = malloc_usable_size (tcache_allocs[i].ptr1);
> +      free (tcache_allocs[i].ptr1);
> +
> +      /* This should return the same chunk as was just free'd.  */
> +      tcache_allocs[i].ptr2 = memalign (tcache_allocs[i].alignment, sz2);
> +      CHECK (tcache_allocs[i].ptr2, tcache_allocs[i].alignment);
> +      free (tcache_allocs[i].ptr2);
> +
> +      TEST_VERIFY (tcache_allocs[i].ptr1 == tcache_allocs[i].ptr2);
> +    }
> +
> +  /* Test for non-head tcache hits.  */
> +  for (i = 0; i < array_length (ptr); ++ i)
> +    {
> +      if (i == 4)
> +	{
> +	  ptr[i] = memalign (64, 256);
> +	  CHECK (ptr[i], 64);
> +	}
> +      else
> +	{
> +	  ptr[i] = malloc (256);
> +	  CHECK (ptr[i], 4);
> +	}
> +    }
> +  for (i = 0; i < array_length (ptr); ++ i)
> +    free (ptr[i]);
> +
> +  p = memalign (64, 256);
> +  CHECK (p, 64);
> +
> +  count = 0;
> +  for (i = 0; i < 10; ++ i)
> +    if (ptr[i] == p)
> +      ++ count;
> +  free (p);
> +  TEST_VERIFY (count > 0);
> +
> +  /* Large bins test.  */
> +
> +  for (i = 0; i < LN; ++ i)
> +    {
> +      large_allocs[i].ptr1 = memalign (large_allocs[i].alignment, large_allocs[i].size);
> +      CHECK (large_allocs[i].ptr1, large_allocs[i].alignment);
> +      /* Keep chunks from combining by fragmenting the heap.  */
> +      p = malloc (512);
> +      CHECK (p, 4);
> +    }
> +
> +  for (i = 0; i < LN; ++ i)
> +    free (large_allocs[i].ptr1);
> +
> +  /* Force the unsorted bins to be scanned and moved to small/large
> +     bins.  */
> +  p = malloc (60000);
> +
> +  for (i = 0; i < LN; ++ i)
> +    {
> +      large_allocs[i].ptr2 = memalign (large_allocs[i].alignment, large_allocs[i].size);
> +      CHECK (large_allocs[i].ptr2, large_allocs[i].alignment);
> +    }
> +
> +  count = 0;
> +  for (i = 0; i < LN; ++ i)
> +    {
> +      int ok = 0;
> +      for (j = 0; j < LN; ++ j)
> +	if (large_allocs[i].ptr1 == large_allocs[j].ptr2)
> +	  ok = 1;
> +      if (ok == 1)
> +	count ++;
> +    }
> +
> +  /* The allocation algorithm is complicated outside of the memalign
> +     logic, so just make sure it's working for most of the
> +     allocations.  This avoids possible boundary conditions with
> +     empty/full heaps.  */
> +  TEST_VERIFY (count > LN / 2);
> +
> +  return 0;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  pthread_t p;
> +
> +  p = xpthread_create (NULL, mem_test, NULL);
> +  xpthread_join (p);
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>
>
  
DJ Delorie April 18, 2023, 3:02 p.m. UTC | #2
Thanks!  Pushed.  WHEW.
  

Patch

diff --git a/malloc/Makefile b/malloc/Makefile
index f49675845e..071dfdb9d8 100644
--- a/malloc/Makefile
+++ b/malloc/Makefile
@@ -43,7 +43,8 @@  tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
 	 tst-tcfree1 tst-tcfree2 tst-tcfree3 \
 	 tst-safe-linking \
 	 tst-mallocalign1 \
-	 tst-memalign-2
+	 tst-memalign-2 \
+	 tst-memalign-3
 
 tests-static := \
 	 tst-interpose-static-nothread \
@@ -71,7 +72,7 @@  test-srcs = tst-mtrace
 # with MALLOC_CHECK_=3 because they expect a specific failure.
 tests-exclude-malloc-check = tst-malloc-check tst-malloc-usable \
 	tst-mxfast tst-safe-linking \
-	tst-compathooks-off tst-compathooks-on tst-memalign-2
+	tst-compathooks-off tst-compathooks-on tst-memalign-2 tst-memalign-3
 
 # Run all tests with MALLOC_CHECK_=3
 tests-malloc-check = $(filter-out $(tests-exclude-malloc-check) \
@@ -116,6 +117,8 @@  tests-exclude-mcheck = tst-mallocstate \
 	tst-malloc-usable-tunables \
 	tst-malloc_info \
 	tst-compathooks-off tst-compathooks-on \
+	tst-memalign-2 \
+	tst-memalign-3 \
 	tst-mxfast
 
 tests-mcheck = $(filter-out $(tests-exclude-mcheck) $(tests-static), $(tests))
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 0315ac5d16..e33ed665db 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -4974,13 +4974,13 @@  _int_realloc (mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
 
 /* Returns 0 if the chunk is not and does not contain the requested
    aligned sub-chunk, else returns the amount of "waste" from
-   trimming.  BYTES is the *user* byte size, not the chunk byte
+   trimming.  NB is the *chunk* byte size, not the user byte
    size.  */
 static size_t
-chunk_ok_for_memalign (mchunkptr p, size_t alignment, size_t bytes)
+chunk_ok_for_memalign (mchunkptr p, size_t alignment, size_t nb)
 {
   void *m = chunk2mem (p);
-  INTERNAL_SIZE_T size = memsize (p);
+  INTERNAL_SIZE_T size = chunksize (p);
   void *aligned_m = m;
 
   if (__glibc_unlikely (misaligned_chunk (p)))
@@ -4997,12 +4997,12 @@  chunk_ok_for_memalign (mchunkptr p, size_t alignment, size_t bytes)
   /* If it's a perfect fit, it's an exception to the return value rule
      (we would return zero waste, which looks like "not usable"), so
      handle it here by returning a small non-zero value instead.  */
-  if (size == bytes && front_extra == 0)
+  if (size == nb && front_extra == 0)
     return 1;
 
   /* If the block we need fits in the chunk, calculate total waste.  */
-  if (size > bytes + front_extra)
-    return size - bytes;
+  if (size > nb + front_extra)
+    return size - nb;
 
   /* Can't use this chunk.  */
   return 0;
@@ -5048,95 +5048,98 @@  _int_memalign (mstate av, size_t alignment, size_t bytes)
      and unlikely to meet our alignment requirements.  We have not done
      any experimentation with searching for aligned fastbins.  */
 
-  int first_bin_index;
-  int first_largebin_index;
-  int last_bin_index;
+  if (av != NULL)
+    {
+      int first_bin_index;
+      int first_largebin_index;
+      int last_bin_index;
 
-  if (in_smallbin_range (nb))
-    first_bin_index = smallbin_index (nb);
-  else
-    first_bin_index = largebin_index (nb);
+      if (in_smallbin_range (nb))
+	first_bin_index = smallbin_index (nb);
+      else
+	first_bin_index = largebin_index (nb);
 
-  if (in_smallbin_range (nb * 2))
-    last_bin_index = smallbin_index (nb * 2);
-  else
-    last_bin_index = largebin_index (nb * 2);
+      if (in_smallbin_range (nb * 2))
+	last_bin_index = smallbin_index (nb * 2);
+      else
+	last_bin_index = largebin_index (nb * 2);
 
-  first_largebin_index = largebin_index (MIN_LARGE_SIZE);
+      first_largebin_index = largebin_index (MIN_LARGE_SIZE);
 
-  int victim_index;                 /* its bin index */
+      int victim_index;                 /* its bin index */
 
-  for (victim_index = first_bin_index;
-       victim_index < last_bin_index;
-       victim_index ++)
-    {
-      victim = NULL;
+      for (victim_index = first_bin_index;
+	   victim_index < last_bin_index;
+	   victim_index ++)
+	{
+	  victim = NULL;
 
-      if (victim_index < first_largebin_index)
-    {
-      /* Check small bins.  Small bin chunks are doubly-linked despite
-	 being the same size.  */
+	  if (victim_index < first_largebin_index)
+	    {
+	      /* Check small bins.  Small bin chunks are doubly-linked despite
+		 being the same size.  */
 
-      mchunkptr fwd;                    /* misc temp for linking */
-      mchunkptr bck;                    /* misc temp for linking */
+	      mchunkptr fwd;                    /* misc temp for linking */
+	      mchunkptr bck;                    /* misc temp for linking */
 
-      bck = bin_at (av, victim_index);
-      fwd = bck->fd;
-      while (fwd != bck)
-	{
-	  if (chunk_ok_for_memalign (fwd, alignment, bytes) > 0)
-	    {
-	      victim = fwd;
+	      bck = bin_at (av, victim_index);
+	      fwd = bck->fd;
+	      while (fwd != bck)
+		{
+		  if (chunk_ok_for_memalign (fwd, alignment, nb) > 0)
+		    {
+		      victim = fwd;
 
-	      /* Unlink it */
-	      victim->fd->bk = victim->bk;
-	      victim->bk->fd = victim->fd;
-	      break;
+		      /* Unlink it */
+		      victim->fd->bk = victim->bk;
+		      victim->bk->fd = victim->fd;
+		      break;
+		    }
+
+		  fwd = fwd->fd;
+		}
 	    }
+	  else
+	    {
+	      /* Check large bins.  */
+	      mchunkptr fwd;                    /* misc temp for linking */
+	      mchunkptr bck;                    /* misc temp for linking */
+	      mchunkptr best = NULL;
+	      size_t best_size = 0;
 
-	  fwd = fwd->fd;
-	}
-    }
-  else
-    {
-      /* Check large bins.  */
-      mchunkptr fwd;                    /* misc temp for linking */
-      mchunkptr bck;                    /* misc temp for linking */
-      mchunkptr best = NULL;
-      size_t best_size = 0;
+	      bck = bin_at (av, victim_index);
+	      fwd = bck->fd;
 
-      bck = bin_at (av, victim_index);
-      fwd = bck->fd;
+	      while (fwd != bck)
+		{
+		  int extra;
 
-      while (fwd != bck)
-	{
-	  int extra;
+		  if (chunksize (fwd) < nb)
+		    break;
+		  extra = chunk_ok_for_memalign (fwd, alignment, nb);
+		  if (extra > 0
+		      && (extra <= best_size || best == NULL))
+		    {
+		      best = fwd;
+		      best_size = extra;
+		    }
 
-	  if (chunksize (fwd) < nb)
-	      break;
-	  extra = chunk_ok_for_memalign (fwd, alignment, bytes);
-	  if (extra > 0
-	      && (extra <= best_size || best == NULL))
-	    {
-	      best = fwd;
-	      best_size = extra;
-	    }
+		  fwd = fwd->fd;
+		}
+	      victim = best;
 
-	  fwd = fwd->fd;
-	}
-      victim = best;
+	      if (victim != NULL)
+		{
+		  unlink_chunk (av, victim);
+		  break;
+		}
+	    }
 
-      if (victim != NULL)
-	{
-	  unlink_chunk (av, victim);
-	  break;
+	  if (victim != NULL)
+	    break;
 	}
     }
 
-      if (victim != NULL)
-	break;
-    }
-
   /* Strategy: find a spot within that chunk that meets the alignment
      request, and then possibly free the leading and trailing space.
      This strategy is incredibly costly and can lead to external
@@ -5147,6 +5150,8 @@  _int_memalign (mstate av, size_t alignment, size_t bytes)
       p = victim;
       m = chunk2mem (p);
       set_inuse (p);
+      if (av != &main_arena)
+	set_non_main_arena (p);
     }
   else
     {
diff --git a/malloc/tst-memalign-2.c b/malloc/tst-memalign-2.c
index 4996578e9f..f229283dbf 100644
--- a/malloc/tst-memalign-2.c
+++ b/malloc/tst-memalign-2.c
@@ -33,9 +33,10 @@  typedef struct TestCase {
 } TestCase;
 
 static TestCase tcache_allocs[] = {
-  { 24, 8, NULL, NULL },
-  { 24, 16, NULL, NULL },
-  { 128, 32, NULL, NULL }
+  { 24, 32, NULL, NULL },
+  { 24, 64, NULL, NULL },
+  { 128, 128, NULL, NULL },
+  { 500, 128, NULL, NULL }
 };
 #define TN array_length (tcache_allocs)
 
@@ -70,11 +71,15 @@  do_test (void)
 
   for (i = 0; i < TN; ++ i)
     {
+      size_t sz2;
+
       tcache_allocs[i].ptr1 = memalign (tcache_allocs[i].alignment, tcache_allocs[i].size);
       CHECK (tcache_allocs[i].ptr1, tcache_allocs[i].alignment);
+      sz2 = malloc_usable_size (tcache_allocs[i].ptr1);
       free (tcache_allocs[i].ptr1);
+
       /* This should return the same chunk as was just free'd.  */
-      tcache_allocs[i].ptr2 = memalign (tcache_allocs[i].alignment, tcache_allocs[i].size);
+      tcache_allocs[i].ptr2 = memalign (tcache_allocs[i].alignment, sz2);
       CHECK (tcache_allocs[i].ptr2, tcache_allocs[i].alignment);
       free (tcache_allocs[i].ptr2);
 
diff --git a/malloc/tst-memalign-3.c b/malloc/tst-memalign-3.c
new file mode 100644
index 0000000000..ab90d6ca9b
--- /dev/null
+++ b/malloc/tst-memalign-3.c
@@ -0,0 +1,173 @@ 
+/* Test for memalign chunk reuse.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <malloc.h>
+#include <stdio.h>
+#include <pthread.h>
+#include <string.h>
+#include <unistd.h>
+#include <array_length.h>
+#include <libc-pointer-arith.h>
+#include <support/check.h>
+#include <support/xthread.h>
+
+
+typedef struct TestCase {
+  size_t size;
+  size_t alignment;
+  void *ptr1;
+  void *ptr2;
+} TestCase;
+
+static TestCase tcache_allocs[] = {
+  { 24, 32, NULL, NULL },
+  { 24, 64, NULL, NULL },
+  { 128, 128, NULL, NULL },
+  { 500, 128, NULL, NULL }
+};
+#define TN array_length (tcache_allocs)
+
+static TestCase large_allocs[] = {
+  { 23450, 64, NULL, NULL },
+  { 23450, 64, NULL, NULL },
+  { 23550, 64, NULL, NULL },
+  { 23550, 64, NULL, NULL },
+  { 23650, 64, NULL, NULL },
+  { 23650, 64, NULL, NULL },
+  { 33650, 64, NULL, NULL },
+  { 33650, 64, NULL, NULL }
+};
+#define LN array_length (large_allocs)
+
+void *p;
+
+/* Sanity checks, ancillary to the actual test.  */
+#define CHECK(p,a) \
+  if (p == NULL || !PTR_IS_ALIGNED (p, a)) \
+    FAIL_EXIT1 ("NULL or misaligned memory detected.\n");
+
+static void *
+mem_test (void *closure)
+{
+  int i;
+  int j;
+  int count;
+  void *ptr[10];
+  void *p;
+
+  /* TCache test.  */
+  for (i = 0; i < TN; ++ i)
+    {
+      size_t sz2;
+
+      tcache_allocs[i].ptr1 = memalign (tcache_allocs[i].alignment, tcache_allocs[i].size);
+      CHECK (tcache_allocs[i].ptr1, tcache_allocs[i].alignment);
+      sz2 = malloc_usable_size (tcache_allocs[i].ptr1);
+      free (tcache_allocs[i].ptr1);
+
+      /* This should return the same chunk as was just free'd.  */
+      tcache_allocs[i].ptr2 = memalign (tcache_allocs[i].alignment, sz2);
+      CHECK (tcache_allocs[i].ptr2, tcache_allocs[i].alignment);
+      free (tcache_allocs[i].ptr2);
+
+      TEST_VERIFY (tcache_allocs[i].ptr1 == tcache_allocs[i].ptr2);
+    }
+
+  /* Test for non-head tcache hits.  */
+  for (i = 0; i < array_length (ptr); ++ i)
+    {
+      if (i == 4)
+	{
+	  ptr[i] = memalign (64, 256);
+	  CHECK (ptr[i], 64);
+	}
+      else
+	{
+	  ptr[i] = malloc (256);
+	  CHECK (ptr[i], 4);
+	}
+    }
+  for (i = 0; i < array_length (ptr); ++ i)
+    free (ptr[i]);
+
+  p = memalign (64, 256);
+  CHECK (p, 64);
+
+  count = 0;
+  for (i = 0; i < 10; ++ i)
+    if (ptr[i] == p)
+      ++ count;
+  free (p);
+  TEST_VERIFY (count > 0);
+
+  /* Large bins test.  */
+
+  for (i = 0; i < LN; ++ i)
+    {
+      large_allocs[i].ptr1 = memalign (large_allocs[i].alignment, large_allocs[i].size);
+      CHECK (large_allocs[i].ptr1, large_allocs[i].alignment);
+      /* Keep chunks from combining by fragmenting the heap.  */
+      p = malloc (512);
+      CHECK (p, 4);
+    }
+
+  for (i = 0; i < LN; ++ i)
+    free (large_allocs[i].ptr1);
+
+  /* Force the unsorted bins to be scanned and moved to small/large
+     bins.  */
+  p = malloc (60000);
+
+  for (i = 0; i < LN; ++ i)
+    {
+      large_allocs[i].ptr2 = memalign (large_allocs[i].alignment, large_allocs[i].size);
+      CHECK (large_allocs[i].ptr2, large_allocs[i].alignment);
+    }
+
+  count = 0;
+  for (i = 0; i < LN; ++ i)
+    {
+      int ok = 0;
+      for (j = 0; j < LN; ++ j)
+	if (large_allocs[i].ptr1 == large_allocs[j].ptr2)
+	  ok = 1;
+      if (ok == 1)
+	count ++;
+    }
+
+  /* The allocation algorithm is complicated outside of the memalign
+     logic, so just make sure it's working for most of the
+     allocations.  This avoids possible boundary conditions with
+     empty/full heaps.  */
+  TEST_VERIFY (count > LN / 2);
+
+  return 0;
+}
+
+static int
+do_test (void)
+{
+  pthread_t p;
+
+  p = xpthread_create (NULL, mem_test, NULL);
+  xpthread_join (p);
+  return 0;
+}
+
+#include <support/test-driver.c>