[2/2] Document the M_ARENA_* mallopt parameters
Commit Message
The M_ARENA_* mallopt parameters are in wide use in production to
control the number of arenas that a long lived process creates and
hence there is no point in stating that this interface is non-public.
Document this interface and remove the obsolete comment.
* manual/memory.texi (M_ARENA_TEST): Add documentation.
(M_ARENA_MAX): Likewise.
* malloc/malloc.c: Remove obsolete comment.
---
malloc/malloc.c | 1 -
manual/memory.texi | 122 +++++++++++++++++++++++++++--------------------------
2 files changed, 62 insertions(+), 61 deletions(-)
Comments
Hi Siddhesh,
Most of this is formatting or cleanup/improvement of the pre-existing
chunks that were moved, but the comment on M_ARENA_MAX regards content.
On 10/24/2016 07:07 AM, Siddhesh Poyarekar wrote:
> The M_ARENA_* mallopt parameters are in wide use in production to
> control the number of arenas that a long lived process creates and
> hence there is no point in stating that this interface is non-public.
> Document this interface and remove the obsolete comment.
>
> * manual/memory.texi (M_ARENA_TEST): Add documentation.
> (M_ARENA_MAX): Likewise.
> * malloc/malloc.c: Remove obsolete comment.
> ---
> malloc/malloc.c | 1 -
> manual/memory.texi | 122 +++++++++++++++++++++++++++--------------------------
> 2 files changed, 62 insertions(+), 61 deletions(-)
>
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index ef04360..a849901 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -1718,7 +1718,6 @@ static struct malloc_par mp_ =
> };
>
>
> -/* Non public mallopt parameters. */
> #define M_ARENA_TEST -7
> #define M_ARENA_MAX -8
>
> diff --git a/manual/memory.texi b/manual/memory.texi
> index 6f33455..198a933 100644
> --- a/manual/memory.texi
> +++ b/manual/memory.texi
> @@ -162,6 +162,8 @@ special to @theglibc{} and GNU Compiler.
>
> @menu
> * Memory Allocation and C:: How to get different kinds of allocation in C.
> +* The GNU allocator:: An overview of the GNU @code{malloc}
I think "Allocator" should be capitalized. The manual is inconsistent
on @subsection capitalization, but the majority of it is (capitalized).
Also, all the other info menu entries visible in the context here are
fully capitalized.
> + implementation.
> * Unconstrained Allocation:: The @code{malloc} facility allows fully general
> dynamic allocation.
> * Allocation Debugging:: Finding memory leaks and not freed memory.
> @@ -258,6 +260,43 @@ address of the space. Then you can use the operators @samp{*} and
> @}
> @end smallexample
>
> +@node The GNU allocator
> +@subsection The GNU allocator
Allocator
> +@cindex gnu allocator
> +
> +The @code{malloc} implementation in @theglibc{} is derived from ptmalloc
> +(pthreads malloc), which in turn is derived from dlmalloc (Doug Lea malloc).
> +This malloc may allocate memory in two different ways depending on their size
> +and certain parameters that may be controlled by users. The most common way is
> +to allocate portions of memory (called chunks) from a large contiguous area of
> +memory and manage these areas to optimize their use and reduce wastage in the
> +form of unusable chunks. Traditionally the system heap was set up to be the one
> +large memory area but @theglibc{} @code{malloc} implementation maintains
This should be "the @glibcadj{} @code{malloc}".
> +multiple such areas to optimize their use in multi-threaded applications. Each
> +such area is internally referred to as an @code{arena}.
@dfn{arena}
Unless this is a function name (or literal string one would be using in
code, for example), this is simply a term we use to describe the
concept. On first use, where we define the term, it should have @dfn{},
and otherwise it doesn't need to be stylized at all, such as...
> +
> +As opposed to other versions, the @code{malloc} in @theglibc{} does not round
> +up chunk sizes to powers of two, neither for large nor for small sizes.
> +Neighboring chunks can be coalesced on a @code{free} no matter what their size
> +is. This makes the implementation suitable for all kinds of allocation
> +patterns without generally incurring high memory waste through fragmentation.
> +The presence of multiple @code{arenas} allows multiple threads to allocate
here (no @code{}).
> +memory simultaneously in their own separate arenas, thus improving performance.
> +
> +The other way of memory allocation is for very large blocks, i.e. much larger
> +than a page. These requests are allocated with @code{mmap} (anonymous or via
> +@code{/dev/zero}). This has the great advantage that these chunks are returned
Should be @file{}.
A reference to mmap would be good. Something like, "... (anonymous or
via @file{/dev/zero}; @pxref{Memory-mapped I/O})."
> +to the system immediately when they are freed. Therefore, it cannot happen
> +that a large chunk becomes ``locked'' in between smaller ones and even after
> +calling @code{free} wastes memory. The size threshold for @code{mmap} to be
> +used is dynamic and gets adjusted according to allocation patterns of the
> +program. This can also be statically adjusted with @code{mallopt}. The use of
A reference to mallopt would be good here.
> +@code{mmap} can also be disabled completely.
Should briefly say how and/or give a reference. I believe mallopt
applies to both, so maybe something like, "@code{mallopt} can be used to
statically adjust the threshold using @code{M_MMAP_THRESHOLD}, and the
use of @code{mmap} can be disabled completely with @code{M_MMAP_MAX};
@pxref{Malloc Tunable Parameters}."
> +
> +A more detailed technical description of the GNU allocator is maintained in
> +@theglibc{} wiki. See
the @glibcadj{}
> +@uref{https://sourceware.org/glibc/wiki/MallocInternals}.
> +
> @node Unconstrained Allocation
> @subsection Unconstrained Allocation
> @cindex unconstrained memory allocation
> @@ -278,8 +317,6 @@ any time (or never).
> bigger or smaller.
> * Allocating Cleared Space:: Use @code{calloc} to allocate a
> block and clear it.
> -* Efficiency and Malloc:: Efficiency considerations in use of
> - these functions.
> * Aligned Memory Blocks:: Allocating specially aligned memory.
> * Malloc Tunable Parameters:: Use @code{mallopt} to adjust allocation
> parameters.
> @@ -867,59 +904,6 @@ But in general, it is not guaranteed that @code{calloc} calls
> @code{malloc}/@code{realloc}/@code{free} outside the C library, it
> should always define @code{calloc}, too.
>
> -@node Efficiency and Malloc
> -@subsubsection Efficiency Considerations for @code{malloc}
> -@cindex efficiency and @code{malloc}
> -
> -
> -
> -
> -@ignore
> -
> -@c No longer true, see below instead.
> -To make the best use of @code{malloc}, it helps to know that the GNU
> -version of @code{malloc} always dispenses small amounts of memory in
> -blocks whose sizes are powers of two. It keeps separate pools for each
> -power of two. This holds for sizes up to a page size. Therefore, if
> -you are free to choose the size of a small block in order to make
> -@code{malloc} more efficient, make it a power of two.
> -@c !!! xref getpagesize
> -
> -Once a page is split up for a particular block size, it can't be reused
> -for another size unless all the blocks in it are freed. In many
> -programs, this is unlikely to happen. Thus, you can sometimes make a
> -program use memory more efficiently by using blocks of the same size for
> -many different purposes.
> -
> -When you ask for memory blocks of a page or larger, @code{malloc} uses a
> -different strategy; it rounds the size up to a multiple of a page, and
> -it can coalesce and split blocks as needed.
> -
> -The reason for the two strategies is that it is important to allocate
> -and free small blocks as fast as possible, but speed is less important
> -for a large block since the program normally spends a fair amount of
> -time using it. Also, large blocks are normally fewer in number.
> -Therefore, for large blocks, it makes sense to use a method which takes
> -more time to minimize the wasted space.
> -
> -@end ignore
> -
> -As opposed to other versions, the @code{malloc} in @theglibc{}
> -does not round up block sizes to powers of two, neither for large nor
> -for small sizes. Neighboring chunks can be coalesced on a @code{free}
> -no matter what their size is. This makes the implementation suitable
> -for all kinds of allocation patterns without generally incurring high
> -memory waste through fragmentation.
> -
> -Very large blocks (much larger than a page) are allocated with
> -@code{mmap} (anonymous or via @code{/dev/zero}) by this implementation.
> -This has the great advantage that these chunks are returned to the
> -system immediately when they are freed. Therefore, it cannot happen
> -that a large chunk becomes ``locked'' in between smaller ones and even
> -after calling @code{free} wastes memory. The size threshold for
> -@code{mmap} to be used can be adjusted with @code{mallopt}. The use of
> -@code{mmap} can also be disabled completely.
> -
> @node Aligned Memory Blocks
> @subsubsection Allocating Aligned Memory Blocks
>
> @@ -1105,10 +1089,6 @@ parameter to be set, and @var{value} the new value to be set. Possible
> choices for @var{param}, as defined in @file{malloc.h}, are:
>
> @table @code
> -@comment TODO: @item M_ARENA_MAX
> -@comment - Document ARENA_MAX env var.
> -@comment TODO: @item M_ARENA_TEST
> -@comment - Document ARENA_TEST env var.
> @comment TODO: @item M_CHECK_ACTION
> @item M_MMAP_MAX
> The maximum number of chunks to allocate with @code{mmap}. Setting this
> @@ -1169,6 +1149,28 @@ value is set statically to the provided input.
>
> This parameter can also be set for the process at startup by setting the
> environment variable @code{MALLOC_TRIM_THRESHOLD_} to the desired value.
> +
> +@item M_ARENA_TEST
> +This parameter specifies the number of arenas that can be created before the
> +test on the limit to the number of arenas is conducted. The value is ignored if
> +@code{M_ARENA_MAX} is set.
> +
> +The default value of this parameter is 2 on 32-bit systems and 8 on 64-bit
> +systems.
> +
> +This parameter can also be set for the process at startup by setting the
> +environment variable @code{MALLOC_ARENA_TEST} to the desired value.
> +@item M_ARENA_MAX
> +This parameter sets the number of arenas to use regardless of the number of
> +cores in the system.
> +
> +The default value of this tunable is @code{0}, meaning that the limit on the
> +number of arenas is determined by the number of CPU cores online. For 32-bit
> +systems the limit is twice the number of cores online and on 64-bit systems, it
> +is eight times the number of cores online.
Even though I had followed the thread, I immediately jumped to the same
confusion with M_ARENA_TEST that was resolved in [1]. Explicitly
stating here that 2 and 8 are not derived from M_ARENA_TEST defaults
might save a lot of general confusion down the road.
> +
> +This parameter can also be set for the process at startup by setting the
> +environment variable @code{MALLOC_ARENA_MAX} to the desired value.
> @end table
>
> @end deftypefun
> @@ -1511,7 +1513,7 @@ This is the total size of memory allocated with @code{sbrk} by
> This is the number of chunks not in use. (The memory allocator
> internally gets chunks of memory from the operating system, and then
> carves them up to satisfy individual @code{malloc} requests; see
> -@ref{Efficiency and Malloc}.)
> +@ref{The GNU allocator}.)
Allocator
Also, making this an @pxref{} now would be nice, since this renders
wrong in info as-is.
>
> @item int smblks
> This field is unused.
>
Rical
[1] https://sourceware.org/ml/libc-alpha/2016-10/msg00310.html
Siddhesh Poyarekar <siddhesh@sourceware.org> writes:
> -/* Non public mallopt parameters. */
> #define M_ARENA_TEST -7
> #define M_ARENA_MAX -8
Hmmm... if these are now public, do they need to be moved elsewhere?
Like, malloc.h ?
> +The presence of multiple @code{arenas} allows multiple threads to allocate
> +memory simultaneously in their own separate arenas, thus improving performance.
Not quite true - there isn't one arena per thread, there are N arenas
per M threads. Probably better to say "... simultaneously in separate
arenas ..."
> +environment variable @code{MALLOC_ARENA_TEST} to the desired value.
> +@item M_ARENA_MAX
Missing blank line.
Otherwise, the content looks OK to me from a technical point of view.
On Wednesday 26 October 2016 03:25 AM, DJ Delorie wrote:
> Siddhesh Poyarekar <siddhesh@sourceware.org> writes:
>> -/* Non public mallopt parameters. */
>> #define M_ARENA_TEST -7
>> #define M_ARENA_MAX -8
>
> Hmmm... if these are now public, do they need to be moved elsewhere?
> Like, malloc.h ?
They're already there in malloc.h. I'll push a follow-up patch if they
can be removed.
>> +The presence of multiple @code{arenas} allows multiple threads to allocate
>> +memory simultaneously in their own separate arenas, thus improving performance.
>
> Not quite true - there isn't one arena per thread, there are N arenas
> per M threads. Probably better to say "... simultaneously in separate
> arenas ..."
OK.
>> +environment variable @code{MALLOC_ARENA_TEST} to the desired value.
>> +@item M_ARENA_MAX
>
> Missing blank line.
>
> Otherwise, the content looks OK to me from a technical point of view.
Thanks, I'll fix up formatting comments from Rical and push this.
Siddhesh
@@ -1718,7 +1718,6 @@ static struct malloc_par mp_ =
};
-/* Non public mallopt parameters. */
#define M_ARENA_TEST -7
#define M_ARENA_MAX -8
@@ -162,6 +162,8 @@ special to @theglibc{} and GNU Compiler.
@menu
* Memory Allocation and C:: How to get different kinds of allocation in C.
+* The GNU allocator:: An overview of the GNU @code{malloc}
+ implementation.
* Unconstrained Allocation:: The @code{malloc} facility allows fully general
dynamic allocation.
* Allocation Debugging:: Finding memory leaks and not freed memory.
@@ -258,6 +260,43 @@ address of the space. Then you can use the operators @samp{*} and
@}
@end smallexample
+@node The GNU allocator
+@subsection The GNU allocator
+@cindex gnu allocator
+
+The @code{malloc} implementation in @theglibc{} is derived from ptmalloc
+(pthreads malloc), which in turn is derived from dlmalloc (Doug Lea malloc).
+This malloc may allocate memory in two different ways depending on their size
+and certain parameters that may be controlled by users. The most common way is
+to allocate portions of memory (called chunks) from a large contiguous area of
+memory and manage these areas to optimize their use and reduce wastage in the
+form of unusable chunks. Traditionally the system heap was set up to be the one
+large memory area but @theglibc{} @code{malloc} implementation maintains
+multiple such areas to optimize their use in multi-threaded applications. Each
+such area is internally referred to as an @code{arena}.
+
+As opposed to other versions, the @code{malloc} in @theglibc{} does not round
+up chunk sizes to powers of two, neither for large nor for small sizes.
+Neighboring chunks can be coalesced on a @code{free} no matter what their size
+is. This makes the implementation suitable for all kinds of allocation
+patterns without generally incurring high memory waste through fragmentation.
+The presence of multiple @code{arenas} allows multiple threads to allocate
+memory simultaneously in their own separate arenas, thus improving performance.
+
+The other way of memory allocation is for very large blocks, i.e. much larger
+than a page. These requests are allocated with @code{mmap} (anonymous or via
+@code{/dev/zero}). This has the great advantage that these chunks are returned
+to the system immediately when they are freed. Therefore, it cannot happen
+that a large chunk becomes ``locked'' in between smaller ones and even after
+calling @code{free} wastes memory. The size threshold for @code{mmap} to be
+used is dynamic and gets adjusted according to allocation patterns of the
+program. This can also be statically adjusted with @code{mallopt}. The use of
+@code{mmap} can also be disabled completely.
+
+A more detailed technical description of the GNU allocator is maintained in
+@theglibc{} wiki. See
+@uref{https://sourceware.org/glibc/wiki/MallocInternals}.
+
@node Unconstrained Allocation
@subsection Unconstrained Allocation
@cindex unconstrained memory allocation
@@ -278,8 +317,6 @@ any time (or never).
bigger or smaller.
* Allocating Cleared Space:: Use @code{calloc} to allocate a
block and clear it.
-* Efficiency and Malloc:: Efficiency considerations in use of
- these functions.
* Aligned Memory Blocks:: Allocating specially aligned memory.
* Malloc Tunable Parameters:: Use @code{mallopt} to adjust allocation
parameters.
@@ -867,59 +904,6 @@ But in general, it is not guaranteed that @code{calloc} calls
@code{malloc}/@code{realloc}/@code{free} outside the C library, it
should always define @code{calloc}, too.
-@node Efficiency and Malloc
-@subsubsection Efficiency Considerations for @code{malloc}
-@cindex efficiency and @code{malloc}
-
-
-
-
-@ignore
-
-@c No longer true, see below instead.
-To make the best use of @code{malloc}, it helps to know that the GNU
-version of @code{malloc} always dispenses small amounts of memory in
-blocks whose sizes are powers of two. It keeps separate pools for each
-power of two. This holds for sizes up to a page size. Therefore, if
-you are free to choose the size of a small block in order to make
-@code{malloc} more efficient, make it a power of two.
-@c !!! xref getpagesize
-
-Once a page is split up for a particular block size, it can't be reused
-for another size unless all the blocks in it are freed. In many
-programs, this is unlikely to happen. Thus, you can sometimes make a
-program use memory more efficiently by using blocks of the same size for
-many different purposes.
-
-When you ask for memory blocks of a page or larger, @code{malloc} uses a
-different strategy; it rounds the size up to a multiple of a page, and
-it can coalesce and split blocks as needed.
-
-The reason for the two strategies is that it is important to allocate
-and free small blocks as fast as possible, but speed is less important
-for a large block since the program normally spends a fair amount of
-time using it. Also, large blocks are normally fewer in number.
-Therefore, for large blocks, it makes sense to use a method which takes
-more time to minimize the wasted space.
-
-@end ignore
-
-As opposed to other versions, the @code{malloc} in @theglibc{}
-does not round up block sizes to powers of two, neither for large nor
-for small sizes. Neighboring chunks can be coalesced on a @code{free}
-no matter what their size is. This makes the implementation suitable
-for all kinds of allocation patterns without generally incurring high
-memory waste through fragmentation.
-
-Very large blocks (much larger than a page) are allocated with
-@code{mmap} (anonymous or via @code{/dev/zero}) by this implementation.
-This has the great advantage that these chunks are returned to the
-system immediately when they are freed. Therefore, it cannot happen
-that a large chunk becomes ``locked'' in between smaller ones and even
-after calling @code{free} wastes memory. The size threshold for
-@code{mmap} to be used can be adjusted with @code{mallopt}. The use of
-@code{mmap} can also be disabled completely.
-
@node Aligned Memory Blocks
@subsubsection Allocating Aligned Memory Blocks
@@ -1105,10 +1089,6 @@ parameter to be set, and @var{value} the new value to be set. Possible
choices for @var{param}, as defined in @file{malloc.h}, are:
@table @code
-@comment TODO: @item M_ARENA_MAX
-@comment - Document ARENA_MAX env var.
-@comment TODO: @item M_ARENA_TEST
-@comment - Document ARENA_TEST env var.
@comment TODO: @item M_CHECK_ACTION
@item M_MMAP_MAX
The maximum number of chunks to allocate with @code{mmap}. Setting this
@@ -1169,6 +1149,28 @@ value is set statically to the provided input.
This parameter can also be set for the process at startup by setting the
environment variable @code{MALLOC_TRIM_THRESHOLD_} to the desired value.
+
+@item M_ARENA_TEST
+This parameter specifies the number of arenas that can be created before the
+test on the limit to the number of arenas is conducted. The value is ignored if
+@code{M_ARENA_MAX} is set.
+
+The default value of this parameter is 2 on 32-bit systems and 8 on 64-bit
+systems.
+
+This parameter can also be set for the process at startup by setting the
+environment variable @code{MALLOC_ARENA_TEST} to the desired value.
+@item M_ARENA_MAX
+This parameter sets the number of arenas to use regardless of the number of
+cores in the system.
+
+The default value of this tunable is @code{0}, meaning that the limit on the
+number of arenas is determined by the number of CPU cores online. For 32-bit
+systems the limit is twice the number of cores online and on 64-bit systems, it
+is eight times the number of cores online.
+
+This parameter can also be set for the process at startup by setting the
+environment variable @code{MALLOC_ARENA_MAX} to the desired value.
@end table
@end deftypefun
@@ -1511,7 +1513,7 @@ This is the total size of memory allocated with @code{sbrk} by
This is the number of chunks not in use. (The memory allocator
internally gets chunks of memory from the operating system, and then
carves them up to satisfy individual @code{malloc} requests; see
-@ref{Efficiency and Malloc}.)
+@ref{The GNU allocator}.)
@item int smblks
This field is unused.