doc: document incremental LTO flags
Checks
| Context |
Check |
Description |
| linaro-tcwg-bot/tcwg_simplebootstrap_build--master-arm-bootstrap |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_simplebootstrap_build--master-aarch64-bootstrap |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
Commit Message
This adds missing documentation for LTO flags.
Ok?
gcc/ChangeLog:
* doc/invoke.texi: (Optimize Options):
Add incremental LTO flags.
---
gcc/doc/invoke.texi | 26 +++++++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
Comments
On Thu, 13 Mar 2025, Michal Jires wrote:
> This adds missing documentation for LTO flags.
>
> Ok?
OK.
Thanks,
Richard.
> gcc/ChangeLog:
>
> * doc/invoke.texi: (Optimize Options):
> Add incremental LTO flags.
> ---
> gcc/doc/invoke.texi | 26 +++++++++++++++++++++++---
> 1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 4fbb4cda101..3efc6602898 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -601,7 +601,8 @@ Objective-C and Objective-C++ Dialects}.
> -floop-block -floop-interchange -floop-strip-mine
> -floop-unroll-and-jam -floop-nest-optimize
> -floop-parallelize-all -flra-remat -flto -flto-compression-level
> --flto-partition=@var{alg} -fmalloc-dce -fmerge-all-constants
> +-flto-partition=@var{alg} -flto-incremental=@var{path}
> +-flto-incremental-cache-size=@var{n} -fmalloc-dce -fmerge-all-constants
> -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves
> -fmove-loop-invariants -fmove-loop-stores -fno-branch-count-reg
> -fno-defer-pop -fno-fp-int-builtin-inexact -fno-function-cse
> @@ -15086,8 +15087,10 @@ Specify the partitioning algorithm used by the link-time optimizer.
> The value is either @samp{1to1} to specify a partitioning mirroring
> the original source files or @samp{balanced} to specify partitioning
> into equally sized chunks (whenever possible) or @samp{max} to create
> -new partition for every symbol where possible. Specifying @samp{none}
> -as an algorithm disables partitioning and streaming completely.
> +new partition for every symbol where possible or @samp{cache} to
> +balance chunk sizes while keeping related symbols together for better
> +caching in incremental LTO. Specifying @samp{none} as an algorithm
> +disables partitioning and streaming completely.
> The default value is @samp{balanced}. While @samp{1to1} can be used
> as an workaround for various code ordering issues, the @samp{max}
> partitioning is intended for internal testing only.
> @@ -15095,6 +15098,23 @@ The value @samp{one} specifies that exactly one partition should be
> used while the value @samp{none} bypasses partitioning and executes
> the link-time optimization step directly from the WPA phase.
>
> +@opindex flto-incremental
> +@item -flto-incremental=@var{path}
> +Enable incremental LTO, with its cache in given existing directory.
> +Can significantly shorten edit-compile cycles with LTO.
> +
> +When used with LTO (@option{-flto}), the output of translation units
> +inside LTO is cached. Cached translation units are likely to be
> +encountered again when recompiling with small code changes, leading to
> +recompile time reduction.
> +
> +Multiple GCC instances can use the same cache in parallel.
> +
> +@opindex flto-incremental-cache-size
> +@item -flto-incremental-cache-size=@var{n}
> +Specifies number of cache entries in incremental LTO after which to prune
> +old entries. This is a soft limit, temporarily there may be more entries.
> +
> @opindex flto-compression-level
> @item -flto-compression-level=@var{n}
> This option specifies the level of compression used for intermediate
>
Michal Jires <mjires@suse.cz> writes:
> This adds missing documentation for LTO flags.
>
> Ok?
>
> gcc/ChangeLog:
>
> * doc/invoke.texi: (Optimize Options):
> Add incremental LTO flags.
> ---
> gcc/doc/invoke.texi | 26 +++++++++++++++++++++++---
> 1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 4fbb4cda101..3efc6602898 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -601,7 +601,8 @@ Objective-C and Objective-C++ Dialects}.
> -floop-block -floop-interchange -floop-strip-mine
> -floop-unroll-and-jam -floop-nest-optimize
> -floop-parallelize-all -flra-remat -flto -flto-compression-level
> --flto-partition=@var{alg} -fmalloc-dce -fmerge-all-constants
> +-flto-partition=@var{alg} -flto-incremental=@var{path}
> +-flto-incremental-cache-size=@var{n} -fmalloc-dce -fmerge-all-constants
> -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves
> -fmove-loop-invariants -fmove-loop-stores -fno-branch-count-reg
> -fno-defer-pop -fno-fp-int-builtin-inexact -fno-function-cse
> @@ -15086,8 +15087,10 @@ Specify the partitioning algorithm used by the link-time optimizer.
> The value is either @samp{1to1} to specify a partitioning mirroring
> the original source files or @samp{balanced} to specify partitioning
> into equally sized chunks (whenever possible) or @samp{max} to create
> -new partition for every symbol where possible. Specifying @samp{none}
> -as an algorithm disables partitioning and streaming completely.
> +new partition for every symbol where possible or @samp{cache} to
> +balance chunk sizes while keeping related symbols together for better
> +caching in incremental LTO. Specifying @samp{none} as an algorithm
> +disables partitioning and streaming completely.
> The default value is @samp{balanced}. While @samp{1to1} can be used
> as an workaround for various code ordering issues, the @samp{max}
> partitioning is intended for internal testing only.
> @@ -15095,6 +15098,23 @@ The value @samp{one} specifies that exactly one partition should be
> used while the value @samp{none} bypasses partitioning and executes
> the link-time optimization step directly from the WPA phase.
>
> +@opindex flto-incremental
> +@item -flto-incremental=@var{path}
> +Enable incremental LTO, with its cache in given existing directory.
> +Can significantly shorten edit-compile cycles with LTO.
One thing I wasn't quite sure on yet: is -flto-partition=cache automatic
with -flto-incremental? Or is it just an optional flag I can pass for
more effective incremental LTO?
If it's the latter, should we mention that in the -flto-incremental
documentation?
> [...]
Thanks for working on incremental LTO. I had the opportunity to use it
for a bug for the first time last weekend and enjoyed it.
On Thu, 2025-03-27 at 15:33:44 +0000, Sam James wrote:
>
> One thing I wasn't quite sure on yet: is -flto-partition=cache automatic
> with -flto-incremental? Or is it just an optional flag I can pass for
> more effective incremental LTO?
>
> If it's the latter, should we mention that in the -flto-incremental
> documentation?
>
It is not automatic, because different partitioning will result in
different executable. Most of the time this should not matter, but for
example a performance bug depending on instruction alignment would not
be reproduced.
The cache partitioning is most useful with large amount of divergences
per diverging partition. Which was very useful at the start, but it
happens less with each divergence I remove.
Last time I measured it, the improvement was no longer noticeable
without debug symbols and only a few percent improvement with debug
symbols, with one outlier case being ~50 % worse.
The benefits are minor, a bit unclear, and caveats are hard to properly
explain. So I do not want to actively recommend the option for now.
> > [...]
>
> Thanks for working on incremental LTO. I had the opportunity to use it
> for a bug for the first time last weekend and enjoyed it.
Thanks, glad it is already useful.
Michal
Michal Jires <mjires@suse.cz> writes:
> On Thu, 2025-03-27 at 15:33:44 +0000, Sam James wrote:
>>
>> One thing I wasn't quite sure on yet: is -flto-partition=cache automatic
>> with -flto-incremental? Or is it just an optional flag I can pass for
>> more effective incremental LTO?
>>
>> If it's the latter, should we mention that in the -flto-incremental
>> documentation?
>>
>
> It is not automatic, because different partitioning will result in
> different executable. Most of the time this should not matter, but for
> example a performance bug depending on instruction alignment would not
> be reproduced.
Thanks! That makes sense.
>
> The cache partitioning is most useful with large amount of divergences
> per diverging partition. Which was very useful at the start, but it
> happens less with each divergence I remove.
> Last time I measured it, the improvement was no longer noticeable
> without debug symbols and only a few percent improvement with debug
> symbols, with one outlier case being ~50 % worse.
>
> The benefits are minor, a bit unclear, and caveats are hard to properly
> explain. So I do not want to actively recommend the option for now.
ACK. Appreciate the explanation.
> [...]
sam
@@ -601,7 +601,8 @@ Objective-C and Objective-C++ Dialects}.
-floop-block -floop-interchange -floop-strip-mine
-floop-unroll-and-jam -floop-nest-optimize
-floop-parallelize-all -flra-remat -flto -flto-compression-level
--flto-partition=@var{alg} -fmalloc-dce -fmerge-all-constants
+-flto-partition=@var{alg} -flto-incremental=@var{path}
+-flto-incremental-cache-size=@var{n} -fmalloc-dce -fmerge-all-constants
-fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves
-fmove-loop-invariants -fmove-loop-stores -fno-branch-count-reg
-fno-defer-pop -fno-fp-int-builtin-inexact -fno-function-cse
@@ -15086,8 +15087,10 @@ Specify the partitioning algorithm used by the link-time optimizer.
The value is either @samp{1to1} to specify a partitioning mirroring
the original source files or @samp{balanced} to specify partitioning
into equally sized chunks (whenever possible) or @samp{max} to create
-new partition for every symbol where possible. Specifying @samp{none}
-as an algorithm disables partitioning and streaming completely.
+new partition for every symbol where possible or @samp{cache} to
+balance chunk sizes while keeping related symbols together for better
+caching in incremental LTO. Specifying @samp{none} as an algorithm
+disables partitioning and streaming completely.
The default value is @samp{balanced}. While @samp{1to1} can be used
as an workaround for various code ordering issues, the @samp{max}
partitioning is intended for internal testing only.
@@ -15095,6 +15098,23 @@ The value @samp{one} specifies that exactly one partition should be
used while the value @samp{none} bypasses partitioning and executes
the link-time optimization step directly from the WPA phase.
+@opindex flto-incremental
+@item -flto-incremental=@var{path}
+Enable incremental LTO, with its cache in given existing directory.
+Can significantly shorten edit-compile cycles with LTO.
+
+When used with LTO (@option{-flto}), the output of translation units
+inside LTO is cached. Cached translation units are likely to be
+encountered again when recompiling with small code changes, leading to
+recompile time reduction.
+
+Multiple GCC instances can use the same cache in parallel.
+
+@opindex flto-incremental-cache-size
+@item -flto-incremental-cache-size=@var{n}
+Specifies number of cache entries in incremental LTO after which to prune
+old entries. This is a soft limit, temporarily there may be more entries.
+
@opindex flto-compression-level
@item -flto-compression-level=@var{n}
This option specifies the level of compression used for intermediate