doc: document incremental LTO flags

Message ID 2uxzll7pe4wdyzm24zrf26xrw2hkr56edx47sb6zsbrdcakkps@pdhw6kl5supn
State Committed
Commit 63e7478db76452d9f7d5bef9704a94480cc28a77
Headers
Series doc: document incremental LTO flags |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-arm-bootstrap success Build passed
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-aarch64-bootstrap success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed

Commit Message

Michal Jires March 13, 2025, 3:45 p.m. UTC
  This adds missing documentation for LTO flags.

Ok?

gcc/ChangeLog:

	* doc/invoke.texi: (Optimize Options):
	Add incremental LTO flags.
---
 gcc/doc/invoke.texi | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)
  

Comments

Richard Biener March 14, 2025, 6:49 a.m. UTC | #1
On Thu, 13 Mar 2025, Michal Jires wrote:

> This adds missing documentation for LTO flags.
> 
> Ok?

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> 	* doc/invoke.texi: (Optimize Options):
> 	Add incremental LTO flags.
> ---
>  gcc/doc/invoke.texi | 26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 4fbb4cda101..3efc6602898 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -601,7 +601,8 @@ Objective-C and Objective-C++ Dialects}.
>  -floop-block  -floop-interchange  -floop-strip-mine
>  -floop-unroll-and-jam  -floop-nest-optimize
>  -floop-parallelize-all  -flra-remat  -flto  -flto-compression-level
> --flto-partition=@var{alg}  -fmalloc-dce -fmerge-all-constants
> +-flto-partition=@var{alg} -flto-incremental=@var{path}
> +-flto-incremental-cache-size=@var{n} -fmalloc-dce -fmerge-all-constants
>  -fmerge-constants  -fmodulo-sched  -fmodulo-sched-allow-regmoves
>  -fmove-loop-invariants  -fmove-loop-stores  -fno-branch-count-reg
>  -fno-defer-pop  -fno-fp-int-builtin-inexact  -fno-function-cse
> @@ -15086,8 +15087,10 @@ Specify the partitioning algorithm used by the link-time optimizer.
>  The value is either @samp{1to1} to specify a partitioning mirroring
>  the original source files or @samp{balanced} to specify partitioning
>  into equally sized chunks (whenever possible) or @samp{max} to create
> -new partition for every symbol where possible.  Specifying @samp{none}
> -as an algorithm disables partitioning and streaming completely.
> +new partition for every symbol where possible or @samp{cache} to
> +balance chunk sizes while keeping related symbols together for better
> +caching in incremental LTO.  Specifying @samp{none} as an algorithm
> +disables partitioning and streaming completely.
>  The default value is @samp{balanced}. While @samp{1to1} can be used
>  as an workaround for various code ordering issues, the @samp{max}
>  partitioning is intended for internal testing only.
> @@ -15095,6 +15098,23 @@ The value @samp{one} specifies that exactly one partition should be
>  used while the value @samp{none} bypasses partitioning and executes
>  the link-time optimization step directly from the WPA phase.
>  
> +@opindex flto-incremental
> +@item -flto-incremental=@var{path}
> +Enable incremental LTO, with its cache in given existing directory.
> +Can significantly shorten edit-compile cycles with LTO.
> +
> +When used with LTO (@option{-flto}), the output of translation units
> +inside LTO is cached. Cached translation units are likely to be
> +encountered again when recompiling with small code changes, leading to
> +recompile time reduction.
> +
> +Multiple GCC instances can use the same cache in parallel.
> +
> +@opindex flto-incremental-cache-size
> +@item -flto-incremental-cache-size=@var{n}
> +Specifies number of cache entries in incremental LTO after which to prune
> +old entries. This is a soft limit, temporarily there may be more entries.
> +
>  @opindex flto-compression-level
>  @item -flto-compression-level=@var{n}
>  This option specifies the level of compression used for intermediate
>
  
Sam James March 27, 2025, 3:33 p.m. UTC | #2
Michal Jires <mjires@suse.cz> writes:

> This adds missing documentation for LTO flags.
>
> Ok?
>
> gcc/ChangeLog:
>
> 	* doc/invoke.texi: (Optimize Options):
> 	Add incremental LTO flags.
> ---
>  gcc/doc/invoke.texi | 26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 4fbb4cda101..3efc6602898 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -601,7 +601,8 @@ Objective-C and Objective-C++ Dialects}.
>  -floop-block  -floop-interchange  -floop-strip-mine
>  -floop-unroll-and-jam  -floop-nest-optimize
>  -floop-parallelize-all  -flra-remat  -flto  -flto-compression-level
> --flto-partition=@var{alg}  -fmalloc-dce -fmerge-all-constants
> +-flto-partition=@var{alg} -flto-incremental=@var{path}
> +-flto-incremental-cache-size=@var{n} -fmalloc-dce -fmerge-all-constants
>  -fmerge-constants  -fmodulo-sched  -fmodulo-sched-allow-regmoves
>  -fmove-loop-invariants  -fmove-loop-stores  -fno-branch-count-reg
>  -fno-defer-pop  -fno-fp-int-builtin-inexact  -fno-function-cse
> @@ -15086,8 +15087,10 @@ Specify the partitioning algorithm used by the link-time optimizer.
>  The value is either @samp{1to1} to specify a partitioning mirroring
>  the original source files or @samp{balanced} to specify partitioning
>  into equally sized chunks (whenever possible) or @samp{max} to create
> -new partition for every symbol where possible.  Specifying @samp{none}
> -as an algorithm disables partitioning and streaming completely.
> +new partition for every symbol where possible or @samp{cache} to
> +balance chunk sizes while keeping related symbols together for better
> +caching in incremental LTO.  Specifying @samp{none} as an algorithm
> +disables partitioning and streaming completely.
>  The default value is @samp{balanced}. While @samp{1to1} can be used
>  as an workaround for various code ordering issues, the @samp{max}
>  partitioning is intended for internal testing only.
> @@ -15095,6 +15098,23 @@ The value @samp{one} specifies that exactly one partition should be
>  used while the value @samp{none} bypasses partitioning and executes
>  the link-time optimization step directly from the WPA phase.
>  
> +@opindex flto-incremental
> +@item -flto-incremental=@var{path}
> +Enable incremental LTO, with its cache in given existing directory.
> +Can significantly shorten edit-compile cycles with LTO.

One thing I wasn't quite sure on yet: is -flto-partition=cache automatic
with -flto-incremental? Or is it just an optional flag I can pass for
more effective incremental LTO?

If it's the latter, should we mention that in the -flto-incremental
documentation?

> [...]

Thanks for working on incremental LTO. I had the opportunity to use it
for a bug for the first time last weekend and enjoyed it.
  
Michal Jires March 28, 2025, 1:48 p.m. UTC | #3
On Thu, 2025-03-27 at 15:33:44 +0000, Sam James wrote:
> 
> One thing I wasn't quite sure on yet: is -flto-partition=cache automatic
> with -flto-incremental? Or is it just an optional flag I can pass for
> more effective incremental LTO?
> 
> If it's the latter, should we mention that in the -flto-incremental
> documentation?
> 

It is not automatic, because different partitioning will result in
different executable. Most of the time this should not matter, but for
example a performance bug depending on instruction alignment would not
be reproduced.

The cache partitioning is most useful with large amount of divergences
per diverging partition. Which was very useful at the start, but it
happens less with each divergence I remove.
Last time I measured it, the improvement was no longer noticeable
without debug symbols and only a few percent improvement with debug
symbols, with one outlier case being ~50 % worse.

The benefits are minor, a bit unclear, and caveats are hard to properly
explain. So I do not want to actively recommend the option for now.

> > [...]
> 
> Thanks for working on incremental LTO. I had the opportunity to use it
> for a bug for the first time last weekend and enjoyed it.

Thanks, glad it is already useful.

Michal
  
Sam James March 28, 2025, 1:58 p.m. UTC | #4
Michal Jires <mjires@suse.cz> writes:

> On Thu, 2025-03-27 at 15:33:44 +0000, Sam James wrote:
>> 
>> One thing I wasn't quite sure on yet: is -flto-partition=cache automatic
>> with -flto-incremental? Or is it just an optional flag I can pass for
>> more effective incremental LTO?
>> 
>> If it's the latter, should we mention that in the -flto-incremental
>> documentation?
>> 
>
> It is not automatic, because different partitioning will result in
> different executable. Most of the time this should not matter, but for
> example a performance bug depending on instruction alignment would not
> be reproduced.

Thanks! That makes sense.

>
> The cache partitioning is most useful with large amount of divergences
> per diverging partition. Which was very useful at the start, but it
> happens less with each divergence I remove.
> Last time I measured it, the improvement was no longer noticeable
> without debug symbols and only a few percent improvement with debug
> symbols, with one outlier case being ~50 % worse.
>
> The benefits are minor, a bit unclear, and caveats are hard to properly
> explain. So I do not want to actively recommend the option for now.

ACK. Appreciate the explanation.

> [...]

sam
  

Patch

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4fbb4cda101..3efc6602898 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -601,7 +601,8 @@  Objective-C and Objective-C++ Dialects}.
 -floop-block  -floop-interchange  -floop-strip-mine
 -floop-unroll-and-jam  -floop-nest-optimize
 -floop-parallelize-all  -flra-remat  -flto  -flto-compression-level
--flto-partition=@var{alg}  -fmalloc-dce -fmerge-all-constants
+-flto-partition=@var{alg} -flto-incremental=@var{path}
+-flto-incremental-cache-size=@var{n} -fmalloc-dce -fmerge-all-constants
 -fmerge-constants  -fmodulo-sched  -fmodulo-sched-allow-regmoves
 -fmove-loop-invariants  -fmove-loop-stores  -fno-branch-count-reg
 -fno-defer-pop  -fno-fp-int-builtin-inexact  -fno-function-cse
@@ -15086,8 +15087,10 @@  Specify the partitioning algorithm used by the link-time optimizer.
 The value is either @samp{1to1} to specify a partitioning mirroring
 the original source files or @samp{balanced} to specify partitioning
 into equally sized chunks (whenever possible) or @samp{max} to create
-new partition for every symbol where possible.  Specifying @samp{none}
-as an algorithm disables partitioning and streaming completely.
+new partition for every symbol where possible or @samp{cache} to
+balance chunk sizes while keeping related symbols together for better
+caching in incremental LTO.  Specifying @samp{none} as an algorithm
+disables partitioning and streaming completely.
 The default value is @samp{balanced}. While @samp{1to1} can be used
 as an workaround for various code ordering issues, the @samp{max}
 partitioning is intended for internal testing only.
@@ -15095,6 +15098,23 @@  The value @samp{one} specifies that exactly one partition should be
 used while the value @samp{none} bypasses partitioning and executes
 the link-time optimization step directly from the WPA phase.
 
+@opindex flto-incremental
+@item -flto-incremental=@var{path}
+Enable incremental LTO, with its cache in given existing directory.
+Can significantly shorten edit-compile cycles with LTO.
+
+When used with LTO (@option{-flto}), the output of translation units
+inside LTO is cached. Cached translation units are likely to be
+encountered again when recompiling with small code changes, leading to
+recompile time reduction.
+
+Multiple GCC instances can use the same cache in parallel.
+
+@opindex flto-incremental-cache-size
+@item -flto-incremental-cache-size=@var{n}
+Specifies number of cache entries in incremental LTO after which to prune
+old entries. This is a soft limit, temporarily there may be more entries.
+
 @opindex flto-compression-level
 @item -flto-compression-level=@var{n}
 This option specifies the level of compression used for intermediate