lto: Don't run ipa-comdats pass during LTO

Message ID d925ae20-69f2-8987-7b70-06edb7834beb@codesourcery.com
State New
Headers
Series lto: Don't run ipa-comdats pass during LTO |

Commit Message

Sandra Loosemore Dec. 7, 2021, midnight UTC
  The attached patch fixes an ICE in lto1 at lto-partition.c:215 that
was reported by a customer.  Unfortunately I have no test case for
this; the customer's application is a big C++ shared library with lots
of dependencies and proprietary code under NDA.  I did try reducing it
with cvise but couldn't end up with anything small or robust enough to
be suitable, and likewise my attempts to write a small testcase by
hand were not successful in reproducing the bug.  OTOH, I did track
this down in the debugger and I think I have a pretty good
understanding of what the problem is, namely...

The symbol lto-partition was failing on was a vtable symbol that was
in a comdat group but not marked as DECL_COMDAT and without the right
visibility flags for a comdat symbol.  The LTO data in the input .o
files is correct and lto1 is creating the symbol correctly when it's
read in, but the problem is that there are two optimization passes
being run before lto-partition that are trying to do conflicting
things with COMDATs.

The first is the ipa-visibility pass.  When this pass is run as part
of lto1, since it has the complete program or shared library (as in
this case) available to analyze, it tries to break up COMDATs and turn
them into ordinary local symbols.  But then the ipa-comdats pass,
which is also run as part of regular IPA optimizations at compile
time, tries to do the reverse and move local symbols only referenced
from a COMDAT into that same COMDAT, but in the process it is failing
to set all the flags needed by the LTO partitioner to correctly
process the symbol.  It's possible the failure to set the flags
properly is a bug in ipa-comdats, but looking at the bigger picture,
it makes no sense to be running this optimization pass at link time at
all.  It is a compile-time optimization intended to reduce code size
by letting the linker keep only one copy of these symbols that may be
redundantly defined in multiple objects.

So the patch I've come up with disables the ipa-comdats pass in the
same situations in which ipa-visibility localizes COMDAT symbols, to
remove the conflict between the two passes.  This fixes the customer
problem (in a GCC 10 build for arm-linux-gnueabi), and regression
tests OK on mainline x86_64-linux-gnu as well.  OK for mainline, and
to backport to older branches?  It looked to me like none of this
this code had been touched for years.

-Sandra
  

Comments

Richard Biener Dec. 7, 2021, 7:37 a.m. UTC | #1
On Tue, Dec 7, 2021 at 1:01 AM Sandra Loosemore <sandra@codesourcery.com> wrote:
>
> The attached patch fixes an ICE in lto1 at lto-partition.c:215 that
> was reported by a customer.  Unfortunately I have no test case for
> this; the customer's application is a big C++ shared library with lots
> of dependencies and proprietary code under NDA.  I did try reducing it
> with cvise but couldn't end up with anything small or robust enough to
> be suitable, and likewise my attempts to write a small testcase by
> hand were not successful in reproducing the bug.  OTOH, I did track
> this down in the debugger and I think I have a pretty good
> understanding of what the problem is, namely...

I found PR65995 which looks related (well, the ICE is at the same position).
Does the customer use the linker plugin?

> The symbol lto-partition was failing on was a vtable symbol that was
> in a comdat group but not marked as DECL_COMDAT and without the right
> visibility flags for a comdat symbol.  The LTO data in the input .o
> files is correct and lto1 is creating the symbol correctly when it's
> read in, but the problem is that there are two optimization passes
> being run before lto-partition that are trying to do conflicting
> things with COMDATs.
>
> The first is the ipa-visibility pass.  When this pass is run as part
> of lto1, since it has the complete program or shared library (as in
> this case) available to analyze, it tries to break up COMDATs and turn
> them into ordinary local symbols.  But then the ipa-comdats pass,
> which is also run as part of regular IPA optimizations at compile
> time, tries to do the reverse and move local symbols only referenced
> from a COMDAT into that same COMDAT, but in the process it is failing
> to set all the flags needed by the LTO partitioner to correctly
> process the symbol.  It's possible the failure to set the flags
> properly is a bug in ipa-comdats, but looking at the bigger picture,
> it makes no sense to be running this optimization pass at link time at
> all.  It is a compile-time optimization intended to reduce code size
> by letting the linker keep only one copy of these symbols that may be
> redundantly defined in multiple objects.
>
> So the patch I've come up with disables the ipa-comdats pass in the
> same situations in which ipa-visibility localizes COMDAT symbols, to
> remove the conflict between the two passes.  This fixes the customer
> problem (in a GCC 10 build for arm-linux-gnueabi), and regression
> tests OK on mainline x86_64-linux-gnu as well.  OK for mainline, and
> to backport to older branches?  It looked to me like none of this
> this code had been touched for years.

I hope Honza can review this.

Thanks,
Richard.

>
> -Sandra
  
Jan Hubicka Dec. 7, 2021, 8:07 a.m. UTC | #2
> The attached patch fixes an ICE in lto1 at lto-partition.c:215 that
> was reported by a customer.  Unfortunately I have no test case for
> this; the customer's application is a big C++ shared library with lots
> of dependencies and proprietary code under NDA.  I did try reducing it
> with cvise but couldn't end up with anything small or robust enough to

It often helps to use -flto-partition=max (or 1to1) to reduce such
problems.
> be suitable, and likewise my attempts to write a small testcase by
> hand were not successful in reproducing the bug.  OTOH, I did track
> this down in the debugger and I think I have a pretty good
> understanding of what the problem is, namely...
> 
> The symbol lto-partition was failing on was a vtable symbol that was
> in a comdat group but not marked as DECL_COMDAT and without the right
> visibility flags for a comdat symbol.  The LTO data in the input .o
> files is correct and lto1 is creating the symbol correctly when it's
> read in, but the problem is that there are two optimization passes
> being run before lto-partition that are trying to do conflicting
> things with COMDATs.
> 
> The first is the ipa-visibility pass.  When this pass is run as part
> of lto1, since it has the complete program or shared library (as in
> this case) available to analyze, it tries to break up COMDATs and turn
> them into ordinary local symbols.  But then the ipa-comdats pass,
> which is also run as part of regular IPA optimizations at compile
> time, tries to do the reverse and move local symbols only referenced
> from a COMDAT into that same COMDAT, but in the process it is failing
> to set all the flags needed by the LTO partitioner to correctly
> process the symbol.  It's possible the failure to set the flags
> properly is a bug in ipa-comdats, but looking at the bigger picture,
> it makes no sense to be running this optimization pass at link time at
> all.  It is a compile-time optimization intended to reduce code size
> by letting the linker keep only one copy of these symbols that may be
> redundantly defined in multiple objects.
> 
> So the patch I've come up with disables the ipa-comdats pass in the
> same situations in which ipa-visibility localizes COMDAT symbols, to
> remove the conflict between the two passes.  This fixes the customer
> problem (in a GCC 10 build for arm-linux-gnueabi), and regression
> tests OK on mainline x86_64-linux-gnu as well.  OK for mainline, and
> to backport to older branches?  It looked to me like none of this
> this code had been touched for years.

The passes are supposed to work together so disabling it with LTO is
just papering around the problem.  Perhaps we can try to debug it
together.

Ipa-visibility should use resolution info when it knows that the comdat
group can be dismantled (because we have resolution info and we know
given comdat group will prevail).

For comdat groups that did fail to prevail ipa-comdats should still do
the job of bringing static symbols that are only used to implement a
given comdat inside of the comdat group so in case it is optimized out
we do not end up with dead code.

I however wonder how this path is triggering for you with the resolution
info.  Would it be possible to know bit more detail? What symbol we ICE
on and what are the visibility flags of it?  Wha is the resolution info
of the symbols inside of the comdat?

Honza
  

Patch

commit 559b1911c825c8df273d6cc7c403a3d0c79d0295
Author: Sandra Loosemore <sandra@codesourcery.com>
Date:   Sun Dec 5 17:59:19 2021 -0800

    lto: Don't run ipa-comdats pass during LTO
    
    When the ipa-visibility pass is invoked during LTO, it attempts to
    localize comdat symbols and destroy their comdat group.  The
    ipa-comdats pass subsequently attempts to do the opposite and move
    symbols into comdats, but without marking them as DECL_COMDAT or
    restoring appropriate visibility flags.  This confuses the
    lto-partition pass and leads to ICEs.  Running the ipa-comdats pass at
    link time seems counterproductive in terms of optimization, so the
    simplest fix is not to do that.
    
    2021-12-05  Sandra Loosemore  <sandra@codesourcery.com>
    
    	gcc/
    	* ipa-comdats.c (pass_ipa_comdats::gate): Skip this pass if it
    	would un-do the comdat localization already performed at LTO
    	time in ipa-visibility.c.

diff --git a/gcc/ipa-comdats.c b/gcc/ipa-comdats.c
index a9d2993..d178734 100644
--- a/gcc/ipa-comdats.c
+++ b/gcc/ipa-comdats.c
@@ -428,6 +428,13 @@  public:
 bool
 pass_ipa_comdats::gate (function *)
 {
+  if ((in_lto_p || flag_whole_program) && !flag_incremental_link)
+    /* This is the combination of flags cgraph_externally_visible_p in
+       ipa-visibility.c uses to enable localization of comdat symbols.
+       Do not attempt to undo that by trying to re-insert symbols into
+       comdats without their original visibility information, as it
+       causes assertion failures in lto-partition. */
+    return false;
   return HAVE_COMDAT_GROUP;
 }