options: Make -Ofast switch off -fsemantic-interposition

Message ID ri6czn5z8mw.fsf@suse.cz
State Committed
Commit 458d2c689963d8461d84670a3d8988cd6ecbfd81
Headers
Series options: Make -Ofast switch off -fsemantic-interposition |

Commit Message

Martin Jambor Nov. 12, 2021, 3:38 p.m. UTC
  Hi,

using -fno-semantic-interposition has been reported by various people
to bring about considerable speed up at the cost of strict compliance
to the ELF symbol interposition rules  See for example
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup

As such I believe it should be implied by our -Ofast optimization
level, not only so that benchmarks that can benefit run faster, but
also so that people looking at -Ofast documentation for options that
could speed their programs find it.

I have verified that with the following patch IPA-CP sees
flag_semantic_interposition set to zero at Ofast and that info and pdf
manual builds fine with the documentation change.  I am bootstrapping
and testing it now in order to comply with submission criteria but I
don't think an Ofast change gets much tested.

Assuming it passes, is the patch OK?  (If it is, I will also add a note
about it in the "Caveats" section in gcc-12/changes.html of wwwdocs
after I commit the patch.)

Thanks,

Martin


gcc/ChangeLog:

2021-11-12  Martin Jambor  <mjambor@suse.cz>

	* opts.c (default_options_table): Switch off
	flag_semantic_interposition at Ofast.
	* doc/invoke.texi (Optimize Options): Document that Ofast switches off
	-fsemantic-interposition.
---
 gcc/doc/invoke.texi | 1 +
 gcc/opts.c          | 1 +
 2 files changed, 2 insertions(+)
  

Comments

Jan Hubicka Nov. 12, 2021, 3:40 p.m. UTC | #1
> Hi,
> 
> using -fno-semantic-interposition has been reported by various people
> to bring about considerable speed up at the cost of strict compliance
> to the ELF symbol interposition rules  See for example
> https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
> 
> As such I believe it should be implied by our -Ofast optimization
> level, not only so that benchmarks that can benefit run faster, but
> also so that people looking at -Ofast documentation for options that
> could speed their programs find it.
> 
> I have verified that with the following patch IPA-CP sees
> flag_semantic_interposition set to zero at Ofast and that info and pdf
> manual builds fine with the documentation change.  I am bootstrapping
> and testing it now in order to comply with submission criteria but I
> don't think an Ofast change gets much tested.
> 
> Assuming it passes, is the patch OK?  (If it is, I will also add a note
> about it in the "Caveats" section in gcc-12/changes.html of wwwdocs
> after I commit the patch.)
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2021-11-12  Martin Jambor  <mjambor@suse.cz>
> 
> 	* opts.c (default_options_table): Switch off
> 	flag_semantic_interposition at Ofast.
> 	* doc/invoke.texi (Optimize Options): Document that Ofast switches off
> 	-fsemantic-interposition.
OK,
thanks!
Honza
  
Martin Jambor Nov. 12, 2021, 10:14 p.m. UTC | #2
Hi,

On Fri, Nov 12 2021, Martin Jambor wrote:
> Hi,
>
> using -fno-semantic-interposition has been reported by various people
> to bring about considerable speed up at the cost of strict compliance
> to the ELF symbol interposition rules  See for example
> https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
>
> As such I believe it should be implied by our -Ofast optimization
> level, not only so that benchmarks that can benefit run faster, but
> also so that people looking at -Ofast documentation for options that
> could speed their programs find it.
>
> I have verified that with the following patch IPA-CP sees
> flag_semantic_interposition set to zero at Ofast and that info and pdf
> manual builds fine with the documentation change.  I am bootstrapping
> and testing it now in order to comply with submission criteria but I
> don't think an Ofast change gets much tested.
>
> Assuming it passes, is the patch OK?  (If it is, I will also add a note
> about it in the "Caveats" section in gcc-12/changes.html of wwwdocs
> after I commit the patch.)
>

Unfortunately, I was wrong, there are testcases which use the optimize
attribute to switch a function to Ofast and those ICE because
-fsemantic-interposition is not an optimization flag and only
optimization flags can change in an optimize attribute (AFAIK, I only
had a quick glance at the results).

I am not sure what is the right way to tackle this, whether to set the
flag at Ofast in some nonstandard way or make the flag an optimization
flag - probably affecting function definitions, having it affect
call-sites seems too fine-grained.  I will try to discuss this on IRC on
Monday (and hope such change is still doable early stage3).

Sorry for posting this a bit prematurely,

Martin

>
>
> gcc/ChangeLog:
>
> 2021-11-12  Martin Jambor  <mjambor@suse.cz>
>
> 	* opts.c (default_options_table): Switch off
> 	flag_semantic_interposition at Ofast.
> 	* doc/invoke.texi (Optimize Options): Document that Ofast switches off
> 	-fsemantic-interposition.
> ---
>  gcc/doc/invoke.texi | 1 +
>  gcc/opts.c          | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 2ea23d07c4c..fd16c91aec8 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -10551,6 +10551,7 @@ valid for all standard-compliant programs.
>  It turns on @option{-ffast-math}, @option{-fallow-store-data-races}
>  and the Fortran-specific @option{-fstack-arrays}, unless
>  @option{-fmax-stack-var-size} is specified, and @option{-fno-protect-parens}.
> +It turns off @option {-fsemantic-interposition}.
>  
>  @item -Og
>  @opindex Og
> diff --git a/gcc/opts.c b/gcc/opts.c
> index caed6255500..3da53d8f890 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -682,6 +682,7 @@ static const struct default_options default_options_table[] =
>      /* -Ofast adds optimizations to -O3.  */
>      { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
>      { OPT_LEVELS_FAST, OPT_fallow_store_data_races, NULL, 1 },
> +    { OPT_LEVELS_FAST, OPT_fsemantic_interposition, NULL, 0 },
>  
>      { OPT_LEVELS_NONE, 0, NULL, 0 }
>    };
> -- 
> 2.33.0
  
Jan Hubicka Nov. 18, 2021, 5:31 p.m. UTC | #3
> Hi,
> 
> On Fri, Nov 12 2021, Martin Jambor wrote:
> > Hi,
> >
> > using -fno-semantic-interposition has been reported by various people
> > to bring about considerable speed up at the cost of strict compliance
> > to the ELF symbol interposition rules  See for example
> > https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
> >
> > As such I believe it should be implied by our -Ofast optimization
> > level, not only so that benchmarks that can benefit run faster, but
> > also so that people looking at -Ofast documentation for options that
> > could speed their programs find it.
> >
> > I have verified that with the following patch IPA-CP sees
> > flag_semantic_interposition set to zero at Ofast and that info and pdf
> > manual builds fine with the documentation change.  I am bootstrapping
> > and testing it now in order to comply with submission criteria but I
> > don't think an Ofast change gets much tested.
> >
> > Assuming it passes, is the patch OK?  (If it is, I will also add a note
> > about it in the "Caveats" section in gcc-12/changes.html of wwwdocs
> > after I commit the patch.)
> >
> 
> Unfortunately, I was wrong, there are testcases which use the optimize
> attribute to switch a function to Ofast and those ICE because
> -fsemantic-interposition is not an optimization flag and only
> optimization flags can change in an optimize attribute (AFAIK, I only
> had a quick glance at the results).
> 
> I am not sure what is the right way to tackle this, whether to set the
> flag at Ofast in some nonstandard way or make the flag an optimization
> flag - probably affecting function definitions, having it affect
> call-sites seems too fine-grained.  I will try to discuss this on IRC on
> Monday (and hope such change is still doable early stage3).
> 
> Sorry for posting this a bit prematurely,

Hi,

This patch turns flag_semantic_interposition to optimization option so
it can be enabled with per-function granuality.  This is done by adding
the flag among visibility flags into the symbol table.  This fixes the
behaviour on the testcase I added to testsuite.

There are bugs where get_availability misbehaves on partitioned program.
We can also use the new flag to fix those, but I will do that
incrementally.

The -Ofast change should be safe now.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

2021-11-18  Jan Hubicka  <hubicka@ucw.cz>

	* cgraph.c (cgraph_node::get_availability): Update call of
	decl_replaceable_p.
	(cgraph_node::verify_node): Verify that semantic_interposition flag
	is set correclty.
	* cgraph.h: (symtab_node): Add semantic_interposition flag.
	* cgraphclones.c (set_new_clone_decl_and_node_flags): Clear
	semantic_interposition flag.
	* cgraphunit.c (cgraph_node::finalize_function): Set
	semantic_interposition flag.
	(cgraph_node::add_new_function): Likewise.
	(varpool_node::finalize_decl): Likewise.
	(cgraph_node::create_wrapper): Likewise.
	* common.opt (fsemantic-interposition): Turn to optimization node.
	* lto-cgraph.c (lto_output_node): Stream semantic_interposition.
	(lto_output_varpool_node): Likewise.
	(input_overwrite_node): Likewise.
	(input_varpool_node): Likewise.
	* symtab.c (symtab_node::dump_base):
	* varasm.c (decl_replaceable_p): Add semantic_interposition_p
	parameter.
	* varasm.h (decl_replaceable_p): Update declaration.
	* varpool.c (varpool_node::ctor_useable_for_folding_p):
	Use semantic_interposition flag.
	(varpool_node::get_availability): Likewise.
	(varpool_node::create_alias): Copy semantic_interposition flag.

gcc/cp/ChangeLog:

2021-11-18  Jan Hubicka  <hubicka@ucw.cz>

	* decl.c (finish_function): Update use of decl_replaceable_p.

gcc/lto/ChangeLog:

2021-11-18  Jan Hubicka  <hubicka@ucw.cz>

	* lto-partition.c (promote_symbol): Clear semantic_interposition flag.

gcc/testsuite/ChangeLog:

2021-11-18  Jan Hubicka  <hubicka@ucw.cz>

	* gcc.dg/lto/semantic-interposition-1_0.c: New test.
	* gcc.dg/lto/semantic-interposition-1_1.c: New test.

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 466b66d5ba5..8e7c12642ad 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2390,7 +2390,8 @@ cgraph_node::get_availability (symtab_node *ref)
      to code cp_cannot_inline_tree_fn and probably shall be shared and
      the inlinability hooks completely eliminated).  */
 
-  else if (decl_replaceable_p (decl) && !DECL_EXTERNAL (decl))
+  else if (decl_replaceable_p (decl, semantic_interposition)
+	   && !DECL_EXTERNAL (decl))
     avail = AVAIL_INTERPOSABLE;
   else avail = AVAIL_AVAILABLE;
 
@@ -3486,6 +3487,13 @@ cgraph_node::verify_node (void)
 	     "returns a pointer");
       error_found = true;
     }
+  if (definition && externally_visible
+      && semantic_interposition
+	 != opt_for_fn (decl, flag_semantic_interposition))
+    {
+      error ("semantic interposition mismatch");
+      error_found = true;
+    }
   for (e = indirect_calls; e; e = e->next_callee)
     {
       if (e->aux)
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index e42e305cdb6..dafdfd289a0 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -117,6 +117,7 @@ public:
       refuse_visibility_changes (false), externally_visible (false),
       no_reorder (false), force_output (false), forced_by_abi (false),
       unique_name (false), implicit_section (false), body_removed (false),
+      semantic_interposition (flag_semantic_interposition),
       used_from_other_partition (false), in_other_partition (false),
       address_taken (false), in_init_priority_hash (false),
       need_lto_streaming (false), offloadable (false), ifunc_resolver (false),
@@ -557,6 +558,8 @@ public:
   /* True when body and other characteristics have been removed by
      symtab_remove_unreachable_nodes. */
   unsigned body_removed : 1;
+  /* True when symbol should comply to -fsemantic-interposition flag.  */
+  unsigned semantic_interposition : 1;
 
   /*** WHOPR Partitioning flags.
        These flags are used at ltrans stage when only part of the callgraph is
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index ae91dccd31d..c997b82f3e1 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -172,6 +172,7 @@ set_new_clone_decl_and_node_flags (cgraph_node *new_node)
   new_node->externally_visible = 0;
   new_node->local = 1;
   new_node->lowered = true;
+  new_node->semantic_interposition = 0;
 }
 
 /* Duplicate thunk THUNK if necessary but make it to refer to NODE.
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 55cb0347149..1e58ffd65e8 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -452,6 +452,7 @@ cgraph_node::finalize_function (tree decl, bool no_collect)
   node->definition = true;
   notice_global_symbol (decl);
   node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL;
+  node->semantic_interposition = opt_for_fn (decl, flag_semantic_interposition);
   if (!flag_toplevel_reorder)
     node->no_reorder = true;
 
@@ -554,6 +555,8 @@ cgraph_node::add_new_function (tree fndecl, bool lowered)
 	node = cgraph_node::get_create (fndecl);
 	node->local = false;
 	node->definition = true;
+	node->semantic_interposition = opt_for_fn (fndecl,
+						   flag_semantic_interposition);
 	node->force_output = true;
 	if (TREE_PUBLIC (fndecl))
 	  node->externally_visible = true;
@@ -581,6 +584,8 @@ cgraph_node::add_new_function (tree fndecl, bool lowered)
 	if (lowered)
 	  node->lowered = true;
 	node->definition = true;
+	node->semantic_interposition = opt_for_fn (fndecl,
+						   flag_semantic_interposition);
 	node->analyze ();
 	push_cfun (DECL_STRUCT_FUNCTION (fndecl));
 	gimple_register_cfg_hooks ();
@@ -954,6 +959,7 @@ varpool_node::finalize_decl (tree decl)
   /* Set definition first before calling notice_global_symbol so that
      it is available to notice_global_symbol.  */
   node->definition = true;
+  node->semantic_interposition = flag_semantic_interposition;
   notice_global_symbol (decl);
   if (!flag_toplevel_reorder)
     node->no_reorder = true;
@@ -2576,6 +2582,7 @@ cgraph_node::create_wrapper (cgraph_node *target)
 
   /* Turn alias into thunk and expand it into GIMPLE representation.  */
   definition = true;
+  semantic_interposition = opt_for_fn (decl, flag_semantic_interposition);
 
   /* Create empty thunk, but be sure we did not keep former thunk around.
      In that case we would need to preserve the info.  */
diff --git a/gcc/common.opt b/gcc/common.opt
index de9b848eda5..db6010e4e20 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2539,7 +2539,7 @@ Common Var(flag_sel_sched_reschedule_pipelined) Init(0) Optimization
 Reschedule pipelined regions without pipelining.
 
 fsemantic-interposition
-Common Var(flag_semantic_interposition) Init(1)
+Common Var(flag_semantic_interposition) Init(1) Optimization
 Allow interposing function (or variables) by ones with different semantics (or initializer) respectively by dynamic linker.
 
 ; sched_stalled_insns means that insns can be moved prematurely from the queue
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 2ddf0e4a524..9f68d1a5590 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -17609,7 +17609,8 @@ finish_function (bool inline_p)
   if (!processing_template_decl
       && !cp_function_chain->can_throw
       && !flag_non_call_exceptions
-      && !decl_replaceable_p (fndecl))
+      && !decl_replaceable_p (fndecl,
+			      opt_for_fn (fndecl, flag_semantic_interposition)))
     TREE_NOTHROW (fndecl) = 1;
 
   /* This must come after expand_function_end because cleanups might
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 7c3e276a8ea..ef3c371c98f 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -516,6 +516,7 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
   bp_pack_value (&bp, node->forced_by_abi, 1);
   bp_pack_value (&bp, node->unique_name, 1);
   bp_pack_value (&bp, node->body_removed, 1);
+  bp_pack_value (&bp, node->semantic_interposition, 1);
   bp_pack_value (&bp, node->implicit_section, 1);
   bp_pack_value (&bp, node->address_taken, 1);
   bp_pack_value (&bp, tag == LTO_symtab_analyzed_node
@@ -604,6 +605,7 @@ lto_output_varpool_node (struct lto_simple_output_block *ob, varpool_node *node,
 		 node->body_removed
 		 || (!encode_initializer_p && !node->alias && node->definition),
 		 1);
+  bp_pack_value (&bp, node->semantic_interposition, 1);
   bp_pack_value (&bp, node->implicit_section, 1);
   bp_pack_value (&bp, node->writeonly, 1);
   bp_pack_value (&bp, node->definition && (encode_initializer_p || node->alias),
@@ -1167,6 +1169,7 @@ input_overwrite_node (struct lto_file_decl_data *file_data,
   node->forced_by_abi = bp_unpack_value (bp, 1);
   node->unique_name = bp_unpack_value (bp, 1);
   node->body_removed = bp_unpack_value (bp, 1);
+  node->semantic_interposition = bp_unpack_value (bp, 1);
   node->implicit_section = bp_unpack_value (bp, 1);
   node->address_taken = bp_unpack_value (bp, 1);
   node->used_from_other_partition = bp_unpack_value (bp, 1);
@@ -1373,6 +1376,7 @@ input_varpool_node (struct lto_file_decl_data *file_data,
   node->forced_by_abi = bp_unpack_value (&bp, 1);
   node->unique_name = bp_unpack_value (&bp, 1);
   node->body_removed = bp_unpack_value (&bp, 1);
+  node->semantic_interposition = bp_unpack_value (&bp, 1);
   node->implicit_section = bp_unpack_value (&bp, 1);
   node->writeonly = bp_unpack_value (&bp, 1);
   node->definition = bp_unpack_value (&bp, 1);
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index bee40218159..6d20555073e 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -1001,6 +1001,7 @@ promote_symbol (symtab_node *node)
      so it is prevailing.  This is important to keep binds_to_current_def_p
      to work across partitions.  */
   node->resolution = LDPR_PREVAILING_DEF_IRONLY;
+  node->semantic_interposition = false;
   DECL_VISIBILITY (node->decl) = VISIBILITY_HIDDEN;
   DECL_VISIBILITY_SPECIFIED (node->decl) = true;
   if (dump_file)
diff --git a/gcc/symtab.c b/gcc/symtab.c
index c7ea8ecef74..94b4e47f749 100644
--- a/gcc/symtab.c
+++ b/gcc/symtab.c
@@ -929,6 +929,8 @@ symtab_node::dump_base (FILE *f)
     fprintf (f, " forced_by_abi");
   if (externally_visible)
     fprintf (f, " externally_visible");
+  if (semantic_interposition)
+    fprintf (f, " semantic_interposition");
   if (no_reorder)
     fprintf (f, " no_reorder");
   if (resolution != LDPR_UNKNOWN)
diff --git a/gcc/testsuite/gcc.dg/lto/semantic-interposition-1_0.c b/gcc/testsuite/gcc.dg/lto/semantic-interposition-1_0.c
new file mode 100644
index 00000000000..db842749008
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/semantic-interposition-1_0.c
@@ -0,0 +1,13 @@
+/* { dg-lto-do link }  */
+/* { dg-require-effective-target shared } */
+/* { dg-extra-ld-options { -shared } } */
+/* { dg-lto-options {{-O2 -flto -fpic -fdump-ipa-inline-details --shared}} }  */ 
+extern int ret1();
+
+int
+test()
+{
+  return ret1();
+}
+/* { dg-final { scan-wpa-ipa-dump "Inlined 1 calls"  "inline"  } } */
+
diff --git a/gcc/testsuite/gcc.dg/lto/semantic-interposition-1_1.c b/gcc/testsuite/gcc.dg/lto/semantic-interposition-1_1.c
new file mode 100644
index 00000000000..9a741773002
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/semantic-interposition-1_1.c
@@ -0,0 +1,5 @@
+/* { dg-options "-O2 -flto -fpic -fno-semantic-interposition" }  */ 
+int ret1()
+{
+  return 1;
+}
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 09316c62050..8c7aba2db61 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -7625,6 +7625,8 @@ decl_binds_to_current_def_p (const_tree decl)
    at link-time with an entirely different definition, provided that the
    replacement has the same type.  For example, functions declared
    with __attribute__((weak)) on most systems are replaceable.
+   If SEMANTIC_INTERPOSITION_P is false allow interposition only on
+   symbols explicitly declared weak.
 
    COMDAT functions are not replaceable, since all definitions of the
    function must be equivalent.  It is important that COMDAT functions
@@ -7632,12 +7634,12 @@ decl_binds_to_current_def_p (const_tree decl)
    instantiations is not penalized.  */
 
 bool
-decl_replaceable_p (tree decl)
+decl_replaceable_p (tree decl, bool semantic_interposition_p)
 {
   gcc_assert (DECL_P (decl));
   if (!TREE_PUBLIC (decl) || DECL_COMDAT (decl))
     return false;
-  if (!flag_semantic_interposition
+  if (!semantic_interposition_p
       && !DECL_WEAK (decl))
     return false;
   return !decl_binds_to_current_def_p (decl);
diff --git a/gcc/varasm.h b/gcc/varasm.h
index 442ca6a380a..4ab9dc51838 100644
--- a/gcc/varasm.h
+++ b/gcc/varasm.h
@@ -38,7 +38,7 @@ extern void mark_decl_referenced (tree);
 extern void notice_global_symbol (tree);
 extern void set_user_assembler_name (tree, const char *);
 extern void process_pending_assemble_externals (void);
-extern bool decl_replaceable_p (tree);
+extern bool decl_replaceable_p (tree, bool);
 extern bool decl_binds_to_current_def_p (const_tree);
 extern enum tls_model decl_default_tls_model (const_tree);
 
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 4830df5c320..fd0d53b9eb6 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -372,7 +372,8 @@ varpool_node::ctor_useable_for_folding_p (void)
    */
   if ((!DECL_INITIAL (real_node->decl)
        || (DECL_WEAK (decl) && !DECL_COMDAT (decl)))
-      && (DECL_EXTERNAL (decl) || decl_replaceable_p (decl)))
+      && ((DECL_EXTERNAL (decl) && !in_other_partition)
+	  || decl_replaceable_p (decl, semantic_interposition)))
     return false;
 
   /* Variables declared `const' with an initializer are considered
@@ -511,8 +512,8 @@ varpool_node::get_availability (symtab_node *ref)
   /* If the variable can be overwritten, return OVERWRITABLE.  Takes
      care of at least one notable extension - the COMDAT variables
      used to share template instantiations in C++.  */
-  if (decl_replaceable_p (decl)
-      || DECL_EXTERNAL (decl))
+  if (decl_replaceable_p (decl, semantic_interposition)
+      || (DECL_EXTERNAL (decl) && !in_other_partition))
     return AVAIL_INTERPOSABLE;
   return AVAIL_AVAILABLE;
 }
@@ -779,6 +780,7 @@ varpool_node::create_alias (tree alias, tree decl)
   alias_node = varpool_node::get_create (alias);
   alias_node->alias = true;
   alias_node->definition = true;
+  alias_node->semantic_interposition = flag_semantic_interposition;
   alias_node->alias_target = decl;
   if (lookup_attribute ("weakref", DECL_ATTRIBUTES (alias)) != NULL)
     alias_node->weakref = alias_node->transparent_alias = true;
  
Jan Hubicka Nov. 19, 2021, 11:51 a.m. UTC | #4
> > Hi,
> > 
> > On Fri, Nov 12 2021, Martin Jambor wrote:
> > > Hi,
> > >
> > > using -fno-semantic-interposition has been reported by various people
> > > to bring about considerable speed up at the cost of strict compliance
> > > to the ELF symbol interposition rules  See for example
> > > https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
> > >
> > > As such I believe it should be implied by our -Ofast optimization
> > > level, not only so that benchmarks that can benefit run faster, but
> > > also so that people looking at -Ofast documentation for options that
> > > could speed their programs find it.
> > >
> > > I have verified that with the following patch IPA-CP sees
> > > flag_semantic_interposition set to zero at Ofast and that info and pdf
> > > manual builds fine with the documentation change.  I am bootstrapping
> > > and testing it now in order to comply with submission criteria but I
> > > don't think an Ofast change gets much tested.
> > >
> > > Assuming it passes, is the patch OK?  (If it is, I will also add a note
> > > about it in the "Caveats" section in gcc-12/changes.html of wwwdocs
> > > after I commit the patch.)
> > >
> > 
> > Unfortunately, I was wrong, there are testcases which use the optimize
> > attribute to switch a function to Ofast and those ICE because
> > -fsemantic-interposition is not an optimization flag and only
> > optimization flags can change in an optimize attribute (AFAIK, I only
> > had a quick glance at the results).
> > 
> > I am not sure what is the right way to tackle this, whether to set the
> > flag at Ofast in some nonstandard way or make the flag an optimization
> > flag - probably affecting function definitions, having it affect
> > call-sites seems too fine-grained.  I will try to discuss this on IRC on
> > Monday (and hope such change is still doable early stage3).
> > 
> > Sorry for posting this a bit prematurely,
> 
> Hi,
> 
> This patch turns flag_semantic_interposition to optimization option so
> it can be enabled with per-function granuality.  This is done by adding
> the flag among visibility flags into the symbol table.  This fixes the
> behaviour on the testcase I added to testsuite.
> 
> There are bugs where get_availability misbehaves on partitioned program.
> We can also use the new flag to fix those, but I will do that
> incrementally.
> 
> The -Ofast change should be safe now.

Also forgot to say it explicitly, the patch is OK, so please commit it.

Honza
  
Martin Jambor Nov. 19, 2021, 5:55 p.m. UTC | #5
Hi,

On Fri, Nov 19 2021, Jan Hubicka wrote:
>> > Hi,
>> > 
>> > On Fri, Nov 12 2021, Martin Jambor wrote:
>> > > Hi,
>> > >
>> > > using -fno-semantic-interposition has been reported by various people
>> > > to bring about considerable speed up at the cost of strict compliance
>> > > to the ELF symbol interposition rules  See for example
>> > > https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
>> > >
>> > > As such I believe it should be implied by our -Ofast optimization
>> > > level, not only so that benchmarks that can benefit run faster, but
>> > > also so that people looking at -Ofast documentation for options that
>> > > could speed their programs find it.
>> > >
>> > > I have verified that with the following patch IPA-CP sees
>> > > flag_semantic_interposition set to zero at Ofast and that info and pdf
>> > > manual builds fine with the documentation change.  I am bootstrapping
>> > > and testing it now in order to comply with submission criteria but I
>> > > don't think an Ofast change gets much tested.
>> > >
>> > > Assuming it passes, is the patch OK?  (If it is, I will also add a note
>> > > about it in the "Caveats" section in gcc-12/changes.html of wwwdocs
>> > > after I commit the patch.)
>> > >
>> > 
>> > Unfortunately, I was wrong, there are testcases which use the optimize
>> > attribute to switch a function to Ofast and those ICE because
>> > -fsemantic-interposition is not an optimization flag and only
>> > optimization flags can change in an optimize attribute (AFAIK, I only
>> > had a quick glance at the results).
>> > 
>> > I am not sure what is the right way to tackle this, whether to set the
>> > flag at Ofast in some nonstandard way or make the flag an optimization
>> > flag - probably affecting function definitions, having it affect
>> > call-sites seems too fine-grained.  I will try to discuss this on IRC on
>> > Monday (and hope such change is still doable early stage3).
>> > 
>> > Sorry for posting this a bit prematurely,
>> 
>> Hi,
>> 
>> This patch turns flag_semantic_interposition to optimization option so
>> it can be enabled with per-function granuality.  This is done by adding
>> the flag among visibility flags into the symbol table.  This fixes the
>> behaviour on the testcase I added to testsuite.
>> 
>> There are bugs where get_availability misbehaves on partitioned program.
>> We can also use the new flag to fix those, but I will do that
>> incrementally.
>> 
>> The -Ofast change should be safe now.
>
> Also forgot to say it explicitly, the patch is OK, so please commit it.
>

Thanks a lot for the enabling patch.  I have committed mine after re-testing.

Martin
  
Fangrui Song May 11, 2022, 9:25 p.m. UTC | #6
On Fri, Nov 19, 2021 at 9:55 AM Martin Jambor <mjambor@suse.cz> wrote:
>
> Hi,
>
> On Fri, Nov 19 2021, Jan Hubicka wrote:
> >> > Hi,
> >> >
> >> > On Fri, Nov 12 2021, Martin Jambor wrote:
> >> > > Hi,
> >> > >
> >> > > using -fno-semantic-interposition has been reported by various people
> >> > > to bring about considerable speed up at the cost of strict compliance
> >> > > to the ELF symbol interposition rules  See for example
> >> > > https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
> >> > >
> >> > > As such I believe it should be implied by our -Ofast optimization
> >> > > level, not only so that benchmarks that can benefit run faster, but
> >> > > also so that people looking at -Ofast documentation for options that
> >> > > could speed their programs find it.
> >> > >
> >> > > I have verified that with the following patch IPA-CP sees
> >> > > flag_semantic_interposition set to zero at Ofast and that info and pdf
> >> > > manual builds fine with the documentation change.  I am bootstrapping
> >> > > and testing it now in order to comply with submission criteria but I
> >> > > don't think an Ofast change gets much tested.
> >> > >
> >> > > Assuming it passes, is the patch OK?  (If it is, I will also add a note
> >> > > about it in the "Caveats" section in gcc-12/changes.html of wwwdocs
> >> > > after I commit the patch.)
> >> > >
> >> >
> >> > Unfortunately, I was wrong, there are testcases which use the optimize
> >> > attribute to switch a function to Ofast and those ICE because
> >> > -fsemantic-interposition is not an optimization flag and only
> >> > optimization flags can change in an optimize attribute (AFAIK, I only
> >> > had a quick glance at the results).
> >> >
> >> > I am not sure what is the right way to tackle this, whether to set the
> >> > flag at Ofast in some nonstandard way or make the flag an optimization
> >> > flag - probably affecting function definitions, having it affect
> >> > call-sites seems too fine-grained.  I will try to discuss this on IRC on
> >> > Monday (and hope such change is still doable early stage3).
> >> >
> >> > Sorry for posting this a bit prematurely,
> >>
> >> Hi,
> >>
> >> This patch turns flag_semantic_interposition to optimization option so
> >> it can be enabled with per-function granuality.  This is done by adding
> >> the flag among visibility flags into the symbol table.  This fixes the
> >> behaviour on the testcase I added to testsuite.
> >>
> >> There are bugs where get_availability misbehaves on partitioned program.
> >> We can also use the new flag to fix those, but I will do that
> >> incrementally.
> >>
> >> The -Ofast change should be safe now.
> >
> > Also forgot to say it explicitly, the patch is OK, so please commit it.
> >
>
> Thanks a lot for the enabling patch.  I have committed mine after re-testing.
>
> Martin

Perhaps reconsider https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100937
(configure: Add --enable-default-semantic-interposition)?
Patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572018.html

I happen a write-up on
https://maskray.me/blog/2021-05-09-fno-semantic-interposition
  

Patch

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2ea23d07c4c..fd16c91aec8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10551,6 +10551,7 @@  valid for all standard-compliant programs.
 It turns on @option{-ffast-math}, @option{-fallow-store-data-races}
 and the Fortran-specific @option{-fstack-arrays}, unless
 @option{-fmax-stack-var-size} is specified, and @option{-fno-protect-parens}.
+It turns off @option {-fsemantic-interposition}.
 
 @item -Og
 @opindex Og
diff --git a/gcc/opts.c b/gcc/opts.c
index caed6255500..3da53d8f890 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -682,6 +682,7 @@  static const struct default_options default_options_table[] =
     /* -Ofast adds optimizations to -O3.  */
     { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
     { OPT_LEVELS_FAST, OPT_fallow_store_data_races, NULL, 1 },
+    { OPT_LEVELS_FAST, OPT_fsemantic_interposition, NULL, 0 },
 
     { OPT_LEVELS_NONE, 0, NULL, 0 }
   };