Introduce -finstrument-functions-once

Message ID 4713782.GXAFRqVoOG@fomalhaut
State New
Headers
Series Introduce -finstrument-functions-once |

Commit Message

Eric Botcazou May 24, 2022, 10:48 a.m. UTC
  Hi,

some time ago we were requested to implement a -finstrument-functions-once
switch in the compiler, with the semantics that the profiling functions be
called only once per instrumented function.  The goal was to make it possible
to use it in (large) production binaries to do function-level coverage, so the
overhead must be minimum and, in particular, there is no protection against
data races so the "once" moniker is imprecise.

Tested on x86-64/Linux, OK for the mainline?


2022-05-24  Eric Botcazou  <ebotcazou@adacore.com>

	* common.opt (finstrument-functions): Set explicit value.
	(-finstrument-functions-once): New option.
	* doc/invoke.texi (Program Instrumentation Options): Document it.
	* gimplify.c (build_instrumentation_call): New static function.
	(gimplify_function_tree): Invoke it to emit the instrumentation calls
	if -finstrument-functions[-once] is specified.
  

Comments

Richard Biener June 2, 2022, 10:14 a.m. UTC | #1
On Tue, May 24, 2022 at 12:49 PM Eric Botcazou via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> some time ago we were requested to implement a -finstrument-functions-once
> switch in the compiler, with the semantics that the profiling functions be
> called only once per instrumented function.  The goal was to make it possible
> to use it in (large) production binaries to do function-level coverage, so the
> overhead must be minimum and, in particular, there is no protection against
> data races so the "once" moniker is imprecise.

So that also applies to

"... and the second profiling function is called before the exit
+corresponding to this first entry"

specifically "corresponding to this first entry"?   As if the second
entry exits first will that call the second profiling function or will
it really be the thread that called the first profiling function
(what happens when that thread terminates before calling the second
profiling function? (***)).  Consider re-wording this slightly.

+      /* If -finstrument-functions-once is specified, generate:
+
+          static volatile bool F.0 = true;
+          bool tmp_first;

is there any good reason to make F.0 volatile?  That doesn't prevent
races.  Any reason to make F.0 initialized to true rather than false
(bss init?)

(***) looking at the implementation the second profiling function
can end up being never called when the thread calling the first
profiling function does not exit the function.  So I wonder if
the "optimization"(?) not re-reading F.0 makes sense (it also
requires to keep the value of F.0 live across the whole function)

Otherwise looks OK to me.

Richard.

> Tested on x86-64/Linux, OK for the mainline?
>
>
> 2022-05-24  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * common.opt (finstrument-functions): Set explicit value.
>         (-finstrument-functions-once): New option.
>         * doc/invoke.texi (Program Instrumentation Options): Document it.
>         * gimplify.c (build_instrumentation_call): New static function.
>         (gimplify_function_tree): Invoke it to emit the instrumentation calls
>         if -finstrument-functions[-once] is specified.
>
> --
> Eric Botcazou
  
Eric Botcazou June 13, 2022, 9:46 a.m. UTC | #2
> So that also applies to
> 
> "... and the second profiling function is called before the exit
> +corresponding to this first entry"
> 
> specifically "corresponding to this first entry"?   As if the second
> entry exits first will that call the second profiling function or will
> it really be the thread that called the first profiling function
> (what happens when that thread terminates before calling the second
> profiling function? (***)).  Consider re-wording this slightly.

The calls are always paired, i.e. if a thread calls the first function, then 
it will call the second function; I can indeed state it explicitly in the doc.

> +      /* If -finstrument-functions-once is specified, generate:
> +
> +          static volatile bool F.0 = true;
> +          bool tmp_first;
> 
> is there any good reason to make F.0 volatile?  That doesn't prevent
> races.

No, it does not, but it guarantees a single read so the pairing.

> Any reason to make F.0 initialized to true rather than false (bss init?)

None, changed.

> (***) looking at the implementation the second profiling function
> can end up being never called when the thread calling the first
> profiling function does not exit the function.  So I wonder if
> the "optimization"(?) not re-reading F.0 makes sense (it also
> requires to keep the value of F.0 live across the whole function)

It's for the pairing.  The value should be spilled onto the stack if need be, 
so you'd get at most 2 loads like if you re-read the variable.

Revised patch attached.
  
Richard Biener June 13, 2022, 11:26 a.m. UTC | #3
On Mon, Jun 13, 2022 at 11:46 AM Eric Botcazou <botcazou@adacore.com> wrote:
>
> > So that also applies to
> >
> > "... and the second profiling function is called before the exit
> > +corresponding to this first entry"
> >
> > specifically "corresponding to this first entry"?   As if the second
> > entry exits first will that call the second profiling function or will
> > it really be the thread that called the first profiling function
> > (what happens when that thread terminates before calling the second
> > profiling function? (***)).  Consider re-wording this slightly.
>
> The calls are always paired, i.e. if a thread calls the first function, then
> it will call the second function; I can indeed state it explicitly in the doc.
>
> > +      /* If -finstrument-functions-once is specified, generate:
> > +
> > +          static volatile bool F.0 = true;
> > +          bool tmp_first;
> >
> > is there any good reason to make F.0 volatile?  That doesn't prevent
> > races.
>
> No, it does not, but it guarantees a single read so the pairing.
>
> > Any reason to make F.0 initialized to true rather than false (bss init?)
>
> None, changed.
>
> > (***) looking at the implementation the second profiling function
> > can end up being never called when the thread calling the first
> > profiling function does not exit the function.  So I wonder if
> > the "optimization"(?) not re-reading F.0 makes sense (it also
> > requires to keep the value of F.0 live across the whole function)
>
> It's for the pairing.  The value should be spilled onto the stack if need be,
> so you'd get at most 2 loads like if you re-read the variable.
>
> Revised patch attached.

OK.

Thanks,
Richard.

>
> --
> Eric Botcazou
  

Patch

diff --git a/gcc/common.opt b/gcc/common.opt
index 8a0dafc522d..c82df1778e6 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1878,9 +1878,13 @@  EnumValue
 Enum(cf_protection_level) String(none) Value(CF_NONE)
 
 finstrument-functions
-Common Var(flag_instrument_function_entry_exit)
+Common Var(flag_instrument_function_entry_exit,1)
 Instrument function entry and exit with profiling calls.
 
+finstrument-functions-once
+Common Var(flag_instrument_function_entry_exit,2)
+Instrument function entry and exit with profiling calls invoked once.
+
 finstrument-functions-exclude-function-list=
 Common RejectNegative Joined
 -finstrument-functions-exclude-function-list=name,...	Do not instrument listed functions.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e8e6d4e039b..eaea3a7cb93 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -617,7 +617,7 @@  Objective-C and Objective-C++ Dialects}.
 -fno-stack-limit  -fsplit-stack @gol
 -fvtable-verify=@r{[}std@r{|}preinit@r{|}none@r{]} @gol
 -fvtv-counts  -fvtv-debug @gol
--finstrument-functions @gol
+-finstrument-functions  -finstrument-functions-once @gol
 -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol
 -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{}} @gol
 -fprofile-prefix-map=@var{old}=@var{new}
@@ -16365,6 +16365,20 @@  cannot safely be called (perhaps signal handlers, if the profiling
 routines generate output or allocate memory).
 @xref{Common Function Attributes}.
 
+@item -finstrument-functions-once
+@opindex -finstrument-functions-once
+This is similar to @option{-finstrument-functions}, but the profiling
+functions are called only once per instrumented function, i.e. the first
+profiling function is called after the first entry into the instrumented
+function and the second profiling function is called before the exit
+corresponding to this first entry.
+
+The definition of @code{once} for the purpose of this option is a little
+vague because the implementation is not protected against data races.
+As a result, the implementation only guarantees that the profiling
+functions are invoked at @emph{least} once per process and at @emph{most}
+once per thread.
+
 @item -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{}
 @opindex finstrument-functions-exclude-file-list
 
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 260993be215..8ce15a2adad 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -16570,6 +16570,51 @@  flag_instrument_functions_exclude_p (tree fndecl)
   return false;
 }
 
+/* Build a call to the instrumentation function FNCODE and add it to SEQ.
+   If COND_VAR is not NULL, it is a boolean variable guarding the call to
+   the instrumentation function.  IF STMT is not NULL, it is a statement
+   to be executed just before the call to the instrumentation function.  */
+
+static void
+build_instrumentation_call (gimple_seq *seq, enum built_in_function fncode,
+			    tree cond_var, gimple *stmt)
+{
+  /* The instrumentation hooks aren't going to call the instrumented
+     function and the address they receive is expected to be matchable
+     against symbol addresses.  Make sure we don't create a trampoline,
+     in case the current function is nested.  */
+  tree this_fn_addr = build_fold_addr_expr (current_function_decl);
+  TREE_NO_TRAMPOLINE (this_fn_addr) = 1;
+
+  tree label_true, label_false;
+  if (cond_var)
+    {
+      label_true = create_artificial_label (UNKNOWN_LOCATION);
+      label_false = create_artificial_label (UNKNOWN_LOCATION);
+      gcond *cond = gimple_build_cond (EQ_EXPR, cond_var, boolean_true_node,
+				      label_true, label_false);
+      gimplify_seq_add_stmt (seq, cond);
+      gimplify_seq_add_stmt (seq, gimple_build_label (label_true));
+      gimplify_seq_add_stmt (seq, gimple_build_predict (PRED_COLD_LABEL,
+							NOT_TAKEN));
+    }
+
+  if (stmt)
+    gimplify_seq_add_stmt (seq, stmt);
+
+  tree x = builtin_decl_implicit (BUILT_IN_RETURN_ADDRESS);
+  gcall *call = gimple_build_call (x, 1, integer_zero_node);
+  tree tmp_var = create_tmp_var (ptr_type_node, "return_addr");
+  gimple_call_set_lhs (call, tmp_var);
+  gimplify_seq_add_stmt (seq, call);
+  x = builtin_decl_implicit (fncode);
+  call = gimple_build_call (x, 2, this_fn_addr, tmp_var);
+  gimplify_seq_add_stmt (seq, call);
+
+  if (cond_var)
+    gimplify_seq_add_stmt (seq, gimple_build_label (label_false));
+}
+
 /* Entry point to the gimplification pass.  FNDECL is the FUNCTION_DECL
    node for the function we want to gimplify.
 
@@ -16620,40 +16665,60 @@  gimplify_function_tree (tree fndecl)
 	   && DECL_DISREGARD_INLINE_LIMITS (fndecl))
       && !flag_instrument_functions_exclude_p (fndecl))
     {
-      tree x;
-      gbind *new_bind;
-      gimple *tf;
-      gimple_seq cleanup = NULL, body = NULL;
-      tree tmp_var, this_fn_addr;
-      gcall *call;
-
-      /* The instrumentation hooks aren't going to call the instrumented
-	 function and the address they receive is expected to be matchable
-	 against symbol addresses.  Make sure we don't create a trampoline,
-	 in case the current function is nested.  */
-      this_fn_addr = build_fold_addr_expr (current_function_decl);
-      TREE_NO_TRAMPOLINE (this_fn_addr) = 1;
-
-      x = builtin_decl_implicit (BUILT_IN_RETURN_ADDRESS);
-      call = gimple_build_call (x, 1, integer_zero_node);
-      tmp_var = create_tmp_var (ptr_type_node, "return_addr");
-      gimple_call_set_lhs (call, tmp_var);
-      gimplify_seq_add_stmt (&cleanup, call);
-      x = builtin_decl_implicit (BUILT_IN_PROFILE_FUNC_EXIT);
-      call = gimple_build_call (x, 2, this_fn_addr, tmp_var);
-      gimplify_seq_add_stmt (&cleanup, call);
-      tf = gimple_build_try (seq, cleanup, GIMPLE_TRY_FINALLY);
-
-      x = builtin_decl_implicit (BUILT_IN_RETURN_ADDRESS);
-      call = gimple_build_call (x, 1, integer_zero_node);
-      tmp_var = create_tmp_var (ptr_type_node, "return_addr");
-      gimple_call_set_lhs (call, tmp_var);
-      gimplify_seq_add_stmt (&body, call);
-      x = builtin_decl_implicit (BUILT_IN_PROFILE_FUNC_ENTER);
-      call = gimple_build_call (x, 2, this_fn_addr, tmp_var);
-      gimplify_seq_add_stmt (&body, call);
+      tree cond_var = NULL_TREE;
+      gimple_seq body = NULL, cleanup = NULL;
+      gassign *assign = NULL;
+
+      /* If -finstrument-functions-once is specified, generate:
+
+	   static volatile bool F.0 = true;
+	   bool tmp_first;
+
+	   tmp_first = F.0;
+	   if (tmp_first)
+	     {
+	       F.0 = false;
+	       [call profiling enter function]
+	     }
+
+	 without specific protection for data races.  */
+      if (flag_instrument_function_entry_exit > 1)
+	{
+	  tree first_var
+	    = build_decl (DECL_SOURCE_LOCATION (current_function_decl),
+			  VAR_DECL,
+			  create_tmp_var_name ("F"),
+			  boolean_type_node);
+	  DECL_ARTIFICIAL (first_var) = 1;
+	  DECL_IGNORED_P (first_var) = 1;
+	  TREE_STATIC (first_var) = 1;
+	  TREE_THIS_VOLATILE (first_var) = 1;
+	  TREE_USED (first_var) = 1;
+	  DECL_INITIAL (first_var) = boolean_true_node;
+	  varpool_node::add (first_var);
+
+	  cond_var = create_tmp_var (boolean_type_node, "tmp_first");
+	  assign = gimple_build_assign (cond_var, first_var);
+	  gimplify_seq_add_stmt (&body, assign);
+
+	  assign = gimple_build_assign (first_var, boolean_false_node);
+	}
+
+      build_instrumentation_call (&body, BUILT_IN_PROFILE_FUNC_ENTER,
+				  cond_var, assign);
+
+      /* If -finstrument-functions-once is specified, generate:
+
+	   if (tmp_first)
+	     [call profiling exit function]
+
+	 without specific protection for data races.  */
+      build_instrumentation_call (&cleanup, BUILT_IN_PROFILE_FUNC_EXIT,
+				  cond_var, NULL);
+
+      gimple *tf = gimple_build_try (seq, cleanup, GIMPLE_TRY_FINALLY);
       gimplify_seq_add_stmt (&body, tf);
-      new_bind = gimple_build_bind (NULL, body, NULL);
+      gbind *new_bind = gimple_build_bind (NULL, body, NULL);
 
       /* Replace the current function body with the body
          wrapped in the try/finally TF.  */