Add -fopt-builtin optimization option

Message ID 20211031101238.523164-1-keithp@keithp.com
State New
Headers
Series Add -fopt-builtin optimization option |

Commit Message

Keith Packard Oct. 31, 2021, 10:12 a.m. UTC
  This option (enabled by default) controls optimizations which convert
a sequence of operations into an equivalent sequence that includes
calls to builtin functions. Typical cases here are code which matches
memcpy, calloc, sincos.

The -ftree-loop-distribute-patterns flag only covers converting loops
into builtin calls, not numerous other places where knowledge of
builtin function semantics changes the generated code.

The goal is to allow built-in functions to be declared by the compiler
and used directly by the application, but to disable optimizations
which create new calls to them, and to allow this optimization
behavior to be changed for individual functions by decorating the
function definition like this:

	void
	attribute((optimize("no-opt-builtin")))
	sincos(double x, double *s, double *c)
	{
		*s = sin(x);
		*c = cos(x);
	}

This also avoids converting loops into library calls like this:

	void *
	attribute((optimize("no-opt-builtin")))
	memcpy(void *__restrict__ dst, const void *__restrict__ src, size_t n)
	{
		char *d = dst;
		const char *s = src;

		while (n--)
			*d++ = *s++;
		return dst;
	}

As well as disabling analysis of memory lifetimes around free as in
this example:

	void *
	attribute((optimize("no-opt-builtin")))
	erase_and_free(void *ptr)
	{
		memset(ptr, '\0', malloc_usable_size(ptr));
		free(ptr);
	}

Clang has a more sophisticated version of this mechanism which
can disable all builtins, or disable a specific builtin:

	double
	attribute((no_builtin("exp2")))
	exp2(double x)
	{
		return pow (2.0, x);
	}

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 gcc/builtins.c               | 6 ++++++
 gcc/common.opt               | 4 ++++
 gcc/gimple.c                 | 3 +++
 gcc/tree-loop-distribution.c | 2 ++
 4 files changed, 15 insertions(+)
  

Comments

Richard Biener Nov. 2, 2021, 1:06 p.m. UTC | #1
On Sun, Oct 31, 2021 at 11:13 AM Keith Packard via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This option (enabled by default) controls optimizations which convert
> a sequence of operations into an equivalent sequence that includes
> calls to builtin functions. Typical cases here are code which matches
> memcpy, calloc, sincos.
>
> The -ftree-loop-distribute-patterns flag only covers converting loops
> into builtin calls, not numerous other places where knowledge of
> builtin function semantics changes the generated code.
>
> The goal is to allow built-in functions to be declared by the compiler
> and used directly by the application, but to disable optimizations
> which create new calls to them, and to allow this optimization
> behavior to be changed for individual functions by decorating the
> function definition like this:
>
>         void
>         attribute((optimize("no-opt-builtin")))
>         sincos(double x, double *s, double *c)
>         {
>                 *s = sin(x);
>                 *c = cos(x);
>         }
>
> This also avoids converting loops into library calls like this:
>
>         void *
>         attribute((optimize("no-opt-builtin")))
>         memcpy(void *__restrict__ dst, const void *__restrict__ src, size_t n)
>         {
>                 char *d = dst;
>                 const char *s = src;
>
>                 while (n--)
>                         *d++ = *s++;
>                 return dst;
>         }
>
> As well as disabling analysis of memory lifetimes around free as in
> this example:
>
>         void *
>         attribute((optimize("no-opt-builtin")))
>         erase_and_free(void *ptr)
>         {
>                 memset(ptr, '\0', malloc_usable_size(ptr));
>                 free(ptr);
>         }
>
> Clang has a more sophisticated version of this mechanism which
> can disable all builtins, or disable a specific builtin:
>
>         double
>         attribute((no_builtin("exp2")))
>         exp2(double x)
>         {
>                 return pow (2.0, x);
>         }

I don't think it reliably works the way you implement it.  It's also having
more side-effects than what you document, in particular

  pow (2.0, x);

will now clobber and use global memory (besides errno).

I think you may want to instead change builtin_decl_implicit
to avoid code-generating a specific builtin.

Generally we'd also want sth like the clang attribute and _not_
use optimize("") for this or a global flag_*, so the behavior can
be more readily encoded in the IL.  In fact a flag on the call
statement could be added to denote the desired effect on it.

I also don't see the advantage compared to -fno-builtin[-foo].
Declaring the function should be something that's already done.

Richard.

> Signed-off-by: Keith Packard <keithp@keithp.com>
> ---
>  gcc/builtins.c               | 6 ++++++
>  gcc/common.opt               | 4 ++++
>  gcc/gimple.c                 | 3 +++
>  gcc/tree-loop-distribution.c | 2 ++
>  4 files changed, 15 insertions(+)
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 7d0f61fc98b..7aae57deab5 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -1922,6 +1922,9 @@ mathfn_built_in_2 (tree type, combined_fn fn)
>    built_in_function fcodef64x = END_BUILTINS;
>    built_in_function fcodef128x = END_BUILTINS;
>
> +  if (flag_no_opt_builtin)
> +    return END_BUILTINS;
> +
>    switch (fn)
>      {
>  #define SEQ_OF_CASE_MATHFN                     \
> @@ -2125,6 +2128,9 @@ mathfn_built_in_type (combined_fn fn)
>    case CFN_BUILT_IN_##MATHFN##L_R:             \
>      return long_double_type_node;
>
> +  if (flag_no_opt_builtin)
> +    return NULL_TREE;
> +
>    switch (fn)
>      {
>      SEQ_OF_CASE_MATHFN
> diff --git a/gcc/common.opt b/gcc/common.opt
> index eeba1a727f2..d6111cc776a 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2142,6 +2142,10 @@ fomit-frame-pointer
>  Common Var(flag_omit_frame_pointer) Optimization
>  When possible do not generate stack frames.
>
> +fopt-builtin
> +Common Var(flag_no_opt_builtin, 0) Optimization
> +Match code sequences equivalent to builtin functions
> +
>  fopt-info
>  Common Var(flag_opt_info) Optimization
>  Enable all optimization info dumps on stderr.
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index 22dd6417d19..5b82b9409c0 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -2790,6 +2790,9 @@ gimple_builtin_call_types_compatible_p (const gimple *stmt, tree fndecl)
>  {
>    gcc_checking_assert (DECL_BUILT_IN_CLASS (fndecl) != NOT_BUILT_IN);
>
> +  if (flag_no_opt_builtin)
> +    return false;
> +
>    tree ret = gimple_call_lhs (stmt);
>    if (ret
>        && !useless_type_conversion_p (TREE_TYPE (ret),
> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
> index 583c01a42d8..43f22a3c7ce 100644
> --- a/gcc/tree-loop-distribution.c
> +++ b/gcc/tree-loop-distribution.c
> @@ -1859,6 +1859,7 @@ loop_distribution::classify_partition (loop_p loop,
>
>    /* Perform general partition disqualification for builtins.  */
>    if (volatiles_p
> +      || flag_no_opt_builtin
>        || !flag_tree_loop_distribute_patterns)
>      return has_reduction;
>
> @@ -3764,6 +3765,7 @@ loop_distribution::execute (function *fun)
>        /* Don't distribute multiple exit edges loop, or cold loop when
>           not doing pattern detection.  */
>        if (!single_exit (loop)
> +         || flag_no_opt_builtin
>           || (!flag_tree_loop_distribute_patterns
>               && !optimize_loop_for_speed_p (loop)))
>         continue;
> --
> 2.33.0
>
  
Keith Packard Nov. 2, 2021, 4:08 p.m. UTC | #2
Richard Biener <richard.guenther@gmail.com> writes:

> I don't think it reliably works the way you implement it.  It's also having
> more side-effects than what you document, in particular

Yeah, I made a 'minimal' patch that had the effect I needed, but it's
clearly in the wrong place as it disables the matching of builtins
against the incoming source code instead of the generation of new
builtin references from the tree.

> I think you may want to instead change builtin_decl_implicit
> to avoid code-generating a specific builtin.

Yup, I looked at that and there are numerous places which assume that
will work, so it will be a more complicated patch.

> Generally we'd also want sth like the clang attribute and _not_
> use optimize("") for this or a global flag_*, so the behavior can
> be more readily encoded in the IL.  In fact a flag on the call
> statement could be added to denote the desired effect on it.

Agreed, using the existing optimize attribute was a short-cut to
leverage the existing code handling that case. If we think providing
something that matches the clang attribute would be useful, it makes
sense to provide it using the same syntax.

> I also don't see the advantage compared to -fno-builtin[-foo].
> Declaring the function should be something that's already done.

The semantic of the clang option is not to completely disable access
to the given builtin function, but rather to stop the optimizer from
creating new builtin function references (either to a specific builtin,
or to all builtins).

If I could use "no-builtin" in a function attribute, I probably wouldn't
have bothered looking to implement the clang semantics, but -fno-builtin
isn't supported in this way. But, now that I think I understand the
behavior of attribute((no_builtin)) in clang, I think it has value
beyond what -fno-builtin performs as you can still gain access to
builtin functions when they are directly named.

I'll go implement changes in builtin_decl_implicit and all of the
affected call sites and see what that looks like.

Thanks much for your review!
  

Patch

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 7d0f61fc98b..7aae57deab5 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -1922,6 +1922,9 @@  mathfn_built_in_2 (tree type, combined_fn fn)
   built_in_function fcodef64x = END_BUILTINS;
   built_in_function fcodef128x = END_BUILTINS;
 
+  if (flag_no_opt_builtin)
+    return END_BUILTINS;
+
   switch (fn)
     {
 #define SEQ_OF_CASE_MATHFN			\
@@ -2125,6 +2128,9 @@  mathfn_built_in_type (combined_fn fn)
   case CFN_BUILT_IN_##MATHFN##L_R:		\
     return long_double_type_node;
 
+  if (flag_no_opt_builtin)
+    return NULL_TREE;
+
   switch (fn)
     {
     SEQ_OF_CASE_MATHFN
diff --git a/gcc/common.opt b/gcc/common.opt
index eeba1a727f2..d6111cc776a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2142,6 +2142,10 @@  fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
 
+fopt-builtin
+Common Var(flag_no_opt_builtin, 0) Optimization
+Match code sequences equivalent to builtin functions
+
 fopt-info
 Common Var(flag_opt_info) Optimization
 Enable all optimization info dumps on stderr.
diff --git a/gcc/gimple.c b/gcc/gimple.c
index 22dd6417d19..5b82b9409c0 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2790,6 +2790,9 @@  gimple_builtin_call_types_compatible_p (const gimple *stmt, tree fndecl)
 {
   gcc_checking_assert (DECL_BUILT_IN_CLASS (fndecl) != NOT_BUILT_IN);
 
+  if (flag_no_opt_builtin)
+    return false;
+
   tree ret = gimple_call_lhs (stmt);
   if (ret
       && !useless_type_conversion_p (TREE_TYPE (ret),
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 583c01a42d8..43f22a3c7ce 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1859,6 +1859,7 @@  loop_distribution::classify_partition (loop_p loop,
 
   /* Perform general partition disqualification for builtins.  */
   if (volatiles_p
+      || flag_no_opt_builtin
       || !flag_tree_loop_distribute_patterns)
     return has_reduction;
 
@@ -3764,6 +3765,7 @@  loop_distribution::execute (function *fun)
       /* Don't distribute multiple exit edges loop, or cold loop when
          not doing pattern detection.  */
       if (!single_exit (loop)
+	  || flag_no_opt_builtin
 	  || (!flag_tree_loop_distribute_patterns
 	      && !optimize_loop_for_speed_p (loop)))
 	continue;