c++, TLS: Support cross-tu static initialization for targets without alias support [PR106435].

Message ID 20221207153939.49157-1-iain@sandoe.co.uk
State New
Headers
Series c++, TLS: Support cross-tu static initialization for targets without alias support [PR106435]. |

Commit Message

Iain Sandoe Dec. 7, 2022, 3:39 p.m. UTC
  This has been tested on x86_64 and arm64 Darwin and on x86_64 linux gnu.
 The basic patch is live in the homebrew macOS support and so has had quite
 wide coverage on non-trivial codebases.
 
 OK for master?
 Iain
 
 Since this actually fixes wrong code, I wonder if we should also consider
 back-porting.
 
 --- >8 ---

The description below relates to the code path when TARGET_SUPPORTS_ALIASES is
false; current operation is maintained for targets with alias support and any
new support code should be DCEd in that case.

--

Currently, cross-tu static initialisation is not supported for targets without
alias support.

The patch adds support by building a shim function in place of the alias for
these targets; the shim simply calls the generic initialiser.  Although this is
slightly less efficient than the alias, in practice (for targets that allow
sibcalls) the penalty is a single jump when code is optimised.

From the perspective of a TU referencing an extern TLS variable, there is no
way to determine if it requires a guarded dynamic init.  So, in the referencing
TU, we build a weak reference to the potential init and check at runtime if the
init is present before calling it.  This strategy is fine for targets that have
ELF semantics, but fails at link time for Mach-O (which does not permit the
reference to be undefined in the static link).

The actual initialiser call is contained in a wrapper function, and to resolve
the Mach-O linker issue, in the TU that is referencing the var, we now generate
both the wrapper _and_ a weak definition of a dummy init function.  In the case
that there _is_ a dynamic init (in a different TU), that version will be non-weak
and will be override the weak dummy one.  In the case that we have a trivial
static init (so no init in any other TU) the weak-defined dummy init will be
called (a single return insn for optimised code).  We mitigate the call to
the dummy init by reworking the wrapper code-gen path to remove the test for
the weak reference function (as it will always be true) since the static linker
will now determine the function to be called.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

	PR c++/106435

gcc/c-family/ChangeLog:

	* c-opts.cc (c_common_post_options): Allow fextern-tls-init for targets
	without alias support.

gcc/cp/ChangeLog:

	* decl2.cc (get_tls_init_fn): Allow targets without alias support.
	(handle_tls_init): Emit aliases for single init functions where the
	target supporst this, otherwise emit a stub function that calls the
	main tls init function.  (generate_tls_dummy_init): New.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/pr106435-b.cc: New file.
	* g++.dg/cpp0x/pr106435.C: New test.
	* g++.dg/cpp0x/pr106435.h: New file.
---
 gcc/c-family/c-opts.cc                   |  2 +-
 gcc/cp/decl2.cc                          | 80 ++++++++++++++++++++----
 gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc | 22 +++++++
 gcc/testsuite/g++.dg/cpp0x/pr106435.C    | 24 +++++++
 gcc/testsuite/g++.dg/cpp0x/pr106435.h    | 27 ++++++++
 5 files changed, 142 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435.h
  

Comments

Iain Sandoe Dec. 30, 2022, 9:07 a.m. UTC | #1
it seems other targets are interested in this too - I was asked on IRC if it was going to be fixed this cycle.

> On 7 Dec 2022, at 15:39, Iain Sandoe via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> This has been tested on x86_64 and arm64 Darwin and on x86_64 linux gnu.
> The basic patch is live in the homebrew macOS support and so has had quite
> wide coverage on non-trivial codebases.
> 
> OK for master?
> Iain
> 
> Since this actually fixes wrong code, I wonder if we should also consider
> back-porting.
> 
> --- >8 ---
> 
> The description below relates to the code path when TARGET_SUPPORTS_ALIASES is
> false; current operation is maintained for targets with alias support and any
> new support code should be DCEd in that case.
> 
> --
> 
> Currently, cross-tu static initialisation is not supported for targets without
> alias support.
> 
> The patch adds support by building a shim function in place of the alias for
> these targets; the shim simply calls the generic initialiser.  Although this is
> slightly less efficient than the alias, in practice (for targets that allow
> sibcalls) the penalty is a single jump when code is optimised.
> 
> From the perspective of a TU referencing an extern TLS variable, there is no
> way to determine if it requires a guarded dynamic init.  So, in the referencing
> TU, we build a weak reference to the potential init and check at runtime if the
> init is present before calling it.  This strategy is fine for targets that have
> ELF semantics, but fails at link time for Mach-O (which does not permit the
> reference to be undefined in the static link).
> 
> The actual initialiser call is contained in a wrapper function, and to resolve
> the Mach-O linker issue, in the TU that is referencing the var, we now generate
> both the wrapper _and_ a weak definition of a dummy init function.  In the case
> that there _is_ a dynamic init (in a different TU), that version will be non-weak
> and will be override the weak dummy one.  In the case that we have a trivial
> static init (so no init in any other TU) the weak-defined dummy init will be
> called (a single return insn for optimised code).  We mitigate the call to
> the dummy init by reworking the wrapper code-gen path to remove the test for
> the weak reference function (as it will always be true) since the static linker
> will now determine the function to be called.
> 
> Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
> 
> 	PR c++/106435
> 
> gcc/c-family/ChangeLog:
> 
> 	* c-opts.cc (c_common_post_options): Allow fextern-tls-init for targets
> 	without alias support.
> 
> gcc/cp/ChangeLog:
> 
> 	* decl2.cc (get_tls_init_fn): Allow targets without alias support.
> 	(handle_tls_init): Emit aliases for single init functions where the
> 	target supporst this, otherwise emit a stub function that calls the
> 	main tls init function.  (generate_tls_dummy_init): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* g++.dg/cpp0x/pr106435-b.cc: New file.
> 	* g++.dg/cpp0x/pr106435.C: New test.
> 	* g++.dg/cpp0x/pr106435.h: New file.
> ---
> gcc/c-family/c-opts.cc                   |  2 +-
> gcc/cp/decl2.cc                          | 80 ++++++++++++++++++++----
> gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc | 22 +++++++
> gcc/testsuite/g++.dg/cpp0x/pr106435.C    | 24 +++++++
> gcc/testsuite/g++.dg/cpp0x/pr106435.h    | 27 ++++++++
> 5 files changed, 142 insertions(+), 13 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
> create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435.C
> create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435.h
> 
> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> index 70745aa4e7c..064645f980d 100644
> --- a/gcc/c-family/c-opts.cc
> +++ b/gcc/c-family/c-opts.cc
> @@ -1070,7 +1070,7 @@ c_common_post_options (const char **pfilename)
> 
>   if (flag_extern_tls_init)
>     {
> -      if (!TARGET_SUPPORTS_ALIASES || !SUPPORTS_WEAK)
> +      if (!SUPPORTS_WEAK)
> 	{
> 	  /* Lazy TLS initialization for a variable in another TU requires
> 	     alias and weak reference support.  */
> diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
> index f95529a5c9a..c6550c0c2fc 100644
> --- a/gcc/cp/decl2.cc
> +++ b/gcc/cp/decl2.cc
> @@ -3672,9 +3672,8 @@ get_tls_init_fn (tree var)
>   if (!flag_extern_tls_init && DECL_EXTERNAL (var))
>     return NULL_TREE;
> 
> -  /* If the variable is internal, or if we can't generate aliases,
> -     call the local init function directly.  */
> -  if (!TREE_PUBLIC (var) || !TARGET_SUPPORTS_ALIASES)
> +  /* If the variable is internal call the local init function directly.  */
> +  if (!TREE_PUBLIC (var))
>     return get_local_tls_init_fn (DECL_SOURCE_LOCATION (var));
> 
>   tree sname = mangle_tls_init_fn (var);
> @@ -3811,8 +3810,12 @@ generate_tls_wrapper (tree fn)
>   if (tree init_fn = get_tls_init_fn (var))
>     {
>       tree if_stmt = NULL_TREE;
> -      /* If init_fn is a weakref, make sure it exists before calling.  */
> -      if (lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
> +
> +      /* If init_fn is a weakref, make sure it exists before calling.
> +	 If the target does not support aliases, then we will have generated
> +	 a dummy weak function, so there is no need to test its existence.  */
> +      if (TARGET_SUPPORTS_ALIASES &&
> +	  lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
> 	{
> 	  if_stmt = begin_if_stmt ();
> 	  tree addr = cp_build_addr_expr (init_fn, tf_warning_or_error);
> @@ -3837,6 +3840,25 @@ generate_tls_wrapper (tree fn)
>   expand_or_defer_fn (finish_function (/*inline_p=*/false));
> }
> 
> +/* A dummy init function to act as a weak placeholder for a (possibly non-
> +   existent) dynamic init.  */
> +static void
> +generate_tls_dummy_init (tree fn)
> +{
> +  tree var = DECL_BEFRIENDING_CLASSES (fn);
> +  tree init_fn = get_tls_init_fn (var);
> +  /* If have no init fn, or it is non-weak, then we do not need to make a
> +     dummy.  */
> +  if (!init_fn || !lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
> +    return;
> +  start_preparsed_function (init_fn, NULL_TREE, SF_DEFAULT | SF_PRE_PARSED);
> +  tree body = begin_function_body ();
> +  declare_weak (init_fn);
> +  finish_return_stmt (NULL_TREE);
> +  finish_function_body (body);
> +  expand_or_defer_fn (finish_function (/*inline_p=*/false));
> +}
> +
> /* Start a global constructor or destructor function.  */
> 
> static tree
> @@ -4626,22 +4648,24 @@ handle_tls_init (void)
>   finish_expr_stmt (cp_build_modify_expr (loc, guard, NOP_EXPR,
> 					  boolean_true_node,
> 					  tf_warning_or_error));
> +  auto_vec<tree> direct_calls;
>   for (; vars; vars = TREE_CHAIN (vars))
>     {
>       tree var = TREE_VALUE (vars);
>       tree init = TREE_PURPOSE (vars);
>       one_static_initialization_or_destruction (/*initp=*/true, var, init);
> 
> -      /* Output init aliases even with -fno-extern-tls-init.  */
> -      if (TARGET_SUPPORTS_ALIASES && TREE_PUBLIC (var))
> +      /* Output inits even with -fno-extern-tls-init.
> +	 We save the list here and output either an alias or a stub function
> +	 below.  */
> +      if (TREE_PUBLIC (var))
> 	{
>           tree single_init_fn = get_tls_init_fn (var);
> 	  if (single_init_fn == NULL_TREE)
> 	    continue;
> -	  cgraph_node *alias
> -	    = cgraph_node::get_create (fn)->create_same_body_alias
> -		(single_init_fn, fn);
> -	  gcc_assert (alias != NULL);
> +	  if (single_init_fn == fn)
> +	    continue;
> +	  direct_calls.safe_push (single_init_fn);
> 	}
>     }
> 
> @@ -4649,6 +4673,30 @@ handle_tls_init (void)
>   finish_if_stmt (if_stmt);
>   finish_function_body (body);
>   expand_or_defer_fn (finish_function (/*inline_p=*/false));
> +
> +  /* For each TLS var that we have an init function, we either emit an alias
> +     between that and the tls_init, or a stub function that just calls the
> +     tls_init.  */
> +  while (!direct_calls.is_empty())
> +    {
> +      tree single_init_fn = direct_calls.pop ();
> +      if (TARGET_SUPPORTS_ALIASES)
> +	{
> +	  cgraph_node *alias
> +	     = cgraph_node::get_create (fn)->create_same_body_alias
> +		(single_init_fn, fn);
> +	  gcc_assert (alias != NULL);
> +	}
> +      else
> +	{
> +	  start_preparsed_function (single_init_fn, NULL_TREE, SF_PRE_PARSED);
> +	  tree body = begin_function_body ();
> +	  tree r = build_call_expr (fn, 0);
> +	  finish_expr_stmt (r);
> +	  finish_function_body (body);
> +	  expand_or_defer_fn (finish_function (/*inline_p=*/false));
> +	}
> +    }
> }
> 
> /* We're at the end of compilation, so generate any mangling aliases that
> @@ -5058,7 +5106,15 @@ c_parse_final_cleanups (void)
> 	    }
> 
> 	  if (!DECL_INITIAL (decl) && decl_tls_wrapper_p (decl))
> -	    generate_tls_wrapper (decl);
> +	    {
> +	      generate_tls_wrapper (decl);
> +	      if (!TARGET_SUPPORTS_ALIASES)
> +	      /* The wrapper might have a weak reference to an init, we provide
> +		 a dummy function to satisfy that here.  The linker/dynamic
> +		 loader will override this with the actual init, if one is
> +		 required.  */
> +		generate_tls_dummy_init (decl);
> +	    }
> 
> 	  if (!DECL_SAVED_TREE (decl))
> 	    continue;
> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc b/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
> new file mode 100644
> index 00000000000..bd586286647
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
> @@ -0,0 +1,22 @@
> +// PR c++/106435
> +#include "pr106435.h"
> +
> +//#include <iostream>
> +
> +Foo::Foo() {
> +  ++num_calls;
> +//  std::cout << "Foo::Foo(this=" << this << ")\n";
> +}
> +
> +int Foo::func() {
> +//  std::cout << "Foo::func(this=" << this << ")\n";
> +  return num_calls;
> +}
> +
> +thread_local Foo Bar::foo;
> +thread_local Foo Bar::baz;
> +
> +thread_local Foo F = Foo{5};
> +
> +thread_local void* tl = (void*)&F;
> +thread_local void* tl1 = nullptr;
> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435.C b/gcc/testsuite/g++.dg/cpp0x/pr106435.C
> new file mode 100644
> index 00000000000..9071e032697
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435.C
> @@ -0,0 +1,24 @@
> +// PR c++/106435
> +// { dg-do run { target c++11 } }
> +// { dg-additional-sources "pr106435-b.cc" }
> +
> +#include "pr106435.h"
> +
> +int num_calls = 0;
> +
> +thread_local Foo Bar::bat;
> +
> +int main() {
> +  int v = Bar::foo.func();
> +  if (v != 2)
> +    __builtin_abort ();
> +  v = Bar::bat.func();
> +  if (v != 3)
> +    __builtin_abort ();
> +  if (F.getX() != 5)
> +    __builtin_abort();
> +  if  (gt_tl () != (void*)&F)
> +    __builtin_abort();
> +  if  (gt_tl1 ())
> +    __builtin_abort();
> +}
> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435.h b/gcc/testsuite/g++.dg/cpp0x/pr106435.h
> new file mode 100644
> index 00000000000..06d3e60484f
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435.h
> @@ -0,0 +1,27 @@
> +// PR c++/106435
> +#pragma once
> +
> +extern int num_calls;
> +struct Foo {
> +  Foo();
> +  Foo(int _x) : x(_x) {}
> +  int func();
> +  int getX () { return x; }
> +  int x;
> +};
> +
> +struct Bar {
> +  thread_local static Foo foo;
> +  thread_local static Foo baz;
> +  thread_local static Foo bat;
> +};
> +
> +extern thread_local void* tl;
> +
> +inline void* gt_tl () {return tl;}
> +
> +extern thread_local Foo F;
> +
> +extern thread_local void* tl1;
> +
> +inline void* gt_tl1 () {return tl1;}
> -- 
> 2.37.1 (Apple Git-137.1)
>
  
Jason Merrill Jan. 3, 2023, 10:22 p.m. UTC | #2
On 12/7/22 10:39, Iain Sandoe wrote:
>   This has been tested on x86_64 and arm64 Darwin and on x86_64 linux gnu.
>   The basic patch is live in the homebrew macOS support and so has had quite
>   wide coverage on non-trivial codebases.
>   
>   OK for master?
>   Iain
>   
>   Since this actually fixes wrong code, I wonder if we should also consider
>   back-porting.
>   
>   --- >8 ---
> 
> The description below relates to the code path when TARGET_SUPPORTS_ALIASES is
> false; current operation is maintained for targets with alias support and any
> new support code should be DCEd in that case.
> 
> --
> 
> Currently, cross-tu static initialisation is not supported for targets without
> alias support.
> 
> The patch adds support by building a shim function in place of the alias for
> these targets; the shim simply calls the generic initialiser.  Although this is
> slightly less efficient than the alias, in practice (for targets that allow
> sibcalls) the penalty is a single jump when code is optimised.
> 
>  From the perspective of a TU referencing an extern TLS variable, there is no
> way to determine if it requires a guarded dynamic init.  So, in the referencing
> TU, we build a weak reference to the potential init and check at runtime if the
> init is present before calling it.  This strategy is fine for targets that have
> ELF semantics, but fails at link time for Mach-O (which does not permit the
> reference to be undefined in the static link).
> 
> The actual initialiser call is contained in a wrapper function, and to resolve
> the Mach-O linker issue, in the TU that is referencing the var, we now generate
> both the wrapper _and_ a weak definition of a dummy init function.  In the case
> that there _is_ a dynamic init (in a different TU), that version will be non-weak
> and will be override the weak dummy one.

IIUC, this isn't reliable in general; in specific, I believe that the 
glibc dynamic loader no longer prefers strong definitions to weak ones.

Perhaps on targets that don't allow weakrefs to be unbound, we should 
unconditionally emit the init function where the variable is defined, 
even if it does nothing, and unconditionally call it from the wrapper?

> In the case that we have a trivial
> static init (so no init in any other TU) the weak-defined dummy init will be
> called (a single return insn for optimised code).  We mitigate the call to
> the dummy init by reworking the wrapper code-gen path to remove the test for
> the weak reference function (as it will always be true) since the static linker
> will now determine the function to be called.
> 
> Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
> 
> 	PR c++/106435
> 
> gcc/c-family/ChangeLog:
> 
> 	* c-opts.cc (c_common_post_options): Allow fextern-tls-init for targets
> 	without alias support.
> 
> gcc/cp/ChangeLog:
> 
> 	* decl2.cc (get_tls_init_fn): Allow targets without alias support.
> 	(handle_tls_init): Emit aliases for single init functions where the
> 	target supporst this, otherwise emit a stub function that calls the
> 	main tls init function.  (generate_tls_dummy_init): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* g++.dg/cpp0x/pr106435-b.cc: New file.
> 	* g++.dg/cpp0x/pr106435.C: New test.
> 	* g++.dg/cpp0x/pr106435.h: New file.
> ---
>   gcc/c-family/c-opts.cc                   |  2 +-
>   gcc/cp/decl2.cc                          | 80 ++++++++++++++++++++----
>   gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc | 22 +++++++
>   gcc/testsuite/g++.dg/cpp0x/pr106435.C    | 24 +++++++
>   gcc/testsuite/g++.dg/cpp0x/pr106435.h    | 27 ++++++++
>   5 files changed, 142 insertions(+), 13 deletions(-)
>   create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
>   create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435.C
>   create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435.h
> 
> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> index 70745aa4e7c..064645f980d 100644
> --- a/gcc/c-family/c-opts.cc
> +++ b/gcc/c-family/c-opts.cc
> @@ -1070,7 +1070,7 @@ c_common_post_options (const char **pfilename)
>   
>     if (flag_extern_tls_init)
>       {
> -      if (!TARGET_SUPPORTS_ALIASES || !SUPPORTS_WEAK)
> +      if (!SUPPORTS_WEAK)
>   	{
>   	  /* Lazy TLS initialization for a variable in another TU requires
>   	     alias and weak reference support.  */
> diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
> index f95529a5c9a..c6550c0c2fc 100644
> --- a/gcc/cp/decl2.cc
> +++ b/gcc/cp/decl2.cc
> @@ -3672,9 +3672,8 @@ get_tls_init_fn (tree var)
>     if (!flag_extern_tls_init && DECL_EXTERNAL (var))
>       return NULL_TREE;
>   
> -  /* If the variable is internal, or if we can't generate aliases,
> -     call the local init function directly.  */
> -  if (!TREE_PUBLIC (var) || !TARGET_SUPPORTS_ALIASES)
> +  /* If the variable is internal call the local init function directly.  */
> +  if (!TREE_PUBLIC (var))
>       return get_local_tls_init_fn (DECL_SOURCE_LOCATION (var));
>   
>     tree sname = mangle_tls_init_fn (var);
> @@ -3811,8 +3810,12 @@ generate_tls_wrapper (tree fn)
>     if (tree init_fn = get_tls_init_fn (var))
>       {
>         tree if_stmt = NULL_TREE;
> -      /* If init_fn is a weakref, make sure it exists before calling.  */
> -      if (lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
> +
> +      /* If init_fn is a weakref, make sure it exists before calling.
> +	 If the target does not support aliases, then we will have generated
> +	 a dummy weak function, so there is no need to test its existence.  */
> +      if (TARGET_SUPPORTS_ALIASES &&
> +	  lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
>   	{
>   	  if_stmt = begin_if_stmt ();
>   	  tree addr = cp_build_addr_expr (init_fn, tf_warning_or_error);
> @@ -3837,6 +3840,25 @@ generate_tls_wrapper (tree fn)
>     expand_or_defer_fn (finish_function (/*inline_p=*/false));
>   }
>   
> +/* A dummy init function to act as a weak placeholder for a (possibly non-
> +   existent) dynamic init.  */
> +static void
> +generate_tls_dummy_init (tree fn)
> +{
> +  tree var = DECL_BEFRIENDING_CLASSES (fn);
> +  tree init_fn = get_tls_init_fn (var);
> +  /* If have no init fn, or it is non-weak, then we do not need to make a
> +     dummy.  */
> +  if (!init_fn || !lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
> +    return;
> +  start_preparsed_function (init_fn, NULL_TREE, SF_DEFAULT | SF_PRE_PARSED);
> +  tree body = begin_function_body ();
> +  declare_weak (init_fn);
> +  finish_return_stmt (NULL_TREE);
> +  finish_function_body (body);
> +  expand_or_defer_fn (finish_function (/*inline_p=*/false));
> +}
> +
>   /* Start a global constructor or destructor function.  */
>   
>   static tree
> @@ -4626,22 +4648,24 @@ handle_tls_init (void)
>     finish_expr_stmt (cp_build_modify_expr (loc, guard, NOP_EXPR,
>   					  boolean_true_node,
>   					  tf_warning_or_error));
> +  auto_vec<tree> direct_calls;
>     for (; vars; vars = TREE_CHAIN (vars))
>       {
>         tree var = TREE_VALUE (vars);
>         tree init = TREE_PURPOSE (vars);
>         one_static_initialization_or_destruction (/*initp=*/true, var, init);
>   
> -      /* Output init aliases even with -fno-extern-tls-init.  */
> -      if (TARGET_SUPPORTS_ALIASES && TREE_PUBLIC (var))
> +      /* Output inits even with -fno-extern-tls-init.
> +	 We save the list here and output either an alias or a stub function
> +	 below.  */
> +      if (TREE_PUBLIC (var))
>   	{
>             tree single_init_fn = get_tls_init_fn (var);
>   	  if (single_init_fn == NULL_TREE)
>   	    continue;
> -	  cgraph_node *alias
> -	    = cgraph_node::get_create (fn)->create_same_body_alias
> -		(single_init_fn, fn);
> -	  gcc_assert (alias != NULL);
> +	  if (single_init_fn == fn)
> +	    continue;
> +	  direct_calls.safe_push (single_init_fn);
>   	}
>       }
>   
> @@ -4649,6 +4673,30 @@ handle_tls_init (void)
>     finish_if_stmt (if_stmt);
>     finish_function_body (body);
>     expand_or_defer_fn (finish_function (/*inline_p=*/false));
> +
> +  /* For each TLS var that we have an init function, we either emit an alias
> +     between that and the tls_init, or a stub function that just calls the
> +     tls_init.  */
> +  while (!direct_calls.is_empty())
> +    {
> +      tree single_init_fn = direct_calls.pop ();
> +      if (TARGET_SUPPORTS_ALIASES)
> +	{
> +	  cgraph_node *alias
> +	     = cgraph_node::get_create (fn)->create_same_body_alias
> +		(single_init_fn, fn);
> +	  gcc_assert (alias != NULL);
> +	}
> +      else
> +	{
> +	  start_preparsed_function (single_init_fn, NULL_TREE, SF_PRE_PARSED);
> +	  tree body = begin_function_body ();
> +	  tree r = build_call_expr (fn, 0);
> +	  finish_expr_stmt (r);
> +	  finish_function_body (body);
> +	  expand_or_defer_fn (finish_function (/*inline_p=*/false));
> +	}
> +    }
>   }
>   
>   /* We're at the end of compilation, so generate any mangling aliases that
> @@ -5058,7 +5106,15 @@ c_parse_final_cleanups (void)
>   	    }
>   
>   	  if (!DECL_INITIAL (decl) && decl_tls_wrapper_p (decl))
> -	    generate_tls_wrapper (decl);
> +	    {
> +	      generate_tls_wrapper (decl);
> +	      if (!TARGET_SUPPORTS_ALIASES)
> +	      /* The wrapper might have a weak reference to an init, we provide
> +		 a dummy function to satisfy that here.  The linker/dynamic
> +		 loader will override this with the actual init, if one is
> +		 required.  */
> +		generate_tls_dummy_init (decl);
> +	    }
>   
>   	  if (!DECL_SAVED_TREE (decl))
>   	    continue;
> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc b/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
> new file mode 100644
> index 00000000000..bd586286647
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
> @@ -0,0 +1,22 @@
> +// PR c++/106435
> +#include "pr106435.h"
> +
> +//#include <iostream>
> +
> +Foo::Foo() {
> +  ++num_calls;
> +//  std::cout << "Foo::Foo(this=" << this << ")\n";
> +}
> +
> +int Foo::func() {
> +//  std::cout << "Foo::func(this=" << this << ")\n";
> +  return num_calls;
> +}
> +
> +thread_local Foo Bar::foo;
> +thread_local Foo Bar::baz;
> +
> +thread_local Foo F = Foo{5};
> +
> +thread_local void* tl = (void*)&F;
> +thread_local void* tl1 = nullptr;
> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435.C b/gcc/testsuite/g++.dg/cpp0x/pr106435.C
> new file mode 100644
> index 00000000000..9071e032697
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435.C
> @@ -0,0 +1,24 @@
> +// PR c++/106435
> +// { dg-do run { target c++11 } }
> +// { dg-additional-sources "pr106435-b.cc" }
> +
> +#include "pr106435.h"
> +
> +int num_calls = 0;
> +
> +thread_local Foo Bar::bat;
> +
> +int main() {
> +  int v = Bar::foo.func();
> +  if (v != 2)
> +    __builtin_abort ();
> +  v = Bar::bat.func();
> +  if (v != 3)
> +    __builtin_abort ();
> +  if (F.getX() != 5)
> +    __builtin_abort();
> +  if  (gt_tl () != (void*)&F)
> +    __builtin_abort();
> +  if  (gt_tl1 ())
> +    __builtin_abort();
> +}
> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435.h b/gcc/testsuite/g++.dg/cpp0x/pr106435.h
> new file mode 100644
> index 00000000000..06d3e60484f
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435.h
> @@ -0,0 +1,27 @@
> +// PR c++/106435
> +#pragma once
> +
> +extern int num_calls;
> +struct Foo {
> +  Foo();
> +  Foo(int _x) : x(_x) {}
> +  int func();
> +  int getX () { return x; }
> +  int x;
> +};
> +
> +struct Bar {
> +  thread_local static Foo foo;
> +  thread_local static Foo baz;
> +  thread_local static Foo bat;
> +};
> +
> +extern thread_local void* tl;
> +
> +inline void* gt_tl () {return tl;}
> +
> +extern thread_local Foo F;
> +
> +extern thread_local void* tl1;
> +
> +inline void* gt_tl1 () {return tl1;}
  
Iain Sandoe Jan. 3, 2023, 11:17 p.m. UTC | #3
> On 3 Jan 2023, at 22:22, Jason Merrill <jason@redhat.com> wrote:
> 
> On 12/7/22 10:39, Iain Sandoe wrote:
>>  This has been tested on x86_64 and arm64 Darwin and on x86_64 linux gnu.
>>  The basic patch is live in the homebrew macOS support and so has had quite
>>  wide coverage on non-trivial codebases.
>>    OK for master?
>>  Iain
>>    Since this actually fixes wrong code, I wonder if we should also consider
>>  back-porting.
>>    --- >8 ---
>> The description below relates to the code path when TARGET_SUPPORTS_ALIASES is
>> false; current operation is maintained for targets with alias support and any
>> new support code should be DCEd in that case.
>> --
>> Currently, cross-tu static initialisation is not supported for targets without
>> alias support.
>> The patch adds support by building a shim function in place of the alias for
>> these targets; the shim simply calls the generic initialiser.  Although this is
>> slightly less efficient than the alias, in practice (for targets that allow
>> sibcalls) the penalty is a single jump when code is optimised.
>> From the perspective of a TU referencing an extern TLS variable, there is no
>> way to determine if it requires a guarded dynamic init.  So, in the referencing
>> TU, we build a weak reference to the potential init and check at runtime if the
>> init is present before calling it.  This strategy is fine for targets that have
>> ELF semantics, but fails at link time for Mach-O (which does not permit the
>> reference to be undefined in the static link).
>> The actual initialiser call is contained in a wrapper function, and to resolve
>> the Mach-O linker issue, in the TU that is referencing the var, we now generate
>> both the wrapper _and_ a weak definition of a dummy init function.  In the case
>> that there _is_ a dynamic init (in a different TU), that version will be non-weak
>> and will be override the weak dummy one.
> 
> IIUC, this isn't reliable in general; in specific, I believe that the glibc dynamic loader no longer prefers strong definitions to weak ones.

Neither does Darwin’s dynamic loader, this implemenation works there because the static linker _will_ override the weak def with a strong one.  IIUC, binutils ld does this too.

If we need this to work between DSOs then that potentially presents a problem (for Darwin the DSO is identified so that the symbol will be found in the library that resolved it in the static link, [but that can be defeated by forcing “flat linking”]), I am not sure if glibc dynamic loader would do something similar (although this code path is not taken on ELF targets since they have the symbol aliases).

> Perhaps on targets that don't allow weakrefs to be unbound,

Darwin would allow it if we were able to tell the static linker that the symbol is permitted to be undefined - but since we don’t know the symbol’s name outside the FE, that is not going to fly.

> we should unconditionally emit the init function where the variable is defined, even if it does nothing, and unconditionally call it from the wrapper?

OK. that seems a safer option .. I will have to look at it when I have a chance.

thanks
Iain 
> 
>> In the case that we have a trivial
>> static init (so no init in any other TU) the weak-defined dummy init will be
>> called (a single return insn for optimised code).  We mitigate the call to
>> the dummy init by reworking the wrapper code-gen path to remove the test for
>> the weak reference function (as it will always be true) since the static linker
>> will now determine the function to be called.
>> Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
>> 	PR c++/106435
>> gcc/c-family/ChangeLog:
>> 	* c-opts.cc (c_common_post_options): Allow fextern-tls-init for targets
>> 	without alias support.
>> gcc/cp/ChangeLog:
>> 	* decl2.cc (get_tls_init_fn): Allow targets without alias support.
>> 	(handle_tls_init): Emit aliases for single init functions where the
>> 	target supporst this, otherwise emit a stub function that calls the
>> 	main tls init function.  (generate_tls_dummy_init): New.
>> gcc/testsuite/ChangeLog:
>> 	* g++.dg/cpp0x/pr106435-b.cc: New file.
>> 	* g++.dg/cpp0x/pr106435.C: New test.
>> 	* g++.dg/cpp0x/pr106435.h: New file.
>> ---
>>  gcc/c-family/c-opts.cc                   |  2 +-
>>  gcc/cp/decl2.cc                          | 80 ++++++++++++++++++++----
>>  gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc | 22 +++++++
>>  gcc/testsuite/g++.dg/cpp0x/pr106435.C    | 24 +++++++
>>  gcc/testsuite/g++.dg/cpp0x/pr106435.h    | 27 ++++++++
>>  5 files changed, 142 insertions(+), 13 deletions(-)
>>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
>>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435.C
>>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435.h
>> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
>> index 70745aa4e7c..064645f980d 100644
>> --- a/gcc/c-family/c-opts.cc
>> +++ b/gcc/c-family/c-opts.cc
>> @@ -1070,7 +1070,7 @@ c_common_post_options (const char **pfilename)
>>      if (flag_extern_tls_init)
>>      {
>> -      if (!TARGET_SUPPORTS_ALIASES || !SUPPORTS_WEAK)
>> +      if (!SUPPORTS_WEAK)
>>  	{
>>  	  /* Lazy TLS initialization for a variable in another TU requires
>>  	     alias and weak reference support.  */
>> diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
>> index f95529a5c9a..c6550c0c2fc 100644
>> --- a/gcc/cp/decl2.cc
>> +++ b/gcc/cp/decl2.cc
>> @@ -3672,9 +3672,8 @@ get_tls_init_fn (tree var)
>>    if (!flag_extern_tls_init && DECL_EXTERNAL (var))
>>      return NULL_TREE;
>>  -  /* If the variable is internal, or if we can't generate aliases,
>> -     call the local init function directly.  */
>> -  if (!TREE_PUBLIC (var) || !TARGET_SUPPORTS_ALIASES)
>> +  /* If the variable is internal call the local init function directly.  */
>> +  if (!TREE_PUBLIC (var))
>>      return get_local_tls_init_fn (DECL_SOURCE_LOCATION (var));
>>      tree sname = mangle_tls_init_fn (var);
>> @@ -3811,8 +3810,12 @@ generate_tls_wrapper (tree fn)
>>    if (tree init_fn = get_tls_init_fn (var))
>>      {
>>        tree if_stmt = NULL_TREE;
>> -      /* If init_fn is a weakref, make sure it exists before calling.  */
>> -      if (lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
>> +
>> +      /* If init_fn is a weakref, make sure it exists before calling.
>> +	 If the target does not support aliases, then we will have generated
>> +	 a dummy weak function, so there is no need to test its existence.  */
>> +      if (TARGET_SUPPORTS_ALIASES &&
>> +	  lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
>>  	{
>>  	  if_stmt = begin_if_stmt ();
>>  	  tree addr = cp_build_addr_expr (init_fn, tf_warning_or_error);
>> @@ -3837,6 +3840,25 @@ generate_tls_wrapper (tree fn)
>>    expand_or_defer_fn (finish_function (/*inline_p=*/false));
>>  }
>>  +/* A dummy init function to act as a weak placeholder for a (possibly non-
>> +   existent) dynamic init.  */
>> +static void
>> +generate_tls_dummy_init (tree fn)
>> +{
>> +  tree var = DECL_BEFRIENDING_CLASSES (fn);
>> +  tree init_fn = get_tls_init_fn (var);
>> +  /* If have no init fn, or it is non-weak, then we do not need to make a
>> +     dummy.  */
>> +  if (!init_fn || !lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
>> +    return;
>> +  start_preparsed_function (init_fn, NULL_TREE, SF_DEFAULT | SF_PRE_PARSED);
>> +  tree body = begin_function_body ();
>> +  declare_weak (init_fn);
>> +  finish_return_stmt (NULL_TREE);
>> +  finish_function_body (body);
>> +  expand_or_defer_fn (finish_function (/*inline_p=*/false));
>> +}
>> +
>>  /* Start a global constructor or destructor function.  */
>>    static tree
>> @@ -4626,22 +4648,24 @@ handle_tls_init (void)
>>    finish_expr_stmt (cp_build_modify_expr (loc, guard, NOP_EXPR,
>>  					  boolean_true_node,
>>  					  tf_warning_or_error));
>> +  auto_vec<tree> direct_calls;
>>    for (; vars; vars = TREE_CHAIN (vars))
>>      {
>>        tree var = TREE_VALUE (vars);
>>        tree init = TREE_PURPOSE (vars);
>>        one_static_initialization_or_destruction (/*initp=*/true, var, init);
>>  -      /* Output init aliases even with -fno-extern-tls-init.  */
>> -      if (TARGET_SUPPORTS_ALIASES && TREE_PUBLIC (var))
>> +      /* Output inits even with -fno-extern-tls-init.
>> +	 We save the list here and output either an alias or a stub function
>> +	 below.  */
>> +      if (TREE_PUBLIC (var))
>>  	{
>>            tree single_init_fn = get_tls_init_fn (var);
>>  	  if (single_init_fn == NULL_TREE)
>>  	    continue;
>> -	  cgraph_node *alias
>> -	    = cgraph_node::get_create (fn)->create_same_body_alias
>> -		(single_init_fn, fn);
>> -	  gcc_assert (alias != NULL);
>> +	  if (single_init_fn == fn)
>> +	    continue;
>> +	  direct_calls.safe_push (single_init_fn);
>>  	}
>>      }
>>  @@ -4649,6 +4673,30 @@ handle_tls_init (void)
>>    finish_if_stmt (if_stmt);
>>    finish_function_body (body);
>>    expand_or_defer_fn (finish_function (/*inline_p=*/false));
>> +
>> +  /* For each TLS var that we have an init function, we either emit an alias
>> +     between that and the tls_init, or a stub function that just calls the
>> +     tls_init.  */
>> +  while (!direct_calls.is_empty())
>> +    {
>> +      tree single_init_fn = direct_calls.pop ();
>> +      if (TARGET_SUPPORTS_ALIASES)
>> +	{
>> +	  cgraph_node *alias
>> +	     = cgraph_node::get_create (fn)->create_same_body_alias
>> +		(single_init_fn, fn);
>> +	  gcc_assert (alias != NULL);
>> +	}
>> +      else
>> +	{
>> +	  start_preparsed_function (single_init_fn, NULL_TREE, SF_PRE_PARSED);
>> +	  tree body = begin_function_body ();
>> +	  tree r = build_call_expr (fn, 0);
>> +	  finish_expr_stmt (r);
>> +	  finish_function_body (body);
>> +	  expand_or_defer_fn (finish_function (/*inline_p=*/false));
>> +	}
>> +    }
>>  }
>>    /* We're at the end of compilation, so generate any mangling aliases that
>> @@ -5058,7 +5106,15 @@ c_parse_final_cleanups (void)
>>  	    }
>>    	  if (!DECL_INITIAL (decl) && decl_tls_wrapper_p (decl))
>> -	    generate_tls_wrapper (decl);
>> +	    {
>> +	      generate_tls_wrapper (decl);
>> +	      if (!TARGET_SUPPORTS_ALIASES)
>> +	      /* The wrapper might have a weak reference to an init, we provide
>> +		 a dummy function to satisfy that here.  The linker/dynamic
>> +		 loader will override this with the actual init, if one is
>> +		 required.  */
>> +		generate_tls_dummy_init (decl);
>> +	    }
>>    	  if (!DECL_SAVED_TREE (decl))
>>  	    continue;
>> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc b/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
>> new file mode 100644
>> index 00000000000..bd586286647
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
>> @@ -0,0 +1,22 @@
>> +// PR c++/106435
>> +#include "pr106435.h"
>> +
>> +//#include <iostream>
>> +
>> +Foo::Foo() {
>> +  ++num_calls;
>> +//  std::cout << "Foo::Foo(this=" << this << ")\n";
>> +}
>> +
>> +int Foo::func() {
>> +//  std::cout << "Foo::func(this=" << this << ")\n";
>> +  return num_calls;
>> +}
>> +
>> +thread_local Foo Bar::foo;
>> +thread_local Foo Bar::baz;
>> +
>> +thread_local Foo F = Foo{5};
>> +
>> +thread_local void* tl = (void*)&F;
>> +thread_local void* tl1 = nullptr;
>> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435.C b/gcc/testsuite/g++.dg/cpp0x/pr106435.C
>> new file mode 100644
>> index 00000000000..9071e032697
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435.C
>> @@ -0,0 +1,24 @@
>> +// PR c++/106435
>> +// { dg-do run { target c++11 } }
>> +// { dg-additional-sources "pr106435-b.cc" }
>> +
>> +#include "pr106435.h"
>> +
>> +int num_calls = 0;
>> +
>> +thread_local Foo Bar::bat;
>> +
>> +int main() {
>> +  int v = Bar::foo.func();
>> +  if (v != 2)
>> +    __builtin_abort ();
>> +  v = Bar::bat.func();
>> +  if (v != 3)
>> +    __builtin_abort ();
>> +  if (F.getX() != 5)
>> +    __builtin_abort();
>> +  if  (gt_tl () != (void*)&F)
>> +    __builtin_abort();
>> +  if  (gt_tl1 ())
>> +    __builtin_abort();
>> +}
>> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435.h b/gcc/testsuite/g++.dg/cpp0x/pr106435.h
>> new file mode 100644
>> index 00000000000..06d3e60484f
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435.h
>> @@ -0,0 +1,27 @@
>> +// PR c++/106435
>> +#pragma once
>> +
>> +extern int num_calls;
>> +struct Foo {
>> +  Foo();
>> +  Foo(int _x) : x(_x) {}
>> +  int func();
>> +  int getX () { return x; }
>> +  int x;
>> +};
>> +
>> +struct Bar {
>> +  thread_local static Foo foo;
>> +  thread_local static Foo baz;
>> +  thread_local static Foo bat;
>> +};
>> +
>> +extern thread_local void* tl;
>> +
>> +inline void* gt_tl () {return tl;}
>> +
>> +extern thread_local Foo F;
>> +
>> +extern thread_local void* tl1;
>> +
>> +inline void* gt_tl1 () {return tl1;}
  
Jason Merrill Jan. 4, 2023, 3:03 p.m. UTC | #4
On 1/3/23 18:17, Iain Sandoe wrote:
> 
> 
>> On 3 Jan 2023, at 22:22, Jason Merrill <jason@redhat.com> wrote:
>>
>> On 12/7/22 10:39, Iain Sandoe wrote:
>>>   This has been tested on x86_64 and arm64 Darwin and on x86_64 linux gnu.
>>>   The basic patch is live in the homebrew macOS support and so has had quite
>>>   wide coverage on non-trivial codebases.
>>>     OK for master?
>>>   Iain
>>>     Since this actually fixes wrong code, I wonder if we should also consider
>>>   back-porting.
>>>     --- >8 ---
>>> The description below relates to the code path when TARGET_SUPPORTS_ALIASES is
>>> false; current operation is maintained for targets with alias support and any
>>> new support code should be DCEd in that case.
>>> --
>>> Currently, cross-tu static initialisation is not supported for targets without
>>> alias support.
>>> The patch adds support by building a shim function in place of the alias for
>>> these targets; the shim simply calls the generic initialiser.  Although this is
>>> slightly less efficient than the alias, in practice (for targets that allow
>>> sibcalls) the penalty is a single jump when code is optimised.
>>>  From the perspective of a TU referencing an extern TLS variable, there is no
>>> way to determine if it requires a guarded dynamic init.  So, in the referencing
>>> TU, we build a weak reference to the potential init and check at runtime if the
>>> init is present before calling it.  This strategy is fine for targets that have
>>> ELF semantics, but fails at link time for Mach-O (which does not permit the
>>> reference to be undefined in the static link).
>>> The actual initialiser call is contained in a wrapper function, and to resolve
>>> the Mach-O linker issue, in the TU that is referencing the var, we now generate
>>> both the wrapper _and_ a weak definition of a dummy init function.  In the case
>>> that there _is_ a dynamic init (in a different TU), that version will be non-weak
>>> and will be override the weak dummy one.
>>
>> IIUC, this isn't reliable in general; in specific, I believe that the glibc dynamic loader no longer prefers strong definitions to weak ones.
> 
> Neither does Darwin’s dynamic loader, this implemenation works there because the static linker _will_ override the weak def with a strong one.  IIUC, binutils ld does this too.
> 
> If we need this to work between DSOs then that potentially presents a problem (for Darwin the DSO is identified so that the symbol will be found in the library that resolved it in the static link, [but that can be defeated by forcing “flat linking”]), I am not sure if glibc dynamic loader would do something similar (although this code path is not taken on ELF targets since they have the symbol aliases).
> 
>> Perhaps on targets that don't allow weakrefs to be unbound,
> 
> Darwin would allow it if we were able to tell the static linker that the symbol is permitted to be undefined - but since we don’t know the symbol’s name outside the FE, that is not going to fly.

Can you elaborate on this?

>> we should unconditionally emit the init function where the variable is defined, even if it does nothing, and unconditionally call it from the wrapper?
> 
> OK. that seems a safer option .. I will have to look at it when I have a chance.
> 
> thanks
> Iain
>>
>>> In the case that we have a trivial
>>> static init (so no init in any other TU) the weak-defined dummy init will be
>>> called (a single return insn for optimised code).  We mitigate the call to
>>> the dummy init by reworking the wrapper code-gen path to remove the test for
>>> the weak reference function (as it will always be true) since the static linker
>>> will now determine the function to be called.
>>> Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
>>> 	PR c++/106435
>>> gcc/c-family/ChangeLog:
>>> 	* c-opts.cc (c_common_post_options): Allow fextern-tls-init for targets
>>> 	without alias support.
>>> gcc/cp/ChangeLog:
>>> 	* decl2.cc (get_tls_init_fn): Allow targets without alias support.
>>> 	(handle_tls_init): Emit aliases for single init functions where the
>>> 	target supporst this, otherwise emit a stub function that calls the
>>> 	main tls init function.  (generate_tls_dummy_init): New.
>>> gcc/testsuite/ChangeLog:
>>> 	* g++.dg/cpp0x/pr106435-b.cc: New file.
>>> 	* g++.dg/cpp0x/pr106435.C: New test.
>>> 	* g++.dg/cpp0x/pr106435.h: New file.
>>> ---
>>>   gcc/c-family/c-opts.cc                   |  2 +-
>>>   gcc/cp/decl2.cc                          | 80 ++++++++++++++++++++----
>>>   gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc | 22 +++++++
>>>   gcc/testsuite/g++.dg/cpp0x/pr106435.C    | 24 +++++++
>>>   gcc/testsuite/g++.dg/cpp0x/pr106435.h    | 27 ++++++++
>>>   5 files changed, 142 insertions(+), 13 deletions(-)
>>>   create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
>>>   create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435.C
>>>   create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435.h
>>> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
>>> index 70745aa4e7c..064645f980d 100644
>>> --- a/gcc/c-family/c-opts.cc
>>> +++ b/gcc/c-family/c-opts.cc
>>> @@ -1070,7 +1070,7 @@ c_common_post_options (const char **pfilename)
>>>       if (flag_extern_tls_init)
>>>       {
>>> -      if (!TARGET_SUPPORTS_ALIASES || !SUPPORTS_WEAK)
>>> +      if (!SUPPORTS_WEAK)
>>>   	{
>>>   	  /* Lazy TLS initialization for a variable in another TU requires
>>>   	     alias and weak reference support.  */
>>> diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
>>> index f95529a5c9a..c6550c0c2fc 100644
>>> --- a/gcc/cp/decl2.cc
>>> +++ b/gcc/cp/decl2.cc
>>> @@ -3672,9 +3672,8 @@ get_tls_init_fn (tree var)
>>>     if (!flag_extern_tls_init && DECL_EXTERNAL (var))
>>>       return NULL_TREE;
>>>   -  /* If the variable is internal, or if we can't generate aliases,
>>> -     call the local init function directly.  */
>>> -  if (!TREE_PUBLIC (var) || !TARGET_SUPPORTS_ALIASES)
>>> +  /* If the variable is internal call the local init function directly.  */
>>> +  if (!TREE_PUBLIC (var))
>>>       return get_local_tls_init_fn (DECL_SOURCE_LOCATION (var));
>>>       tree sname = mangle_tls_init_fn (var);
>>> @@ -3811,8 +3810,12 @@ generate_tls_wrapper (tree fn)
>>>     if (tree init_fn = get_tls_init_fn (var))
>>>       {
>>>         tree if_stmt = NULL_TREE;
>>> -      /* If init_fn is a weakref, make sure it exists before calling.  */
>>> -      if (lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
>>> +
>>> +      /* If init_fn is a weakref, make sure it exists before calling.
>>> +	 If the target does not support aliases, then we will have generated
>>> +	 a dummy weak function, so there is no need to test its existence.  */
>>> +      if (TARGET_SUPPORTS_ALIASES &&
>>> +	  lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
>>>   	{
>>>   	  if_stmt = begin_if_stmt ();
>>>   	  tree addr = cp_build_addr_expr (init_fn, tf_warning_or_error);
>>> @@ -3837,6 +3840,25 @@ generate_tls_wrapper (tree fn)
>>>     expand_or_defer_fn (finish_function (/*inline_p=*/false));
>>>   }
>>>   +/* A dummy init function to act as a weak placeholder for a (possibly non-
>>> +   existent) dynamic init.  */
>>> +static void
>>> +generate_tls_dummy_init (tree fn)
>>> +{
>>> +  tree var = DECL_BEFRIENDING_CLASSES (fn);
>>> +  tree init_fn = get_tls_init_fn (var);
>>> +  /* If have no init fn, or it is non-weak, then we do not need to make a
>>> +     dummy.  */
>>> +  if (!init_fn || !lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
>>> +    return;
>>> +  start_preparsed_function (init_fn, NULL_TREE, SF_DEFAULT | SF_PRE_PARSED);
>>> +  tree body = begin_function_body ();
>>> +  declare_weak (init_fn);
>>> +  finish_return_stmt (NULL_TREE);
>>> +  finish_function_body (body);
>>> +  expand_or_defer_fn (finish_function (/*inline_p=*/false));
>>> +}
>>> +
>>>   /* Start a global constructor or destructor function.  */
>>>     static tree
>>> @@ -4626,22 +4648,24 @@ handle_tls_init (void)
>>>     finish_expr_stmt (cp_build_modify_expr (loc, guard, NOP_EXPR,
>>>   					  boolean_true_node,
>>>   					  tf_warning_or_error));
>>> +  auto_vec<tree> direct_calls;
>>>     for (; vars; vars = TREE_CHAIN (vars))
>>>       {
>>>         tree var = TREE_VALUE (vars);
>>>         tree init = TREE_PURPOSE (vars);
>>>         one_static_initialization_or_destruction (/*initp=*/true, var, init);
>>>   -      /* Output init aliases even with -fno-extern-tls-init.  */
>>> -      if (TARGET_SUPPORTS_ALIASES && TREE_PUBLIC (var))
>>> +      /* Output inits even with -fno-extern-tls-init.
>>> +	 We save the list here and output either an alias or a stub function
>>> +	 below.  */
>>> +      if (TREE_PUBLIC (var))
>>>   	{
>>>             tree single_init_fn = get_tls_init_fn (var);
>>>   	  if (single_init_fn == NULL_TREE)
>>>   	    continue;
>>> -	  cgraph_node *alias
>>> -	    = cgraph_node::get_create (fn)->create_same_body_alias
>>> -		(single_init_fn, fn);
>>> -	  gcc_assert (alias != NULL);
>>> +	  if (single_init_fn == fn)
>>> +	    continue;
>>> +	  direct_calls.safe_push (single_init_fn);
>>>   	}
>>>       }
>>>   @@ -4649,6 +4673,30 @@ handle_tls_init (void)
>>>     finish_if_stmt (if_stmt);
>>>     finish_function_body (body);
>>>     expand_or_defer_fn (finish_function (/*inline_p=*/false));
>>> +
>>> +  /* For each TLS var that we have an init function, we either emit an alias
>>> +     between that and the tls_init, or a stub function that just calls the
>>> +     tls_init.  */
>>> +  while (!direct_calls.is_empty())
>>> +    {
>>> +      tree single_init_fn = direct_calls.pop ();
>>> +      if (TARGET_SUPPORTS_ALIASES)
>>> +	{
>>> +	  cgraph_node *alias
>>> +	     = cgraph_node::get_create (fn)->create_same_body_alias
>>> +		(single_init_fn, fn);
>>> +	  gcc_assert (alias != NULL);
>>> +	}
>>> +      else
>>> +	{
>>> +	  start_preparsed_function (single_init_fn, NULL_TREE, SF_PRE_PARSED);
>>> +	  tree body = begin_function_body ();
>>> +	  tree r = build_call_expr (fn, 0);
>>> +	  finish_expr_stmt (r);
>>> +	  finish_function_body (body);
>>> +	  expand_or_defer_fn (finish_function (/*inline_p=*/false));
>>> +	}
>>> +    }
>>>   }
>>>     /* We're at the end of compilation, so generate any mangling aliases that
>>> @@ -5058,7 +5106,15 @@ c_parse_final_cleanups (void)
>>>   	    }
>>>     	  if (!DECL_INITIAL (decl) && decl_tls_wrapper_p (decl))
>>> -	    generate_tls_wrapper (decl);
>>> +	    {
>>> +	      generate_tls_wrapper (decl);
>>> +	      if (!TARGET_SUPPORTS_ALIASES)
>>> +	      /* The wrapper might have a weak reference to an init, we provide
>>> +		 a dummy function to satisfy that here.  The linker/dynamic
>>> +		 loader will override this with the actual init, if one is
>>> +		 required.  */
>>> +		generate_tls_dummy_init (decl);
>>> +	    }
>>>     	  if (!DECL_SAVED_TREE (decl))
>>>   	    continue;
>>> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc b/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
>>> new file mode 100644
>>> index 00000000000..bd586286647
>>> --- /dev/null
>>> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
>>> @@ -0,0 +1,22 @@
>>> +// PR c++/106435
>>> +#include "pr106435.h"
>>> +
>>> +//#include <iostream>
>>> +
>>> +Foo::Foo() {
>>> +  ++num_calls;
>>> +//  std::cout << "Foo::Foo(this=" << this << ")\n";
>>> +}
>>> +
>>> +int Foo::func() {
>>> +//  std::cout << "Foo::func(this=" << this << ")\n";
>>> +  return num_calls;
>>> +}
>>> +
>>> +thread_local Foo Bar::foo;
>>> +thread_local Foo Bar::baz;
>>> +
>>> +thread_local Foo F = Foo{5};
>>> +
>>> +thread_local void* tl = (void*)&F;
>>> +thread_local void* tl1 = nullptr;
>>> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435.C b/gcc/testsuite/g++.dg/cpp0x/pr106435.C
>>> new file mode 100644
>>> index 00000000000..9071e032697
>>> --- /dev/null
>>> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435.C
>>> @@ -0,0 +1,24 @@
>>> +// PR c++/106435
>>> +// { dg-do run { target c++11 } }
>>> +// { dg-additional-sources "pr106435-b.cc" }
>>> +
>>> +#include "pr106435.h"
>>> +
>>> +int num_calls = 0;
>>> +
>>> +thread_local Foo Bar::bat;
>>> +
>>> +int main() {
>>> +  int v = Bar::foo.func();
>>> +  if (v != 2)
>>> +    __builtin_abort ();
>>> +  v = Bar::bat.func();
>>> +  if (v != 3)
>>> +    __builtin_abort ();
>>> +  if (F.getX() != 5)
>>> +    __builtin_abort();
>>> +  if  (gt_tl () != (void*)&F)
>>> +    __builtin_abort();
>>> +  if  (gt_tl1 ())
>>> +    __builtin_abort();
>>> +}
>>> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435.h b/gcc/testsuite/g++.dg/cpp0x/pr106435.h
>>> new file mode 100644
>>> index 00000000000..06d3e60484f
>>> --- /dev/null
>>> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435.h
>>> @@ -0,0 +1,27 @@
>>> +// PR c++/106435
>>> +#pragma once
>>> +
>>> +extern int num_calls;
>>> +struct Foo {
>>> +  Foo();
>>> +  Foo(int _x) : x(_x) {}
>>> +  int func();
>>> +  int getX () { return x; }
>>> +  int x;
>>> +};
>>> +
>>> +struct Bar {
>>> +  thread_local static Foo foo;
>>> +  thread_local static Foo baz;
>>> +  thread_local static Foo bat;
>>> +};
>>> +
>>> +extern thread_local void* tl;
>>> +
>>> +inline void* gt_tl () {return tl;}
>>> +
>>> +extern thread_local Foo F;
>>> +
>>> +extern thread_local void* tl1;
>>> +
>>> +inline void* gt_tl1 () {return tl1;}
>
  
Iain Sandoe Jan. 4, 2023, 3:30 p.m. UTC | #5
> On 4 Jan 2023, at 15:03, Jason Merrill <jason@redhat.com> wrote:
> 
> On 1/3/23 18:17, Iain Sandoe wrote:
>>> On 3 Jan 2023, at 22:22, Jason Merrill <jason@redhat.com> wrote:
>>> 
>>> On 12/7/22 10:39, Iain Sandoe wrote:
>>>>  This has been tested on x86_64 and arm64 Darwin and on x86_64 linux gnu.
>>>>  The basic patch is live in the homebrew macOS support and so has had quite
>>>>  wide coverage on non-trivial codebases.
>>>>    OK for master?
>>>>  Iain
>>>>    Since this actually fixes wrong code, I wonder if we should also consider
>>>>  back-porting.
>>>>    --- >8 ---
>>>> The description below relates to the code path when TARGET_SUPPORTS_ALIASES is
>>>> false; current operation is maintained for targets with alias support and any
>>>> new support code should be DCEd in that case.
>>>> --
>>>> Currently, cross-tu static initialisation is not supported for targets without
>>>> alias support.
>>>> The patch adds support by building a shim function in place of the alias for
>>>> these targets; the shim simply calls the generic initialiser.  Although this is
>>>> slightly less efficient than the alias, in practice (for targets that allow
>>>> sibcalls) the penalty is a single jump when code is optimised.
>>>> From the perspective of a TU referencing an extern TLS variable, there is no
>>>> way to determine if it requires a guarded dynamic init.  So, in the referencing
>>>> TU, we build a weak reference to the potential init and check at runtime if the
>>>> init is present before calling it.  This strategy is fine for targets that have
>>>> ELF semantics, but fails at link time for Mach-O (which does not permit the
>>>> reference to be undefined in the static link).
>>>> The actual initialiser call is contained in a wrapper function, and to resolve
>>>> the Mach-O linker issue, in the TU that is referencing the var, we now generate
>>>> both the wrapper _and_ a weak definition of a dummy init function.  In the case
>>>> that there _is_ a dynamic init (in a different TU), that version will be non-weak
>>>> and will be override the weak dummy one.
>>> 
>>> IIUC, this isn't reliable in general; in specific, I believe that the glibc dynamic loader no longer prefers strong definitions to weak ones.
>> Neither does Darwin’s dynamic loader, this implemenation works there because the static linker _will_ override the weak def with a strong one.  IIUC, binutils ld does this too.
>> If we need this to work between DSOs then that potentially presents a problem (for Darwin the DSO is identified so that the symbol will be found in the library that resolved it in the static link, [but that can be defeated by forcing “flat linking”]), I am not sure if glibc dynamic loader would do something similar (although this code path is not taken on ELF targets since they have the symbol aliases).
>>> Perhaps on targets that don't allow weakrefs to be unbound,
>> Darwin would allow it if we were able to tell the static linker that the symbol is permitted to be undefined - but since we don’t know the symbol’s name outside the FE, that is not going to fly.
> 
> Can you elaborate on this?

At runtime Mach-O is much the same as ELF w.r.t weak references, the difference comes at static link time when (by default) Darwin’s linker requires all symbols to have a definition.

Darwin’s static linker has three mechanisms for allowing a weak reference (in each case, at runtime, the symbol reference will be NULL if no definition is present - so ELF-like at that point):

 1. A definition is supplied on the link line (usually in a DSO) the DSO is defined as a weak library, which means it is permitted to be absent at runtime.  [the usage we are thinking of here is not what this facility was designed for, but it can work].

2. We put -Wl,-undefined,dynamic_lookup on the link line, that allows any symbol to be undefined - it is a massive sledgehammer and not at all recommended for general code (we have to use it for things like plugins that need to resolve many symbols at runtime from their host).  NOTE that it also seems to be incompatible with some modern fixups on arm64 (i.e. it looks like Apple do not intend to guarantee it will work in the future).

3. An individual symbol maybe be specified as “allowed to be undefined” by passing -Wl,-U,_symbol

It was the third case I was thinking of - but I cannot see how to obtain the symbols easily (If we can identify them, we could arrange to emit them into some special section and then fish them out using simple-object in collect2 and apply to the generated link line).  However, this does not seem like a phase3/4 kind of change (and I do not currently have much^W any spare time either).

——

The simpler approach I was using was to provide a dummy weak definition that is always available, but which will be overridden by the static linker if a non-weak definition is provided at that time (which works fine within a single DSO). 

thanks
Iain


> 
>>> we should unconditionally emit the init function where the variable is defined, even if it does nothing, and unconditionally call it from the wrapper?
>> OK. that seems a safer option .. I will have to look at it when I have a chance.
>> thanks
>> Iain
>>> 
>>>> In the case that we have a trivial
>>>> static init (so no init in any other TU) the weak-defined dummy init will be
>>>> called (a single return insn for optimised code).  We mitigate the call to
>>>> the dummy init by reworking the wrapper code-gen path to remove the test for
>>>> the weak reference function (as it will always be true) since the static linker
>>>> will now determine the function to be called.
>>>> Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
>>>> 	PR c++/106435
>>>> gcc/c-family/ChangeLog:
>>>> 	* c-opts.cc (c_common_post_options): Allow fextern-tls-init for targets
>>>> 	without alias support.
>>>> gcc/cp/ChangeLog:
>>>> 	* decl2.cc (get_tls_init_fn): Allow targets without alias support.
>>>> 	(handle_tls_init): Emit aliases for single init functions where the
>>>> 	target supporst this, otherwise emit a stub function that calls the
>>>> 	main tls init function.  (generate_tls_dummy_init): New.
>>>> gcc/testsuite/ChangeLog:
>>>> 	* g++.dg/cpp0x/pr106435-b.cc: New file.
>>>> 	* g++.dg/cpp0x/pr106435.C: New test.
>>>> 	* g++.dg/cpp0x/pr106435.h: New file.
>>>> ---
>>>>  gcc/c-family/c-opts.cc                   |  2 +-
>>>>  gcc/cp/decl2.cc                          | 80 ++++++++++++++++++++----
>>>>  gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc | 22 +++++++
>>>>  gcc/testsuite/g++.dg/cpp0x/pr106435.C    | 24 +++++++
>>>>  gcc/testsuite/g++.dg/cpp0x/pr106435.h    | 27 ++++++++
>>>>  5 files changed, 142 insertions(+), 13 deletions(-)
>>>>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
>>>>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435.C
>>>>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr106435.h
>>>> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
>>>> index 70745aa4e7c..064645f980d 100644
>>>> --- a/gcc/c-family/c-opts.cc
>>>> +++ b/gcc/c-family/c-opts.cc
>>>> @@ -1070,7 +1070,7 @@ c_common_post_options (const char **pfilename)
>>>>      if (flag_extern_tls_init)
>>>>      {
>>>> -      if (!TARGET_SUPPORTS_ALIASES || !SUPPORTS_WEAK)
>>>> +      if (!SUPPORTS_WEAK)
>>>>  	{
>>>>  	  /* Lazy TLS initialization for a variable in another TU requires
>>>>  	     alias and weak reference support.  */
>>>> diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
>>>> index f95529a5c9a..c6550c0c2fc 100644
>>>> --- a/gcc/cp/decl2.cc
>>>> +++ b/gcc/cp/decl2.cc
>>>> @@ -3672,9 +3672,8 @@ get_tls_init_fn (tree var)
>>>>    if (!flag_extern_tls_init && DECL_EXTERNAL (var))
>>>>      return NULL_TREE;
>>>>  -  /* If the variable is internal, or if we can't generate aliases,
>>>> -     call the local init function directly.  */
>>>> -  if (!TREE_PUBLIC (var) || !TARGET_SUPPORTS_ALIASES)
>>>> +  /* If the variable is internal call the local init function directly.  */
>>>> +  if (!TREE_PUBLIC (var))
>>>>      return get_local_tls_init_fn (DECL_SOURCE_LOCATION (var));
>>>>      tree sname = mangle_tls_init_fn (var);
>>>> @@ -3811,8 +3810,12 @@ generate_tls_wrapper (tree fn)
>>>>    if (tree init_fn = get_tls_init_fn (var))
>>>>      {
>>>>        tree if_stmt = NULL_TREE;
>>>> -      /* If init_fn is a weakref, make sure it exists before calling.  */
>>>> -      if (lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
>>>> +
>>>> +      /* If init_fn is a weakref, make sure it exists before calling.
>>>> +	 If the target does not support aliases, then we will have generated
>>>> +	 a dummy weak function, so there is no need to test its existence.  */
>>>> +      if (TARGET_SUPPORTS_ALIASES &&
>>>> +	  lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
>>>>  	{
>>>>  	  if_stmt = begin_if_stmt ();
>>>>  	  tree addr = cp_build_addr_expr (init_fn, tf_warning_or_error);
>>>> @@ -3837,6 +3840,25 @@ generate_tls_wrapper (tree fn)
>>>>    expand_or_defer_fn (finish_function (/*inline_p=*/false));
>>>>  }
>>>>  +/* A dummy init function to act as a weak placeholder for a (possibly non-
>>>> +   existent) dynamic init.  */
>>>> +static void
>>>> +generate_tls_dummy_init (tree fn)
>>>> +{
>>>> +  tree var = DECL_BEFRIENDING_CLASSES (fn);
>>>> +  tree init_fn = get_tls_init_fn (var);
>>>> +  /* If have no init fn, or it is non-weak, then we do not need to make a
>>>> +     dummy.  */
>>>> +  if (!init_fn || !lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
>>>> +    return;
>>>> +  start_preparsed_function (init_fn, NULL_TREE, SF_DEFAULT | SF_PRE_PARSED);
>>>> +  tree body = begin_function_body ();
>>>> +  declare_weak (init_fn);
>>>> +  finish_return_stmt (NULL_TREE);
>>>> +  finish_function_body (body);
>>>> +  expand_or_defer_fn (finish_function (/*inline_p=*/false));
>>>> +}
>>>> +
>>>>  /* Start a global constructor or destructor function.  */
>>>>    static tree
>>>> @@ -4626,22 +4648,24 @@ handle_tls_init (void)
>>>>    finish_expr_stmt (cp_build_modify_expr (loc, guard, NOP_EXPR,
>>>>  					  boolean_true_node,
>>>>  					  tf_warning_or_error));
>>>> +  auto_vec<tree> direct_calls;
>>>>    for (; vars; vars = TREE_CHAIN (vars))
>>>>      {
>>>>        tree var = TREE_VALUE (vars);
>>>>        tree init = TREE_PURPOSE (vars);
>>>>        one_static_initialization_or_destruction (/*initp=*/true, var, init);
>>>>  -      /* Output init aliases even with -fno-extern-tls-init.  */
>>>> -      if (TARGET_SUPPORTS_ALIASES && TREE_PUBLIC (var))
>>>> +      /* Output inits even with -fno-extern-tls-init.
>>>> +	 We save the list here and output either an alias or a stub function
>>>> +	 below.  */
>>>> +      if (TREE_PUBLIC (var))
>>>>  	{
>>>>            tree single_init_fn = get_tls_init_fn (var);
>>>>  	  if (single_init_fn == NULL_TREE)
>>>>  	    continue;
>>>> -	  cgraph_node *alias
>>>> -	    = cgraph_node::get_create (fn)->create_same_body_alias
>>>> -		(single_init_fn, fn);
>>>> -	  gcc_assert (alias != NULL);
>>>> +	  if (single_init_fn == fn)
>>>> +	    continue;
>>>> +	  direct_calls.safe_push (single_init_fn);
>>>>  	}
>>>>      }
>>>>  @@ -4649,6 +4673,30 @@ handle_tls_init (void)
>>>>    finish_if_stmt (if_stmt);
>>>>    finish_function_body (body);
>>>>    expand_or_defer_fn (finish_function (/*inline_p=*/false));
>>>> +
>>>> +  /* For each TLS var that we have an init function, we either emit an alias
>>>> +     between that and the tls_init, or a stub function that just calls the
>>>> +     tls_init.  */
>>>> +  while (!direct_calls.is_empty())
>>>> +    {
>>>> +      tree single_init_fn = direct_calls.pop ();
>>>> +      if (TARGET_SUPPORTS_ALIASES)
>>>> +	{
>>>> +	  cgraph_node *alias
>>>> +	     = cgraph_node::get_create (fn)->create_same_body_alias
>>>> +		(single_init_fn, fn);
>>>> +	  gcc_assert (alias != NULL);
>>>> +	}
>>>> +      else
>>>> +	{
>>>> +	  start_preparsed_function (single_init_fn, NULL_TREE, SF_PRE_PARSED);
>>>> +	  tree body = begin_function_body ();
>>>> +	  tree r = build_call_expr (fn, 0);
>>>> +	  finish_expr_stmt (r);
>>>> +	  finish_function_body (body);
>>>> +	  expand_or_defer_fn (finish_function (/*inline_p=*/false));
>>>> +	}
>>>> +    }
>>>>  }
>>>>    /* We're at the end of compilation, so generate any mangling aliases that
>>>> @@ -5058,7 +5106,15 @@ c_parse_final_cleanups (void)
>>>>  	    }
>>>>    	  if (!DECL_INITIAL (decl) && decl_tls_wrapper_p (decl))
>>>> -	    generate_tls_wrapper (decl);
>>>> +	    {
>>>> +	      generate_tls_wrapper (decl);
>>>> +	      if (!TARGET_SUPPORTS_ALIASES)
>>>> +	      /* The wrapper might have a weak reference to an init, we provide
>>>> +		 a dummy function to satisfy that here.  The linker/dynamic
>>>> +		 loader will override this with the actual init, if one is
>>>> +		 required.  */
>>>> +		generate_tls_dummy_init (decl);
>>>> +	    }
>>>>    	  if (!DECL_SAVED_TREE (decl))
>>>>  	    continue;
>>>> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc b/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
>>>> new file mode 100644
>>>> index 00000000000..bd586286647
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
>>>> @@ -0,0 +1,22 @@
>>>> +// PR c++/106435
>>>> +#include "pr106435.h"
>>>> +
>>>> +//#include <iostream>
>>>> +
>>>> +Foo::Foo() {
>>>> +  ++num_calls;
>>>> +//  std::cout << "Foo::Foo(this=" << this << ")\n";
>>>> +}
>>>> +
>>>> +int Foo::func() {
>>>> +//  std::cout << "Foo::func(this=" << this << ")\n";
>>>> +  return num_calls;
>>>> +}
>>>> +
>>>> +thread_local Foo Bar::foo;
>>>> +thread_local Foo Bar::baz;
>>>> +
>>>> +thread_local Foo F = Foo{5};
>>>> +
>>>> +thread_local void* tl = (void*)&F;
>>>> +thread_local void* tl1 = nullptr;
>>>> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435.C b/gcc/testsuite/g++.dg/cpp0x/pr106435.C
>>>> new file mode 100644
>>>> index 00000000000..9071e032697
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435.C
>>>> @@ -0,0 +1,24 @@
>>>> +// PR c++/106435
>>>> +// { dg-do run { target c++11 } }
>>>> +// { dg-additional-sources "pr106435-b.cc" }
>>>> +
>>>> +#include "pr106435.h"
>>>> +
>>>> +int num_calls = 0;
>>>> +
>>>> +thread_local Foo Bar::bat;
>>>> +
>>>> +int main() {
>>>> +  int v = Bar::foo.func();
>>>> +  if (v != 2)
>>>> +    __builtin_abort ();
>>>> +  v = Bar::bat.func();
>>>> +  if (v != 3)
>>>> +    __builtin_abort ();
>>>> +  if (F.getX() != 5)
>>>> +    __builtin_abort();
>>>> +  if  (gt_tl () != (void*)&F)
>>>> +    __builtin_abort();
>>>> +  if  (gt_tl1 ())
>>>> +    __builtin_abort();
>>>> +}
>>>> diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435.h b/gcc/testsuite/g++.dg/cpp0x/pr106435.h
>>>> new file mode 100644
>>>> index 00000000000..06d3e60484f
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/g++.dg/cpp0x/pr106435.h
>>>> @@ -0,0 +1,27 @@
>>>> +// PR c++/106435
>>>> +#pragma once
>>>> +
>>>> +extern int num_calls;
>>>> +struct Foo {
>>>> +  Foo();
>>>> +  Foo(int _x) : x(_x) {}
>>>> +  int func();
>>>> +  int getX () { return x; }
>>>> +  int x;
>>>> +};
>>>> +
>>>> +struct Bar {
>>>> +  thread_local static Foo foo;
>>>> +  thread_local static Foo baz;
>>>> +  thread_local static Foo bat;
>>>> +};
>>>> +
>>>> +extern thread_local void* tl;
>>>> +
>>>> +inline void* gt_tl () {return tl;}
>>>> +
>>>> +extern thread_local Foo F;
>>>> +
>>>> +extern thread_local void* tl1;
>>>> +
>>>> +inline void* gt_tl1 () {return tl1;}
  
Jason Merrill Jan. 4, 2023, 3:49 p.m. UTC | #6
On 1/4/23 10:30, Iain Sandoe wrote:
> 
> 
>> On 4 Jan 2023, at 15:03, Jason Merrill <jason@redhat.com> wrote:
>>
>> On 1/3/23 18:17, Iain Sandoe wrote:
>>>> On 3 Jan 2023, at 22:22, Jason Merrill <jason@redhat.com> wrote:
>>>>
>>>> On 12/7/22 10:39, Iain Sandoe wrote:
>>>>>   This has been tested on x86_64 and arm64 Darwin and on x86_64 linux gnu.
>>>>>   The basic patch is live in the homebrew macOS support and so has had quite
>>>>>   wide coverage on non-trivial codebases.
>>>>>     OK for master?
>>>>>   Iain
>>>>>     Since this actually fixes wrong code, I wonder if we should also consider
>>>>>   back-porting.
>>>>>     --- >8 ---
>>>>> The description below relates to the code path when TARGET_SUPPORTS_ALIASES is
>>>>> false; current operation is maintained for targets with alias support and any
>>>>> new support code should be DCEd in that case.
>>>>> --
>>>>> Currently, cross-tu static initialisation is not supported for targets without
>>>>> alias support.
>>>>> The patch adds support by building a shim function in place of the alias for
>>>>> these targets; the shim simply calls the generic initialiser.  Although this is
>>>>> slightly less efficient than the alias, in practice (for targets that allow
>>>>> sibcalls) the penalty is a single jump when code is optimised.
>>>>>  From the perspective of a TU referencing an extern TLS variable, there is no
>>>>> way to determine if it requires a guarded dynamic init.  So, in the referencing
>>>>> TU, we build a weak reference to the potential init and check at runtime if the
>>>>> init is present before calling it.  This strategy is fine for targets that have
>>>>> ELF semantics, but fails at link time for Mach-O (which does not permit the
>>>>> reference to be undefined in the static link).
>>>>> The actual initialiser call is contained in a wrapper function, and to resolve
>>>>> the Mach-O linker issue, in the TU that is referencing the var, we now generate
>>>>> both the wrapper _and_ a weak definition of a dummy init function.  In the case
>>>>> that there _is_ a dynamic init (in a different TU), that version will be non-weak
>>>>> and will be override the weak dummy one.
>>>>
>>>> IIUC, this isn't reliable in general; in specific, I believe that the glibc dynamic loader no longer prefers strong definitions to weak ones.
>>> Neither does Darwin’s dynamic loader, this implemenation works there because the static linker _will_ override the weak def with a strong one.  IIUC, binutils ld does this too.
>>> If we need this to work between DSOs then that potentially presents a problem (for Darwin the DSO is identified so that the symbol will be found in the library that resolved it in the static link, [but that can be defeated by forcing “flat linking”]), I am not sure if glibc dynamic loader would do something similar (although this code path is not taken on ELF targets since they have the symbol aliases).
>>>> Perhaps on targets that don't allow weakrefs to be unbound,
>>> Darwin would allow it if we were able to tell the static linker that the symbol is permitted to be undefined - but since we don’t know the symbol’s name outside the FE, that is not going to fly.
>>
>> Can you elaborate on this?
> 
> At runtime Mach-O is much the same as ELF w.r.t weak references, the difference comes at static link time when (by default) Darwin’s linker requires all symbols to have a definition.
> 
> Darwin’s static linker has three mechanisms for allowing a weak reference (in each case, at runtime, the symbol reference will be NULL if no definition is present - so ELF-like at that point):
> 
>   1. A definition is supplied on the link line (usually in a DSO) the DSO is defined as a weak library, which means it is permitted to be absent at runtime.  [the usage we are thinking of here is not what this facility was designed for, but it can work].
> 
> 2. We put -Wl,-undefined,dynamic_lookup on the link line, that allows any symbol to be undefined - it is a massive sledgehammer and not at all recommended for general code (we have to use it for things like plugins that need to resolve many symbols at runtime from their host).  NOTE that it also seems to be incompatible with some modern fixups on arm64 (i.e. it looks like Apple do not intend to guarantee it will work in the future).
> 
> 3. An individual symbol maybe be specified as “allowed to be undefined” by passing -Wl,-U,_symbol
> 
> It was the third case I was thinking of - but I cannot see how to obtain the symbols easily (If we can identify them, we could arrange to emit them into some special section and then fish them out using simple-object in collect2 and apply to the generated link line).  However, this does not seem like a phase3/4 kind of change (and I do not currently have much^W any spare time either).

Aha, thanks.  We shouldn't need to build a list in a special section: 
collect2 could look for _ZTH* symbol references and add -U options for them.

Jason
  

Patch

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 70745aa4e7c..064645f980d 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1070,7 +1070,7 @@  c_common_post_options (const char **pfilename)
 
   if (flag_extern_tls_init)
     {
-      if (!TARGET_SUPPORTS_ALIASES || !SUPPORTS_WEAK)
+      if (!SUPPORTS_WEAK)
 	{
 	  /* Lazy TLS initialization for a variable in another TU requires
 	     alias and weak reference support.  */
diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index f95529a5c9a..c6550c0c2fc 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -3672,9 +3672,8 @@  get_tls_init_fn (tree var)
   if (!flag_extern_tls_init && DECL_EXTERNAL (var))
     return NULL_TREE;
 
-  /* If the variable is internal, or if we can't generate aliases,
-     call the local init function directly.  */
-  if (!TREE_PUBLIC (var) || !TARGET_SUPPORTS_ALIASES)
+  /* If the variable is internal call the local init function directly.  */
+  if (!TREE_PUBLIC (var))
     return get_local_tls_init_fn (DECL_SOURCE_LOCATION (var));
 
   tree sname = mangle_tls_init_fn (var);
@@ -3811,8 +3810,12 @@  generate_tls_wrapper (tree fn)
   if (tree init_fn = get_tls_init_fn (var))
     {
       tree if_stmt = NULL_TREE;
-      /* If init_fn is a weakref, make sure it exists before calling.  */
-      if (lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
+
+      /* If init_fn is a weakref, make sure it exists before calling.
+	 If the target does not support aliases, then we will have generated
+	 a dummy weak function, so there is no need to test its existence.  */
+      if (TARGET_SUPPORTS_ALIASES &&
+	  lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
 	{
 	  if_stmt = begin_if_stmt ();
 	  tree addr = cp_build_addr_expr (init_fn, tf_warning_or_error);
@@ -3837,6 +3840,25 @@  generate_tls_wrapper (tree fn)
   expand_or_defer_fn (finish_function (/*inline_p=*/false));
 }
 
+/* A dummy init function to act as a weak placeholder for a (possibly non-
+   existent) dynamic init.  */
+static void
+generate_tls_dummy_init (tree fn)
+{
+  tree var = DECL_BEFRIENDING_CLASSES (fn);
+  tree init_fn = get_tls_init_fn (var);
+  /* If have no init fn, or it is non-weak, then we do not need to make a
+     dummy.  */
+  if (!init_fn || !lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
+    return;
+  start_preparsed_function (init_fn, NULL_TREE, SF_DEFAULT | SF_PRE_PARSED);
+  tree body = begin_function_body ();
+  declare_weak (init_fn);
+  finish_return_stmt (NULL_TREE);
+  finish_function_body (body);
+  expand_or_defer_fn (finish_function (/*inline_p=*/false));
+}
+
 /* Start a global constructor or destructor function.  */
 
 static tree
@@ -4626,22 +4648,24 @@  handle_tls_init (void)
   finish_expr_stmt (cp_build_modify_expr (loc, guard, NOP_EXPR,
 					  boolean_true_node,
 					  tf_warning_or_error));
+  auto_vec<tree> direct_calls;
   for (; vars; vars = TREE_CHAIN (vars))
     {
       tree var = TREE_VALUE (vars);
       tree init = TREE_PURPOSE (vars);
       one_static_initialization_or_destruction (/*initp=*/true, var, init);
 
-      /* Output init aliases even with -fno-extern-tls-init.  */
-      if (TARGET_SUPPORTS_ALIASES && TREE_PUBLIC (var))
+      /* Output inits even with -fno-extern-tls-init.
+	 We save the list here and output either an alias or a stub function
+	 below.  */
+      if (TREE_PUBLIC (var))
 	{
           tree single_init_fn = get_tls_init_fn (var);
 	  if (single_init_fn == NULL_TREE)
 	    continue;
-	  cgraph_node *alias
-	    = cgraph_node::get_create (fn)->create_same_body_alias
-		(single_init_fn, fn);
-	  gcc_assert (alias != NULL);
+	  if (single_init_fn == fn)
+	    continue;
+	  direct_calls.safe_push (single_init_fn);
 	}
     }
 
@@ -4649,6 +4673,30 @@  handle_tls_init (void)
   finish_if_stmt (if_stmt);
   finish_function_body (body);
   expand_or_defer_fn (finish_function (/*inline_p=*/false));
+
+  /* For each TLS var that we have an init function, we either emit an alias
+     between that and the tls_init, or a stub function that just calls the
+     tls_init.  */
+  while (!direct_calls.is_empty())
+    {
+      tree single_init_fn = direct_calls.pop ();
+      if (TARGET_SUPPORTS_ALIASES)
+	{
+	  cgraph_node *alias
+	     = cgraph_node::get_create (fn)->create_same_body_alias
+		(single_init_fn, fn);
+	  gcc_assert (alias != NULL);
+	}
+      else
+	{
+	  start_preparsed_function (single_init_fn, NULL_TREE, SF_PRE_PARSED);
+	  tree body = begin_function_body ();
+	  tree r = build_call_expr (fn, 0);
+	  finish_expr_stmt (r);
+	  finish_function_body (body);
+	  expand_or_defer_fn (finish_function (/*inline_p=*/false));
+	}
+    }
 }
 
 /* We're at the end of compilation, so generate any mangling aliases that
@@ -5058,7 +5106,15 @@  c_parse_final_cleanups (void)
 	    }
 
 	  if (!DECL_INITIAL (decl) && decl_tls_wrapper_p (decl))
-	    generate_tls_wrapper (decl);
+	    {
+	      generate_tls_wrapper (decl);
+	      if (!TARGET_SUPPORTS_ALIASES)
+	      /* The wrapper might have a weak reference to an init, we provide
+		 a dummy function to satisfy that here.  The linker/dynamic
+		 loader will override this with the actual init, if one is
+		 required.  */
+		generate_tls_dummy_init (decl);
+	    }
 
 	  if (!DECL_SAVED_TREE (decl))
 	    continue;
diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc b/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
new file mode 100644
index 00000000000..bd586286647
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/pr106435-b.cc
@@ -0,0 +1,22 @@ 
+// PR c++/106435
+#include "pr106435.h"
+
+//#include <iostream>
+
+Foo::Foo() {
+  ++num_calls;
+//  std::cout << "Foo::Foo(this=" << this << ")\n";
+}
+
+int Foo::func() {
+//  std::cout << "Foo::func(this=" << this << ")\n";
+  return num_calls;
+}
+
+thread_local Foo Bar::foo;
+thread_local Foo Bar::baz;
+
+thread_local Foo F = Foo{5};
+
+thread_local void* tl = (void*)&F;
+thread_local void* tl1 = nullptr;
diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435.C b/gcc/testsuite/g++.dg/cpp0x/pr106435.C
new file mode 100644
index 00000000000..9071e032697
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/pr106435.C
@@ -0,0 +1,24 @@ 
+// PR c++/106435
+// { dg-do run { target c++11 } }
+// { dg-additional-sources "pr106435-b.cc" }
+
+#include "pr106435.h"
+
+int num_calls = 0;
+
+thread_local Foo Bar::bat;
+
+int main() {
+  int v = Bar::foo.func();
+  if (v != 2)
+    __builtin_abort ();
+  v = Bar::bat.func();
+  if (v != 3)
+    __builtin_abort ();
+  if (F.getX() != 5)
+    __builtin_abort();
+  if  (gt_tl () != (void*)&F)
+    __builtin_abort();
+  if  (gt_tl1 ())
+    __builtin_abort();
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/pr106435.h b/gcc/testsuite/g++.dg/cpp0x/pr106435.h
new file mode 100644
index 00000000000..06d3e60484f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/pr106435.h
@@ -0,0 +1,27 @@ 
+// PR c++/106435
+#pragma once
+
+extern int num_calls;
+struct Foo {
+  Foo();
+  Foo(int _x) : x(_x) {}
+  int func();
+  int getX () { return x; }
+  int x;
+};
+
+struct Bar {
+  thread_local static Foo foo;
+  thread_local static Foo baz;
+  thread_local static Foo bat;
+};
+
+extern thread_local void* tl;
+
+inline void* gt_tl () {return tl;}
+
+extern thread_local Foo F;
+
+extern thread_local void* tl1;
+
+inline void* gt_tl1 () {return tl1;}