From patchwork Sat Jan 15 14:08:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 50063 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E1D29385841F for ; Sat, 15 Jan 2022 14:09:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E1D29385841F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1642255762; bh=LjJ7hmwfmfWgBfr8gjEfRSmhuBkAsoFtGifImhdSMDk=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=lshGr0wRYeuZ9lCEg2csHWQKA3fvyVQPYzsgleVKAaiNhuPAdyYCFZXvu2gSuTKcx KeQHXDhZOAMc4WgM8OHiDgMBLs/97zMFT2Te+2jDEGf9C8VzTP1ByDPxQk4cZ6V2OR 8PVJomS0nK7/pmnuVsdKX/RKz47PY/VGx58ByJss= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 49DE83858D35 for ; Sat, 15 Jan 2022 14:08:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 49DE83858D35 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-177-gEbMkOMfMumb75yIYLVh-g-1; Sat, 15 Jan 2022 09:08:46 -0500 X-MC-Unique: gEbMkOMfMumb75yIYLVh-g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A23BC835B92; Sat, 15 Jan 2022 14:08:45 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.195.246]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B92E27E122; Sat, 15 Jan 2022 14:08:41 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 20FE8c032101750 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Sat, 15 Jan 2022 15:08:39 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 20FE8cL62101749; Sat, 15 Jan 2022 15:08:38 +0100 Date: Sat, 15 Jan 2022 15:08:38 +0100 To: Richard Biener , Jason Merrill Subject: [PATCH] c++, dyninit, v2: Optimize C++ dynamic initialization by constants into DECL_INITIAL adjustment [PR102876] Message-ID: <20220115140838.GX2646553@tucnak> References: <20211104094250.GR304296@tucnak> <55rn2n7-6r3o-q81o-2qsr-snnp178sn840@fhfr.qr> MIME-Version: 1.0 In-Reply-To: <55rn2n7-6r3o-q81o-2qsr-snnp178sn840@fhfr.qr> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches From: Jakub Jelinek Reply-To: Jakub Jelinek Cc: gcc-patches@gcc.gnu.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi! Sorry for the delay in response. On Thu, Dec 02, 2021 at 02:35:25PM +0100, Richard Biener wrote: > So with > > +/* Mark start and end of dynamic initialization of a variable. */ > +DEF_INTERNAL_FN (DYNAMIC_INIT_START, ECF_LEAF | ECF_NOTHROW, ". r ") > +DEF_INTERNAL_FN (DYNAMIC_INIT_END, ECF_LEAF | ECF_NOTHROW, ". r ") > > there's nothing preventing code motion of unrelated stmts into > the block, but that should be harmless. What it also does > is make 'e' aliased (because it's address is now taken), probably > relevant only for IPA/LTO or for statics. > > The setup does not prevent CSEing the inits with uses from another > initializer - probably OK as well (if not then .DYNAMIC_INIT_END > should also be considered writing to 'e'). > > ". r " also means it clobbers and uses all global memory, I think > we'd like to have it const + looping-pure-or-const. ".cr " would > possibly achieve this, not sure about the looping part. https://eel.is/c++draft/basic.start.static#3 in https://eel.is/c++draft/basic.start.static#3.1 says that the optimization I'd like to do is only allowed if "the dynamic version of the initialization does not change the value of any other object of static or thread storage duration prior to its initialization", so we at least need to ensure that stores from other dynamic initialization or stores from this dynamic initialization aren't moved across the .DYNAMIC_INIT_{START,END} calls (i.e. that if a dynamic initialization of a also stores to b, we don't move the store to b somewhere else and optimize it, or we don't move store to some unrelated var into the block and punt because of that). > > Currently the optimization is only able to optimize cases where the whole > > variable is stored in a single store (typically scalar variables), or > > uses the native_{encode,interpret}* infrastructure to create or update > > the CONSTRUCTOR. This means that except for the first category, we can't > > right now handle unions or anything that needs relocations (vars containing > > pointers to other vars or references). > > I think it would be nice to incrementally add before the native_* fallback > > some attempt to just create or update a CONSTRUCTOR if possible. If we only > > see var.a.b.c.d[10].e = const; style of stores, this shouldn't be that hard > > as the whole access path is recorded there and we'd just need to decide what > > to do with unions if two or more union members are accessed. And do a deep > > copy of the CONSTRUCTOR and try to efficiently update the copy afterwards > > (the CONSTRUCTORs should be sorted on increasing offsets of the > > members/elements, so doing an ordered vec insertion might not be the best > > idea). But MEM_REFs complicate this, parts or all of the access path > > is lost. For non-unions in most cases we could try to guess which field > > it is (do we have some existing function to do that? I vaguely remember > > we've been doing that in some cases in the past in some folding but stopped > > doing so) but with unions it will be harder or impossible. > > I suppose we could, at least for non-overlapping inits, create a > new aggregate type on the fly to be able to compose the CTOR and then > view-convert it to the decls type. Would need to check that > a CTOR wrapped in a V_C_E is handled OK by varasm of course. > > An alternative way of recording the initializer (maybe just emit > it right away into asm?) would be another possibility. It would be nice to come up with something that works for those, but perhaps that can be improved incrementally (so for sure GCC13+ only)? > I also note that loops are quite common in some initializers so more > aggressively unrolling those for initializations might be a good > idea as well. Sure. > > As the middle-end can't easily differentiate between const variables without > > and with mutable members, both of those will have TREE_READONLY on the > > var decl clear (because of dynamic initialization) and TYPE_READONLY set > > on the type, the patch remembers this in an extra argument to > > .DYNAMIC_INIT_START (true if it is ok to set TREE_READONLY on the var decl > > back if the var dynamic initialization could be optimized into DECL_INITIAL). > > Thinking more about it, I'm not sure about const vars without mutable > > members with non-trivial destructors, do we register their dtors dynamically > > through __cxa_atexit in the ctors (that would mean the optimization > > currently punts on them), or not (in that case we could put it into .rodata > > even when the dtor will want to perhaps write to them)? > > I think anything like this asks for doing the whole thing at IPA level > to see which functions are "initialization" and thus need not considered > writing when the initializer is made static. I don't see how we could do it at the IPA level. Because for it we need inlining to happen and then perform at least some minimal cleanup passes (forwprop, ccp, fre, dce, cunrolli, etc.) and I really don't see how we could do those on the constructor function still during IPA. > That said, do we want to record the fact that we guarded init > with .DYNAMIC_INIT_* on the varpool node? I think we want a flag I don't have a need for it right now, if we come up with some reason, it can be surely added. > in struct function or in the cgraph node to tell whether there's > a .DYNAMIC_INIT_* in it to avoid the whole function walk of > pass_dyninit::execute which for most functions will be a noop. Good idea, implemented. > Is there any reason you run the pass before pass_store_merging? It Because the ifns prevent some optimizations, so it is a hairy decision between doing it too early after IPA (then we might optimize fewer dynamic initializations, some dynamic initialization might not yet be optimized into constant stores) and doing it too late, then we optimize perhaps slightly more dynamic initializations into constant, but for dynamic inits that really can't be optimized perhaps prevent some other optimizations (as e.g. as you wrote, the vars are address taken because of the ifns, we prevent some code motion across those etc.). E.g. if we have several adjacent dynamic initializations guarded by the same condition, we want to use just one condition for all of them etc. > I also see you gate .DYNAMIC_INIT_* creation on 'optimize' but only > scheudle the pass in the O1+ pipeline, missing out -Og. I suppose > for -Og not creating .DYNAMIC_INIT_* would be reasonable. Updated patch doesn't do this optimization for -Og. > > + Verify if there are only stores of constants to the corresponding > > + variable or parts of that variable and if so, try to reconstruct > > + a static initializer from the static initializer if any and > > + the constant stores into the variable. This is permitted by > > + [basic.start.static]/3. */ > > + if (is_gimple_call (stmt)) > > + { > > + if (gimple_call_internal_p (stmt, IFN_DYNAMIC_INIT_START)) > > this overload already tests is_gimple_call I know, but one predicate guards two gimple_call_internal_p calls and one gimple_call_fndecl case together, so it seems for the common case of a stmt other than a call it is better to check just once. > > + { > > + ifns.safe_push (stmt); > > + if (cur) > > + *cur = NULL; > > + tree arg = gimple_call_arg (stmt, 0); > > + gcc_assert (TREE_CODE (arg) == ADDR_EXPR > > + && DECL_P (TREE_OPERAND (arg, 0))); > > + tree var = TREE_OPERAND (arg, 0); > > + gcc_checking_assert (is_global_var (var)); > > + varpool_node *node = varpool_node::get (var); > > + if (node == NULL > > + || node->in_other_partition > > + || TREE_ASM_WRITTEN (var) > > + || DECL_SIZE_UNIT (var) == NULL_TREE > > + || !tree_fits_uhwi_p (DECL_SIZE_UNIT (var)) > > + || tree_to_uhwi (DECL_SIZE_UNIT (var)) > 1024 > > this should maybe be a --param? Added. > Did you do any statistics on things other than GCC what of the various > checks prevents eliding the dynamic initialization and which of those > we could mitigate in the future? No, but can do it later. > > + if (map == NULL) > > + map = new hash_map (61); > > + bool existed_p; > > + cur = &map->get_or_insert (var, &existed_p); > > + if (existed_p) > > + { > > + /* Punt if we see more than one .DYNAMIC_INIT_START > > + internal call for the same variable. */ > > how can this happen? The code in between .DYNAMIC_INIT_{START,END} can have many bbs, I'm worried about jump threading duplicating one of the calls and not the other or even duplicating both of them etc. The constant initialization can be just one, so if something duplicated it etc., we need to punt. > > + /* Punt if we see any artificial > > + __static_initialization_and_destruction_* calls, e.g. if > > + it would be partially inlined, because we wouldn't then see > > + all .DYNAMIC_INIT_* calls. */ > > + tree fndecl = gimple_call_fndecl (stmt); > > + if (fndecl > > + && DECL_ARTIFICIAL (fndecl) > > + && DECL_NAME (fndecl) > > + && startswith (IDENTIFIER_POINTER (DECL_NAME (fndecl)), > > + "__static_initialization_and_destruction_")) > > + ssdf_calls = true; > > Ugh, that looks unreliable - but how's that a problem if we saw > both START/END ifns for a decl? I'm worried about fnsplit there etc., I'd like to be sure I see all the calls for the TU at once. I've replaced the DECL_NAME + startswith test with DECL_STRUCT_FUNCTION (fndecl)->has_dynamic_init now. > > + /* Remove now all the stores for the dynamic initialization. */ > > + unlink_stmt_vdef (stmt); > > + gsi_remove (&gsi, true); > > + if (gimple_vdef (stmt)) > > + release_ssa_name (gimple_vdef (stmt)); > > release_defs () should do the trick > > > + } > > + while (1); > > + } > > + } > > + delete map; > > + for (gimple *g : ifns) > > + { > > + gimple_stmt_iterator gsi = gsi_for_stmt (g); > > + unlink_stmt_vdef (g); > > + gsi_remove (&gsi, true); > > + if (gimple_vdef (g)) > > + release_ssa_name (gimple_vdef (g)); > > likewise. Both changed. Here is an updated patch, passes the test, I'll do a full bootstrap/regtest tonight. 2022-01-15 Jakub Jelinek PR c++/102876 gcc/ * internal-fn.def (DYNAMIC_INIT_START, DYNAMIC_INIT_END): New internal functions. * internal-fn.c (expand_DYNAMIC_INIT_START, expand_DYNAMIC_INIT_END): New functions. * tree-pass.h (make_pass_dyninit): Declare. * passes.def (pass_dyninit): Add after dce4. * function.h (struct function): Add has_dynamic_init bitfield. * tree-inline.c (initialize_cfun): Copy over has_dynamic_init. (expand_call_inline): Or in has_dynamic_init from inlined fn into caller. * params.opt (--param=dynamic-init-max-size=): New param. * gimple-ssa-store-merging.c (pass_data_dyninit): New variable. (class pass_dyninit): New type. (pass_dyninit::execute): New method. (make_pass_dyninit): New function. gcc/cp/ * decl2.c (one_static_initialization_or_destruction): Emit .DYNAMIC_INIT_START and .DYNAMIC_INIT_END internal calls around dynamic initialization of variables that don't need a guard. gcc/testsuite/ * g++.dg/opt/init3.C: New test. Jakub --- gcc/internal-fn.def.jj 2022-01-15 13:56:45.396986086 +0100 +++ gcc/internal-fn.def 2022-01-15 13:58:25.431567672 +0100 @@ -373,6 +373,10 @@ DEF_INTERNAL_FN (PHI, 0, NULL) automatic variable. */ DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) +/* Mark start and end of dynamic initialization of a variable. */ +DEF_INTERNAL_FN (DYNAMIC_INIT_START, ECF_LEAF | ECF_NOTHROW, ".cr ") +DEF_INTERNAL_FN (DYNAMIC_INIT_END, ECF_LEAF | ECF_NOTHROW, ".cr ") + /* DIM_SIZE and DIM_POS return the size of a particular compute dimension and the executing thread's position within that dimension. DIM_POS is pure (and not const) so that it isn't --- gcc/internal-fn.c.jj 2022-01-15 13:56:45.396986086 +0100 +++ gcc/internal-fn.c 2022-01-15 13:58:05.888844773 +0100 @@ -3558,6 +3558,16 @@ expand_CO_ACTOR (internal_fn, gcall *) gcc_unreachable (); } +static void +expand_DYNAMIC_INIT_START (internal_fn, gcall *) +{ +} + +static void +expand_DYNAMIC_INIT_END (internal_fn, gcall *) +{ +} + /* Expand a call to FN using the operands in STMT. FN has a single output operand and NARGS input operands. */ --- gcc/tree-pass.h.jj 2022-01-12 23:45:44.741771603 +0100 +++ gcc/tree-pass.h 2022-01-15 13:58:05.888844773 +0100 @@ -446,6 +446,7 @@ extern gimple_opt_pass *make_pass_cse_re extern gimple_opt_pass *make_pass_cse_sincos (gcc::context *ctxt); extern gimple_opt_pass *make_pass_optimize_bswap (gcc::context *ctxt); extern gimple_opt_pass *make_pass_store_merging (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_dyninit (gcc::context *ctxt); extern gimple_opt_pass *make_pass_optimize_widening_mul (gcc::context *ctxt); extern gimple_opt_pass *make_pass_warn_function_return (gcc::context *ctxt); extern gimple_opt_pass *make_pass_warn_function_noreturn (gcc::context *ctxt); --- gcc/passes.def.jj 2022-01-11 23:11:22.802284364 +0100 +++ gcc/passes.def 2022-01-15 13:58:05.888844773 +0100 @@ -262,6 +262,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_tsan); NEXT_PASS (pass_dse); NEXT_PASS (pass_dce); + NEXT_PASS (pass_dyninit); /* Pass group that runs when 1) enabled, 2) there are loops in the function. Make sure to run pass_fix_loops before to discover/remove loops before running the gate function --- gcc/function.h.jj 2022-01-11 23:11:22.517288369 +0100 +++ gcc/function.h 2022-01-15 14:18:40.067344670 +0100 @@ -438,6 +438,10 @@ struct GTY(()) function { /* Set if there are any OMP_TARGET regions in the function. */ unsigned int has_omp_target : 1; + + /* Set if there are any .DYNAMIC_INIT_{START,END} calls in the + function. */ + unsigned int has_dynamic_init : 1; }; /* Add the decl D to the local_decls list of FUN. */ --- gcc/tree-inline.c.jj 2022-01-13 15:59:44.471730408 +0100 +++ gcc/tree-inline.c 2022-01-15 14:22:27.544118959 +0100 @@ -2829,6 +2829,7 @@ initialize_cfun (tree new_fndecl, tree c cfun->can_delete_dead_exceptions = src_cfun->can_delete_dead_exceptions; cfun->returns_struct = src_cfun->returns_struct; cfun->returns_pcc_struct = src_cfun->returns_pcc_struct; + cfun->has_dynamic_init = src_cfun->has_dynamic_init; init_empty_tree_cfg (); @@ -5003,6 +5004,7 @@ expand_call_inline (basic_block bb, gimp dst_cfun->calls_eh_return |= id->src_cfun->calls_eh_return; id->dst_node->calls_declare_variant_alt |= id->src_node->calls_declare_variant_alt; + dst_cfun->has_dynamic_init |= id->src_cfun->has_dynamic_init; gcc_assert (!id->src_cfun->after_inlining); --- gcc/params.opt.jj 2022-01-11 23:11:22.801284378 +0100 +++ gcc/params.opt 2022-01-15 14:32:52.470290716 +0100 @@ -1189,4 +1189,8 @@ Enum(vrp_mode) String(vrp) Value(VRP_MOD EnumValue Enum(vrp_mode) String(ranger) Value(VRP_MODE_RANGER) +-param=dynamic-init-max-size= +Common Joined UInteger Var(param_dynamic_init_max_size) Init(1024) Param Optimization +Maximum size of a dynamically initialized namespace scope C++ variable for dynamic into constant initialization optimization. + ; This comment is to ensure we retain the blank line above. --- gcc/gimple-ssa-store-merging.c.jj 2022-01-11 23:11:22.559287779 +0100 +++ gcc/gimple-ssa-store-merging.c 2022-01-15 14:36:42.867082255 +0100 @@ -170,6 +170,8 @@ #include "optabs-tree.h" #include "dbgcnt.h" #include "selftest.h" +#include "cgraph.h" +#include "varasm.h" /* The maximum size (in bits) of the stores this pass should generate. */ #define MAX_STORE_BITSIZE (BITS_PER_WORD) @@ -5496,6 +5498,334 @@ pass_store_merging::execute (function *f return 0; } +/* Pass to optimize C++ dynamic initialization. */ + +const pass_data pass_data_dyninit = { + GIMPLE_PASS, /* type */ + "dyninit", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_GIMPLE_STORE_MERGING, /* tv_id */ + PROP_ssa, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_dyninit : public gimple_opt_pass +{ +public: + pass_dyninit (gcc::context *ctxt) + : gimple_opt_pass (pass_data_dyninit, ctxt) + { + } + + virtual bool + gate (function *fun) + { + return (DECL_ARTIFICIAL (fun->decl) + && DECL_STATIC_CONSTRUCTOR (fun->decl) + && optimize + && !optimize_debug + && fun->has_dynamic_init); + } + + virtual unsigned int execute (function *); +}; // class pass_dyninit + +unsigned int +pass_dyninit::execute (function *fun) +{ + basic_block bb; + auto_vec ifns; + hash_map *map = NULL; + auto_vec vars; + gimple **cur = NULL; + bool ssdf_calls = false; + + FOR_EACH_BB_FN (bb, fun) + { + for (gimple_stmt_iterator gsi = gsi_after_labels (bb); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (is_gimple_debug (stmt)) + continue; + + /* The C++ FE can wrap dynamic initialization of certain + variables with a pair of iternal function calls, like: + .DYNAMIC_INIT_START (&b, 0); + b = 1; + .DYNAMIC_INIT_END (&b); + + or + .DYNAMIC_INIT_START (&e, 1); + # DEBUG this => &e.f + MEM[(struct S *)&e + 4B] ={v} {CLOBBER}; + MEM[(struct S *)&e + 4B].a = 1; + MEM[(struct S *)&e + 4B].b = 2; + MEM[(struct S *)&e + 4B].c = 3; + # DEBUG BEGIN_STMT + MEM[(struct S *)&e + 4B].d = 6; + # DEBUG this => NULL + .DYNAMIC_INIT_END (&e); + + Verify if there are only stores of constants to the corresponding + variable or parts of that variable and if so, try to reconstruct + a static initializer from the static initializer if any and + the constant stores into the variable. This is permitted by + [basic.start.static]/3. */ + if (is_gimple_call (stmt)) + { + if (gimple_call_internal_p (stmt, IFN_DYNAMIC_INIT_START)) + { + ifns.safe_push (stmt); + if (cur) + *cur = NULL; + tree arg = gimple_call_arg (stmt, 0); + gcc_assert (TREE_CODE (arg) == ADDR_EXPR + && DECL_P (TREE_OPERAND (arg, 0))); + tree var = TREE_OPERAND (arg, 0); + gcc_checking_assert (is_global_var (var)); + varpool_node *node = varpool_node::get (var); + if (node == NULL + || node->in_other_partition + || TREE_ASM_WRITTEN (var) + || DECL_SIZE_UNIT (var) == NULL_TREE + || !tree_fits_uhwi_p (DECL_SIZE_UNIT (var)) + || (tree_to_uhwi (DECL_SIZE_UNIT (var)) + > (unsigned) param_dynamic_init_max_size) + || TYPE_SIZE_UNIT (TREE_TYPE (var)) == NULL_TREE + || !tree_int_cst_equal (TYPE_SIZE_UNIT (TREE_TYPE (var)), + DECL_SIZE_UNIT (var))) + continue; + if (map == NULL) + map = new hash_map (61); + bool existed_p; + cur = &map->get_or_insert (var, &existed_p); + if (existed_p) + { + /* Punt if we see more than one .DYNAMIC_INIT_START + internal call for the same variable. */ + *cur = NULL; + cur = NULL; + } + else + { + *cur = stmt; + vars.safe_push (var); + } + continue; + } + else if (gimple_call_internal_p (stmt, IFN_DYNAMIC_INIT_END)) + { + ifns.safe_push (stmt); + tree arg = gimple_call_arg (stmt, 0); + gcc_assert (TREE_CODE (arg) == ADDR_EXPR + && DECL_P (TREE_OPERAND (arg, 0))); + tree var = TREE_OPERAND (arg, 0); + gcc_checking_assert (is_global_var (var)); + if (cur) + { + /* Punt if .DYNAMIC_INIT_END call argument doesn't + pair with .DYNAMIC_INIT_START. */ + if (vars.last () != var) + *cur = NULL; + cur = NULL; + } + continue; + } + + /* Punt if we see any artificial + __static_initialization_and_destruction_* calls, e.g. if + it would be partially inlined, because we wouldn't then see + all .DYNAMIC_INIT_* calls. */ + tree fndecl = gimple_call_fndecl (stmt); + if (fndecl + && DECL_ARTIFICIAL (fndecl) + && DECL_STRUCT_FUNCTION (fndecl) + && DECL_STRUCT_FUNCTION (fndecl)->has_dynamic_init) + ssdf_calls = true; + } + if (cur) + { + if (store_valid_for_store_merging_p (stmt)) + { + tree lhs = gimple_assign_lhs (stmt); + tree rhs = gimple_assign_rhs1 (stmt); + poly_int64 bitsize, bitpos; + HOST_WIDE_INT ibitsize, ibitpos; + machine_mode mode; + int unsignedp, reversep, volatilep = 0; + tree offset; + tree var = vars.last (); + if (rhs_valid_for_store_merging_p (rhs) + && get_inner_reference (lhs, &bitsize, &bitpos, &offset, + &mode, &unsignedp, &reversep, + &volatilep) == var + && !reversep + && !volatilep + && (offset == NULL_TREE || integer_zerop (offset)) + && bitsize.is_constant (&ibitsize) + && bitpos.is_constant (&ibitpos) + && ibitpos >= 0 + && ibitsize <= tree_to_shwi (DECL_SIZE (var)) + && ibitsize + ibitpos <= tree_to_shwi (DECL_SIZE (var))) + continue; + } + *cur = NULL; + cur = NULL; + } + } + if (cur) + { + *cur = NULL; + cur = NULL; + } + } + if (map && !ssdf_calls) + { + for (tree var : vars) + { + gimple *g = *map->get (var); + if (g == NULL) + continue; + varpool_node *node = varpool_node::get (var); + node->get_constructor (); + tree init = DECL_INITIAL (var); + if (init == NULL) + init = build_zero_cst (TREE_TYPE (var)); + gimple_stmt_iterator gsi = gsi_for_stmt (g); + unsigned char *buf = NULL; + unsigned int buf_size = tree_to_uhwi (DECL_SIZE_UNIT (var)); + bool buf_valid = false; + do + { + gsi_next (&gsi); + gimple *stmt = gsi_stmt (gsi); + if (is_gimple_debug (stmt)) + continue; + if (is_gimple_call (stmt)) + break; + if (gimple_clobber_p (stmt)) + continue; + tree lhs = gimple_assign_lhs (stmt); + tree rhs = gimple_assign_rhs1 (stmt); + if (lhs == var) + { + /* Simple assignment to the whole variable. + rhs is the initializer. */ + buf_valid = false; + init = rhs; + continue; + } + poly_int64 bitsize, bitpos; + machine_mode mode; + int unsignedp, reversep, volatilep = 0; + tree offset; + get_inner_reference (lhs, &bitsize, &bitpos, &offset, + &mode, &unsignedp, &reversep, &volatilep); + HOST_WIDE_INT ibitsize = bitsize.to_constant (); + HOST_WIDE_INT ibitpos = bitpos.to_constant (); + if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN + || CHAR_BIT != 8 + || BITS_PER_UNIT != 8) + { + g = NULL; + break; + } + if (!buf_valid) + { + if (buf == NULL) + buf = XNEWVEC (unsigned char, buf_size * 2); + memset (buf, 0, buf_size); + if (native_encode_initializer (init, buf, buf_size) + != (int) buf_size) + { + g = NULL; + break; + } + buf_valid = true; + } + /* Otherwise go through byte representation. */ + if (!encode_tree_to_bitpos (rhs, buf, ibitsize, + ibitpos, buf_size)) + { + g = NULL; + break; + } + } + while (1); + if (g == NULL) + { + XDELETE (buf); + continue; + } + if (buf_valid) + { + init = native_interpret_aggregate (TREE_TYPE (var), buf, 0, + buf_size); + if (init) + { + /* Verify the dynamic initialization doesn't e.g. set + some padding bits to non-zero by trying to encode + it again and comparing. */ + memset (buf + buf_size, 0, buf_size); + if (native_encode_initializer (init, buf + buf_size, + buf_size) != (int) buf_size + || memcmp (buf, buf + buf_size, buf_size) != 0) + init = NULL_TREE; + } + } + XDELETE (buf); + if (!init || !initializer_constant_valid_p (init, TREE_TYPE (var))) + continue; + if (integer_nonzerop (gimple_call_arg (g, 1))) + TREE_READONLY (var) = 1; + if (dump_file) + { + fprintf (dump_file, "dynamic initialization of "); + print_generic_stmt (dump_file, var, TDF_SLIM); + fprintf (dump_file, " optimized into: "); + print_generic_stmt (dump_file, init, TDF_SLIM); + if (TREE_READONLY (var)) + fprintf (dump_file, " and making it read-only\n"); + fprintf (dump_file, "\n"); + } + if (initializer_zerop (init)) + DECL_INITIAL (var) = NULL_TREE; + else + DECL_INITIAL (var) = init; + gsi = gsi_for_stmt (g); + gsi_next (&gsi); + do + { + gimple *stmt = gsi_stmt (gsi); + if (is_gimple_debug (stmt)) + { + gsi_next (&gsi); + continue; + } + if (is_gimple_call (stmt)) + break; + /* Remove now all the stores for the dynamic initialization. */ + unlink_stmt_vdef (stmt); + gsi_remove (&gsi, true); + release_defs (stmt); + } + while (1); + } + } + delete map; + for (gimple *g : ifns) + { + gimple_stmt_iterator gsi = gsi_for_stmt (g); + unlink_stmt_vdef (g); + gsi_remove (&gsi, true); + release_defs (g); + } + return 0; +} } // anon namespace /* Construct and return a store merging pass object. */ @@ -5506,6 +5836,14 @@ make_pass_store_merging (gcc::context *c return new pass_store_merging (ctxt); } +/* Construct and return a dyninit pass object. */ + +gimple_opt_pass * +make_pass_dyninit (gcc::context *ctxt) +{ + return new pass_dyninit (ctxt); +} + #if CHECKING_P namespace selftest { --- gcc/cp/decl2.c.jj 2022-01-11 23:11:22.099294243 +0100 +++ gcc/cp/decl2.c 2022-01-15 14:17:25.873396764 +0100 @@ -4221,13 +4221,37 @@ one_static_initialization_or_destruction { if (init) { + bool sanitize = sanitize_flags_p (SANITIZE_ADDRESS, decl); + if (optimize && !optimize_debug && guard == NULL_TREE && !sanitize) + { + tree t = build_fold_addr_expr (decl); + tree type = TREE_TYPE (decl); + tree is_const + = constant_boolean_node (TYPE_READONLY (type) + && !cp_has_mutable_p (type), + boolean_type_node); + t = build_call_expr_internal_loc (DECL_SOURCE_LOCATION (decl), + IFN_DYNAMIC_INIT_START, + void_type_node, 2, t, + is_const); + cfun->has_dynamic_init = true; + finish_expr_stmt (t); + } finish_expr_stmt (init); - if (sanitize_flags_p (SANITIZE_ADDRESS, decl)) + if (sanitize) { varpool_node *vnode = varpool_node::get (decl); if (vnode) vnode->dynamically_initialized = 1; } + else if (optimize && !optimize_debug && guard == NULL_TREE) + { + tree t = build_fold_addr_expr (decl); + t = build_call_expr_internal_loc (DECL_SOURCE_LOCATION (decl), + IFN_DYNAMIC_INIT_END, + void_type_node, 1, t); + finish_expr_stmt (t); + } } /* If we're using __cxa_atexit, register a function that calls the --- gcc/testsuite/g++.dg/opt/init3.C.jj 2022-01-15 13:58:05.933844134 +0100 +++ gcc/testsuite/g++.dg/opt/init3.C 2022-01-15 13:58:05.933844134 +0100 @@ -0,0 +1,31 @@ +// PR c++/102876 +// { dg-do compile } +// { dg-options "-O2 -fdump-tree-dyninit" } +// { dg-final { scan-tree-dump "dynamic initialization of b\[\n\r]* optimized into: 1" "dyninit" } } +// { dg-final { scan-tree-dump "dynamic initialization of e\[\n\r]* optimized into: {.e=5, .f={.a=1, .b=2, .c=3, .d=6}, .g=6}\[\n\r]* and making it read-only" "dyninit" } } +// { dg-final { scan-tree-dump "dynamic initialization of f\[\n\r]* optimized into: {.e=7, .f={.a=1, .b=2, .c=3, .d=6}, .g=1}" "dyninit" } } +// { dg-final { scan-tree-dump "dynamic initialization of h\[\n\r]* optimized into: {.h=8, .i={.a=1, .b=2, .c=3, .d=6}, .j=9}" "dyninit" } } +// { dg-final { scan-tree-dump-times "dynamic initialization of " 4 "dyninit" } } +// { dg-final { scan-tree-dump-times "and making it read-only" 1 "dyninit" } } + +struct S { S () : a(1), b(2), c(3), d(4) { d += 2; } int a, b, c, d; }; +struct T { int e; S f; int g; }; +struct U { int h; mutable S i; int j; }; +extern int b; +int foo (int &); +int bar (int &); +int baz () { return 1; } +int qux () { return b = 2; } +// Dynamic initialization of a shouldn't be optimized, foo can't be inlined. +int a = foo (b); +int b = baz (); +// Likewise for c. +int c = bar (b); +// While qux is inlined, the dynamic initialization modifies another +// variable, so punt for d as well. +int d = qux (); +const T e = { 5, S (), 6 }; +T f = { 7, S (), baz () }; +const T &g = e; +const U h = { 8, S (), 9 }; +const U &i = h;