From patchwork Thu Nov 4 09:42:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 47040 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8F16C3857C6B for ; Thu, 4 Nov 2021 09:47:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8F16C3857C6B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1636019249; bh=fIv5Fgs+WcaLceMq74fR7NdjLD8XS2qNmYla4CH0MZU=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=BwIgUoc9lDuiYd0TowEAuQW3YbKQGMpfadF0xaVGQQYBabQZ6P8OhfbyFQWG6m076 w2G6wk76Dtbsd7LV2Gnbq+6ObAwJKdJ3+gTgVG7NvNYMGyyZKabmpxmmiBjqnSvG0e QbrJKr1VZvZpjh8RKXe0xaFNclO3gKgDnRQgULBA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id E2EB1385802E for ; Thu, 4 Nov 2021 09:43:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E2EB1385802E Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-161--2SKF9XqPae4YNoQHplSAA-1; Thu, 04 Nov 2021 05:43:00 -0400 X-MC-Unique: -2SKF9XqPae4YNoQHplSAA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A94531015DC4; Thu, 4 Nov 2021 09:42:59 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.193.172]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 28C55A6C45; Thu, 4 Nov 2021 09:42:54 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 1A49gpZa3341886 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 4 Nov 2021 10:42:51 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 1A49goc73341885; Thu, 4 Nov 2021 10:42:50 +0100 Date: Thu, 4 Nov 2021 10:42:50 +0100 To: Jason Merrill , Richard Biener Subject: [PATCH] c++, dyninit: Optimize C++ dynamic initialization by constants into DECL_INITIAL adjustment [PR102876] Message-ID: <20211104094250.GR304296@tucnak> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches From: Jakub Jelinek Reply-To: Jakub Jelinek Cc: gcc-patches@gcc.gnu.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi! When users don't use constexpr everywhere in initialization of namespace scope non-comdat vars and the initializers aren't constant when FE is looking at them, the FE performs dynamic initialization of those variables. But after inlining and some constant propagation, we often end up with just storing constants into those variables in the _GLOBAL__sub_I_* constructor. C++ gives us permission to change some of that dynamic initialization back into static initialization - https://eel.is/c++draft/basic.start.static#3 For classes that need (dynamic) construction, I believe access to some var from other dynamic construction before that var is constructed is UB, but as the example in the above mentioned spot of C++: inline double fd() { return 1.0; } extern double d1; double d2 = d1; // unspecified: // either statically initialized to 0.0 or // dynamically initialized to 0.0 if d1 is // dynamically initialized, or 1.0 otherwise double d1 = fd(); // either initialized statically or dynamically to 1.0 some vars can be used before they are dynamically initialized and the implementation can still optimize those into static initialization. The following patch attempts to optimize some such cases back into DECL_INITIAL initializers and where possible (originally const vars without mutable members) put those vars back to .rodata etc. Because we put all dynamic initialization from a single TU into one single function (well, originally one function per priority but typically inline those back into one function), we can either have a simpler approach (from the PR it seems that is what LLVM uses) where either we manage to optimize all dynamic initializers into constant in the TU, or nothing, or by adding some markup - in the form of a pair of internal functions in this patch - around each dynamic initialization that can be optimized, we can optimize each dynamic initialization separately. The patch adds a new pass that is invoked (through gate check) only on DECL_ARTIFICIAL DECL_STATIC_CONSTRUCTOR functions, and looks there for sequences like: .DYNAMIC_INIT_START (&b, 0); b = 1; .DYNAMIC_INIT_END (&b); or .DYNAMIC_INIT_START (&e, 1); # DEBUG this => &e.f MEM[(struct S *)&e + 4B] ={v} {CLOBBER}; MEM[(struct S *)&e + 4B].a = 1; MEM[(struct S *)&e + 4B].b = 2; MEM[(struct S *)&e + 4B].c = 3; # DEBUG BEGIN_STMT MEM[(struct S *)&e + 4B].d = 6; # DEBUG this => NULL .DYNAMIC_INIT_END (&e); (where between the pair of markers everything is either debug stmts or stores of constants into the variables or their parts). The pass needs to be done late enough so that after IPA all the needed constant propagation and perhaps loop unrolling is done, on the other side should be early enough so that if we can't optimize it, we can remove those .DYNAMIC_INIT* internal calls that could prevent some further optimizations (they have fnspec such that they pretend to read the corresponding variable). Currently the optimization is only able to optimize cases where the whole variable is stored in a single store (typically scalar variables), or uses the native_{encode,interpret}* infrastructure to create or update the CONSTRUCTOR. This means that except for the first category, we can't right now handle unions or anything that needs relocations (vars containing pointers to other vars or references). I think it would be nice to incrementally add before the native_* fallback some attempt to just create or update a CONSTRUCTOR if possible. If we only see var.a.b.c.d[10].e = const; style of stores, this shouldn't be that hard as the whole access path is recorded there and we'd just need to decide what to do with unions if two or more union members are accessed. And do a deep copy of the CONSTRUCTOR and try to efficiently update the copy afterwards (the CONSTRUCTORs should be sorted on increasing offsets of the members/elements, so doing an ordered vec insertion might not be the best idea). But MEM_REFs complicate this, parts or all of the access path is lost. For non-unions in most cases we could try to guess which field it is (do we have some existing function to do that? I vaguely remember we've been doing that in some cases in the past in some folding but stopped doing so) but with unions it will be harder or impossible. As the middle-end can't easily differentiate between const variables without and with mutable members, both of those will have TREE_READONLY on the var decl clear (because of dynamic initialization) and TYPE_READONLY set on the type, the patch remembers this in an extra argument to .DYNAMIC_INIT_START (true if it is ok to set TREE_READONLY on the var decl back if the var dynamic initialization could be optimized into DECL_INITIAL). Thinking more about it, I'm not sure about const vars without mutable members with non-trivial destructors, do we register their dtors dynamically through __cxa_atexit in the ctors (that would mean the optimization currently punts on them), or not (in that case we could put it into .rodata even when the dtor will want to perhaps write to them)? Anyway, forgot to do another set of bootstraps with gathering statistics how many vars were optimized, so just trying to figure it out from the sizes of _GLOBAL__sub_I_* functions: # Without patch, x86_64-linux cc1plus $ readelf -Ws obj50/gcc/cc1plus | grep ' _GLOBAL__sub_I_' | awk 'BEGIN{I=0}{I=I+$3}END{print I}' 13934 # With the patch, x86_64-linux cc1plus $ readelf -Ws obj52/gcc/cc1plus | grep ' _GLOBAL__sub_I_' | awk 'BEGIN{I=0}{I=I+$3}END{print I}' 6966 # Without patch, i686-linux cc1plus $ readelf -Ws obj51/gcc/cc1plus | grep ' _GLOBAL__sub_I_' | awk 'BEGIN{I=0}{I=I+$3}END{print I}' 24158 # With the patch, i686-linux cc1plus $ readelf -Ws obj53/gcc/cc1plus | grep ' _GLOBAL__sub_I_' | awk 'BEGIN{I=0}{I=I+$3}END{print I}' 10536 That seems like a huge improvement, although on a closer look, most of that saving is from just one TU: $ readelf -Ws obj50/gcc/i386-options.o | grep ' _GLOBAL__sub_I_' | awk '{print $3}' 6693 $ readelf -Ws obj52/gcc/i386-options.o | grep ' _GLOBAL__sub_I_' | awk '{print $3}' 1 $ readelf -Ws obj51/gcc/i386-options.o | grep ' _GLOBAL__sub_I_' | awk '{print $3}' 13001 $ readelf -Ws obj53/gcc/i386-options.o | grep ' _GLOBAL__sub_I_' | awk '{print $3}' 1 So, the shrinking on all the dynamic initialization functions except i386-options.o is: 7241 -> 6965 for 64-bit and 11157 -> 10535 for 32-bit. Will try to use constexpr for i386-options.c later today. Another optimization that could be useful but not sure if it can be easily done is if we before expansion of the _GLOBAL__sub_I_* functions end up with nothing in their body (that's those 1 byte functions on x86) perhaps either not emit those functions at all or at least don't register them in .init_array etc. so that cycles aren't wasted at runtime: $ readelf -Ws obj50/gcc/{*,*/*}.o | grep ' _GLOBAL__sub_I_' | awk '($3 == 1){print $3}' | wc -l 4 $ readelf -Ws obj52/gcc/{*,*/*}.o | grep ' _GLOBAL__sub_I_' | awk '($3 == 1){print $3}' | wc -l 87 $ readelf -Ws obj51/gcc/{*,*/*}.o | grep ' _GLOBAL__sub_I_' | awk '($3 == 1){print $3}' | wc -l 4 $ readelf -Ws obj53/gcc/{*,*/*}.o | grep ' _GLOBAL__sub_I_' | awk '($3 == 1){print $3}' | wc -l 84 Also, wonder if I should add some new -f* option to control the optimization or doing it always at -O+ with -fdisable-tree-pass-dyninit as a way to disable it is good enough, and whether the 1024 hardcoded constant (upper bound on optimized size so that we don't spend huge amounts of compile time trying to optimize initializers of gigabyte sizes) shouldn't be a param. Bootstrapped/regtested on x86_64-linux and i686-linux. 2021-11-04 Jakub Jelinek PR c++/102876 gcc/ * internal-fn.def (DYNAMIC_INIT_START, DYNAMIC_INIT_END): New internal functions. * internal-fn.c (expand_DYNAMIC_INIT_START, expand_DYNAMIC_INIT_END): New functions. * tree-pass.h (make_pass_dyninit): Declare. * passes.def (pass_dyninit): Add after dce4. * gimple-ssa-store-merging.c (pass_data_dyninit): New variable. (class pass_dyninit): New type. (pass_dyninit::execute): New method. (make_pass_dyninit): New function. gcc/cp/ * decl2.c (one_static_initialization_or_destruction): Emit .DYNAMIC_INIT_START and .DYNAMIC_INIT_END internal calls around dynamic initialization of variables that don't need a guard. gcc/testsuite/ * g++.dg/opt/init3.C: New test. Jakub --- gcc/internal-fn.def.jj 2021-11-02 09:05:47.029664211 +0100 +++ gcc/internal-fn.def 2021-11-02 12:40:38.702436113 +0100 @@ -367,6 +367,10 @@ DEF_INTERNAL_FN (PHI, 0, NULL) automatic variable. */ DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) +/* Mark start and end of dynamic initialization of a variable. */ +DEF_INTERNAL_FN (DYNAMIC_INIT_START, ECF_LEAF | ECF_NOTHROW, ". r ") +DEF_INTERNAL_FN (DYNAMIC_INIT_END, ECF_LEAF | ECF_NOTHROW, ". r ") + /* DIM_SIZE and DIM_POS return the size of a particular compute dimension and the executing thread's position within that dimension. DIM_POS is pure (and not const) so that it isn't --- gcc/internal-fn.c.jj 2021-11-02 09:05:47.029664211 +0100 +++ gcc/internal-fn.c 2021-11-02 12:40:38.703436099 +0100 @@ -3485,6 +3485,16 @@ expand_CO_ACTOR (internal_fn, gcall *) gcc_unreachable (); } +static void +expand_DYNAMIC_INIT_START (internal_fn, gcall *) +{ +} + +static void +expand_DYNAMIC_INIT_END (internal_fn, gcall *) +{ +} + /* Expand a call to FN using the operands in STMT. FN has a single output operand and NARGS input operands. */ --- gcc/tree-pass.h.jj 2021-10-28 11:29:01.891721153 +0200 +++ gcc/tree-pass.h 2021-11-02 14:15:00.139185088 +0100 @@ -445,6 +445,7 @@ extern gimple_opt_pass *make_pass_cse_re extern gimple_opt_pass *make_pass_cse_sincos (gcc::context *ctxt); extern gimple_opt_pass *make_pass_optimize_bswap (gcc::context *ctxt); extern gimple_opt_pass *make_pass_store_merging (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_dyninit (gcc::context *ctxt); extern gimple_opt_pass *make_pass_optimize_widening_mul (gcc::context *ctxt); extern gimple_opt_pass *make_pass_warn_function_return (gcc::context *ctxt); extern gimple_opt_pass *make_pass_warn_function_noreturn (gcc::context *ctxt); --- gcc/passes.def.jj 2021-11-01 14:37:06.685853324 +0100 +++ gcc/passes.def 2021-11-02 14:23:47.836715821 +0100 @@ -261,6 +261,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_tsan); NEXT_PASS (pass_dse); NEXT_PASS (pass_dce); + NEXT_PASS (pass_dyninit); /* Pass group that runs when 1) enabled, 2) there are loops in the function. Make sure to run pass_fix_loops before to discover/remove loops before running the gate function --- gcc/gimple-ssa-store-merging.c.jj 2021-09-01 12:06:19.488211919 +0200 +++ gcc/gimple-ssa-store-merging.c 2021-11-03 18:02:55.190015359 +0100 @@ -170,6 +170,8 @@ #include "optabs-tree.h" #include "dbgcnt.h" #include "selftest.h" +#include "cgraph.h" +#include "varasm.h" /* The maximum size (in bits) of the stores this pass should generate. */ #define MAX_STORE_BITSIZE (BITS_PER_WORD) @@ -5465,6 +5467,334 @@ pass_store_merging::execute (function *f return 0; } +/* Pass to optimize C++ dynamic initialization. */ + +const pass_data pass_data_dyninit = { + GIMPLE_PASS, /* type */ + "dyninit", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_GIMPLE_STORE_MERGING, /* tv_id */ + PROP_ssa, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_dyninit : public gimple_opt_pass +{ +public: + pass_dyninit (gcc::context *ctxt) + : gimple_opt_pass (pass_data_dyninit, ctxt) + { + } + + virtual bool + gate (function *fun) + { + return (DECL_ARTIFICIAL (fun->decl) + && DECL_STATIC_CONSTRUCTOR (fun->decl) + && optimize); + } + + virtual unsigned int execute (function *); +}; // class pass_dyninit + +unsigned int +pass_dyninit::execute (function *fun) +{ + basic_block bb; + auto_vec ifns; + hash_map *map = NULL; + auto_vec vars; + gimple **cur = NULL; + bool ssdf_calls = false; + + FOR_EACH_BB_FN (bb, fun) + { + for (gimple_stmt_iterator gsi = gsi_after_labels (bb); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (is_gimple_debug (stmt)) + continue; + + /* The C++ FE can wrap dynamic initialization of certain + variables with a pair of iternal function calls, like: + .DYNAMIC_INIT_START (&b, 0); + b = 1; + .DYNAMIC_INIT_END (&b); + + or + .DYNAMIC_INIT_START (&e, 1); + # DEBUG this => &e.f + MEM[(struct S *)&e + 4B] ={v} {CLOBBER}; + MEM[(struct S *)&e + 4B].a = 1; + MEM[(struct S *)&e + 4B].b = 2; + MEM[(struct S *)&e + 4B].c = 3; + # DEBUG BEGIN_STMT + MEM[(struct S *)&e + 4B].d = 6; + # DEBUG this => NULL + .DYNAMIC_INIT_END (&e); + + Verify if there are only stores of constants to the corresponding + variable or parts of that variable and if so, try to reconstruct + a static initializer from the static initializer if any and + the constant stores into the variable. This is permitted by + [basic.start.static]/3. */ + if (is_gimple_call (stmt)) + { + if (gimple_call_internal_p (stmt, IFN_DYNAMIC_INIT_START)) + { + ifns.safe_push (stmt); + if (cur) + *cur = NULL; + tree arg = gimple_call_arg (stmt, 0); + gcc_assert (TREE_CODE (arg) == ADDR_EXPR + && DECL_P (TREE_OPERAND (arg, 0))); + tree var = TREE_OPERAND (arg, 0); + gcc_checking_assert (is_global_var (var)); + varpool_node *node = varpool_node::get (var); + if (node == NULL + || node->in_other_partition + || TREE_ASM_WRITTEN (var) + || DECL_SIZE_UNIT (var) == NULL_TREE + || !tree_fits_uhwi_p (DECL_SIZE_UNIT (var)) + || tree_to_uhwi (DECL_SIZE_UNIT (var)) > 1024 + || TYPE_SIZE_UNIT (TREE_TYPE (var)) == NULL_TREE + || !tree_int_cst_equal (TYPE_SIZE_UNIT (TREE_TYPE (var)), + DECL_SIZE_UNIT (var))) + continue; + if (map == NULL) + map = new hash_map (61); + bool existed_p; + cur = &map->get_or_insert (var, &existed_p); + if (existed_p) + { + /* Punt if we see more than one .DYNAMIC_INIT_START + internal call for the same variable. */ + *cur = NULL; + cur = NULL; + } + else + { + *cur = stmt; + vars.safe_push (var); + } + continue; + } + else if (gimple_call_internal_p (stmt, IFN_DYNAMIC_INIT_END)) + { + ifns.safe_push (stmt); + tree arg = gimple_call_arg (stmt, 0); + gcc_assert (TREE_CODE (arg) == ADDR_EXPR + && DECL_P (TREE_OPERAND (arg, 0))); + tree var = TREE_OPERAND (arg, 0); + gcc_checking_assert (is_global_var (var)); + if (cur) + { + /* Punt if .DYNAMIC_INIT_END call argument doesn't + pair with .DYNAMIC_INIT_START. */ + if (vars.last () != var) + *cur = NULL; + cur = NULL; + } + continue; + } + + /* Punt if we see any artificial + __static_initialization_and_destruction_* calls, e.g. if + it would be partially inlined, because we wouldn't then see + all .DYNAMIC_INIT_* calls. */ + tree fndecl = gimple_call_fndecl (stmt); + if (fndecl + && DECL_ARTIFICIAL (fndecl) + && DECL_NAME (fndecl) + && startswith (IDENTIFIER_POINTER (DECL_NAME (fndecl)), + "__static_initialization_and_destruction_")) + ssdf_calls = true; + } + if (cur) + { + if (store_valid_for_store_merging_p (stmt)) + { + tree lhs = gimple_assign_lhs (stmt); + tree rhs = gimple_assign_rhs1 (stmt); + poly_int64 bitsize, bitpos; + HOST_WIDE_INT ibitsize, ibitpos; + machine_mode mode; + int unsignedp, reversep, volatilep = 0; + tree offset; + tree var = vars.last (); + if (rhs_valid_for_store_merging_p (rhs) + && get_inner_reference (lhs, &bitsize, &bitpos, &offset, + &mode, &unsignedp, &reversep, + &volatilep) == var + && !reversep + && !volatilep + && (offset == NULL_TREE || integer_zerop (offset)) + && bitsize.is_constant (&ibitsize) + && bitpos.is_constant (&ibitpos) + && ibitpos >= 0 + && ibitsize <= tree_to_shwi (DECL_SIZE (var)) + && ibitsize + ibitpos <= tree_to_shwi (DECL_SIZE (var))) + continue; + } + *cur = NULL; + cur = NULL; + } + } + if (cur) + { + *cur = NULL; + cur = NULL; + } + } + if (map && !ssdf_calls) + { + for (tree var : vars) + { + gimple *g = *map->get (var); + if (g == NULL) + continue; + varpool_node *node = varpool_node::get (var); + node->get_constructor (); + tree init = DECL_INITIAL (var); + if (init == NULL) + init = build_zero_cst (TREE_TYPE (var)); + gimple_stmt_iterator gsi = gsi_for_stmt (g); + unsigned char *buf = NULL; + unsigned int buf_size = tree_to_uhwi (DECL_SIZE_UNIT (var)); + bool buf_valid = false; + do + { + gsi_next (&gsi); + gimple *stmt = gsi_stmt (gsi); + if (is_gimple_debug (stmt)) + continue; + if (is_gimple_call (stmt)) + break; + if (gimple_clobber_p (stmt)) + continue; + tree lhs = gimple_assign_lhs (stmt); + tree rhs = gimple_assign_rhs1 (stmt); + if (lhs == var) + { + /* Simple assignment to the whole variable. + rhs is the initializer. */ + buf_valid = false; + init = rhs; + continue; + } + poly_int64 bitsize, bitpos; + machine_mode mode; + int unsignedp, reversep, volatilep = 0; + tree offset; + get_inner_reference (lhs, &bitsize, &bitpos, &offset, + &mode, &unsignedp, &reversep, &volatilep); + HOST_WIDE_INT ibitsize = bitsize.to_constant (); + HOST_WIDE_INT ibitpos = bitpos.to_constant (); + if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN + || CHAR_BIT != 8 + || BITS_PER_UNIT != 8) + { + g = NULL; + break; + } + if (!buf_valid) + { + if (buf == NULL) + buf = XNEWVEC (unsigned char, buf_size * 2); + memset (buf, 0, buf_size); + if (native_encode_initializer (init, buf, buf_size) + != (int) buf_size) + { + g = NULL; + break; + } + buf_valid = true; + } + /* Otherwise go through byte representation. */ + if (!encode_tree_to_bitpos (rhs, buf, ibitsize, + ibitpos, buf_size)) + { + g = NULL; + break; + } + } + while (1); + if (g == NULL) + { + XDELETE (buf); + continue; + } + if (buf_valid) + { + init = native_interpret_aggregate (TREE_TYPE (var), buf, 0, + buf_size); + if (init) + { + /* Verify the dynamic initialization doesn't e.g. set + some padding bits to non-zero by trying to encode + it again and comparing. */ + memset (buf + buf_size, 0, buf_size); + if (native_encode_initializer (init, buf + buf_size, + buf_size) != (int) buf_size + || memcmp (buf, buf + buf_size, buf_size) != 0) + init = NULL_TREE; + } + } + XDELETE (buf); + if (!init || !initializer_constant_valid_p (init, TREE_TYPE (var))) + continue; + if (integer_nonzerop (gimple_call_arg (g, 1))) + TREE_READONLY (var) = 1; + if (dump_file) + { + fprintf (dump_file, "dynamic initialization of "); + print_generic_stmt (dump_file, var, TDF_SLIM); + fprintf (dump_file, " optimized into: "); + print_generic_stmt (dump_file, init, TDF_SLIM); + if (TREE_READONLY (var)) + fprintf (dump_file, " and making it read-only\n"); + fprintf (dump_file, "\n"); + } + if (initializer_zerop (init)) + DECL_INITIAL (var) = NULL_TREE; + else + DECL_INITIAL (var) = init; + gsi = gsi_for_stmt (g); + gsi_next (&gsi); + do + { + gimple *stmt = gsi_stmt (gsi); + if (is_gimple_debug (stmt)) + { + gsi_next (&gsi); + continue; + } + if (is_gimple_call (stmt)) + break; + /* Remove now all the stores for the dynamic initialization. */ + unlink_stmt_vdef (stmt); + gsi_remove (&gsi, true); + if (gimple_vdef (stmt)) + release_ssa_name (gimple_vdef (stmt)); + } + while (1); + } + } + delete map; + for (gimple *g : ifns) + { + gimple_stmt_iterator gsi = gsi_for_stmt (g); + unlink_stmt_vdef (g); + gsi_remove (&gsi, true); + if (gimple_vdef (g)) + release_ssa_name (gimple_vdef (g)); + } + return 0; +} } // anon namespace /* Construct and return a store merging pass object. */ @@ -5475,6 +5805,14 @@ make_pass_store_merging (gcc::context *c return new pass_store_merging (ctxt); } +/* Construct and return a dyninit pass object. */ + +gimple_opt_pass * +make_pass_dyninit (gcc::context *ctxt) +{ + return new pass_dyninit (ctxt); +} + #if CHECKING_P namespace selftest { --- gcc/cp/decl2.c.jj 2021-11-02 09:05:47.004664566 +0100 +++ gcc/cp/decl2.c 2021-11-03 17:18:11.395288518 +0100 @@ -4133,13 +4133,36 @@ one_static_initialization_or_destruction { if (init) { + bool sanitize = sanitize_flags_p (SANITIZE_ADDRESS, decl); + if (optimize && guard == NULL_TREE && !sanitize) + { + tree t = build_fold_addr_expr (decl); + tree type = TREE_TYPE (decl); + tree is_const + = constant_boolean_node (TYPE_READONLY (type) + && !cp_has_mutable_p (type), + boolean_type_node); + t = build_call_expr_internal_loc (DECL_SOURCE_LOCATION (decl), + IFN_DYNAMIC_INIT_START, + void_type_node, 2, t, + is_const); + finish_expr_stmt (t); + } finish_expr_stmt (init); - if (sanitize_flags_p (SANITIZE_ADDRESS, decl)) + if (sanitize) { varpool_node *vnode = varpool_node::get (decl); if (vnode) vnode->dynamically_initialized = 1; } + else if (optimize && guard == NULL_TREE) + { + tree t = build_fold_addr_expr (decl); + t = build_call_expr_internal_loc (DECL_SOURCE_LOCATION (decl), + IFN_DYNAMIC_INIT_END, + void_type_node, 1, t); + finish_expr_stmt (t); + } } /* If we're using __cxa_atexit, register a function that calls the --- gcc/testsuite/g++.dg/opt/init3.C.jj 2021-11-03 17:53:01.872472570 +0100 +++ gcc/testsuite/g++.dg/opt/init3.C 2021-11-03 17:52:57.484535115 +0100 @@ -0,0 +1,31 @@ +// PR c++/102876 +// { dg-do compile } +// { dg-options "-O2 -fdump-tree-dyninit" } +// { dg-final { scan-tree-dump "dynamic initialization of b\[\n\r]* optimized into: 1" "dyninit" } } +// { dg-final { scan-tree-dump "dynamic initialization of e\[\n\r]* optimized into: {.e=5, .f={.a=1, .b=2, .c=3, .d=6}, .g=6}\[\n\r]* and making it read-only" "dyninit" } } +// { dg-final { scan-tree-dump "dynamic initialization of f\[\n\r]* optimized into: {.e=7, .f={.a=1, .b=2, .c=3, .d=6}, .g=1}" "dyninit" } } +// { dg-final { scan-tree-dump "dynamic initialization of h\[\n\r]* optimized into: {.h=8, .i={.a=1, .b=2, .c=3, .d=6}, .j=9}" "dyninit" } } +// { dg-final { scan-tree-dump-times "dynamic initialization of " 4 "dyninit" } } +// { dg-final { scan-tree-dump-times "and making it read-only" 1 "dyninit" } } + +struct S { S () : a(1), b(2), c(3), d(4) { d += 2; } int a, b, c, d; }; +struct T { int e; S f; int g; }; +struct U { int h; mutable S i; int j; }; +extern int b; +int foo (int &); +int bar (int &); +int baz () { return 1; } +int qux () { return b = 2; } +// Dynamic initialization of a shouldn't be optimized, foo can't be inlined. +int a = foo (b); +int b = baz (); +// Likewise for c. +int c = bar (b); +// While qux is inlined, the dynamic initialization modifies another +// variable, so punt for d as well. +int d = qux (); +const T e = { 5, S (), 6 }; +T f = { 7, S (), baz () }; +const T &g = e; +const U h = { 8, S (), 9 }; +const U &i = h;