Message ID | 20220301220821.1732163-1-ppalka@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 63BD63858416 for <patchwork@sourceware.org>; Tue, 1 Mar 2022 22:09:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 63BD63858416 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1646172540; bh=621CdL5ZB2MSEcWcM1jghmf8W14LTfFRIH/2DT8SXBM=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=v4eFesah/aWhPfz2swITFwHsN0srOLrRBAIdqAquXl+KtRg08b5hcVDmO6Y7T8rC4 Ef0n149flb8C+bcL0Ywi61no3CcRkzPSnqLl2pJjJeGTrjUuZ4bbsFdDNq9upNheDv MUQjamOEa3fnqUCmwtpK3I7c1rDh7ktl4dTp31ew= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 873323858D39 for <gcc-patches@gcc.gnu.org>; Tue, 1 Mar 2022 22:08:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 873323858D39 Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-526-1u7zs1ygO5SQdd0CAHtG-A-1; Tue, 01 Mar 2022 17:08:28 -0500 X-MC-Unique: 1u7zs1ygO5SQdd0CAHtG-A-1 Received: by mail-qt1-f198.google.com with SMTP id p16-20020ac87410000000b002dde63e978cso8279815qtq.7 for <gcc-patches@gcc.gnu.org>; Tue, 01 Mar 2022 14:08:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=621CdL5ZB2MSEcWcM1jghmf8W14LTfFRIH/2DT8SXBM=; b=qw3EnKQpMaEsaRG8OJMBj8mQY7IMl9HbjogIBagdwy01OIiBemj3aTO8uxyhSczHdF buxaKDZYFtq1mnv+8IH2CifkiwQLlO6rm7EcW8WRC54rLJkXz4CerolCf5P8Do+SieUe zdl7auAR3BhLmv2y6tyIh2VAq0jY6Q6vxDr0kxbO6r77dBAyRWWIdpcGEneuwm5A56jr P5MiZe3WPAdEOrvN5BpHXsuAbceBAYO5tOnEWkqh8rAZBZ9XWaPSDfTUrOcsFbwbzbaW ZXdJEy0MgIiu0dLqJz5uZaot7r8JbgUbHVve5SZsnckvUMmoud7cN87DGNNvAsVQ7cVI mo4g== X-Gm-Message-State: AOAM533Bwd3tQ1imZN4/Muu9CQTq4oG0T2dEgB68lgrWV6MtYkvRJvXZ A04354t7gt8sgvE7T+K2idGfgJzzksWU3qruI2gmGnCNZEW3blKRtNuSGqNImuD3zCcOyJugfQF wNGr8DAVxNL88AfYfJU6y/w4zaMFy1fBwvNy7X5V/mMvtJ5jbp3DDeRblcqHn0sA24MA= X-Received: by 2002:ac8:5b0d:0:b0:2dd:d99a:9b6e with SMTP id m13-20020ac85b0d000000b002ddd99a9b6emr21305783qtw.643.1646172507518; Tue, 01 Mar 2022 14:08:27 -0800 (PST) X-Google-Smtp-Source: ABdhPJzEo4c363rUrwDWYoiLQ9cc16OQBfN/OBjav/Akp8clbp5EIut6oUbXnZVYFC8BJja+epGMYQ== X-Received: by 2002:ac8:5b0d:0:b0:2dd:d99a:9b6e with SMTP id m13-20020ac85b0d000000b002ddd99a9b6emr21305761qtw.643.1646172507137; Tue, 01 Mar 2022 14:08:27 -0800 (PST) Received: from localhost.localdomain (ool-18e40894.dyn.optonline.net. [24.228.8.148]) by smtp.gmail.com with ESMTPSA id p12-20020a05622a048c00b002de8f67b60dsm10417764qtx.58.2022.03.01.14.08.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Mar 2022 14:08:26 -0800 (PST) To: gcc-patches@gcc.gnu.org Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] Date: Tue, 1 Mar 2022 17:08:21 -0500 Message-Id: <20220301220821.1732163-1-ppalka@redhat.com> X-Mailer: git-send-email 2.35.1.354.g715d08a9e5 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" X-Spam-Status: No, score=-14.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Patrick Palka via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Patrick Palka <ppalka@redhat.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
c++: fold calls to std::move/forward [PR96780]
|
|
Commit Message
Patrick Palka
March 1, 2022, 10:08 p.m. UTC
A well-formed call to std::move/forward is equivalent to a cast, but the former being a function call means it comes with bloated debug info, which persists even after the call has been inlined away, for an operation that is never interesting to debug. This patch addresses this problem in a relatively ad-hoc way by folding calls to std::move/forward into casts as part of the frontend's general expression folding routine. After this patch with -O2 and a non-checking compiler, debug info size for some testcases decreases by about ~10% and overall compile time and memory usage decreases by ~2%. Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something we want to consider for GCC 12? PR c++/96780 gcc/cp/ChangeLog: * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, fold calls to std::move/forward into simple casts. * cp-tree.h (is_std_move_p, is_std_forward_p): Declare. * typeck.cc (is_std_move_p, is_std_forward_p): Export. gcc/testsuite/ChangeLog: * g++.dg/opt/pr96780.C: New test. --- gcc/cp/cp-gimplify.cc | 18 ++++++++++++++++++ gcc/cp/cp-tree.h | 2 ++ gcc/cp/typeck.cc | 6 ++---- gcc/testsuite/g++.dg/opt/pr96780.C | 24 ++++++++++++++++++++++++ 4 files changed, 46 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C
Comments
On 3/1/22 18:08, Patrick Palka wrote: > A well-formed call to std::move/forward is equivalent to a cast, but the > former being a function call means it comes with bloated debug info, which > persists even after the call has been inlined away, for an operation that > is never interesting to debug. > > This patch addresses this problem in a relatively ad-hoc way by folding > calls to std::move/forward into casts as part of the frontend's general > expression folding routine. After this patch with -O2 and a non-checking > compiler, debug info size for some testcases decreases by about ~10% and > overall compile time and memory usage decreases by ~2%. Impressive. Which testcases? Do you also want to handle addressof and as_const in this patch, as Jonathan suggested? I think we can do this now, and think about generalizing more in stage 1. > Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something we > want to consider for GCC 12? > > PR c++/96780 > > gcc/cp/ChangeLog: > > * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, > fold calls to std::move/forward into simple casts. > * cp-tree.h (is_std_move_p, is_std_forward_p): Declare. > * typeck.cc (is_std_move_p, is_std_forward_p): Export. > > gcc/testsuite/ChangeLog: > > * g++.dg/opt/pr96780.C: New test. > --- > gcc/cp/cp-gimplify.cc | 18 ++++++++++++++++++ > gcc/cp/cp-tree.h | 2 ++ > gcc/cp/typeck.cc | 6 ++---- > gcc/testsuite/g++.dg/opt/pr96780.C | 24 ++++++++++++++++++++++++ > 4 files changed, 46 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > index d7323fb5c09..0b009b631c7 100644 > --- a/gcc/cp/cp-gimplify.cc > +++ b/gcc/cp/cp-gimplify.cc > @@ -2756,6 +2756,24 @@ cp_fold (tree x) > > case CALL_EXPR: > { > + if (optimize I think this should check flag_no_inline rather than optimize. > + && (is_std_move_p (x) || is_std_forward_p (x))) > + { > + /* When optimizing, "inline" calls to std::move/forward by > + simply folding them into the corresponding cast. This is > + cheaper than relying on the inliner to do so, and also > + means we avoid generating useless debug info for them at all. > + > + At this point the argument has already been coerced into a > + reference, so it suffices to use a NOP_EXPR to express the > + reference-to-reference cast. */ > + r = CALL_EXPR_ARG (x, 0); > + if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r))) > + r = build_nop (TREE_TYPE (x), r); > + x = cp_fold (r); > + break; > + } > + > int sv = optimize, nw = sv; > tree callee = get_callee_fndecl (x); > > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h > index 37d462fca6e..ab828730b03 100644 > --- a/gcc/cp/cp-tree.h > +++ b/gcc/cp/cp-tree.h > @@ -8089,6 +8089,8 @@ extern tree finish_right_unary_fold_expr (tree, int); > extern tree finish_binary_fold_expr (tree, tree, int); > extern tree treat_lvalue_as_rvalue_p (tree, bool); > extern bool decl_in_std_namespace_p (tree); > +extern bool is_std_move_p (tree); > +extern bool is_std_forward_p (tree); > > /* in typeck2.cc */ > extern void require_complete_eh_spec_types (tree, tree); > diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc > index bddc83759ad..a3644f8e7f7 100644 > --- a/gcc/cp/typeck.cc > +++ b/gcc/cp/typeck.cc > @@ -62,8 +62,6 @@ static bool maybe_warn_about_returning_address_of_local (tree, location_t = UNKN > static void error_args_num (location_t, tree, bool); > static int convert_arguments (tree, vec<tree, va_gc> **, tree, int, > tsubst_flags_t); > -static bool is_std_move_p (tree); > -static bool is_std_forward_p (tree); > > /* Do `exp = require_complete_type (exp);' to make sure exp > does not have an incomplete type. (That includes void types.) > @@ -10207,7 +10205,7 @@ decl_in_std_namespace_p (tree decl) > > /* Returns true if FN, a CALL_EXPR, is a call to std::forward. */ > > -static bool > +bool > is_std_forward_p (tree fn) > { > /* std::forward only takes one argument. */ > @@ -10224,7 +10222,7 @@ is_std_forward_p (tree fn) > > /* Returns true if FN, a CALL_EXPR, is a call to std::move. */ > > -static bool > +bool > is_std_move_p (tree fn) > { > /* std::move only takes one argument. */ > diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C b/gcc/testsuite/g++.dg/opt/pr96780.C > new file mode 100644 > index 00000000000..ca24b2802bb > --- /dev/null > +++ b/gcc/testsuite/g++.dg/opt/pr96780.C > @@ -0,0 +1,24 @@ > +// PR c++/96780 > +// Verify calls to std::move/forward are folded away by the frontend. > +// { dg-do compile { target c++11 } } > +// { dg-additional-options "-O -fdump-tree-gimple" } > + > +#include <utility> > + > +struct A; > + > +extern A& a; > +extern const A& ca; > + > +void f() { > + auto&& x1 = std::move(a); > + auto&& x2 = std::forward<A>(a); > + auto&& x3 = std::forward<A&>(a); > + > + auto&& x4 = std::move(ca); > + auto&& x5 = std::forward<const A>(ca); > + auto&& x6 = std::forward<const A&>(ca); > +} > + > +// { dg-final { scan-tree-dump-not "= std::move" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::forward" "gimple" } }
On Wed, 9 Mar 2022, Jason Merrill wrote: > On 3/1/22 18:08, Patrick Palka wrote: > > A well-formed call to std::move/forward is equivalent to a cast, but the > > former being a function call means it comes with bloated debug info, which > > persists even after the call has been inlined away, for an operation that > > is never interesting to debug. > > > > This patch addresses this problem in a relatively ad-hoc way by folding > > calls to std::move/forward into casts as part of the frontend's general > > expression folding routine. After this patch with -O2 and a non-checking > > compiler, debug info size for some testcases decreases by about ~10% and > > overall compile time and memory usage decreases by ~2%. > > Impressive. Which testcases? I saw the largest percent reductions in debug file object size in various tests from cmcstl2 and range-v3, e.g. test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp (which are among their biggest tests). Significant reductions in debug object file size can be observed in some libstdc++ testcases too, such as a 5.5% reduction in std/ranges/adaptor/join.cc > > Do you also want to handle addressof and as_const in this patch, as Jonathan > suggested? Yes, good idea. Since each of their argument and return types are indirect types, I think we can use the same NOP_EXPR-based folding for them. > > I think we can do this now, and think about generalizing more in stage 1. > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something we > > want to consider for GCC 12? > > > > PR c++/96780 > > > > gcc/cp/ChangeLog: > > > > * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, > > fold calls to std::move/forward into simple casts. > > * cp-tree.h (is_std_move_p, is_std_forward_p): Declare. > > * typeck.cc (is_std_move_p, is_std_forward_p): Export. > > > > gcc/testsuite/ChangeLog: > > > > * g++.dg/opt/pr96780.C: New test. > > --- > > gcc/cp/cp-gimplify.cc | 18 ++++++++++++++++++ > > gcc/cp/cp-tree.h | 2 ++ > > gcc/cp/typeck.cc | 6 ++---- > > gcc/testsuite/g++.dg/opt/pr96780.C | 24 ++++++++++++++++++++++++ > > 4 files changed, 46 insertions(+), 4 deletions(-) > > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > > index d7323fb5c09..0b009b631c7 100644 > > --- a/gcc/cp/cp-gimplify.cc > > +++ b/gcc/cp/cp-gimplify.cc > > @@ -2756,6 +2756,24 @@ cp_fold (tree x) > > case CALL_EXPR: > > { > > + if (optimize > > I think this should check flag_no_inline rather than optimize. Sounds good. Here's a patch that extends the folding to as_const and addressof (as well as __addressof, which I'm kind of unsure about since it's non-standard). I suppose it also doesn't hurt to verify that the return and argument type of the function are sane before we commit to folding. -- >8 -- Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] A well-formed call to std::move/forward is equivalent to a cast, but the former being a function call means the compiler generates debug info for it, which persists even after the call has been inlined away, for an operation that's never interesting to debug. This patch addresses this problem in a relatively ad-hoc way by folding calls to std::move/forward and other cast-like functions into simple casts as part of the frontend's general expression folding routine. After this patch with -O2 and a non-checking compiler, debug info size for some testcases decreases by about ~10% and overall compile time and memory usage decreases by ~2%. PR c++/96780 gcc/cp/ChangeLog: * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, fold calls to std::move/forward and other cast-like functions into simple casts. gcc/testsuite/ChangeLog: * g++.dg/opt/pr96780.C: New test. --- gcc/cp/cp-gimplify.cc | 36 +++++++++++++++++++++++++++- gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++++++++++++++++++++++++++++++ 2 files changed, 73 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc index d7323fb5c09..efc4c8f0eb9 100644 --- a/gcc/cp/cp-gimplify.cc +++ b/gcc/cp/cp-gimplify.cc @@ -2756,9 +2756,43 @@ cp_fold (tree x) case CALL_EXPR: { - int sv = optimize, nw = sv; tree callee = get_callee_fndecl (x); + /* "Inline" calls to std::move/forward and other cast-like functions + by simply folding them into the corresponding cast determined by + their return type. This is cheaper than relying on the middle-end + to do so, and also means we avoid generating useless debug info for + them at all. + + At this point the argument has already been converted into a + reference, so it suffices to use a NOP_EXPR to express the + cast. */ + if (!flag_no_inline + && call_expr_nargs (x) == 1 + && decl_in_std_namespace_p (callee) + && DECL_NAME (callee) != NULL_TREE + && (id_equal (DECL_NAME (callee), "move") + || id_equal (DECL_NAME (callee), "forward") + || id_equal (DECL_NAME (callee), "addressof") + /* This addressof equivalent is used in libstdc++. */ + || id_equal (DECL_NAME (callee), "__addressof") + || id_equal (DECL_NAME (callee), "as_const"))) + { + r = CALL_EXPR_ARG (x, 0); + /* Check that the return and arguments types are sane before + folding. */ + if (INDIRECT_TYPE_P (TREE_TYPE (x)) + && INDIRECT_TYPE_P (TREE_TYPE (r))) + { + if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r))) + r = build_nop (TREE_TYPE (x), r); + x = cp_fold (r); + break; + } + } + + int sv = optimize, nw = sv; + /* Some built-in function calls will be evaluated at compile-time in fold (). Set optimize to 1 when folding __builtin_constant_p inside a constexpr function so that fold_builtin_1 doesn't fold it to 0. */ diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C b/gcc/testsuite/g++.dg/opt/pr96780.C new file mode 100644 index 00000000000..1a426b1328b --- /dev/null +++ b/gcc/testsuite/g++.dg/opt/pr96780.C @@ -0,0 +1,38 @@ +// PR c++/96780 +// Verify calls to std::move/forward are folded away by the frontend. +// { dg-do compile { target c++11 } } +// { dg-additional-options "-O -fdump-tree-gimple" } + +#include <utility> + +struct A; + +extern A& a; +extern const A& ca; + +void f() { + auto&& x1 = std::move(a); + auto&& x2 = std::forward<A>(a); + auto&& x3 = std::forward<A&>(a); + + auto&& x4 = std::move(ca); + auto&& x5 = std::forward<const A>(ca); + auto&& x6 = std::forward<const A&>(ca); + + auto x7 = std::addressof(a); + auto x8 = std::addressof(ca); +#if __GLIBCXX__ + auto x9 = std::__addressof(a); + auto x10 = std::__addressof(ca); +#endif +#if __cpp_lib_as_const + auto&& x11 = std::as_const(a); + auto&& x12 = std::as_const(ca); +#endif +} + +// { dg-final { scan-tree-dump-not "= std::move" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::forward" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }
On Thu, 10 Mar 2022 at 15:27, Patrick Palka wrote: > Here's a patch that extends the folding to as_const and addressof (as > well as __addressof, which I'm kind of unsure about since it's > non-standard). N.B. libstdc++ almost never uses std::addressof, because that calls std::__addressof, so we just use that directly to avoid the double indirection. I plan to change that in stage 1 and make std::addressof just call the built-in directly, so that it won't have the extra overhead. If they both get folded that wouldn't matter so much (it would still be useful for Clang, and would presumably make GCC compile ever so slightly faster).
On 3/10/22 11:27, Patrick Palka wrote: > On Wed, 9 Mar 2022, Jason Merrill wrote: > >> On 3/1/22 18:08, Patrick Palka wrote: >>> A well-formed call to std::move/forward is equivalent to a cast, but the >>> former being a function call means it comes with bloated debug info, which >>> persists even after the call has been inlined away, for an operation that >>> is never interesting to debug. >>> >>> This patch addresses this problem in a relatively ad-hoc way by folding >>> calls to std::move/forward into casts as part of the frontend's general >>> expression folding routine. After this patch with -O2 and a non-checking >>> compiler, debug info size for some testcases decreases by about ~10% and >>> overall compile time and memory usage decreases by ~2%. >> >> Impressive. Which testcases? > > I saw the largest percent reductions in debug file object size in > various tests from cmcstl2 and range-v3, e.g. > test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp > (which are among their biggest tests). > > Significant reductions in debug object file size can be observed in > some libstdc++ testcases too, such as a 5.5% reduction in > std/ranges/adaptor/join.cc > >> >> Do you also want to handle addressof and as_const in this patch, as Jonathan >> suggested? > > Yes, good idea. Since each of their argument and return types are > indirect types, I think we can use the same NOP_EXPR-based folding for > them. > >> >> I think we can do this now, and think about generalizing more in stage 1. >> >>> Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something we >>> want to consider for GCC 12? >>> >>> PR c++/96780 >>> >>> gcc/cp/ChangeLog: >>> >>> * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, >>> fold calls to std::move/forward into simple casts. >>> * cp-tree.h (is_std_move_p, is_std_forward_p): Declare. >>> * typeck.cc (is_std_move_p, is_std_forward_p): Export. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * g++.dg/opt/pr96780.C: New test. >>> --- >>> gcc/cp/cp-gimplify.cc | 18 ++++++++++++++++++ >>> gcc/cp/cp-tree.h | 2 ++ >>> gcc/cp/typeck.cc | 6 ++---- >>> gcc/testsuite/g++.dg/opt/pr96780.C | 24 ++++++++++++++++++++++++ >>> 4 files changed, 46 insertions(+), 4 deletions(-) >>> create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C >>> >>> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc >>> index d7323fb5c09..0b009b631c7 100644 >>> --- a/gcc/cp/cp-gimplify.cc >>> +++ b/gcc/cp/cp-gimplify.cc >>> @@ -2756,6 +2756,24 @@ cp_fold (tree x) >>> case CALL_EXPR: >>> { >>> + if (optimize >> >> I think this should check flag_no_inline rather than optimize. > > Sounds good. > > Here's a patch that extends the folding to as_const and addressof (as > well as __addressof, which I'm kind of unsure about since it's > non-standard). I suppose it also doesn't hurt to verify that the return > and argument type of the function are sane before we commit to folding. > > -- >8 -- > > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] > > A well-formed call to std::move/forward is equivalent to a cast, but the > former being a function call means the compiler generates debug info for > it, which persists even after the call has been inlined away, for an > operation that's never interesting to debug. > > This patch addresses this problem in a relatively ad-hoc way by folding > calls to std::move/forward and other cast-like functions into simple > casts as part of the frontend's general expression folding routine. > After this patch with -O2 and a non-checking compiler, debug info size > for some testcases decreases by about ~10% and overall compile time and > memory usage decreases by ~2%. > > PR c++/96780 > > gcc/cp/ChangeLog: > > * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, > fold calls to std::move/forward and other cast-like functions > into simple casts. > > gcc/testsuite/ChangeLog: > > * g++.dg/opt/pr96780.C: New test. > --- > gcc/cp/cp-gimplify.cc | 36 +++++++++++++++++++++++++++- > gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++++++++++++++++++++++++++++++ > 2 files changed, 73 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > index d7323fb5c09..efc4c8f0eb9 100644 > --- a/gcc/cp/cp-gimplify.cc > +++ b/gcc/cp/cp-gimplify.cc > @@ -2756,9 +2756,43 @@ cp_fold (tree x) > > case CALL_EXPR: > { > - int sv = optimize, nw = sv; > tree callee = get_callee_fndecl (x); > > + /* "Inline" calls to std::move/forward and other cast-like functions > + by simply folding them into the corresponding cast determined by > + their return type. This is cheaper than relying on the middle-end > + to do so, and also means we avoid generating useless debug info for > + them at all. > + > + At this point the argument has already been converted into a > + reference, so it suffices to use a NOP_EXPR to express the > + cast. */ > + if (!flag_no_inline In our conversation yesterday it occurred to me that we might make this a separate flag that defaults to the value of flag_no_inline; I was thinking of -ffold-simple-inlines. Then Vittorio et al can specify that explicitly at -O0 if they'd like. > + && call_expr_nargs (x) == 1 > + && decl_in_std_namespace_p (callee) > + && DECL_NAME (callee) != NULL_TREE > + && (id_equal (DECL_NAME (callee), "move") > + || id_equal (DECL_NAME (callee), "forward") > + || id_equal (DECL_NAME (callee), "addressof") > + /* This addressof equivalent is used in libstdc++. */ > + || id_equal (DECL_NAME (callee), "__addressof") > + || id_equal (DECL_NAME (callee), "as_const"))) > + { > + r = CALL_EXPR_ARG (x, 0); > + /* Check that the return and arguments types are sane before > + folding. */ > + if (INDIRECT_TYPE_P (TREE_TYPE (x)) > + && INDIRECT_TYPE_P (TREE_TYPE (r))) > + { > + if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r))) > + r = build_nop (TREE_TYPE (x), r); > + x = cp_fold (r); > + break; > + } > + } > + > + int sv = optimize, nw = sv; > + > /* Some built-in function calls will be evaluated at compile-time in > fold (). Set optimize to 1 when folding __builtin_constant_p inside > a constexpr function so that fold_builtin_1 doesn't fold it to 0. */ > diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C b/gcc/testsuite/g++.dg/opt/pr96780.C > new file mode 100644 > index 00000000000..1a426b1328b > --- /dev/null > +++ b/gcc/testsuite/g++.dg/opt/pr96780.C > @@ -0,0 +1,38 @@ > +// PR c++/96780 > +// Verify calls to std::move/forward are folded away by the frontend. > +// { dg-do compile { target c++11 } } > +// { dg-additional-options "-O -fdump-tree-gimple" } > + > +#include <utility> > + > +struct A; > + > +extern A& a; > +extern const A& ca; > + > +void f() { > + auto&& x1 = std::move(a); > + auto&& x2 = std::forward<A>(a); > + auto&& x3 = std::forward<A&>(a); > + > + auto&& x4 = std::move(ca); > + auto&& x5 = std::forward<const A>(ca); > + auto&& x6 = std::forward<const A&>(ca); > + > + auto x7 = std::addressof(a); > + auto x8 = std::addressof(ca); > +#if __GLIBCXX__ > + auto x9 = std::__addressof(a); > + auto x10 = std::__addressof(ca); > +#endif > +#if __cpp_lib_as_const > + auto&& x11 = std::as_const(a); > + auto&& x12 = std::as_const(ca); > +#endif > +} > + > +// { dg-final { scan-tree-dump-not "= std::move" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::forward" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }
On Fri, 11 Mar 2022, Jason Merrill wrote: > On 3/10/22 11:27, Patrick Palka wrote: > > On Wed, 9 Mar 2022, Jason Merrill wrote: > > > > > On 3/1/22 18:08, Patrick Palka wrote: > > > > A well-formed call to std::move/forward is equivalent to a cast, but the > > > > former being a function call means it comes with bloated debug info, > > > > which > > > > persists even after the call has been inlined away, for an operation > > > > that > > > > is never interesting to debug. > > > > > > > > This patch addresses this problem in a relatively ad-hoc way by folding > > > > calls to std::move/forward into casts as part of the frontend's general > > > > expression folding routine. After this patch with -O2 and a > > > > non-checking > > > > compiler, debug info size for some testcases decreases by about ~10% and > > > > overall compile time and memory usage decreases by ~2%. > > > > > > Impressive. Which testcases? > > > > I saw the largest percent reductions in debug file object size in > > various tests from cmcstl2 and range-v3, e.g. > > test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp > > (which are among their biggest tests). > > > > Significant reductions in debug object file size can be observed in > > some libstdc++ testcases too, such as a 5.5% reduction in > > std/ranges/adaptor/join.cc > > > > > > > > Do you also want to handle addressof and as_const in this patch, as > > > Jonathan > > > suggested? > > > > Yes, good idea. Since each of their argument and return types are > > indirect types, I think we can use the same NOP_EXPR-based folding for > > them. > > > > > > > > I think we can do this now, and think about generalizing more in stage 1. > > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something we > > > > want to consider for GCC 12? > > > > > > > > PR c++/96780 > > > > > > > > gcc/cp/ChangeLog: > > > > > > > > * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, > > > > fold calls to std::move/forward into simple casts. > > > > * cp-tree.h (is_std_move_p, is_std_forward_p): Declare. > > > > * typeck.cc (is_std_move_p, is_std_forward_p): Export. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * g++.dg/opt/pr96780.C: New test. > > > > --- > > > > gcc/cp/cp-gimplify.cc | 18 ++++++++++++++++++ > > > > gcc/cp/cp-tree.h | 2 ++ > > > > gcc/cp/typeck.cc | 6 ++---- > > > > gcc/testsuite/g++.dg/opt/pr96780.C | 24 ++++++++++++++++++++++++ > > > > 4 files changed, 46 insertions(+), 4 deletions(-) > > > > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C > > > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > > > > index d7323fb5c09..0b009b631c7 100644 > > > > --- a/gcc/cp/cp-gimplify.cc > > > > +++ b/gcc/cp/cp-gimplify.cc > > > > @@ -2756,6 +2756,24 @@ cp_fold (tree x) > > > > case CALL_EXPR: > > > > { > > > > + if (optimize > > > > > > I think this should check flag_no_inline rather than optimize. > > > > Sounds good. > > > > Here's a patch that extends the folding to as_const and addressof (as > > well as __addressof, which I'm kind of unsure about since it's > > non-standard). I suppose it also doesn't hurt to verify that the return > > and argument type of the function are sane before we commit to folding. > > > > -- >8 -- > > > > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] > > > > A well-formed call to std::move/forward is equivalent to a cast, but the > > former being a function call means the compiler generates debug info for > > it, which persists even after the call has been inlined away, for an > > operation that's never interesting to debug. > > > > This patch addresses this problem in a relatively ad-hoc way by folding > > calls to std::move/forward and other cast-like functions into simple > > casts as part of the frontend's general expression folding routine. > > After this patch with -O2 and a non-checking compiler, debug info size > > for some testcases decreases by about ~10% and overall compile time and > > memory usage decreases by ~2%. > > > > PR c++/96780 > > > > gcc/cp/ChangeLog: > > > > * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, > > fold calls to std::move/forward and other cast-like functions > > into simple casts. > > > > gcc/testsuite/ChangeLog: > > > > * g++.dg/opt/pr96780.C: New test. > > --- > > gcc/cp/cp-gimplify.cc | 36 +++++++++++++++++++++++++++- > > gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++++++++++++++++++++++++++++++ > > 2 files changed, 73 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > > index d7323fb5c09..efc4c8f0eb9 100644 > > --- a/gcc/cp/cp-gimplify.cc > > +++ b/gcc/cp/cp-gimplify.cc > > @@ -2756,9 +2756,43 @@ cp_fold (tree x) > > case CALL_EXPR: > > { > > - int sv = optimize, nw = sv; > > tree callee = get_callee_fndecl (x); > > + /* "Inline" calls to std::move/forward and other cast-like functions > > + by simply folding them into the corresponding cast determined by > > + their return type. This is cheaper than relying on the middle-end > > + to do so, and also means we avoid generating useless debug info for > > + them at all. > > + > > + At this point the argument has already been converted into a > > + reference, so it suffices to use a NOP_EXPR to express the > > + cast. */ > > + if (!flag_no_inline > > In our conversation yesterday it occurred to me that we might make this a > separate flag that defaults to the value of flag_no_inline; I was thinking of > -ffold-simple-inlines. Then Vittorio et al can specify that explicitly at -O0 > if they'd like. Makes sense, like so? Bootstrapped and regtested on x86_64-pc-linux-gnu. The patch defaults -ffold-simple-inlines according to the value of flag_no_inline at startup. IIUC this means that if the flag has been defaulted to set, then e.g. an optimize("O0") function attribute won't disable -ffold-simple-inlines for that function, since we only compute its default value once. I wonder if we therefore instead want to handle defaulting the flag when it's used, e.g. check (flag_fold_simple_inlines == -1 ? flag_no_inline : flag_fold_simple_inlines) instead of flag_fold_simple_inlines in cp_fold? -- >8 -- Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] A well-formed call to std::move/forward is equivalent to a cast, but the former being a function call means the compiler generates debug info for it, which persists even after the call has been inlined away, for an operation that's never interesting to debug. This patch addresses this problem by folding calls to std::move/forward and other cast-like functions into simple casts as part of the frontend's general expression folding routine. This behavior is controlled by a new flag -ffold-simple-inlines which defaults to the value of -fno-inline. After this patch with -O2 and a non-checking compiler, debug info size for some testcases (e.g. from range-v3 and cmcstl2) decreases by about ~10% and overall compile time and memory usage decreases by ~2%. PR c++/96780 gcc/c-family/ChangeLog: * c-opts.cc (c_common_post_options): Handle defaulting of flag_fold_simple_inlines. * c.opt: Add -ffold-simple-inlines. gcc/cp/ChangeLog: * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: Fold calls to std::move/forward and other cast-like functions into simple casts. gcc/testsuite/ChangeLog: * g++.dg/opt/pr96780.C: New test. --- gcc/c-family/c-opts.cc | 3 +++ gcc/c-family/c.opt | 4 ++++ gcc/cp/cp-gimplify.cc | 35 ++++++++++++++++++++++++++- gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++++++++++++++++++++++++++++++ 4 files changed, 79 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc index a341a061758..e8831d16c7b 100644 --- a/gcc/c-family/c-opts.cc +++ b/gcc/c-family/c-opts.cc @@ -1058,6 +1058,9 @@ c_common_post_options (const char **pfilename) if (flag_implicit_constexpr && cxx_dialect < cxx14) flag_implicit_constexpr = false; + if (flag_fold_simple_inlines == -1) + flag_fold_simple_inlines = !flag_no_inline; + /* Global sized deallocation is new in C++14. */ if (flag_sized_deallocation == -1) flag_sized_deallocation = (cxx_dialect >= cxx14); diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 9cfd2a6bc4e..9a2a597e587 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1731,6 +1731,10 @@ Support dynamic initialization of thread-local variables in a different translat fexternal-templates C++ ObjC++ WarnRemoved +ffold-simple-inlines +C++ ObjC++ Optimization Var(flag_fold_simple_inlines) Init(-1) +Fold calls to simple inline functions. + ffor-scope C++ ObjC++ WarnRemoved diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc index d7323fb5c09..b09fede2d75 100644 --- a/gcc/cp/cp-gimplify.cc +++ b/gcc/cp/cp-gimplify.cc @@ -2756,9 +2756,42 @@ cp_fold (tree x) case CALL_EXPR: { - int sv = optimize, nw = sv; tree callee = get_callee_fndecl (x); + /* "Inline" calls to std::move/forward and other cast-like functions + by simply folding them into a corresponding cast to their return + type. This is cheaper than relying on the middle-end to do so, and + also means we avoid generating useless debug info for them at all. + + At this point the argument has already been converted into a + reference, so it suffices to use a NOP_EXPR to express the + cast. */ + if (flag_fold_simple_inlines + && call_expr_nargs (x) == 1 + && decl_in_std_namespace_p (callee) + && DECL_NAME (callee) != NULL_TREE + && (id_equal (DECL_NAME (callee), "move") + || id_equal (DECL_NAME (callee), "forward") + || id_equal (DECL_NAME (callee), "addressof") + /* This addressof equivalent is used heavily in libstdc++. */ + || id_equal (DECL_NAME (callee), "__addressof") + || id_equal (DECL_NAME (callee), "as_const"))) + { + r = CALL_EXPR_ARG (x, 0); + /* Check that the return and arguments types are sane before + folding. */ + if (INDIRECT_TYPE_P (TREE_TYPE (x)) + && INDIRECT_TYPE_P (TREE_TYPE (r))) + { + if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r))) + r = build_nop (TREE_TYPE (x), r); + x = cp_fold (r); + break; + } + } + + int sv = optimize, nw = sv; + /* Some built-in function calls will be evaluated at compile-time in fold (). Set optimize to 1 when folding __builtin_constant_p inside a constexpr function so that fold_builtin_1 doesn't fold it to 0. */ diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C b/gcc/testsuite/g++.dg/opt/pr96780.C new file mode 100644 index 00000000000..61e11855eeb --- /dev/null +++ b/gcc/testsuite/g++.dg/opt/pr96780.C @@ -0,0 +1,38 @@ +// PR c++/96780 +// Verify calls to std::move/forward are folded away by the frontend. +// { dg-do compile { target c++11 } } +// { dg-additional-options "-ffold-simple-inlines -fdump-tree-gimple" } + +#include <utility> + +struct A; + +extern A& a; +extern const A& ca; + +void f() { + auto&& x1 = std::move(a); + auto&& x2 = std::forward<A>(a); + auto&& x3 = std::forward<A&>(a); + + auto&& x4 = std::move(ca); + auto&& x5 = std::forward<const A>(ca); + auto&& x6 = std::forward<const A&>(ca); + + auto x7 = std::addressof(a); + auto x8 = std::addressof(ca); +#if __GLIBCXX__ + auto x9 = std::__addressof(a); + auto x10 = std::__addressof(ca); +#endif +#if __cpp_lib_as_const + auto&& x11 = std::as_const(a); + auto&& x12 = std::as_const(ca); +#endif +} + +// { dg-final { scan-tree-dump-not "= std::move" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::forward" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }
On 3/14/22 13:13, Patrick Palka wrote: > On Fri, 11 Mar 2022, Jason Merrill wrote: > >> On 3/10/22 11:27, Patrick Palka wrote: >>> On Wed, 9 Mar 2022, Jason Merrill wrote: >>> >>>> On 3/1/22 18:08, Patrick Palka wrote: >>>>> A well-formed call to std::move/forward is equivalent to a cast, but the >>>>> former being a function call means it comes with bloated debug info, >>>>> which >>>>> persists even after the call has been inlined away, for an operation >>>>> that >>>>> is never interesting to debug. >>>>> >>>>> This patch addresses this problem in a relatively ad-hoc way by folding >>>>> calls to std::move/forward into casts as part of the frontend's general >>>>> expression folding routine. After this patch with -O2 and a >>>>> non-checking >>>>> compiler, debug info size for some testcases decreases by about ~10% and >>>>> overall compile time and memory usage decreases by ~2%. >>>> >>>> Impressive. Which testcases? >>> >>> I saw the largest percent reductions in debug file object size in >>> various tests from cmcstl2 and range-v3, e.g. >>> test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp >>> (which are among their biggest tests). >>> >>> Significant reductions in debug object file size can be observed in >>> some libstdc++ testcases too, such as a 5.5% reduction in >>> std/ranges/adaptor/join.cc >>> >>>> >>>> Do you also want to handle addressof and as_const in this patch, as >>>> Jonathan >>>> suggested? >>> >>> Yes, good idea. Since each of their argument and return types are >>> indirect types, I think we can use the same NOP_EXPR-based folding for >>> them. >>> >>>> >>>> I think we can do this now, and think about generalizing more in stage 1. >>>> >>>>> Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something we >>>>> want to consider for GCC 12? >>>>> >>>>> PR c++/96780 >>>>> >>>>> gcc/cp/ChangeLog: >>>>> >>>>> * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, >>>>> fold calls to std::move/forward into simple casts. >>>>> * cp-tree.h (is_std_move_p, is_std_forward_p): Declare. >>>>> * typeck.cc (is_std_move_p, is_std_forward_p): Export. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> * g++.dg/opt/pr96780.C: New test. >>>>> --- >>>>> gcc/cp/cp-gimplify.cc | 18 ++++++++++++++++++ >>>>> gcc/cp/cp-tree.h | 2 ++ >>>>> gcc/cp/typeck.cc | 6 ++---- >>>>> gcc/testsuite/g++.dg/opt/pr96780.C | 24 ++++++++++++++++++++++++ >>>>> 4 files changed, 46 insertions(+), 4 deletions(-) >>>>> create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C >>>>> >>>>> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc >>>>> index d7323fb5c09..0b009b631c7 100644 >>>>> --- a/gcc/cp/cp-gimplify.cc >>>>> +++ b/gcc/cp/cp-gimplify.cc >>>>> @@ -2756,6 +2756,24 @@ cp_fold (tree x) >>>>> case CALL_EXPR: >>>>> { >>>>> + if (optimize >>>> >>>> I think this should check flag_no_inline rather than optimize. >>> >>> Sounds good. >>> >>> Here's a patch that extends the folding to as_const and addressof (as >>> well as __addressof, which I'm kind of unsure about since it's >>> non-standard). I suppose it also doesn't hurt to verify that the return >>> and argument type of the function are sane before we commit to folding. >>> >>> -- >8 -- >>> >>> Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] >>> >>> A well-formed call to std::move/forward is equivalent to a cast, but the >>> former being a function call means the compiler generates debug info for >>> it, which persists even after the call has been inlined away, for an >>> operation that's never interesting to debug. >>> >>> This patch addresses this problem in a relatively ad-hoc way by folding >>> calls to std::move/forward and other cast-like functions into simple >>> casts as part of the frontend's general expression folding routine. >>> After this patch with -O2 and a non-checking compiler, debug info size >>> for some testcases decreases by about ~10% and overall compile time and >>> memory usage decreases by ~2%. >>> >>> PR c++/96780 >>> >>> gcc/cp/ChangeLog: >>> >>> * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, >>> fold calls to std::move/forward and other cast-like functions >>> into simple casts. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * g++.dg/opt/pr96780.C: New test. >>> --- >>> gcc/cp/cp-gimplify.cc | 36 +++++++++++++++++++++++++++- >>> gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++++++++++++++++++++++++++++++ >>> 2 files changed, 73 insertions(+), 1 deletion(-) >>> create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C >>> >>> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc >>> index d7323fb5c09..efc4c8f0eb9 100644 >>> --- a/gcc/cp/cp-gimplify.cc >>> +++ b/gcc/cp/cp-gimplify.cc >>> @@ -2756,9 +2756,43 @@ cp_fold (tree x) >>> case CALL_EXPR: >>> { >>> - int sv = optimize, nw = sv; >>> tree callee = get_callee_fndecl (x); >>> + /* "Inline" calls to std::move/forward and other cast-like functions >>> + by simply folding them into the corresponding cast determined by >>> + their return type. This is cheaper than relying on the middle-end >>> + to do so, and also means we avoid generating useless debug info for >>> + them at all. >>> + >>> + At this point the argument has already been converted into a >>> + reference, so it suffices to use a NOP_EXPR to express the >>> + cast. */ >>> + if (!flag_no_inline >> >> In our conversation yesterday it occurred to me that we might make this a >> separate flag that defaults to the value of flag_no_inline; I was thinking of >> -ffold-simple-inlines. Then Vittorio et al can specify that explicitly at -O0 >> if they'd like. > > Makes sense, like so? Bootstrapped and regtested on x86_64-pc-linux-gnu. > > The patch defaults -ffold-simple-inlines according to the value of > flag_no_inline at startup. IIUC this means that if the flag has been > defaulted to set, then e.g. an optimize("O0") function attribute won't > disable -ffold-simple-inlines for that function, since we only compute > its default value once. > > I wonder if we therefore instead want to handle defaulting the flag > when it's used, e.g. check > > (flag_fold_simple_inlines == -1 > ? flag_no_inline > : flag_fold_simple_inlines) > > instead of > > flag_fold_simple_inlines > > in cp_fold? I guess that makes sense, we can't add front-end options to the default_options_table. But I think let's use OPTION_SET_P instead of checking for -1. > -- >8 -- > > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] > > A well-formed call to std::move/forward is equivalent to a cast, but the > former being a function call means the compiler generates debug info for > it, which persists even after the call has been inlined away, for an > operation that's never interesting to debug. > > This patch addresses this problem by folding calls to std::move/forward > and other cast-like functions into simple casts as part of the frontend's > general expression folding routine. This behavior is controlled by a > new flag -ffold-simple-inlines which defaults to the value of -fno-inline. > > After this patch with -O2 and a non-checking compiler, debug info size > for some testcases (e.g. from range-v3 and cmcstl2) decreases by about > ~10% and overall compile time and memory usage decreases by ~2%. Did you compare the reduction after handling more functions? > PR c++/96780 > > gcc/c-family/ChangeLog: > > * c-opts.cc (c_common_post_options): Handle defaulting of > flag_fold_simple_inlines. > * c.opt: Add -ffold-simple-inlines. > > gcc/cp/ChangeLog: > > * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: Fold calls to > std::move/forward and other cast-like functions into simple > casts. > > gcc/testsuite/ChangeLog: > > * g++.dg/opt/pr96780.C: New test. > --- > gcc/c-family/c-opts.cc | 3 +++ > gcc/c-family/c.opt | 4 ++++ > gcc/cp/cp-gimplify.cc | 35 ++++++++++++++++++++++++++- > gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++++++++++++++++++++++++++++++ > 4 files changed, 79 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C > > diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc > index a341a061758..e8831d16c7b 100644 > --- a/gcc/c-family/c-opts.cc > +++ b/gcc/c-family/c-opts.cc > @@ -1058,6 +1058,9 @@ c_common_post_options (const char **pfilename) > if (flag_implicit_constexpr && cxx_dialect < cxx14) > flag_implicit_constexpr = false; > > + if (flag_fold_simple_inlines == -1) > + flag_fold_simple_inlines = !flag_no_inline; > + > /* Global sized deallocation is new in C++14. */ > if (flag_sized_deallocation == -1) > flag_sized_deallocation = (cxx_dialect >= cxx14); > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt > index 9cfd2a6bc4e..9a2a597e587 100644 > --- a/gcc/c-family/c.opt > +++ b/gcc/c-family/c.opt > @@ -1731,6 +1731,10 @@ Support dynamic initialization of thread-local variables in a different translat > fexternal-templates > C++ ObjC++ WarnRemoved > > +ffold-simple-inlines > +C++ ObjC++ Optimization Var(flag_fold_simple_inlines) Init(-1) > +Fold calls to simple inline functions. > + > ffor-scope > C++ ObjC++ WarnRemoved > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > index d7323fb5c09..b09fede2d75 100644 > --- a/gcc/cp/cp-gimplify.cc > +++ b/gcc/cp/cp-gimplify.cc > @@ -2756,9 +2756,42 @@ cp_fold (tree x) > > case CALL_EXPR: > { > - int sv = optimize, nw = sv; > tree callee = get_callee_fndecl (x); > > + /* "Inline" calls to std::move/forward and other cast-like functions > + by simply folding them into a corresponding cast to their return > + type. This is cheaper than relying on the middle-end to do so, and > + also means we avoid generating useless debug info for them at all. > + > + At this point the argument has already been converted into a > + reference, so it suffices to use a NOP_EXPR to express the > + cast. */ > + if (flag_fold_simple_inlines > + && call_expr_nargs (x) == 1 > + && decl_in_std_namespace_p (callee) > + && DECL_NAME (callee) != NULL_TREE > + && (id_equal (DECL_NAME (callee), "move") > + || id_equal (DECL_NAME (callee), "forward") > + || id_equal (DECL_NAME (callee), "addressof") > + /* This addressof equivalent is used heavily in libstdc++. */ > + || id_equal (DECL_NAME (callee), "__addressof") > + || id_equal (DECL_NAME (callee), "as_const"))) > + { > + r = CALL_EXPR_ARG (x, 0); > + /* Check that the return and arguments types are sane before > + folding. */ > + if (INDIRECT_TYPE_P (TREE_TYPE (x)) > + && INDIRECT_TYPE_P (TREE_TYPE (r))) > + { > + if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r))) > + r = build_nop (TREE_TYPE (x), r); > + x = cp_fold (r); > + break; > + } > + } > + > + int sv = optimize, nw = sv; > + > /* Some built-in function calls will be evaluated at compile-time in > fold (). Set optimize to 1 when folding __builtin_constant_p inside > a constexpr function so that fold_builtin_1 doesn't fold it to 0. */ > diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C b/gcc/testsuite/g++.dg/opt/pr96780.C > new file mode 100644 > index 00000000000..61e11855eeb > --- /dev/null > +++ b/gcc/testsuite/g++.dg/opt/pr96780.C > @@ -0,0 +1,38 @@ > +// PR c++/96780 > +// Verify calls to std::move/forward are folded away by the frontend. > +// { dg-do compile { target c++11 } } > +// { dg-additional-options "-ffold-simple-inlines -fdump-tree-gimple" } > + > +#include <utility> > + > +struct A; > + > +extern A& a; > +extern const A& ca; > + > +void f() { > + auto&& x1 = std::move(a); > + auto&& x2 = std::forward<A>(a); > + auto&& x3 = std::forward<A&>(a); > + > + auto&& x4 = std::move(ca); > + auto&& x5 = std::forward<const A>(ca); > + auto&& x6 = std::forward<const A&>(ca); > + > + auto x7 = std::addressof(a); > + auto x8 = std::addressof(ca); > +#if __GLIBCXX__ > + auto x9 = std::__addressof(a); > + auto x10 = std::__addressof(ca); > +#endif > +#if __cpp_lib_as_const > + auto&& x11 = std::as_const(a); > + auto&& x12 = std::as_const(ca); > +#endif > +} > + > +// { dg-final { scan-tree-dump-not "= std::move" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::forward" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }
On Mon, 14 Mar 2022, Jason Merrill wrote: > On 3/14/22 13:13, Patrick Palka wrote: > > On Fri, 11 Mar 2022, Jason Merrill wrote: > > > > > On 3/10/22 11:27, Patrick Palka wrote: > > > > On Wed, 9 Mar 2022, Jason Merrill wrote: > > > > > > > > > On 3/1/22 18:08, Patrick Palka wrote: > > > > > > A well-formed call to std::move/forward is equivalent to a cast, but > > > > > > the > > > > > > former being a function call means it comes with bloated debug info, > > > > > > which > > > > > > persists even after the call has been inlined away, for an operation > > > > > > that > > > > > > is never interesting to debug. > > > > > > > > > > > > This patch addresses this problem in a relatively ad-hoc way by > > > > > > folding > > > > > > calls to std::move/forward into casts as part of the frontend's > > > > > > general > > > > > > expression folding routine. After this patch with -O2 and a > > > > > > non-checking > > > > > > compiler, debug info size for some testcases decreases by about ~10% > > > > > > and > > > > > > overall compile time and memory usage decreases by ~2%. > > > > > > > > > > Impressive. Which testcases? > > > > > > > > I saw the largest percent reductions in debug file object size in > > > > various tests from cmcstl2 and range-v3, e.g. > > > > test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp > > > > (which are among their biggest tests). > > > > > > > > Significant reductions in debug object file size can be observed in > > > > some libstdc++ testcases too, such as a 5.5% reduction in > > > > std/ranges/adaptor/join.cc > > > > > > > > > > > > > > Do you also want to handle addressof and as_const in this patch, as > > > > > Jonathan > > > > > suggested? > > > > > > > > Yes, good idea. Since each of their argument and return types are > > > > indirect types, I think we can use the same NOP_EXPR-based folding for > > > > them. > > > > > > > > > > > > > > I think we can do this now, and think about generalizing more in stage > > > > > 1. > > > > > > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something > > > > > > we > > > > > > want to consider for GCC 12? > > > > > > > > > > > > PR c++/96780 > > > > > > > > > > > > gcc/cp/ChangeLog: > > > > > > > > > > > > * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, > > > > > > fold calls to std::move/forward into simple casts. > > > > > > * cp-tree.h (is_std_move_p, is_std_forward_p): Declare. > > > > > > * typeck.cc (is_std_move_p, is_std_forward_p): Export. > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > * g++.dg/opt/pr96780.C: New test. > > > > > > --- > > > > > > gcc/cp/cp-gimplify.cc | 18 ++++++++++++++++++ > > > > > > gcc/cp/cp-tree.h | 2 ++ > > > > > > gcc/cp/typeck.cc | 6 ++---- > > > > > > gcc/testsuite/g++.dg/opt/pr96780.C | 24 ++++++++++++++++++++++++ > > > > > > 4 files changed, 46 insertions(+), 4 deletions(-) > > > > > > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C > > > > > > > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > > > > > > index d7323fb5c09..0b009b631c7 100644 > > > > > > --- a/gcc/cp/cp-gimplify.cc > > > > > > +++ b/gcc/cp/cp-gimplify.cc > > > > > > @@ -2756,6 +2756,24 @@ cp_fold (tree x) > > > > > > case CALL_EXPR: > > > > > > { > > > > > > + if (optimize > > > > > > > > > > I think this should check flag_no_inline rather than optimize. > > > > > > > > Sounds good. > > > > > > > > Here's a patch that extends the folding to as_const and addressof (as > > > > well as __addressof, which I'm kind of unsure about since it's > > > > non-standard). I suppose it also doesn't hurt to verify that the return > > > > and argument type of the function are sane before we commit to folding. > > > > > > > > -- >8 -- > > > > > > > > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] > > > > > > > > A well-formed call to std::move/forward is equivalent to a cast, but the > > > > former being a function call means the compiler generates debug info for > > > > it, which persists even after the call has been inlined away, for an > > > > operation that's never interesting to debug. > > > > > > > > This patch addresses this problem in a relatively ad-hoc way by folding > > > > calls to std::move/forward and other cast-like functions into simple > > > > casts as part of the frontend's general expression folding routine. > > > > After this patch with -O2 and a non-checking compiler, debug info size > > > > for some testcases decreases by about ~10% and overall compile time and > > > > memory usage decreases by ~2%. > > > > > > > > PR c++/96780 > > > > > > > > gcc/cp/ChangeLog: > > > > > > > > * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, > > > > fold calls to std::move/forward and other cast-like functions > > > > into simple casts. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * g++.dg/opt/pr96780.C: New test. > > > > --- > > > > gcc/cp/cp-gimplify.cc | 36 +++++++++++++++++++++++++++- > > > > gcc/testsuite/g++.dg/opt/pr96780.C | 38 > > > > ++++++++++++++++++++++++++++++ > > > > 2 files changed, 73 insertions(+), 1 deletion(-) > > > > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C > > > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > > > > index d7323fb5c09..efc4c8f0eb9 100644 > > > > --- a/gcc/cp/cp-gimplify.cc > > > > +++ b/gcc/cp/cp-gimplify.cc > > > > @@ -2756,9 +2756,43 @@ cp_fold (tree x) > > > > case CALL_EXPR: > > > > { > > > > - int sv = optimize, nw = sv; > > > > tree callee = get_callee_fndecl (x); > > > > + /* "Inline" calls to std::move/forward and other cast-like > > > > functions > > > > + by simply folding them into the corresponding cast determined by > > > > + their return type. This is cheaper than relying on the middle-end > > > > + to do so, and also means we avoid generating useless debug info for > > > > + them at all. > > > > + > > > > + At this point the argument has already been converted into a > > > > + reference, so it suffices to use a NOP_EXPR to express the > > > > + cast. */ > > > > + if (!flag_no_inline > > > > > > In our conversation yesterday it occurred to me that we might make this a > > > separate flag that defaults to the value of flag_no_inline; I was thinking > > > of > > > -ffold-simple-inlines. Then Vittorio et al can specify that explicitly at > > > -O0 > > > if they'd like. > > > > Makes sense, like so? Bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > The patch defaults -ffold-simple-inlines according to the value of > > flag_no_inline at startup. IIUC this means that if the flag has been > > defaulted to set, then e.g. an optimize("O0") function attribute won't > > disable -ffold-simple-inlines for that function, since we only compute > > its default value once. > > > > I wonder if we therefore instead want to handle defaulting the flag > > when it's used, e.g. check > > > > (flag_fold_simple_inlines == -1 > > ? flag_no_inline > > : flag_fold_simple_inlines) > > > > instead of > > > > flag_fold_simple_inlines > > > > in cp_fold? > > I guess that makes sense, we can't add front-end options to the > default_options_table. But I think let's use OPTION_SET_P instead of checking > for -1. Done. > > > -- >8 -- > > > > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] > > > > A well-formed call to std::move/forward is equivalent to a cast, but the > > former being a function call means the compiler generates debug info for > > it, which persists even after the call has been inlined away, for an > > operation that's never interesting to debug. > > > > This patch addresses this problem by folding calls to std::move/forward > > and other cast-like functions into simple casts as part of the frontend's > > general expression folding routine. This behavior is controlled by a > > new flag -ffold-simple-inlines which defaults to the value of -fno-inline. > > > > After this patch with -O2 and a non-checking compiler, debug info size > > for some testcases (e.g. from range-v3 and cmcstl2) decreases by about > > ~10% and overall compile time and memory usage decreases by ~2%. > > Did you compare the reduction after handling more functions? The numbers are roughly the same, which I guess is not too surprising since calls to std::move/forward outnumber the other functions by about 10:1 in libstdc++, range-v3 and cmcstl2. The biggest reduction in debug object file size (measured by du) I've observed is 14% with range-v3's test/algorithm/stable_partition.cpp. The biggest reduction in peak memory usage is (measured by /usr/bin/time -v) is 5% with cmcstl's test/algorithm/set_symmetric_difference4.cpp. The biggest reduction in compile time (measured by perf stat) is about 3%, also from that testcase. -- >8 -- Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] A well-formed call to std::move/forward is equivalent to a cast, but the former being a function call means the compiler generates debug info for it, which persists even after the call has been inlined away, for an operation that's never interesting to debug. This patch addresses this problem by folding calls to std::move/forward and other cast-like functions into simple casts as part of the frontend's general expression folding routine. This behavior is controlled by a new flag -ffold-simple-inlines, and otherwise by -fno-inline, so that users can enable such folding even with -O0 (which implies -fno-inline). After this patch with -O2 and a non-checking compiler, debug info size for some testcases (e.g. from range-v3 and cmcstl2) decreases by about ~10% and overall compile time and memory usage decreases by ~2%. PR c++/96780 gcc/c-family/ChangeLog: * c-opts.cc (c_common_post_options): Handle defaulting of flag_fold_simple_inlines. * c.opt: Add -ffold-simple-inlines. gcc/cp/ChangeLog: * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: Fold calls to std::move/forward and other cast-like functions into simple casts. gcc/testsuite/ChangeLog: * g++.dg/opt/pr96780.C: New test. --- gcc/c-family/c.opt | 4 ++++ gcc/cp/cp-gimplify.cc | 38 +++++++++++++++++++++++++++++- gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++++++++++++++++++++++++++++++ 3 files changed, 79 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 9cfd2a6bc4e..9a4828ebe37 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1731,6 +1731,10 @@ Support dynamic initialization of thread-local variables in a different translat fexternal-templates C++ ObjC++ WarnRemoved +ffold-simple-inlines +C++ ObjC++ Optimization Var(flag_fold_simple_inlines) +Fold calls to simple inline functions. + ffor-scope C++ ObjC++ WarnRemoved diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc index d7323fb5c09..41daaf18725 100644 --- a/gcc/cp/cp-gimplify.cc +++ b/gcc/cp/cp-gimplify.cc @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. If not see #include "file-prefix-map.h" #include "cgraph.h" #include "omp-general.h" +#include "opts.h" /* Forward declarations. */ @@ -2756,9 +2757,44 @@ cp_fold (tree x) case CALL_EXPR: { - int sv = optimize, nw = sv; tree callee = get_callee_fndecl (x); + /* "Inline" calls to std::move/forward and other cast-like functions + by simply folding them into a corresponding cast to their return + type. This is cheaper than relying on the middle-end to do so, and + also means we avoid generating useless debug info for them at all. + + At this point the argument has already been converted into a + reference, so it suffices to use a NOP_EXPR to express the + cast. */ + if ((OPTION_SET_P (flag_fold_simple_inlines) + ? flag_fold_simple_inlines + : !flag_no_inline) + && call_expr_nargs (x) == 1 + && decl_in_std_namespace_p (callee) + && DECL_NAME (callee) != NULL_TREE + && (id_equal (DECL_NAME (callee), "move") + || id_equal (DECL_NAME (callee), "forward") + || id_equal (DECL_NAME (callee), "addressof") + /* This addressof equivalent is used heavily in libstdc++. */ + || id_equal (DECL_NAME (callee), "__addressof") + || id_equal (DECL_NAME (callee), "as_const"))) + { + r = CALL_EXPR_ARG (x, 0); + /* Check that the return and arguments types are sane before + folding. */ + if (INDIRECT_TYPE_P (TREE_TYPE (x)) + && INDIRECT_TYPE_P (TREE_TYPE (r))) + { + if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r))) + r = build_nop (TREE_TYPE (x), r); + x = cp_fold (r); + break; + } + } + + int sv = optimize, nw = sv; + /* Some built-in function calls will be evaluated at compile-time in fold (). Set optimize to 1 when folding __builtin_constant_p inside a constexpr function so that fold_builtin_1 doesn't fold it to 0. */ diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C b/gcc/testsuite/g++.dg/opt/pr96780.C new file mode 100644 index 00000000000..61e11855eeb --- /dev/null +++ b/gcc/testsuite/g++.dg/opt/pr96780.C @@ -0,0 +1,38 @@ +// PR c++/96780 +// Verify calls to std::move/forward are folded away by the frontend. +// { dg-do compile { target c++11 } } +// { dg-additional-options "-ffold-simple-inlines -fdump-tree-gimple" } + +#include <utility> + +struct A; + +extern A& a; +extern const A& ca; + +void f() { + auto&& x1 = std::move(a); + auto&& x2 = std::forward<A>(a); + auto&& x3 = std::forward<A&>(a); + + auto&& x4 = std::move(ca); + auto&& x5 = std::forward<const A>(ca); + auto&& x6 = std::forward<const A&>(ca); + + auto x7 = std::addressof(a); + auto x8 = std::addressof(ca); +#if __GLIBCXX__ + auto x9 = std::__addressof(a); + auto x10 = std::__addressof(ca); +#endif +#if __cpp_lib_as_const + auto&& x11 = std::as_const(a); + auto&& x12 = std::as_const(ca); +#endif +} + +// { dg-final { scan-tree-dump-not "= std::move" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::forward" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }
On 3/15/22 10:03, Patrick Palka wrote: > On Mon, 14 Mar 2022, Jason Merrill wrote: > >> On 3/14/22 13:13, Patrick Palka wrote: >>> On Fri, 11 Mar 2022, Jason Merrill wrote: >>> >>>> On 3/10/22 11:27, Patrick Palka wrote: >>>>> On Wed, 9 Mar 2022, Jason Merrill wrote: >>>>> >>>>>> On 3/1/22 18:08, Patrick Palka wrote: >>>>>>> A well-formed call to std::move/forward is equivalent to a cast, but >>>>>>> the >>>>>>> former being a function call means it comes with bloated debug info, >>>>>>> which >>>>>>> persists even after the call has been inlined away, for an operation >>>>>>> that >>>>>>> is never interesting to debug. >>>>>>> >>>>>>> This patch addresses this problem in a relatively ad-hoc way by >>>>>>> folding >>>>>>> calls to std::move/forward into casts as part of the frontend's >>>>>>> general >>>>>>> expression folding routine. After this patch with -O2 and a >>>>>>> non-checking >>>>>>> compiler, debug info size for some testcases decreases by about ~10% >>>>>>> and >>>>>>> overall compile time and memory usage decreases by ~2%. >>>>>> >>>>>> Impressive. Which testcases? >>>>> >>>>> I saw the largest percent reductions in debug file object size in >>>>> various tests from cmcstl2 and range-v3, e.g. >>>>> test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp >>>>> (which are among their biggest tests). >>>>> >>>>> Significant reductions in debug object file size can be observed in >>>>> some libstdc++ testcases too, such as a 5.5% reduction in >>>>> std/ranges/adaptor/join.cc >>>>> >>>>>> >>>>>> Do you also want to handle addressof and as_const in this patch, as >>>>>> Jonathan >>>>>> suggested? >>>>> >>>>> Yes, good idea. Since each of their argument and return types are >>>>> indirect types, I think we can use the same NOP_EXPR-based folding for >>>>> them. >>>>> >>>>>> >>>>>> I think we can do this now, and think about generalizing more in stage >>>>>> 1. >>>>>> >>>>>>> Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something >>>>>>> we >>>>>>> want to consider for GCC 12? >>>>>>> >>>>>>> PR c++/96780 >>>>>>> >>>>>>> gcc/cp/ChangeLog: >>>>>>> >>>>>>> * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, >>>>>>> fold calls to std::move/forward into simple casts. >>>>>>> * cp-tree.h (is_std_move_p, is_std_forward_p): Declare. >>>>>>> * typeck.cc (is_std_move_p, is_std_forward_p): Export. >>>>>>> >>>>>>> gcc/testsuite/ChangeLog: >>>>>>> >>>>>>> * g++.dg/opt/pr96780.C: New test. >>>>>>> --- >>>>>>> gcc/cp/cp-gimplify.cc | 18 ++++++++++++++++++ >>>>>>> gcc/cp/cp-tree.h | 2 ++ >>>>>>> gcc/cp/typeck.cc | 6 ++---- >>>>>>> gcc/testsuite/g++.dg/opt/pr96780.C | 24 ++++++++++++++++++++++++ >>>>>>> 4 files changed, 46 insertions(+), 4 deletions(-) >>>>>>> create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C >>>>>>> >>>>>>> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc >>>>>>> index d7323fb5c09..0b009b631c7 100644 >>>>>>> --- a/gcc/cp/cp-gimplify.cc >>>>>>> +++ b/gcc/cp/cp-gimplify.cc >>>>>>> @@ -2756,6 +2756,24 @@ cp_fold (tree x) >>>>>>> case CALL_EXPR: >>>>>>> { >>>>>>> + if (optimize >>>>>> >>>>>> I think this should check flag_no_inline rather than optimize. >>>>> >>>>> Sounds good. >>>>> >>>>> Here's a patch that extends the folding to as_const and addressof (as >>>>> well as __addressof, which I'm kind of unsure about since it's >>>>> non-standard). I suppose it also doesn't hurt to verify that the return >>>>> and argument type of the function are sane before we commit to folding. >>>>> >>>>> -- >8 -- >>>>> >>>>> Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] >>>>> >>>>> A well-formed call to std::move/forward is equivalent to a cast, but the >>>>> former being a function call means the compiler generates debug info for >>>>> it, which persists even after the call has been inlined away, for an >>>>> operation that's never interesting to debug. >>>>> >>>>> This patch addresses this problem in a relatively ad-hoc way by folding >>>>> calls to std::move/forward and other cast-like functions into simple >>>>> casts as part of the frontend's general expression folding routine. >>>>> After this patch with -O2 and a non-checking compiler, debug info size >>>>> for some testcases decreases by about ~10% and overall compile time and >>>>> memory usage decreases by ~2%. >>>>> >>>>> PR c++/96780 >>>>> >>>>> gcc/cp/ChangeLog: >>>>> >>>>> * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, >>>>> fold calls to std::move/forward and other cast-like functions >>>>> into simple casts. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> * g++.dg/opt/pr96780.C: New test. >>>>> --- >>>>> gcc/cp/cp-gimplify.cc | 36 +++++++++++++++++++++++++++- >>>>> gcc/testsuite/g++.dg/opt/pr96780.C | 38 >>>>> ++++++++++++++++++++++++++++++ >>>>> 2 files changed, 73 insertions(+), 1 deletion(-) >>>>> create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C >>>>> >>>>> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc >>>>> index d7323fb5c09..efc4c8f0eb9 100644 >>>>> --- a/gcc/cp/cp-gimplify.cc >>>>> +++ b/gcc/cp/cp-gimplify.cc >>>>> @@ -2756,9 +2756,43 @@ cp_fold (tree x) >>>>> case CALL_EXPR: >>>>> { >>>>> - int sv = optimize, nw = sv; >>>>> tree callee = get_callee_fndecl (x); >>>>> + /* "Inline" calls to std::move/forward and other cast-like >>>>> functions >>>>> + by simply folding them into the corresponding cast determined by >>>>> + their return type. This is cheaper than relying on the middle-end >>>>> + to do so, and also means we avoid generating useless debug info for >>>>> + them at all. >>>>> + >>>>> + At this point the argument has already been converted into a >>>>> + reference, so it suffices to use a NOP_EXPR to express the >>>>> + cast. */ >>>>> + if (!flag_no_inline >>>> >>>> In our conversation yesterday it occurred to me that we might make this a >>>> separate flag that defaults to the value of flag_no_inline; I was thinking >>>> of >>>> -ffold-simple-inlines. Then Vittorio et al can specify that explicitly at >>>> -O0 >>>> if they'd like. >>> >>> Makes sense, like so? Bootstrapped and regtested on x86_64-pc-linux-gnu. >>> >>> The patch defaults -ffold-simple-inlines according to the value of >>> flag_no_inline at startup. IIUC this means that if the flag has been >>> defaulted to set, then e.g. an optimize("O0") function attribute won't >>> disable -ffold-simple-inlines for that function, since we only compute >>> its default value once. >>> >>> I wonder if we therefore instead want to handle defaulting the flag >>> when it's used, e.g. check >>> >>> (flag_fold_simple_inlines == -1 >>> ? flag_no_inline >>> : flag_fold_simple_inlines) >>> >>> instead of >>> >>> flag_fold_simple_inlines >>> >>> in cp_fold? >> >> I guess that makes sense, we can't add front-end options to the >> default_options_table. But I think let's use OPTION_SET_P instead of checking >> for -1. > > Done. > >> >>> -- >8 -- >>> >>> Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] >>> >>> A well-formed call to std::move/forward is equivalent to a cast, but the >>> former being a function call means the compiler generates debug info for >>> it, which persists even after the call has been inlined away, for an >>> operation that's never interesting to debug. >>> >>> This patch addresses this problem by folding calls to std::move/forward >>> and other cast-like functions into simple casts as part of the frontend's >>> general expression folding routine. This behavior is controlled by a >>> new flag -ffold-simple-inlines which defaults to the value of -fno-inline. >>> >>> After this patch with -O2 and a non-checking compiler, debug info size >>> for some testcases (e.g. from range-v3 and cmcstl2) decreases by about >>> ~10% and overall compile time and memory usage decreases by ~2%. >> >> Did you compare the reduction after handling more functions? > > The numbers are roughly the same, which I guess is not too surprising > since calls to std::move/forward outnumber the other functions by about > 10:1 in libstdc++, range-v3 and cmcstl2. > > The biggest reduction in debug object file size (measured by du) I've > observed is 14% with range-v3's test/algorithm/stable_partition.cpp. > The biggest reduction in peak memory usage is (measured by /usr/bin/time -v) > is 5% with cmcstl's test/algorithm/set_symmetric_difference4.cpp. The > biggest reduction in compile time (measured by perf stat) is about 3%, > also from that testcase. > > -- >8 -- > > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] > > A well-formed call to std::move/forward is equivalent to a cast, but the > former being a function call means the compiler generates debug info for > it, which persists even after the call has been inlined away, for an > operation that's never interesting to debug. > > This patch addresses this problem by folding calls to std::move/forward > and other cast-like functions into simple casts as part of the frontend's > general expression folding routine. This behavior is controlled by a > new flag -ffold-simple-inlines, and otherwise by -fno-inline, so that > users can enable such folding even with -O0 (which implies -fno-inline). > > After this patch with -O2 and a non-checking compiler, debug info size > for some testcases (e.g. from range-v3 and cmcstl2) decreases by about > ~10% and overall compile time and memory usage decreases by ~2%. > > PR c++/96780 > > gcc/c-family/ChangeLog: > > * c-opts.cc (c_common_post_options): Handle defaulting of > flag_fold_simple_inlines. > * c.opt: Add -ffold-simple-inlines. Looks like you still need a doc/invoke.texi change for the new flag. The rest of the patch looks good. > gcc/cp/ChangeLog: > > * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: Fold calls to > std::move/forward and other cast-like functions into simple > casts. > > gcc/testsuite/ChangeLog: > > * g++.dg/opt/pr96780.C: New test. > --- > gcc/c-family/c.opt | 4 ++++ > gcc/cp/cp-gimplify.cc | 38 +++++++++++++++++++++++++++++- > gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++++++++++++++++++++++++++++++ > 3 files changed, 79 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C > > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt > index 9cfd2a6bc4e..9a4828ebe37 100644 > --- a/gcc/c-family/c.opt > +++ b/gcc/c-family/c.opt > @@ -1731,6 +1731,10 @@ Support dynamic initialization of thread-local variables in a different translat > fexternal-templates > C++ ObjC++ WarnRemoved > > +ffold-simple-inlines > +C++ ObjC++ Optimization Var(flag_fold_simple_inlines) > +Fold calls to simple inline functions. > + > ffor-scope > C++ ObjC++ WarnRemoved > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > index d7323fb5c09..41daaf18725 100644 > --- a/gcc/cp/cp-gimplify.cc > +++ b/gcc/cp/cp-gimplify.cc > @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. If not see > #include "file-prefix-map.h" > #include "cgraph.h" > #include "omp-general.h" > +#include "opts.h" > > /* Forward declarations. */ > > @@ -2756,9 +2757,44 @@ cp_fold (tree x) > > case CALL_EXPR: > { > - int sv = optimize, nw = sv; > tree callee = get_callee_fndecl (x); > > + /* "Inline" calls to std::move/forward and other cast-like functions > + by simply folding them into a corresponding cast to their return > + type. This is cheaper than relying on the middle-end to do so, and > + also means we avoid generating useless debug info for them at all. > + > + At this point the argument has already been converted into a > + reference, so it suffices to use a NOP_EXPR to express the > + cast. */ > + if ((OPTION_SET_P (flag_fold_simple_inlines) > + ? flag_fold_simple_inlines > + : !flag_no_inline) > + && call_expr_nargs (x) == 1 > + && decl_in_std_namespace_p (callee) > + && DECL_NAME (callee) != NULL_TREE > + && (id_equal (DECL_NAME (callee), "move") > + || id_equal (DECL_NAME (callee), "forward") > + || id_equal (DECL_NAME (callee), "addressof") > + /* This addressof equivalent is used heavily in libstdc++. */ > + || id_equal (DECL_NAME (callee), "__addressof") > + || id_equal (DECL_NAME (callee), "as_const"))) > + { > + r = CALL_EXPR_ARG (x, 0); > + /* Check that the return and arguments types are sane before > + folding. */ > + if (INDIRECT_TYPE_P (TREE_TYPE (x)) > + && INDIRECT_TYPE_P (TREE_TYPE (r))) > + { > + if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r))) > + r = build_nop (TREE_TYPE (x), r); > + x = cp_fold (r); > + break; > + } > + } > + > + int sv = optimize, nw = sv; > + > /* Some built-in function calls will be evaluated at compile-time in > fold (). Set optimize to 1 when folding __builtin_constant_p inside > a constexpr function so that fold_builtin_1 doesn't fold it to 0. */ > diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C b/gcc/testsuite/g++.dg/opt/pr96780.C > new file mode 100644 > index 00000000000..61e11855eeb > --- /dev/null > +++ b/gcc/testsuite/g++.dg/opt/pr96780.C > @@ -0,0 +1,38 @@ > +// PR c++/96780 > +// Verify calls to std::move/forward are folded away by the frontend. > +// { dg-do compile { target c++11 } } > +// { dg-additional-options "-ffold-simple-inlines -fdump-tree-gimple" } > + > +#include <utility> > + > +struct A; > + > +extern A& a; > +extern const A& ca; > + > +void f() { > + auto&& x1 = std::move(a); > + auto&& x2 = std::forward<A>(a); > + auto&& x3 = std::forward<A&>(a); > + > + auto&& x4 = std::move(ca); > + auto&& x5 = std::forward<const A>(ca); > + auto&& x6 = std::forward<const A&>(ca); > + > + auto x7 = std::addressof(a); > + auto x8 = std::addressof(ca); > +#if __GLIBCXX__ > + auto x9 = std::__addressof(a); > + auto x10 = std::__addressof(ca); > +#endif > +#if __cpp_lib_as_const > + auto&& x11 = std::as_const(a); > + auto&& x12 = std::as_const(ca); > +#endif > +} > + > +// { dg-final { scan-tree-dump-not "= std::move" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::forward" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }
On Tue, 15 Mar 2022, Jason Merrill wrote: > On 3/15/22 10:03, Patrick Palka wrote: > > On Mon, 14 Mar 2022, Jason Merrill wrote: > > > > > On 3/14/22 13:13, Patrick Palka wrote: > > > > On Fri, 11 Mar 2022, Jason Merrill wrote: > > > > > > > > > On 3/10/22 11:27, Patrick Palka wrote: > > > > > > On Wed, 9 Mar 2022, Jason Merrill wrote: > > > > > > > > > > > > > On 3/1/22 18:08, Patrick Palka wrote: > > > > > > > > A well-formed call to std::move/forward is equivalent to a cast, > > > > > > > > but > > > > > > > > the > > > > > > > > former being a function call means it comes with bloated debug > > > > > > > > info, > > > > > > > > which > > > > > > > > persists even after the call has been inlined away, for an > > > > > > > > operation > > > > > > > > that > > > > > > > > is never interesting to debug. > > > > > > > > > > > > > > > > This patch addresses this problem in a relatively ad-hoc way by > > > > > > > > folding > > > > > > > > calls to std::move/forward into casts as part of the frontend's > > > > > > > > general > > > > > > > > expression folding routine. After this patch with -O2 and a > > > > > > > > non-checking > > > > > > > > compiler, debug info size for some testcases decreases by about > > > > > > > > ~10% > > > > > > > > and > > > > > > > > overall compile time and memory usage decreases by ~2%. > > > > > > > > > > > > > > Impressive. Which testcases? > > > > > > > > > > > > I saw the largest percent reductions in debug file object size in > > > > > > various tests from cmcstl2 and range-v3, e.g. > > > > > > test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp > > > > > > (which are among their biggest tests). > > > > > > > > > > > > Significant reductions in debug object file size can be observed in > > > > > > some libstdc++ testcases too, such as a 5.5% reduction in > > > > > > std/ranges/adaptor/join.cc > > > > > > > > > > > > > > > > > > > > Do you also want to handle addressof and as_const in this patch, > > > > > > > as > > > > > > > Jonathan > > > > > > > suggested? > > > > > > > > > > > > Yes, good idea. Since each of their argument and return types are > > > > > > indirect types, I think we can use the same NOP_EXPR-based folding > > > > > > for > > > > > > them. > > > > > > > > > > > > > > > > > > > > I think we can do this now, and think about generalizing more in > > > > > > > stage > > > > > > > 1. > > > > > > > > > > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, is this > > > > > > > > something > > > > > > > > we > > > > > > > > want to consider for GCC 12? > > > > > > > > > > > > > > > > PR c++/96780 > > > > > > > > > > > > > > > > gcc/cp/ChangeLog: > > > > > > > > > > > > > > > > * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, > > > > > > > > fold calls to std::move/forward into simple casts. > > > > > > > > * cp-tree.h (is_std_move_p, is_std_forward_p): Declare. > > > > > > > > * typeck.cc (is_std_move_p, is_std_forward_p): Export. > > > > > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > > > > > * g++.dg/opt/pr96780.C: New test. > > > > > > > > --- > > > > > > > > gcc/cp/cp-gimplify.cc | 18 ++++++++++++++++++ > > > > > > > > gcc/cp/cp-tree.h | 2 ++ > > > > > > > > gcc/cp/typeck.cc | 6 ++---- > > > > > > > > gcc/testsuite/g++.dg/opt/pr96780.C | 24 > > > > > > > > ++++++++++++++++++++++++ > > > > > > > > 4 files changed, 46 insertions(+), 4 deletions(-) > > > > > > > > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C > > > > > > > > > > > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > > > > > > > > index d7323fb5c09..0b009b631c7 100644 > > > > > > > > --- a/gcc/cp/cp-gimplify.cc > > > > > > > > +++ b/gcc/cp/cp-gimplify.cc > > > > > > > > @@ -2756,6 +2756,24 @@ cp_fold (tree x) > > > > > > > > case CALL_EXPR: > > > > > > > > { > > > > > > > > + if (optimize > > > > > > > > > > > > > > I think this should check flag_no_inline rather than optimize. > > > > > > > > > > > > Sounds good. > > > > > > > > > > > > Here's a patch that extends the folding to as_const and addressof > > > > > > (as > > > > > > well as __addressof, which I'm kind of unsure about since it's > > > > > > non-standard). I suppose it also doesn't hurt to verify that the > > > > > > return > > > > > > and argument type of the function are sane before we commit to > > > > > > folding. > > > > > > > > > > > > -- >8 -- > > > > > > > > > > > > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] > > > > > > > > > > > > A well-formed call to std::move/forward is equivalent to a cast, but > > > > > > the > > > > > > former being a function call means the compiler generates debug info > > > > > > for > > > > > > it, which persists even after the call has been inlined away, for an > > > > > > operation that's never interesting to debug. > > > > > > > > > > > > This patch addresses this problem in a relatively ad-hoc way by > > > > > > folding > > > > > > calls to std::move/forward and other cast-like functions into simple > > > > > > casts as part of the frontend's general expression folding routine. > > > > > > After this patch with -O2 and a non-checking compiler, debug info > > > > > > size > > > > > > for some testcases decreases by about ~10% and overall compile time > > > > > > and > > > > > > memory usage decreases by ~2%. > > > > > > > > > > > > PR c++/96780 > > > > > > > > > > > > gcc/cp/ChangeLog: > > > > > > > > > > > > * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, > > > > > > fold calls to std::move/forward and other cast-like functions > > > > > > into simple casts. > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > * g++.dg/opt/pr96780.C: New test. > > > > > > --- > > > > > > gcc/cp/cp-gimplify.cc | 36 > > > > > > +++++++++++++++++++++++++++- > > > > > > gcc/testsuite/g++.dg/opt/pr96780.C | 38 > > > > > > ++++++++++++++++++++++++++++++ > > > > > > 2 files changed, 73 insertions(+), 1 deletion(-) > > > > > > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C > > > > > > > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > > > > > > index d7323fb5c09..efc4c8f0eb9 100644 > > > > > > --- a/gcc/cp/cp-gimplify.cc > > > > > > +++ b/gcc/cp/cp-gimplify.cc > > > > > > @@ -2756,9 +2756,43 @@ cp_fold (tree x) > > > > > > case CALL_EXPR: > > > > > > { > > > > > > - int sv = optimize, nw = sv; > > > > > > tree callee = get_callee_fndecl (x); > > > > > > + /* "Inline" calls to std::move/forward and other cast-like > > > > > > functions > > > > > > + by simply folding them into the corresponding cast > > > > > > determined by > > > > > > + their return type. This is cheaper than relying on the > > > > > > middle-end > > > > > > + to do so, and also means we avoid generating useless debug > > > > > > info for > > > > > > + them at all. > > > > > > + > > > > > > + At this point the argument has already been converted into > > > > > > a > > > > > > + reference, so it suffices to use a NOP_EXPR to express the > > > > > > + cast. */ > > > > > > + if (!flag_no_inline > > > > > > > > > > In our conversation yesterday it occurred to me that we might make > > > > > this a > > > > > separate flag that defaults to the value of flag_no_inline; I was > > > > > thinking > > > > > of > > > > > -ffold-simple-inlines. Then Vittorio et al can specify that > > > > > explicitly at > > > > > -O0 > > > > > if they'd like. > > > > > > > > Makes sense, like so? Bootstrapped and regtested on > > > > x86_64-pc-linux-gnu. > > > > > > > > The patch defaults -ffold-simple-inlines according to the value of > > > > flag_no_inline at startup. IIUC this means that if the flag has been > > > > defaulted to set, then e.g. an optimize("O0") function attribute won't > > > > disable -ffold-simple-inlines for that function, since we only compute > > > > its default value once. > > > > > > > > I wonder if we therefore instead want to handle defaulting the flag > > > > when it's used, e.g. check > > > > > > > > (flag_fold_simple_inlines == -1 > > > > ? flag_no_inline > > > > : flag_fold_simple_inlines) > > > > > > > > instead of > > > > > > > > flag_fold_simple_inlines > > > > > > > > in cp_fold? > > > > > > I guess that makes sense, we can't add front-end options to the > > > default_options_table. But I think let's use OPTION_SET_P instead of > > > checking > > > for -1. > > > > Done. > > > > > > > > > -- >8 -- > > > > > > > > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] > > > > > > > > A well-formed call to std::move/forward is equivalent to a cast, but the > > > > former being a function call means the compiler generates debug info for > > > > it, which persists even after the call has been inlined away, for an > > > > operation that's never interesting to debug. > > > > > > > > This patch addresses this problem by folding calls to std::move/forward > > > > and other cast-like functions into simple casts as part of the > > > > frontend's > > > > general expression folding routine. This behavior is controlled by a > > > > new flag -ffold-simple-inlines which defaults to the value of > > > > -fno-inline. > > > > > > > > After this patch with -O2 and a non-checking compiler, debug info size > > > > for some testcases (e.g. from range-v3 and cmcstl2) decreases by about > > > > ~10% and overall compile time and memory usage decreases by ~2%. > > > > > > Did you compare the reduction after handling more functions? > > > > The numbers are roughly the same, which I guess is not too surprising > > since calls to std::move/forward outnumber the other functions by about > > 10:1 in libstdc++, range-v3 and cmcstl2. > > > > The biggest reduction in debug object file size (measured by du) I've > > observed is 14% with range-v3's test/algorithm/stable_partition.cpp. > > The biggest reduction in peak memory usage is (measured by /usr/bin/time -v) > > is 5% with cmcstl's test/algorithm/set_symmetric_difference4.cpp. The > > biggest reduction in compile time (measured by perf stat) is about 3%, > > also from that testcase. > > > > -- >8 -- > > > > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] > > > > A well-formed call to std::move/forward is equivalent to a cast, but the > > former being a function call means the compiler generates debug info for > > it, which persists even after the call has been inlined away, for an > > operation that's never interesting to debug. > > > > This patch addresses this problem by folding calls to std::move/forward > > and other cast-like functions into simple casts as part of the frontend's > > general expression folding routine. This behavior is controlled by a > > new flag -ffold-simple-inlines, and otherwise by -fno-inline, so that > > users can enable such folding even with -O0 (which implies -fno-inline). > > > > After this patch with -O2 and a non-checking compiler, debug info size > > for some testcases (e.g. from range-v3 and cmcstl2) decreases by about > > ~10% and overall compile time and memory usage decreases by ~2%. > > > > PR c++/96780 > > > > gcc/c-family/ChangeLog: > > > > * c-opts.cc (c_common_post_options): Handle defaulting of > > flag_fold_simple_inlines. > > * c.opt: Add -ffold-simple-inlines. > > Looks like you still need a doc/invoke.texi change for the new flag. The rest > of the patch looks good. Like his perhaps? I opted to document the current scope of the flag (which only cares about a fixed set of functions) as opposed to its future scope (folding all sufficiently simple inline functions). -- >8 -- Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] A well-formed call to std::move/forward is equivalent to a cast, but the former being a function call means the compiler generates debug info for it, which persists even after the call has been inlined away, for an operation that's never interesting to debug. This patch addresses this problem by folding calls to std::move/forward and other cast-like functions into simple casts as part of the frontend's general expression folding routine. This behavior is controlled by a new flag -ffold-simple-inlines, and otherwise by -fno-inline, so that users can enable this folding with -O0 (which implies -fno-inline). After this patch with -O2 and a non-checking compiler, debug info size for some testcases from range-v3 and cmcstl2 decreases by as much as ~10% and overall compile time and memory usage decreases by ~2%. PR c++/96780 gcc/c-family/ChangeLog: * c.opt: Add -ffold-simple-inlines. gcc/ChangeLog: * doc/invoke.texi (C++ Dialect Options): Document -ffold-simple-inlines. gcc/cp/ChangeLog: * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: Fold calls to std::move/forward and other cast-like functions into simple casts. gcc/testsuite/ChangeLog: * g++.dg/opt/pr96780.C: New test. --- gcc/c-family/c.opt | 4 ++++ gcc/cp/cp-gimplify.cc | 38 +++++++++++++++++++++++++++++- gcc/doc/invoke.texi | 10 ++++++++ gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++++++++++++++++++++++++++++++ 4 files changed, 89 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 9cfd2a6bc4e..9a4828ebe37 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1731,6 +1731,10 @@ Support dynamic initialization of thread-local variables in a different translat fexternal-templates C++ ObjC++ WarnRemoved +ffold-simple-inlines +C++ ObjC++ Optimization Var(flag_fold_simple_inlines) +Fold calls to simple inline functions. + ffor-scope C++ ObjC++ WarnRemoved diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc index d7323fb5c09..e4c2644af15 100644 --- a/gcc/cp/cp-gimplify.cc +++ b/gcc/cp/cp-gimplify.cc @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. If not see #include "file-prefix-map.h" #include "cgraph.h" #include "omp-general.h" +#include "opts.h" /* Forward declarations. */ @@ -2756,9 +2757,44 @@ cp_fold (tree x) case CALL_EXPR: { - int sv = optimize, nw = sv; tree callee = get_callee_fndecl (x); + /* "Inline" calls to std::move/forward and other cast-like functions + by simply folding them into a corresponding cast to their return + type. This is cheaper than relying on the middle end to do so, and + also means we avoid generating useless debug info for them at all. + + At this point the argument has already been converted into a + reference, so it suffices to use a NOP_EXPR to express the + cast. */ + if ((OPTION_SET_P (flag_fold_simple_inlines) + ? flag_fold_simple_inlines + : !flag_no_inline) + && call_expr_nargs (x) == 1 + && decl_in_std_namespace_p (callee) + && DECL_NAME (callee) != NULL_TREE + && (id_equal (DECL_NAME (callee), "move") + || id_equal (DECL_NAME (callee), "forward") + || id_equal (DECL_NAME (callee), "addressof") + /* This addressof equivalent is used heavily in libstdc++. */ + || id_equal (DECL_NAME (callee), "__addressof") + || id_equal (DECL_NAME (callee), "as_const"))) + { + r = CALL_EXPR_ARG (x, 0); + /* Check that the return and argument types are sane before + folding. */ + if (INDIRECT_TYPE_P (TREE_TYPE (x)) + && INDIRECT_TYPE_P (TREE_TYPE (r))) + { + if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r))) + r = build_nop (TREE_TYPE (x), r); + x = cp_fold (r); + break; + } + } + + int sv = optimize, nw = sv; + /* Some built-in function calls will be evaluated at compile-time in fold (). Set optimize to 1 when folding __builtin_constant_p inside a constexpr function so that fold_builtin_1 doesn't fold it to 0. */ diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 2a14e1a9472..d65979bba3f 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -3124,6 +3124,16 @@ On targets that support symbol aliases, the default is @option{-fextern-tls-init}. On targets that do not support symbol aliases, the default is @option{-fno-extern-tls-init}. +@item -ffold-simple-inlines +@itemx -fno-fold-simple-inlines +@opindex ffold-simple-inlines +@opindex fno-fold-simple-inlines +Permit the C++ frontend to fold calls to @code{std::move}, @code{std::forward}, +@code{std::addressof} and @code{std::as_const}. In contrast to inlining, this +means no debug information will be generated for such calls. Since these +functions are rarely interesting to debug, this flag is enabled by default +unless @option{-fno-inline} is active. + @item -fno-gnu-keywords @opindex fno-gnu-keywords @opindex fgnu-keywords diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C b/gcc/testsuite/g++.dg/opt/pr96780.C new file mode 100644 index 00000000000..61e11855eeb --- /dev/null +++ b/gcc/testsuite/g++.dg/opt/pr96780.C @@ -0,0 +1,38 @@ +// PR c++/96780 +// Verify calls to std::move/forward are folded away by the frontend. +// { dg-do compile { target c++11 } } +// { dg-additional-options "-ffold-simple-inlines -fdump-tree-gimple" } + +#include <utility> + +struct A; + +extern A& a; +extern const A& ca; + +void f() { + auto&& x1 = std::move(a); + auto&& x2 = std::forward<A>(a); + auto&& x3 = std::forward<A&>(a); + + auto&& x4 = std::move(ca); + auto&& x5 = std::forward<const A>(ca); + auto&& x6 = std::forward<const A&>(ca); + + auto x7 = std::addressof(a); + auto x8 = std::addressof(ca); +#if __GLIBCXX__ + auto x9 = std::__addressof(a); + auto x10 = std::__addressof(ca); +#endif +#if __cpp_lib_as_const + auto&& x11 = std::as_const(a); + auto&& x12 = std::as_const(ca); +#endif +} + +// { dg-final { scan-tree-dump-not "= std::move" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::forward" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }
On 3/15/22 13:09, Patrick Palka wrote: > On Tue, 15 Mar 2022, Jason Merrill wrote: > >> On 3/15/22 10:03, Patrick Palka wrote: >>> On Mon, 14 Mar 2022, Jason Merrill wrote: >>> >>>> On 3/14/22 13:13, Patrick Palka wrote: >>>>> On Fri, 11 Mar 2022, Jason Merrill wrote: >>>>> >>>>>> On 3/10/22 11:27, Patrick Palka wrote: >>>>>>> On Wed, 9 Mar 2022, Jason Merrill wrote: >>>>>>> >>>>>>>> On 3/1/22 18:08, Patrick Palka wrote: >>>>>>>>> A well-formed call to std::move/forward is equivalent to a cast, >>>>>>>>> but >>>>>>>>> the >>>>>>>>> former being a function call means it comes with bloated debug >>>>>>>>> info, >>>>>>>>> which >>>>>>>>> persists even after the call has been inlined away, for an >>>>>>>>> operation >>>>>>>>> that >>>>>>>>> is never interesting to debug. >>>>>>>>> >>>>>>>>> This patch addresses this problem in a relatively ad-hoc way by >>>>>>>>> folding >>>>>>>>> calls to std::move/forward into casts as part of the frontend's >>>>>>>>> general >>>>>>>>> expression folding routine. After this patch with -O2 and a >>>>>>>>> non-checking >>>>>>>>> compiler, debug info size for some testcases decreases by about >>>>>>>>> ~10% >>>>>>>>> and >>>>>>>>> overall compile time and memory usage decreases by ~2%. >>>>>>>> >>>>>>>> Impressive. Which testcases? >>>>>>> >>>>>>> I saw the largest percent reductions in debug file object size in >>>>>>> various tests from cmcstl2 and range-v3, e.g. >>>>>>> test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp >>>>>>> (which are among their biggest tests). >>>>>>> >>>>>>> Significant reductions in debug object file size can be observed in >>>>>>> some libstdc++ testcases too, such as a 5.5% reduction in >>>>>>> std/ranges/adaptor/join.cc >>>>>>> >>>>>>>> >>>>>>>> Do you also want to handle addressof and as_const in this patch, >>>>>>>> as >>>>>>>> Jonathan >>>>>>>> suggested? >>>>>>> >>>>>>> Yes, good idea. Since each of their argument and return types are >>>>>>> indirect types, I think we can use the same NOP_EXPR-based folding >>>>>>> for >>>>>>> them. >>>>>>> >>>>>>>> >>>>>>>> I think we can do this now, and think about generalizing more in >>>>>>>> stage >>>>>>>> 1. >>>>>>>> >>>>>>>>> Bootstrapped and regtested on x86_64-pc-linux-gnu, is this >>>>>>>>> something >>>>>>>>> we >>>>>>>>> want to consider for GCC 12? >>>>>>>>> >>>>>>>>> PR c++/96780 >>>>>>>>> >>>>>>>>> gcc/cp/ChangeLog: >>>>>>>>> >>>>>>>>> * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, >>>>>>>>> fold calls to std::move/forward into simple casts. >>>>>>>>> * cp-tree.h (is_std_move_p, is_std_forward_p): Declare. >>>>>>>>> * typeck.cc (is_std_move_p, is_std_forward_p): Export. >>>>>>>>> >>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>> >>>>>>>>> * g++.dg/opt/pr96780.C: New test. >>>>>>>>> --- >>>>>>>>> gcc/cp/cp-gimplify.cc | 18 ++++++++++++++++++ >>>>>>>>> gcc/cp/cp-tree.h | 2 ++ >>>>>>>>> gcc/cp/typeck.cc | 6 ++---- >>>>>>>>> gcc/testsuite/g++.dg/opt/pr96780.C | 24 >>>>>>>>> ++++++++++++++++++++++++ >>>>>>>>> 4 files changed, 46 insertions(+), 4 deletions(-) >>>>>>>>> create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C >>>>>>>>> >>>>>>>>> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc >>>>>>>>> index d7323fb5c09..0b009b631c7 100644 >>>>>>>>> --- a/gcc/cp/cp-gimplify.cc >>>>>>>>> +++ b/gcc/cp/cp-gimplify.cc >>>>>>>>> @@ -2756,6 +2756,24 @@ cp_fold (tree x) >>>>>>>>> case CALL_EXPR: >>>>>>>>> { >>>>>>>>> + if (optimize >>>>>>>> >>>>>>>> I think this should check flag_no_inline rather than optimize. >>>>>>> >>>>>>> Sounds good. >>>>>>> >>>>>>> Here's a patch that extends the folding to as_const and addressof >>>>>>> (as >>>>>>> well as __addressof, which I'm kind of unsure about since it's >>>>>>> non-standard). I suppose it also doesn't hurt to verify that the >>>>>>> return >>>>>>> and argument type of the function are sane before we commit to >>>>>>> folding. >>>>>>> >>>>>>> -- >8 -- >>>>>>> >>>>>>> Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] >>>>>>> >>>>>>> A well-formed call to std::move/forward is equivalent to a cast, but >>>>>>> the >>>>>>> former being a function call means the compiler generates debug info >>>>>>> for >>>>>>> it, which persists even after the call has been inlined away, for an >>>>>>> operation that's never interesting to debug. >>>>>>> >>>>>>> This patch addresses this problem in a relatively ad-hoc way by >>>>>>> folding >>>>>>> calls to std::move/forward and other cast-like functions into simple >>>>>>> casts as part of the frontend's general expression folding routine. >>>>>>> After this patch with -O2 and a non-checking compiler, debug info >>>>>>> size >>>>>>> for some testcases decreases by about ~10% and overall compile time >>>>>>> and >>>>>>> memory usage decreases by ~2%. >>>>>>> >>>>>>> PR c++/96780 >>>>>>> >>>>>>> gcc/cp/ChangeLog: >>>>>>> >>>>>>> * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing, >>>>>>> fold calls to std::move/forward and other cast-like functions >>>>>>> into simple casts. >>>>>>> >>>>>>> gcc/testsuite/ChangeLog: >>>>>>> >>>>>>> * g++.dg/opt/pr96780.C: New test. >>>>>>> --- >>>>>>> gcc/cp/cp-gimplify.cc | 36 >>>>>>> +++++++++++++++++++++++++++- >>>>>>> gcc/testsuite/g++.dg/opt/pr96780.C | 38 >>>>>>> ++++++++++++++++++++++++++++++ >>>>>>> 2 files changed, 73 insertions(+), 1 deletion(-) >>>>>>> create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C >>>>>>> >>>>>>> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc >>>>>>> index d7323fb5c09..efc4c8f0eb9 100644 >>>>>>> --- a/gcc/cp/cp-gimplify.cc >>>>>>> +++ b/gcc/cp/cp-gimplify.cc >>>>>>> @@ -2756,9 +2756,43 @@ cp_fold (tree x) >>>>>>> case CALL_EXPR: >>>>>>> { >>>>>>> - int sv = optimize, nw = sv; >>>>>>> tree callee = get_callee_fndecl (x); >>>>>>> + /* "Inline" calls to std::move/forward and other cast-like >>>>>>> functions >>>>>>> + by simply folding them into the corresponding cast >>>>>>> determined by >>>>>>> + their return type. This is cheaper than relying on the >>>>>>> middle-end >>>>>>> + to do so, and also means we avoid generating useless debug >>>>>>> info for >>>>>>> + them at all. >>>>>>> + >>>>>>> + At this point the argument has already been converted into >>>>>>> a >>>>>>> + reference, so it suffices to use a NOP_EXPR to express the >>>>>>> + cast. */ >>>>>>> + if (!flag_no_inline >>>>>> >>>>>> In our conversation yesterday it occurred to me that we might make >>>>>> this a >>>>>> separate flag that defaults to the value of flag_no_inline; I was >>>>>> thinking >>>>>> of >>>>>> -ffold-simple-inlines. Then Vittorio et al can specify that >>>>>> explicitly at >>>>>> -O0 >>>>>> if they'd like. >>>>> >>>>> Makes sense, like so? Bootstrapped and regtested on >>>>> x86_64-pc-linux-gnu. >>>>> >>>>> The patch defaults -ffold-simple-inlines according to the value of >>>>> flag_no_inline at startup. IIUC this means that if the flag has been >>>>> defaulted to set, then e.g. an optimize("O0") function attribute won't >>>>> disable -ffold-simple-inlines for that function, since we only compute >>>>> its default value once. >>>>> >>>>> I wonder if we therefore instead want to handle defaulting the flag >>>>> when it's used, e.g. check >>>>> >>>>> (flag_fold_simple_inlines == -1 >>>>> ? flag_no_inline >>>>> : flag_fold_simple_inlines) >>>>> >>>>> instead of >>>>> >>>>> flag_fold_simple_inlines >>>>> >>>>> in cp_fold? >>>> >>>> I guess that makes sense, we can't add front-end options to the >>>> default_options_table. But I think let's use OPTION_SET_P instead of >>>> checking >>>> for -1. >>> >>> Done. >>> >>>> >>>>> -- >8 -- >>>>> >>>>> Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] >>>>> >>>>> A well-formed call to std::move/forward is equivalent to a cast, but the >>>>> former being a function call means the compiler generates debug info for >>>>> it, which persists even after the call has been inlined away, for an >>>>> operation that's never interesting to debug. >>>>> >>>>> This patch addresses this problem by folding calls to std::move/forward >>>>> and other cast-like functions into simple casts as part of the >>>>> frontend's >>>>> general expression folding routine. This behavior is controlled by a >>>>> new flag -ffold-simple-inlines which defaults to the value of >>>>> -fno-inline. >>>>> >>>>> After this patch with -O2 and a non-checking compiler, debug info size >>>>> for some testcases (e.g. from range-v3 and cmcstl2) decreases by about >>>>> ~10% and overall compile time and memory usage decreases by ~2%. >>>> >>>> Did you compare the reduction after handling more functions? >>> >>> The numbers are roughly the same, which I guess is not too surprising >>> since calls to std::move/forward outnumber the other functions by about >>> 10:1 in libstdc++, range-v3 and cmcstl2. >>> >>> The biggest reduction in debug object file size (measured by du) I've >>> observed is 14% with range-v3's test/algorithm/stable_partition.cpp. >>> The biggest reduction in peak memory usage is (measured by /usr/bin/time -v) >>> is 5% with cmcstl's test/algorithm/set_symmetric_difference4.cpp. The >>> biggest reduction in compile time (measured by perf stat) is about 3%, >>> also from that testcase. >>> >>> -- >8 -- >>> >>> Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] >>> >>> A well-formed call to std::move/forward is equivalent to a cast, but the >>> former being a function call means the compiler generates debug info for >>> it, which persists even after the call has been inlined away, for an >>> operation that's never interesting to debug. >>> >>> This patch addresses this problem by folding calls to std::move/forward >>> and other cast-like functions into simple casts as part of the frontend's >>> general expression folding routine. This behavior is controlled by a >>> new flag -ffold-simple-inlines, and otherwise by -fno-inline, so that >>> users can enable such folding even with -O0 (which implies -fno-inline). >>> >>> After this patch with -O2 and a non-checking compiler, debug info size >>> for some testcases (e.g. from range-v3 and cmcstl2) decreases by about >>> ~10% and overall compile time and memory usage decreases by ~2%. >>> >>> PR c++/96780 >>> >>> gcc/c-family/ChangeLog: >>> >>> * c-opts.cc (c_common_post_options): Handle defaulting of >>> flag_fold_simple_inlines. >>> * c.opt: Add -ffold-simple-inlines. >> >> Looks like you still need a doc/invoke.texi change for the new flag. The rest >> of the patch looks good. > > Like his perhaps? I opted to document the current scope of the flag > (which only cares about a fixed set of functions) as opposed to its > future scope (folding all sufficiently simple inline functions). OK. > -- >8 -- > > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] > > A well-formed call to std::move/forward is equivalent to a cast, but the > former being a function call means the compiler generates debug info for > it, which persists even after the call has been inlined away, for an > operation that's never interesting to debug. > > This patch addresses this problem by folding calls to std::move/forward > and other cast-like functions into simple casts as part of the frontend's > general expression folding routine. This behavior is controlled by a > new flag -ffold-simple-inlines, and otherwise by -fno-inline, so that > users can enable this folding with -O0 (which implies -fno-inline). > > After this patch with -O2 and a non-checking compiler, debug info size > for some testcases from range-v3 and cmcstl2 decreases by as much as ~10% > and overall compile time and memory usage decreases by ~2%. > > PR c++/96780 > > gcc/c-family/ChangeLog: > > * c.opt: Add -ffold-simple-inlines. > > gcc/ChangeLog: > > * doc/invoke.texi (C++ Dialect Options): Document > -ffold-simple-inlines. > > gcc/cp/ChangeLog: > > * cp-gimplify.cc (cp_fold) <case CALL_EXPR>: Fold calls to > std::move/forward and other cast-like functions into simple > casts. > > gcc/testsuite/ChangeLog: > > * g++.dg/opt/pr96780.C: New test. > --- > gcc/c-family/c.opt | 4 ++++ > gcc/cp/cp-gimplify.cc | 38 +++++++++++++++++++++++++++++- > gcc/doc/invoke.texi | 10 ++++++++ > gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++++++++++++++++++++++++++++++ > 4 files changed, 89 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C > > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt > index 9cfd2a6bc4e..9a4828ebe37 100644 > --- a/gcc/c-family/c.opt > +++ b/gcc/c-family/c.opt > @@ -1731,6 +1731,10 @@ Support dynamic initialization of thread-local variables in a different translat > fexternal-templates > C++ ObjC++ WarnRemoved > > +ffold-simple-inlines > +C++ ObjC++ Optimization Var(flag_fold_simple_inlines) > +Fold calls to simple inline functions. > + > ffor-scope > C++ ObjC++ WarnRemoved > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > index d7323fb5c09..e4c2644af15 100644 > --- a/gcc/cp/cp-gimplify.cc > +++ b/gcc/cp/cp-gimplify.cc > @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. If not see > #include "file-prefix-map.h" > #include "cgraph.h" > #include "omp-general.h" > +#include "opts.h" > > /* Forward declarations. */ > > @@ -2756,9 +2757,44 @@ cp_fold (tree x) > > case CALL_EXPR: > { > - int sv = optimize, nw = sv; > tree callee = get_callee_fndecl (x); > > + /* "Inline" calls to std::move/forward and other cast-like functions > + by simply folding them into a corresponding cast to their return > + type. This is cheaper than relying on the middle end to do so, and > + also means we avoid generating useless debug info for them at all. > + > + At this point the argument has already been converted into a > + reference, so it suffices to use a NOP_EXPR to express the > + cast. */ > + if ((OPTION_SET_P (flag_fold_simple_inlines) > + ? flag_fold_simple_inlines > + : !flag_no_inline) > + && call_expr_nargs (x) == 1 > + && decl_in_std_namespace_p (callee) > + && DECL_NAME (callee) != NULL_TREE > + && (id_equal (DECL_NAME (callee), "move") > + || id_equal (DECL_NAME (callee), "forward") > + || id_equal (DECL_NAME (callee), "addressof") > + /* This addressof equivalent is used heavily in libstdc++. */ > + || id_equal (DECL_NAME (callee), "__addressof") > + || id_equal (DECL_NAME (callee), "as_const"))) > + { > + r = CALL_EXPR_ARG (x, 0); > + /* Check that the return and argument types are sane before > + folding. */ > + if (INDIRECT_TYPE_P (TREE_TYPE (x)) > + && INDIRECT_TYPE_P (TREE_TYPE (r))) > + { > + if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r))) > + r = build_nop (TREE_TYPE (x), r); > + x = cp_fold (r); > + break; > + } > + } > + > + int sv = optimize, nw = sv; > + > /* Some built-in function calls will be evaluated at compile-time in > fold (). Set optimize to 1 when folding __builtin_constant_p inside > a constexpr function so that fold_builtin_1 doesn't fold it to 0. */ > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 2a14e1a9472..d65979bba3f 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -3124,6 +3124,16 @@ On targets that support symbol aliases, the default is > @option{-fextern-tls-init}. On targets that do not support symbol > aliases, the default is @option{-fno-extern-tls-init}. > > +@item -ffold-simple-inlines > +@itemx -fno-fold-simple-inlines > +@opindex ffold-simple-inlines > +@opindex fno-fold-simple-inlines > +Permit the C++ frontend to fold calls to @code{std::move}, @code{std::forward}, > +@code{std::addressof} and @code{std::as_const}. In contrast to inlining, this > +means no debug information will be generated for such calls. Since these > +functions are rarely interesting to debug, this flag is enabled by default > +unless @option{-fno-inline} is active. > + > @item -fno-gnu-keywords > @opindex fno-gnu-keywords > @opindex fgnu-keywords > diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C b/gcc/testsuite/g++.dg/opt/pr96780.C > new file mode 100644 > index 00000000000..61e11855eeb > --- /dev/null > +++ b/gcc/testsuite/g++.dg/opt/pr96780.C > @@ -0,0 +1,38 @@ > +// PR c++/96780 > +// Verify calls to std::move/forward are folded away by the frontend. > +// { dg-do compile { target c++11 } } > +// { dg-additional-options "-ffold-simple-inlines -fdump-tree-gimple" } > + > +#include <utility> > + > +struct A; > + > +extern A& a; > +extern const A& ca; > + > +void f() { > + auto&& x1 = std::move(a); > + auto&& x2 = std::forward<A>(a); > + auto&& x3 = std::forward<A&>(a); > + > + auto&& x4 = std::move(ca); > + auto&& x5 = std::forward<const A>(ca); > + auto&& x6 = std::forward<const A&>(ca); > + > + auto x7 = std::addressof(a); > + auto x8 = std::addressof(ca); > +#if __GLIBCXX__ > + auto x9 = std::__addressof(a); > + auto x10 = std::__addressof(ca); > +#endif > +#if __cpp_lib_as_const > + auto&& x11 = std::as_const(a); > + auto&& x12 = std::as_const(ca); > +#endif > +} > + > +// { dg-final { scan-tree-dump-not "= std::move" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::forward" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }
diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc index d7323fb5c09..0b009b631c7 100644 --- a/gcc/cp/cp-gimplify.cc +++ b/gcc/cp/cp-gimplify.cc @@ -2756,6 +2756,24 @@ cp_fold (tree x) case CALL_EXPR: { + if (optimize + && (is_std_move_p (x) || is_std_forward_p (x))) + { + /* When optimizing, "inline" calls to std::move/forward by + simply folding them into the corresponding cast. This is + cheaper than relying on the inliner to do so, and also + means we avoid generating useless debug info for them at all. + + At this point the argument has already been coerced into a + reference, so it suffices to use a NOP_EXPR to express the + reference-to-reference cast. */ + r = CALL_EXPR_ARG (x, 0); + if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r))) + r = build_nop (TREE_TYPE (x), r); + x = cp_fold (r); + break; + } + int sv = optimize, nw = sv; tree callee = get_callee_fndecl (x); diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 37d462fca6e..ab828730b03 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -8089,6 +8089,8 @@ extern tree finish_right_unary_fold_expr (tree, int); extern tree finish_binary_fold_expr (tree, tree, int); extern tree treat_lvalue_as_rvalue_p (tree, bool); extern bool decl_in_std_namespace_p (tree); +extern bool is_std_move_p (tree); +extern bool is_std_forward_p (tree); /* in typeck2.cc */ extern void require_complete_eh_spec_types (tree, tree); diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc index bddc83759ad..a3644f8e7f7 100644 --- a/gcc/cp/typeck.cc +++ b/gcc/cp/typeck.cc @@ -62,8 +62,6 @@ static bool maybe_warn_about_returning_address_of_local (tree, location_t = UNKN static void error_args_num (location_t, tree, bool); static int convert_arguments (tree, vec<tree, va_gc> **, tree, int, tsubst_flags_t); -static bool is_std_move_p (tree); -static bool is_std_forward_p (tree); /* Do `exp = require_complete_type (exp);' to make sure exp does not have an incomplete type. (That includes void types.) @@ -10207,7 +10205,7 @@ decl_in_std_namespace_p (tree decl) /* Returns true if FN, a CALL_EXPR, is a call to std::forward. */ -static bool +bool is_std_forward_p (tree fn) { /* std::forward only takes one argument. */ @@ -10224,7 +10222,7 @@ is_std_forward_p (tree fn) /* Returns true if FN, a CALL_EXPR, is a call to std::move. */ -static bool +bool is_std_move_p (tree fn) { /* std::move only takes one argument. */ diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C b/gcc/testsuite/g++.dg/opt/pr96780.C new file mode 100644 index 00000000000..ca24b2802bb --- /dev/null +++ b/gcc/testsuite/g++.dg/opt/pr96780.C @@ -0,0 +1,24 @@ +// PR c++/96780 +// Verify calls to std::move/forward are folded away by the frontend. +// { dg-do compile { target c++11 } } +// { dg-additional-options "-O -fdump-tree-gimple" } + +#include <utility> + +struct A; + +extern A& a; +extern const A& ca; + +void f() { + auto&& x1 = std::move(a); + auto&& x2 = std::forward<A>(a); + auto&& x3 = std::forward<A&>(a); + + auto&& x4 = std::move(ca); + auto&& x5 = std::forward<const A>(ca); + auto&& x6 = std::forward<const A&>(ca); +} + +// { dg-final { scan-tree-dump-not "= std::move" "gimple" } } +// { dg-final { scan-tree-dump-not "= std::forward" "gimple" } }