Message ID | Y+INds/30aydJlJj@tucnak |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 111EA3858416 for <patchwork@sourceware.org>; Tue, 7 Feb 2023 08:37:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 111EA3858416 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1675759020; bh=V4KuCd8i/fBQk01XFW/pde9YmtXgObIpQgTX7tpuZ+g=; h=Date:To:Cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=Be2dCvo5jkPUMijGNqaFePqljbD/jJytvn0GT2LdQPxLYmeAhCBpK+lxRxK7Djeo5 ipqIRk0LZMisXiuXTGxAWNj3joT9qIRHkGe0QfpOrVyYJZ9jQm8mFdcCYFDBl9U8oS 3k4tr9570zPg7rWAZr/TY3rq2z73OnxhNOTi94/0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 609303858D1E for <gcc-patches@gcc.gnu.org>; Tue, 7 Feb 2023 08:36:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 609303858D1E Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-226-zrEAnnxFP3m4fZO0LjbgFQ-1; Tue, 07 Feb 2023 03:36:11 -0500 X-MC-Unique: zrEAnnxFP3m4fZO0LjbgFQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6DC5B80D0EF; Tue, 7 Feb 2023 08:36:11 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.223]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 26F8F401014C; Tue, 7 Feb 2023 08:36:11 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 3178a8IH957479 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 7 Feb 2023 09:36:08 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 3178a7v2957476; Tue, 7 Feb 2023 09:36:07 +0100 Date: Tue, 7 Feb 2023 09:36:06 +0100 To: Jan Hubicka <jh@suse.cz>, Richard Biener <rguenther@suse.de> Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] cgraph: Handle simd clones in cgraph_node::set_{const,pure}_flag [PR106433] Message-ID: <Y+INds/30aydJlJj@tucnak> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Jakub Jelinek <jakub@redhat.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
cgraph: Handle simd clones in cgraph_node::set_{const,pure}_flag [PR106433]
|
|
Commit Message
Jakub Jelinek
Feb. 7, 2023, 8:36 a.m. UTC
Hi! The following testcase ICEs, because we determine only in late pure const pass that bar is const (the content of the function loses a store to a global var during dse3 and read from it during cddce2) and local-pure-const2 makes it const. The cgraph ordering is that post IPA (in late IPA simd clones are created) bar is processed first, then foo as its caller, then foo.simdclone* and finally bar.simdclone*. Conceptually I think that is the right ordering which allows for static simd clones to be removed. The reason for the ICE is that because bar was marked const, the call to it lost vops before vectorization, and when we in foo.simdclone* try to vectorize the call to bar, we replace it with bar.simdclone* which hasn't been marked const and so needs vops, which we don't add. Now, because the simd clones are created from the same IL, just in a loop with different argument/return value passing, I think generally if the base function is determined to be const or pure, the simd clones should be too, unless e.g. the vectorization causes different optimization decisions, but then still the global memory reads if any shouldn't affect what the function does and global memory stores shouldn't be reachable at runtime. So, the following patch changes set_{const,pure}_flag to mark also simd clones. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2023-02-07 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/106433 * cgraph.cc (set_const_flag_1): Recurse on simd clones too. (cgraph_node::set_pure_flag): Call set_pure_flag_1 on simd clones too. * gcc.c-torture/compile/pr106433.c: New test. Jakub
Comments
> Am 07.02.2023 um 09:37 schrieb Jakub Jelinek <jakub@redhat.com>: > > Hi! > > The following testcase ICEs, because we determine only in late pure const > pass that bar is const (the content of the function loses a store to a > global var during dse3 and read from it during cddce2) and local-pure-const2 > makes it const. The cgraph ordering is that post IPA (in late IPA simd > clones are created) bar is processed first, then foo as its caller, then > foo.simdclone* and finally bar.simdclone*. Conceptually I think that is the > right ordering which allows for static simd clones to be removed. > > The reason for the ICE is that because bar was marked const, the call to > it lost vops before vectorization, and when we in foo.simdclone* try to > vectorize the call to bar, we replace it with bar.simdclone* which hasn't > been marked const and so needs vops, which we don't add. > > Now, because the simd clones are created from the same IL, just in a loop > with different argument/return value passing, I think generally if the base > function is determined to be const or pure, the simd clones should be too, > unless e.g. the vectorization causes different optimization decisions, but > then still the global memory reads if any shouldn't affect what the function > does and global memory stores shouldn't be reachable at runtime. > > So, the following patch changes set_{const,pure}_flag to mark also simd > clones. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok, Thanks, Richard > 2023-02-07 Jakub Jelinek <jakub@redhat.com> > > PR tree-optimization/106433 > * cgraph.cc (set_const_flag_1): Recurse on simd clones too. > (cgraph_node::set_pure_flag): Call set_pure_flag_1 on simd clones too. > > * gcc.c-torture/compile/pr106433.c: New test. > > --- gcc/cgraph.cc.jj 2023-02-02 10:54:44.327473492 +0100 > +++ gcc/cgraph.cc 2023-02-06 12:28:22.040593063 +0100 > @@ -2764,6 +2764,9 @@ set_const_flag_1 (cgraph_node *node, boo > if (!set_const || alias->get_availability () > AVAIL_INTERPOSABLE) > set_const_flag_1 (alias, set_const, looping, changed); > } > + for (struct cgraph_node *n = node->simd_clones; n != NULL; > + n = n->simdclone->next_clone) > + set_const_flag_1 (n, set_const, looping, changed); > for (cgraph_edge *e = node->callers; e; e = e->next_caller) > if (e->caller->thunk > && (!set_const || e->caller->get_availability () > AVAIL_INTERPOSABLE)) > @@ -2876,6 +2879,9 @@ cgraph_node::set_pure_flag (bool pure, b > { > struct set_pure_flag_info info = {pure, looping, false}; > call_for_symbol_thunks_and_aliases (set_pure_flag_1, &info, !pure, true); > + for (struct cgraph_node *n = simd_clones; n != NULL; > + n = n->simdclone->next_clone) > + set_pure_flag_1 (n, &info); > return info.changed; > } > > --- gcc/testsuite/gcc.c-torture/compile/pr106433.c.jj 2023-02-06 12:37:26.963748811 +0100 > +++ gcc/testsuite/gcc.c-torture/compile/pr106433.c 2023-02-06 12:37:06.631041918 +0100 > @@ -0,0 +1,24 @@ > +/* PR tree-optimization/106433 */ > + > +int m, *p; > + > +__attribute__ ((simd)) int > +bar (int x) > +{ > + if (x) > + { > + if (m < 1) > + for (m = 0; m < 1; ++m) > + ++x; > + p = &x; > + for (;;) > + ++m; > + } > + return 0; > +} > + > +__attribute__ ((simd)) int > +foo (int x) > +{ > + return bar (x); > +} > > Jakub >
> Hi! > > The following testcase ICEs, because we determine only in late pure const > pass that bar is const (the content of the function loses a store to a > global var during dse3 and read from it during cddce2) and local-pure-const2 > makes it const. The cgraph ordering is that post IPA (in late IPA simd > clones are created) bar is processed first, then foo as its caller, then > foo.simdclone* and finally bar.simdclone*. Conceptually I think that is the > right ordering which allows for static simd clones to be removed. > > The reason for the ICE is that because bar was marked const, the call to > it lost vops before vectorization, and when we in foo.simdclone* try to > vectorize the call to bar, we replace it with bar.simdclone* which hasn't > been marked const and so needs vops, which we don't add. > > Now, because the simd clones are created from the same IL, just in a loop > with different argument/return value passing, I think generally if the base > function is determined to be const or pure, the simd clones should be too, > unless e.g. the vectorization causes different optimization decisions, but > then still the global memory reads if any shouldn't affect what the function > does and global memory stores shouldn't be reachable at runtime. My understanding of simd clones is bit limited, but I think you are right that they should have the same semantics as their caller. I think const may be one that makes compiler to ICE, but there are many other places where function body is analyzed and all its aliases/thunks and other variants should be updated too. For exmaple set_pure_flag, nothrow, noreturn and analysis done by modref, ipa-refernece etc. I wonder if we want to update them all and hide that in some abstraction? Next stage 1 I can work on inventing iterators for those kind of things as current approach combinindg direct walkters and function wrappers has become bit hard to maintain in cases like this. Honza > > So, the following patch changes set_{const,pure}_flag to mark also simd > clones. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2023-02-07 Jakub Jelinek <jakub@redhat.com> > > PR tree-optimization/106433 > * cgraph.cc (set_const_flag_1): Recurse on simd clones too. > (cgraph_node::set_pure_flag): Call set_pure_flag_1 on simd clones too. > > * gcc.c-torture/compile/pr106433.c: New test. > > --- gcc/cgraph.cc.jj 2023-02-02 10:54:44.327473492 +0100 > +++ gcc/cgraph.cc 2023-02-06 12:28:22.040593063 +0100 > @@ -2764,6 +2764,9 @@ set_const_flag_1 (cgraph_node *node, boo > if (!set_const || alias->get_availability () > AVAIL_INTERPOSABLE) > set_const_flag_1 (alias, set_const, looping, changed); > } > + for (struct cgraph_node *n = node->simd_clones; n != NULL; > + n = n->simdclone->next_clone) > + set_const_flag_1 (n, set_const, looping, changed); > for (cgraph_edge *e = node->callers; e; e = e->next_caller) > if (e->caller->thunk > && (!set_const || e->caller->get_availability () > AVAIL_INTERPOSABLE)) > @@ -2876,6 +2879,9 @@ cgraph_node::set_pure_flag (bool pure, b > { > struct set_pure_flag_info info = {pure, looping, false}; > call_for_symbol_thunks_and_aliases (set_pure_flag_1, &info, !pure, true); > + for (struct cgraph_node *n = simd_clones; n != NULL; > + n = n->simdclone->next_clone) > + set_pure_flag_1 (n, &info); > return info.changed; > } > > --- gcc/testsuite/gcc.c-torture/compile/pr106433.c.jj 2023-02-06 12:37:26.963748811 +0100 > +++ gcc/testsuite/gcc.c-torture/compile/pr106433.c 2023-02-06 12:37:06.631041918 +0100 > @@ -0,0 +1,24 @@ > +/* PR tree-optimization/106433 */ > + > +int m, *p; > + > +__attribute__ ((simd)) int > +bar (int x) > +{ > + if (x) > + { > + if (m < 1) > + for (m = 0; m < 1; ++m) > + ++x; > + p = &x; > + for (;;) > + ++m; > + } > + return 0; > +} > + > +__attribute__ ((simd)) int > +foo (int x) > +{ > + return bar (x); > +} > > Jakub >
On Wed, Feb 08, 2023 at 06:10:08PM +0100, Jan Hubicka wrote: > My understanding of simd clones is bit limited, but I think you are > right that they should have the same semantics as their caller. > > I think const may be one that makes compiler to ICE, but > there are many other places where function body is analyzed and all its > aliases/thunks and other variants should be updated too. For exmaple > set_pure_flag, nothrow, noreturn and analysis done by modref, > ipa-refernece etc. > > I wonder if we want to update them all and hide that in some > abstraction? Next stage 1 I can work on inventing iterators for those > kind of things as current approach combinindg direct walkters and > function wrappers has become bit hard to maintain in cases like this. I think it depends on whether we do that analysis or update it post IPA or not. Because simd clones are created very late during IPA, if say the nothrow, noreturn, modref etc. analysis is done only during IPA or before it, we don't need to walk the simd clones. It is just for late GIMPLE analysis that changes flags that later on could be used in callers of those functions. pure/const flag is what I know can happen this late, what else? Jakub
> On Wed, Feb 08, 2023 at 06:10:08PM +0100, Jan Hubicka wrote: > > My understanding of simd clones is bit limited, but I think you are > > right that they should have the same semantics as their caller. > > > > I think const may be one that makes compiler to ICE, but > > there are many other places where function body is analyzed and all its > > aliases/thunks and other variants should be updated too. For exmaple > > set_pure_flag, nothrow, noreturn and analysis done by modref, > > ipa-refernece etc. > > > > I wonder if we want to update them all and hide that in some > > abstraction? Next stage 1 I can work on inventing iterators for those > > kind of things as current approach combinindg direct walkters and > > function wrappers has become bit hard to maintain in cases like this. > > I think it depends on whether we do that analysis or update it post IPA > or not. Because simd clones are created very late during IPA, if say > the nothrow, noreturn, modref etc. analysis is done only during IPA or > before it, we don't need to walk the simd clones. > It is just for late GIMPLE analysis that changes flags that later on > could be used in callers of those functions. > pure/const flag is what I know can happen this late, what else? We have late pure/const (doing pure, const, nothrow, noreturn), modref (which also discovers pure/const attributes and produces its own summaries) and except.c at the very end of copimlation can set notrow flag... This is all I can think of. Honza > > Jakub >
--- gcc/cgraph.cc.jj 2023-02-02 10:54:44.327473492 +0100 +++ gcc/cgraph.cc 2023-02-06 12:28:22.040593063 +0100 @@ -2764,6 +2764,9 @@ set_const_flag_1 (cgraph_node *node, boo if (!set_const || alias->get_availability () > AVAIL_INTERPOSABLE) set_const_flag_1 (alias, set_const, looping, changed); } + for (struct cgraph_node *n = node->simd_clones; n != NULL; + n = n->simdclone->next_clone) + set_const_flag_1 (n, set_const, looping, changed); for (cgraph_edge *e = node->callers; e; e = e->next_caller) if (e->caller->thunk && (!set_const || e->caller->get_availability () > AVAIL_INTERPOSABLE)) @@ -2876,6 +2879,9 @@ cgraph_node::set_pure_flag (bool pure, b { struct set_pure_flag_info info = {pure, looping, false}; call_for_symbol_thunks_and_aliases (set_pure_flag_1, &info, !pure, true); + for (struct cgraph_node *n = simd_clones; n != NULL; + n = n->simdclone->next_clone) + set_pure_flag_1 (n, &info); return info.changed; } --- gcc/testsuite/gcc.c-torture/compile/pr106433.c.jj 2023-02-06 12:37:26.963748811 +0100 +++ gcc/testsuite/gcc.c-torture/compile/pr106433.c 2023-02-06 12:37:06.631041918 +0100 @@ -0,0 +1,24 @@ +/* PR tree-optimization/106433 */ + +int m, *p; + +__attribute__ ((simd)) int +bar (int x) +{ + if (x) + { + if (m < 1) + for (m = 0; m < 1; ++m) + ++x; + p = &x; + for (;;) + ++m; + } + return 0; +} + +__attribute__ ((simd)) int +foo (int x) +{ + return bar (x); +}