Message ID | 23b4998b-bbe6-b052-d7f5-5304ee0f46a3@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 15BDA383E6AF for <patchwork@sourceware.org>; Tue, 14 Jun 2022 07:57:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 15BDA383E6AF DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655193471; bh=1O/iNNT9X8+Vszuo2kRwQ7Jxe7ua4HgFniznPZVUHJ4=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=SF3w4d19EmOFRRiP9m9CAt/pv+YuSMyVBgXVdwtGLHVfqA6qHE5Ac7REoaBuqgJ1m dLhzA3JrK7jlmsHbCYF35Qj41cGOJeXJ0tw3Whqbx+cu5XjDi4UnoR+KV9slSlhg8Z HNjMRCCFpe2+/MunW65rQpBgcYZN/ArsUwwZLYMc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 21C4B3856DF7 for <gcc-patches@gcc.gnu.org>; Tue, 14 Jun 2022 07:57:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 21C4B3856DF7 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25E7h8aA022878; Tue, 14 Jun 2022 07:57:17 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gpp6h8915-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 14 Jun 2022 07:57:16 +0000 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 25E7iQst027631; Tue, 14 Jun 2022 07:57:16 GMT Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gpp6h8909-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 14 Jun 2022 07:57:16 +0000 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 25E7oU02029886; Tue, 14 Jun 2022 07:57:13 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma05fra.de.ibm.com with ESMTP id 3gmjp932uv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 14 Jun 2022 07:57:13 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25E7vBmc19661130 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 14 Jun 2022 07:57:11 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8CA2911C04C; Tue, 14 Jun 2022 07:57:11 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C192F11C04A; Tue, 14 Jun 2022 07:57:09 +0000 (GMT) Received: from [9.200.42.225] (unknown [9.200.42.225]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 14 Jun 2022 07:57:09 +0000 (GMT) Message-ID: <23b4998b-bbe6-b052-d7f5-5304ee0f46a3@linux.ibm.com> Date: Tue, 14 Jun 2022 15:57:08 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Content-Language: en-US To: GCC Patches <gcc-patches@gcc.gnu.org> Subject: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818] Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: fj14FKNQ0hlKWrv51Y2p5qk9a70rpu8h X-Proofpoint-ORIG-GUID: aX3PHjjGWHpXFk5sEDGt12mVFzc6J2nb X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-14_02,2022-06-13_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 bulkscore=0 lowpriorityscore=0 priorityscore=1501 clxscore=1015 adultscore=0 mlxlogscore=960 malwarescore=0 spamscore=0 phishscore=0 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206140028 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: "Kewen.Lin via Gcc-patches" <gcc-patches@gcc.gnu.org> Reply-To: "Kewen.Lin" <linkw@linux.ibm.com> Cc: Richard Sandiford <richard.sandiford@arm.com>, Jan Hubicka <hubicka@ucw.cz>, Segher Boessenkool <segher@kernel.crashing.org> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
predict: Adjust optimize_function_for_size_p [PR105818]
|
|
Commit Message
Kewen.Lin
June 14, 2022, 7:57 a.m. UTC
Hi, Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO if func->decl is not null but no cgraph node is available for it. As PR105818 shows, this could give unexpected result. For the case in PR105818, when parsing bar decl in function foo, the cfun is a function structure for foo, for which there is none cgraph node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since the context is to optimize for size, the flag optimize_size is true. The patch is to make optimize_function_for_size_p to check optimize_size as what it does when func->decl is unavailable. One regression failure got exposed on aarch64-linux-gnu: PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ -DPREVENT_OPTIMIZATION line 21 x == 10 - i The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT used in function fold_range_test during c parsing, it uses optimize_function_for_speed_p which is equal to the invertion of optimize_function_for_size_p. At that time cfun->decl is valid but no cgraph node for it, w/o this patch function optimize_function_for_speed_p returns true eventually, while it returns false with this patch. Since the command line option -Os is specified, there is no reason to interpret it as "for speed". I think this failure is expected and adjust the test case accordingly. Is it ok for trunk? BR, Kewen ----- PR target/105818 gcc/ChangeLog: * predict.cc (optimize_function_for_size_p): Check optimize_size when func->decl is valid but its cgraph node is unavailable. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr105818.c: New test. * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. --- gcc/predict.cc | 2 +- gcc/testsuite/gcc.dg/guality/pr54693-2.c | 2 +- gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++ 3 files changed, 11 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c -- 2.27.0
Comments
> Hi, > > Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO > if func->decl is not null but no cgraph node is available for it. > As PR105818 shows, this could give unexpected result. For the > case in PR105818, when parsing bar decl in function foo, the cfun > is a function structure for foo, for which there is none cgraph > node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since > the context is to optimize for size, the flag optimize_size is > true. > > The patch is to make optimize_function_for_size_p to check > optimize_size as what it does when func->decl is unavailable. > > One regression failure got exposed on aarch64-linux-gnu: > > PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ > -DPREVENT_OPTIMIZATION line 21 x == 10 - i > > The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT > used in function fold_range_test during c parsing, it uses > optimize_function_for_speed_p which is equal to the invertion > of optimize_function_for_size_p. At that time cfun->decl is valid > but no cgraph node for it, w/o this patch function > optimize_function_for_speed_p returns true eventually, while it > returns false with this patch. Since the command line option -Os > is specified, there is no reason to interpret it as "for speed". > I think this failure is expected and adjust the test case > accordingly. > > Is it ok for trunk? > > BR, > Kewen > ----- > > PR target/105818 > > gcc/ChangeLog: > > * predict.cc (optimize_function_for_size_p): Check optimize_size when > func->decl is valid but its cgraph node is unavailable. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr105818.c: New test. > * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. > --- > gcc/predict.cc | 2 +- > gcc/testsuite/gcc.dg/guality/pr54693-2.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++ > 3 files changed, 11 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c > > diff --git a/gcc/predict.cc b/gcc/predict.cc > index 5734e4c8516..6c60a973236 100644 > --- a/gcc/predict.cc > +++ b/gcc/predict.cc > @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) > cgraph_node *n = cgraph_node::get (fun->decl); > if (n) > return n->optimize_for_size_p (); > - return OPTIMIZE_SIZE_NO; > + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; We could also do (opt_for_fn (cfun->decl, optimize_size) that is probably better since one can change optimize_size with optimization attribute. However I think in most cases we check for optimize_size early I think we are doing something wrong, since at that time htere is no profile available. Why exactly PR105818 hits the flag change issue? Honza > } > > /* Return true if function FUN should always be optimized for speed. */ > diff --git a/gcc/testsuite/gcc.dg/guality/pr54693-2.c b/gcc/testsuite/gcc.dg/guality/pr54693-2.c > index 68aa6c63d71..14ca94ad37d 100644 > --- a/gcc/testsuite/gcc.dg/guality/pr54693-2.c > +++ b/gcc/testsuite/gcc.dg/guality/pr54693-2.c > @@ -17,7 +17,7 @@ foo (int x, int y, int z) > int i = 0; > while (x > 3 && y > 3 && z > 3) > { /* { dg-final { gdb-test .+2 "i" "v + 1" } } */ > - /* { dg-final { gdb-test .+1 "x" "10 - i" } } */ > + /* { dg-final { gdb-test .+1 "x" "10 - i" { xfail { aarch64*-*-* && { any-opts "-Os" } } } } } */ > bar (i); /* { dg-final { gdb-test . "y" "20 - 2 * i" } } */ > /* { dg-final { gdb-test .-1 "z" "30 - 3 * i" { xfail { aarch64*-*-* && { any-opts "-fno-fat-lto-objects" "-Os" } } } } } */ > i++, x--, y -= 2, z -= 3; > diff --git a/gcc/testsuite/gcc.target/powerpc/pr105818.c b/gcc/testsuite/gcc.target/powerpc/pr105818.c > new file mode 100644 > index 00000000000..18781edbf9e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr105818.c > @@ -0,0 +1,9 @@ > +/* { dg-options "-Os -fno-tree-vectorize" } */ > + > +#pragma GCC optimize "-fno-tree-vectorize" > + > +void > +foo (void) > +{ > + void bar (void); > +} > -- > 2.27.0
On Tue, Jun 14, 2022 at 03:57:08PM +0800, Kewen.Lin wrote:
> PR target/105818
Change this to something else? It is not a target bug. "middle-end"
perhaps?
Thanks for fixing this,
Segher
Hi Honza, Thanks for the comments! Some replies are inlined below. on 2022/6/14 19:37, Jan Hubicka wrote: >> Hi, >> >> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO >> if func->decl is not null but no cgraph node is available for it. >> As PR105818 shows, this could give unexpected result. For the >> case in PR105818, when parsing bar decl in function foo, the cfun >> is a function structure for foo, for which there is none cgraph >> node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since >> the context is to optimize for size, the flag optimize_size is >> true. >> >> The patch is to make optimize_function_for_size_p to check >> optimize_size as what it does when func->decl is unavailable. >> >> One regression failure got exposed on aarch64-linux-gnu: >> >> PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ >> -DPREVENT_OPTIMIZATION line 21 x == 10 - i >> >> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT >> used in function fold_range_test during c parsing, it uses >> optimize_function_for_speed_p which is equal to the invertion >> of optimize_function_for_size_p. At that time cfun->decl is valid >> but no cgraph node for it, w/o this patch function >> optimize_function_for_speed_p returns true eventually, while it >> returns false with this patch. Since the command line option -Os >> is specified, there is no reason to interpret it as "for speed". >> I think this failure is expected and adjust the test case >> accordingly. >> >> Is it ok for trunk? >> >> BR, >> Kewen >> ----- >> >> PR target/105818 >> >> gcc/ChangeLog: >> >> * predict.cc (optimize_function_for_size_p): Check optimize_size when >> func->decl is valid but its cgraph node is unavailable. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr105818.c: New test. >> * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. >> --- >> gcc/predict.cc | 2 +- >> gcc/testsuite/gcc.dg/guality/pr54693-2.c | 2 +- >> gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++ >> 3 files changed, 11 insertions(+), 2 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c >> >> diff --git a/gcc/predict.cc b/gcc/predict.cc >> index 5734e4c8516..6c60a973236 100644 >> --- a/gcc/predict.cc >> +++ b/gcc/predict.cc >> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) >> cgraph_node *n = cgraph_node::get (fun->decl); >> if (n) >> return n->optimize_for_size_p (); >> - return OPTIMIZE_SIZE_NO; >> + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; > > We could also do (opt_for_fn (cfun->decl, optimize_size) that is > probably better since one can change optimize_size with optimization > attribute. Good point, agree! > However I think in most cases we check for optimize_size early I think > we are doing something wrong, since at that time htere is no profile > available. Why exactly PR105818 hits the flag change issue? For PR105818, the reason why the flag changs is that: Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit of rs6000_isa_flags_explicit, it's set as below: /* If we can shrink-wrap the TOC register save separately, then use -msave-toc-indirect unless explicitly disabled. */ if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 && flag_shrink_wrap_separate && optimize_function_for_speed_p (cfun)) rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; Initially, rs6000 initialize target_option_default_node with OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL and optimize_size is true. Later, when c parser handling function foo, it builds target option node as target_option_default_node in function handle_optimize_attribute, it does global option saving and verifying there as well, at that time the cfun is NULL, no issue is found. And function store_parm_decls allocates struct_function for foo then, cfun becomes function struct for foo, when c parser continues to handle the decl bar in foo, function handle_optimize_attribute works as before, tries to restore the target options at the end, it calls targetm.target_option.restore (rs6000_function_specific_restore) which calls function rs6000_option_override_internal again, at this time the cfun is not NULL while there is no cgraph node for its decl, optimize_function_for_speed_p returns true and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag rs6000_isa_flags set unexpectedly. It becomes inconsistent as the one saved previously. IMHO, both contexts of global and function decl foo here hold optimize_size, function optimize_function_for_speed_p should not return true anyway. btw, the aarch64 failed case also gets the unexpected result for optimize_function_for_speed_p during c parsing (fold_range_test <- ... <- c_parser_condition). IIUC, in parsing time we don't have the profile information available. BR, Kewen
on 2022/6/14 20:53, Segher Boessenkool wrote: > On Tue, Jun 14, 2022 at 03:57:08PM +0800, Kewen.Lin wrote: >> PR target/105818 > > Change this to something else? It is not a target bug. "middle-end" > perhaps? > Good catch, will fix this. Thanks Segher! BR, Kewen
on 2022/6/15 14:20, Kewen.Lin wrote: > Hi Honza, > > Thanks for the comments! Some replies are inlined below. > > on 2022/6/14 19:37, Jan Hubicka wrote: >>> Hi, >>> >>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO >>> if func->decl is not null but no cgraph node is available for it. >>> As PR105818 shows, this could give unexpected result. For the >>> case in PR105818, when parsing bar decl in function foo, the cfun >>> is a function structure for foo, for which there is none cgraph >>> node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since >>> the context is to optimize for size, the flag optimize_size is >>> true. >>> >>> The patch is to make optimize_function_for_size_p to check >>> optimize_size as what it does when func->decl is unavailable. >>> >>> One regression failure got exposed on aarch64-linux-gnu: >>> >>> PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ >>> -DPREVENT_OPTIMIZATION line 21 x == 10 - i >>> >>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT >>> used in function fold_range_test during c parsing, it uses >>> optimize_function_for_speed_p which is equal to the invertion >>> of optimize_function_for_size_p. At that time cfun->decl is valid >>> but no cgraph node for it, w/o this patch function >>> optimize_function_for_speed_p returns true eventually, while it >>> returns false with this patch. Since the command line option -Os >>> is specified, there is no reason to interpret it as "for speed". >>> I think this failure is expected and adjust the test case >>> accordingly. >>> >>> Is it ok for trunk? >>> >>> BR, >>> Kewen >>> ----- >>> >>> PR target/105818 >>> >>> gcc/ChangeLog: >>> >>> * predict.cc (optimize_function_for_size_p): Check optimize_size when >>> func->decl is valid but its cgraph node is unavailable. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/powerpc/pr105818.c: New test. >>> * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. >>> --- >>> gcc/predict.cc | 2 +- >>> gcc/testsuite/gcc.dg/guality/pr54693-2.c | 2 +- >>> gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++ >>> 3 files changed, 11 insertions(+), 2 deletions(-) >>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c >>> >>> diff --git a/gcc/predict.cc b/gcc/predict.cc >>> index 5734e4c8516..6c60a973236 100644 >>> --- a/gcc/predict.cc >>> +++ b/gcc/predict.cc >>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) >>> cgraph_node *n = cgraph_node::get (fun->decl); >>> if (n) >>> return n->optimize_for_size_p (); >>> - return OPTIMIZE_SIZE_NO; >>> + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; >> >> We could also do (opt_for_fn (cfun->decl, optimize_size) that is >> probably better since one can change optimize_size with optimization >> attribute. > > Good point, agree! > >> However I think in most cases we check for optimize_size early I think >> we are doing something wrong, since at that time htere is no profile >> available. Why exactly PR105818 hits the flag change issue? > > For PR105818, the reason why the flag changs is that: > > Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit > of rs6000_isa_flags_explicit, it's set as below: > > /* If we can shrink-wrap the TOC register save separately, then use > -msave-toc-indirect unless explicitly disabled. */ > if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 > && flag_shrink_wrap_separate > && optimize_function_for_speed_p (cfun)) > rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; > > Initially, rs6000 initialize target_option_default_node with > OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL > and optimize_size is true. > > Later, when c parser handling function foo, it builds target > option node as target_option_default_node in function > handle_optimize_attribute, it does global option saving and > verifying there as well, at that time the cfun is NULL, no > issue is found. And function store_parm_decls allocates > struct_function for foo then, cfun becomes function struct > for foo, when c parser continues to handle the decl bar in > foo, function handle_optimize_attribute works as before, > tries to restore the target options at the end, it calls > targetm.target_option.restore (rs6000_function_specific_restore) > which calls function rs6000_option_override_internal again, > at this time the cfun is not NULL while there is no cgraph > node for its decl, optimize_function_for_speed_p returns true > and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag > rs6000_isa_flags set unexpectedly. It becomes inconsistent > as the one saved previously. > > IMHO, both contexts of global and function decl foo here hold > optimize_size, function optimize_function_for_speed_p should > not return true anyway. > > btw, the aarch64 failed case also gets the unexpected > result for optimize_function_for_speed_p during c parsing > (fold_range_test <- ... <- c_parser_condition). > > IIUC, in parsing time we don't have the profile information > available. > Hi Honza, Does the above explanation sound reasonable to you? BR, Kewen
on 2022/7/11 11:42, Kewen.Lin wrote: > on 2022/6/15 14:20, Kewen.Lin wrote: >> Hi Honza, >> >> Thanks for the comments! Some replies are inlined below. >> >> on 2022/6/14 19:37, Jan Hubicka wrote: >>>> Hi, >>>> >>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO >>>> if func->decl is not null but no cgraph node is available for it. >>>> As PR105818 shows, this could give unexpected result. For the >>>> case in PR105818, when parsing bar decl in function foo, the cfun >>>> is a function structure for foo, for which there is none cgraph >>>> node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since >>>> the context is to optimize for size, the flag optimize_size is >>>> true. >>>> >>>> The patch is to make optimize_function_for_size_p to check >>>> optimize_size as what it does when func->decl is unavailable. >>>> >>>> One regression failure got exposed on aarch64-linux-gnu: >>>> >>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ >>>> -DPREVENT_OPTIMIZATION line 21 x == 10 - i >>>> >>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT >>>> used in function fold_range_test during c parsing, it uses >>>> optimize_function_for_speed_p which is equal to the invertion >>>> of optimize_function_for_size_p. At that time cfun->decl is valid >>>> but no cgraph node for it, w/o this patch function >>>> optimize_function_for_speed_p returns true eventually, while it >>>> returns false with this patch. Since the command line option -Os >>>> is specified, there is no reason to interpret it as "for speed". >>>> I think this failure is expected and adjust the test case >>>> accordingly. >>>> >>>> Is it ok for trunk? >>>> >>>> BR, >>>> Kewen >>>> ----- >>>> >>>> PR target/105818 >>>> >>>> gcc/ChangeLog: >>>> >>>> * predict.cc (optimize_function_for_size_p): Check optimize_size when >>>> func->decl is valid but its cgraph node is unavailable. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> * gcc.target/powerpc/pr105818.c: New test. >>>> * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. >>>> --- >>>> gcc/predict.cc | 2 +- >>>> gcc/testsuite/gcc.dg/guality/pr54693-2.c | 2 +- >>>> gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++ >>>> 3 files changed, 11 insertions(+), 2 deletions(-) >>>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c >>>> >>>> diff --git a/gcc/predict.cc b/gcc/predict.cc >>>> index 5734e4c8516..6c60a973236 100644 >>>> --- a/gcc/predict.cc >>>> +++ b/gcc/predict.cc >>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) >>>> cgraph_node *n = cgraph_node::get (fun->decl); >>>> if (n) >>>> return n->optimize_for_size_p (); >>>> - return OPTIMIZE_SIZE_NO; >>>> + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; >>> >>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is >>> probably better since one can change optimize_size with optimization >>> attribute. >> >> Good point, agree! >> >>> However I think in most cases we check for optimize_size early I think >>> we are doing something wrong, since at that time htere is no profile >>> available. Why exactly PR105818 hits the flag change issue? >> >> For PR105818, the reason why the flag changs is that: >> >> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit >> of rs6000_isa_flags_explicit, it's set as below: >> >> /* If we can shrink-wrap the TOC register save separately, then use >> -msave-toc-indirect unless explicitly disabled. */ >> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 >> && flag_shrink_wrap_separate >> && optimize_function_for_speed_p (cfun)) >> rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; >> >> Initially, rs6000 initialize target_option_default_node with >> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL >> and optimize_size is true. >> >> Later, when c parser handling function foo, it builds target >> option node as target_option_default_node in function >> handle_optimize_attribute, it does global option saving and >> verifying there as well, at that time the cfun is NULL, no >> issue is found. And function store_parm_decls allocates >> struct_function for foo then, cfun becomes function struct >> for foo, when c parser continues to handle the decl bar in >> foo, function handle_optimize_attribute works as before, >> tries to restore the target options at the end, it calls >> targetm.target_option.restore (rs6000_function_specific_restore) >> which calls function rs6000_option_override_internal again, >> at this time the cfun is not NULL while there is no cgraph >> node for its decl, optimize_function_for_speed_p returns true >> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag >> rs6000_isa_flags set unexpectedly. It becomes inconsistent >> as the one saved previously. >> >> IMHO, both contexts of global and function decl foo here hold >> optimize_size, function optimize_function_for_speed_p should >> not return true anyway. >> >> btw, the aarch64 failed case also gets the unexpected >> result for optimize_function_for_speed_p during c parsing >> (fold_range_test <- ... <- c_parser_condition). >> >> IIUC, in parsing time we don't have the profile information >> available. >> > > Hi Honza, > > Does the above explanation sound reasonable to you? > Hi Honza, Gentle ping ... BR, Kewen
on 2022/8/15 16:33, Kewen.Lin via Gcc-patches wrote: > on 2022/7/11 11:42, Kewen.Lin wrote: >> on 2022/6/15 14:20, Kewen.Lin wrote: >>> Hi Honza, >>> >>> Thanks for the comments! Some replies are inlined below. >>> >>> on 2022/6/14 19:37, Jan Hubicka wrote: >>>>> Hi, >>>>> >>>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO >>>>> if func->decl is not null but no cgraph node is available for it. >>>>> As PR105818 shows, this could give unexpected result. For the >>>>> case in PR105818, when parsing bar decl in function foo, the cfun >>>>> is a function structure for foo, for which there is none cgraph >>>>> node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since >>>>> the context is to optimize for size, the flag optimize_size is >>>>> true. >>>>> >>>>> The patch is to make optimize_function_for_size_p to check >>>>> optimize_size as what it does when func->decl is unavailable. >>>>> >>>>> One regression failure got exposed on aarch64-linux-gnu: >>>>> >>>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ >>>>> -DPREVENT_OPTIMIZATION line 21 x == 10 - i >>>>> >>>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT >>>>> used in function fold_range_test during c parsing, it uses >>>>> optimize_function_for_speed_p which is equal to the invertion >>>>> of optimize_function_for_size_p. At that time cfun->decl is valid >>>>> but no cgraph node for it, w/o this patch function >>>>> optimize_function_for_speed_p returns true eventually, while it >>>>> returns false with this patch. Since the command line option -Os >>>>> is specified, there is no reason to interpret it as "for speed". >>>>> I think this failure is expected and adjust the test case >>>>> accordingly. >>>>> >>>>> Is it ok for trunk? >>>>> >>>>> BR, >>>>> Kewen >>>>> ----- >>>>> >>>>> PR target/105818 >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * predict.cc (optimize_function_for_size_p): Check optimize_size when >>>>> func->decl is valid but its cgraph node is unavailable. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> * gcc.target/powerpc/pr105818.c: New test. >>>>> * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. >>>>> --- >>>>> gcc/predict.cc | 2 +- >>>>> gcc/testsuite/gcc.dg/guality/pr54693-2.c | 2 +- >>>>> gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++ >>>>> 3 files changed, 11 insertions(+), 2 deletions(-) >>>>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c >>>>> >>>>> diff --git a/gcc/predict.cc b/gcc/predict.cc >>>>> index 5734e4c8516..6c60a973236 100644 >>>>> --- a/gcc/predict.cc >>>>> +++ b/gcc/predict.cc >>>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) >>>>> cgraph_node *n = cgraph_node::get (fun->decl); >>>>> if (n) >>>>> return n->optimize_for_size_p (); >>>>> - return OPTIMIZE_SIZE_NO; >>>>> + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; >>>> >>>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is >>>> probably better since one can change optimize_size with optimization >>>> attribute. >>> >>> Good point, agree! >>> >>>> However I think in most cases we check for optimize_size early I think >>>> we are doing something wrong, since at that time htere is no profile >>>> available. Why exactly PR105818 hits the flag change issue? >>> >>> For PR105818, the reason why the flag changs is that: >>> >>> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit >>> of rs6000_isa_flags_explicit, it's set as below: >>> >>> /* If we can shrink-wrap the TOC register save separately, then use >>> -msave-toc-indirect unless explicitly disabled. */ >>> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 >>> && flag_shrink_wrap_separate >>> && optimize_function_for_speed_p (cfun)) >>> rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; >>> >>> Initially, rs6000 initialize target_option_default_node with >>> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL >>> and optimize_size is true. >>> >>> Later, when c parser handling function foo, it builds target >>> option node as target_option_default_node in function >>> handle_optimize_attribute, it does global option saving and >>> verifying there as well, at that time the cfun is NULL, no >>> issue is found. And function store_parm_decls allocates >>> struct_function for foo then, cfun becomes function struct >>> for foo, when c parser continues to handle the decl bar in >>> foo, function handle_optimize_attribute works as before, >>> tries to restore the target options at the end, it calls >>> targetm.target_option.restore (rs6000_function_specific_restore) >>> which calls function rs6000_option_override_internal again, >>> at this time the cfun is not NULL while there is no cgraph >>> node for its decl, optimize_function_for_speed_p returns true >>> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag >>> rs6000_isa_flags set unexpectedly. It becomes inconsistent >>> as the one saved previously. >>> >>> IMHO, both contexts of global and function decl foo here hold >>> optimize_size, function optimize_function_for_speed_p should >>> not return true anyway. >>> >>> btw, the aarch64 failed case also gets the unexpected >>> result for optimize_function_for_speed_p during c parsing >>> (fold_range_test <- ... <- c_parser_condition). >>> >>> IIUC, in parsing time we don't have the profile information >>> available. >>> >> >> Hi Honza, >> >> Does the above explanation sound reasonable to you? >> > Hi Honza, Gentle ping again ... BR, Kewen
on 2022/8/29 14:35, Kewen.Lin via Gcc-patches wrote: > on 2022/8/15 16:33, Kewen.Lin via Gcc-patches wrote: >> on 2022/7/11 11:42, Kewen.Lin wrote: >>> on 2022/6/15 14:20, Kewen.Lin wrote: >>>> Hi Honza, >>>> >>>> Thanks for the comments! Some replies are inlined below. >>>> >>>> on 2022/6/14 19:37, Jan Hubicka wrote: >>>>>> Hi, >>>>>> >>>>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO >>>>>> if func->decl is not null but no cgraph node is available for it. >>>>>> As PR105818 shows, this could give unexpected result. For the >>>>>> case in PR105818, when parsing bar decl in function foo, the cfun >>>>>> is a function structure for foo, for which there is none cgraph >>>>>> node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since >>>>>> the context is to optimize for size, the flag optimize_size is >>>>>> true. >>>>>> >>>>>> The patch is to make optimize_function_for_size_p to check >>>>>> optimize_size as what it does when func->decl is unavailable. >>>>>> >>>>>> One regression failure got exposed on aarch64-linux-gnu: >>>>>> >>>>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ >>>>>> -DPREVENT_OPTIMIZATION line 21 x == 10 - i >>>>>> >>>>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT >>>>>> used in function fold_range_test during c parsing, it uses >>>>>> optimize_function_for_speed_p which is equal to the invertion >>>>>> of optimize_function_for_size_p. At that time cfun->decl is valid >>>>>> but no cgraph node for it, w/o this patch function >>>>>> optimize_function_for_speed_p returns true eventually, while it >>>>>> returns false with this patch. Since the command line option -Os >>>>>> is specified, there is no reason to interpret it as "for speed". >>>>>> I think this failure is expected and adjust the test case >>>>>> accordingly. >>>>>> >>>>>> Is it ok for trunk? >>>>>> >>>>>> BR, >>>>>> Kewen >>>>>> ----- >>>>>> >>>>>> PR target/105818 >>>>>> >>>>>> gcc/ChangeLog: >>>>>> >>>>>> * predict.cc (optimize_function_for_size_p): Check optimize_size when >>>>>> func->decl is valid but its cgraph node is unavailable. >>>>>> >>>>>> gcc/testsuite/ChangeLog: >>>>>> >>>>>> * gcc.target/powerpc/pr105818.c: New test. >>>>>> * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. >>>>>> --- >>>>>> gcc/predict.cc | 2 +- >>>>>> gcc/testsuite/gcc.dg/guality/pr54693-2.c | 2 +- >>>>>> gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++ >>>>>> 3 files changed, 11 insertions(+), 2 deletions(-) >>>>>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c >>>>>> >>>>>> diff --git a/gcc/predict.cc b/gcc/predict.cc >>>>>> index 5734e4c8516..6c60a973236 100644 >>>>>> --- a/gcc/predict.cc >>>>>> +++ b/gcc/predict.cc >>>>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) >>>>>> cgraph_node *n = cgraph_node::get (fun->decl); >>>>>> if (n) >>>>>> return n->optimize_for_size_p (); >>>>>> - return OPTIMIZE_SIZE_NO; >>>>>> + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; >>>>> >>>>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is >>>>> probably better since one can change optimize_size with optimization >>>>> attribute. >>>> >>>> Good point, agree! >>>> >>>>> However I think in most cases we check for optimize_size early I think >>>>> we are doing something wrong, since at that time htere is no profile >>>>> available. Why exactly PR105818 hits the flag change issue? >>>> >>>> For PR105818, the reason why the flag changs is that: >>>> >>>> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit >>>> of rs6000_isa_flags_explicit, it's set as below: >>>> >>>> /* If we can shrink-wrap the TOC register save separately, then use >>>> -msave-toc-indirect unless explicitly disabled. */ >>>> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 >>>> && flag_shrink_wrap_separate >>>> && optimize_function_for_speed_p (cfun)) >>>> rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; >>>> >>>> Initially, rs6000 initialize target_option_default_node with >>>> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL >>>> and optimize_size is true. >>>> >>>> Later, when c parser handling function foo, it builds target >>>> option node as target_option_default_node in function >>>> handle_optimize_attribute, it does global option saving and >>>> verifying there as well, at that time the cfun is NULL, no >>>> issue is found. And function store_parm_decls allocates >>>> struct_function for foo then, cfun becomes function struct >>>> for foo, when c parser continues to handle the decl bar in >>>> foo, function handle_optimize_attribute works as before, >>>> tries to restore the target options at the end, it calls >>>> targetm.target_option.restore (rs6000_function_specific_restore) >>>> which calls function rs6000_option_override_internal again, >>>> at this time the cfun is not NULL while there is no cgraph >>>> node for its decl, optimize_function_for_speed_p returns true >>>> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag >>>> rs6000_isa_flags set unexpectedly. It becomes inconsistent >>>> as the one saved previously. >>>> >>>> IMHO, both contexts of global and function decl foo here hold >>>> optimize_size, function optimize_function_for_speed_p should >>>> not return true anyway. >>>> >>>> btw, the aarch64 failed case also gets the unexpected >>>> result for optimize_function_for_speed_p during c parsing >>>> (fold_range_test <- ... <- c_parser_condition). >>>> >>>> IIUC, in parsing time we don't have the profile information >>>> available. >>>> >>> >>> Hi Honza, >>> >>> Does the above explanation sound reasonable to you? >>> Hi Honza, Gentle ping^4 ... BR, Kewen
diff --git a/gcc/predict.cc b/gcc/predict.cc index 5734e4c8516..6c60a973236 100644 --- a/gcc/predict.cc +++ b/gcc/predict.cc @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) cgraph_node *n = cgraph_node::get (fun->decl); if (n) return n->optimize_for_size_p (); - return OPTIMIZE_SIZE_NO; + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; } /* Return true if function FUN should always be optimized for speed. */ diff --git a/gcc/testsuite/gcc.dg/guality/pr54693-2.c b/gcc/testsuite/gcc.dg/guality/pr54693-2.c index 68aa6c63d71..14ca94ad37d 100644 --- a/gcc/testsuite/gcc.dg/guality/pr54693-2.c +++ b/gcc/testsuite/gcc.dg/guality/pr54693-2.c @@ -17,7 +17,7 @@ foo (int x, int y, int z) int i = 0; while (x > 3 && y > 3 && z > 3) { /* { dg-final { gdb-test .+2 "i" "v + 1" } } */ - /* { dg-final { gdb-test .+1 "x" "10 - i" } } */ + /* { dg-final { gdb-test .+1 "x" "10 - i" { xfail { aarch64*-*-* && { any-opts "-Os" } } } } } */ bar (i); /* { dg-final { gdb-test . "y" "20 - 2 * i" } } */ /* { dg-final { gdb-test .-1 "z" "30 - 3 * i" { xfail { aarch64*-*-* && { any-opts "-fno-fat-lto-objects" "-Os" } } } } } */ i++, x--, y -= 2, z -= 3; diff --git a/gcc/testsuite/gcc.target/powerpc/pr105818.c b/gcc/testsuite/gcc.target/powerpc/pr105818.c new file mode 100644 index 00000000000..18781edbf9e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr105818.c @@ -0,0 +1,9 @@ +/* { dg-options "-Os -fno-tree-vectorize" } */ + +#pragma GCC optimize "-fno-tree-vectorize" + +void +foo (void) +{ + void bar (void); +}