predict: Adjust optimize_function_for_size_p [PR105818]

Message ID 23b4998b-bbe6-b052-d7f5-5304ee0f46a3@linux.ibm.com
State New
Headers
Series predict: Adjust optimize_function_for_size_p [PR105818] |

Commit Message

Kewen.Lin June 14, 2022, 7:57 a.m. UTC
  Hi,

Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
if func->decl is not null but no cgraph node is available for it.
As PR105818 shows, this could give unexpected result.  For the
case in PR105818, when parsing bar decl in function foo, the cfun
is a function structure for foo, for which there is none cgraph
node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
the context is to optimize for size, the flag optimize_size is
true.

The patch is to make optimize_function_for_size_p to check
optimize_size as what it does when func->decl is unavailable.

One regression failure got exposed on aarch64-linux-gnu:

PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i

The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
used in function fold_range_test during c parsing, it uses
optimize_function_for_speed_p which is equal to the invertion
of optimize_function_for_size_p.  At that time cfun->decl is valid
but no cgraph node for it, w/o this patch function
optimize_function_for_speed_p returns true eventually, while it
returns false with this patch.  Since the command line option -Os
is specified, there is no reason to interpret it as "for speed".
I think this failure is expected and adjust the test case
accordingly.

Is it ok for trunk?

BR,
Kewen
-----

	PR target/105818

gcc/ChangeLog:

	* predict.cc (optimize_function_for_size_p): Check optimize_size when
	func->decl is valid but its cgraph node is unavailable.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr105818.c: New test.
	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
---
 gcc/predict.cc                              | 2 +-
 gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
 gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
 3 files changed, 11 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c

--
2.27.0
  

Comments

Jan Hubicka June 14, 2022, 11:37 a.m. UTC | #1
> Hi,
> 
> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
> if func->decl is not null but no cgraph node is available for it.
> As PR105818 shows, this could give unexpected result.  For the
> case in PR105818, when parsing bar decl in function foo, the cfun
> is a function structure for foo, for which there is none cgraph
> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
> the context is to optimize for size, the flag optimize_size is
> true.
> 
> The patch is to make optimize_function_for_size_p to check
> optimize_size as what it does when func->decl is unavailable.
> 
> One regression failure got exposed on aarch64-linux-gnu:
> 
> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
> 	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
> 
> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
> used in function fold_range_test during c parsing, it uses
> optimize_function_for_speed_p which is equal to the invertion
> of optimize_function_for_size_p.  At that time cfun->decl is valid
> but no cgraph node for it, w/o this patch function
> optimize_function_for_speed_p returns true eventually, while it
> returns false with this patch.  Since the command line option -Os
> is specified, there is no reason to interpret it as "for speed".
> I think this failure is expected and adjust the test case
> accordingly.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -----
> 
> 	PR target/105818
> 
> gcc/ChangeLog:
> 
> 	* predict.cc (optimize_function_for_size_p): Check optimize_size when
> 	func->decl is valid but its cgraph node is unavailable.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/powerpc/pr105818.c: New test.
> 	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
> ---
>  gcc/predict.cc                              | 2 +-
>  gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
>  3 files changed, 11 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
> 
> diff --git a/gcc/predict.cc b/gcc/predict.cc
> index 5734e4c8516..6c60a973236 100644
> --- a/gcc/predict.cc
> +++ b/gcc/predict.cc
> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>    cgraph_node *n = cgraph_node::get (fun->decl);
>    if (n)
>      return n->optimize_for_size_p ();
> -  return OPTIMIZE_SIZE_NO;
> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;

We could also do (opt_for_fn (cfun->decl, optimize_size) that is
probably better since one can change optimize_size with optimization
attribute.
However I think in most cases we check for optimize_size early I think
we are doing something wrong, since at that time htere is no profile
available.  Why exactly PR105818 hits the flag change issue?

Honza
>  }
> 
>  /* Return true if function FUN should always be optimized for speed.  */
> diff --git a/gcc/testsuite/gcc.dg/guality/pr54693-2.c b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
> index 68aa6c63d71..14ca94ad37d 100644
> --- a/gcc/testsuite/gcc.dg/guality/pr54693-2.c
> +++ b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
> @@ -17,7 +17,7 @@ foo (int x, int y, int z)
>    int i = 0;
>    while (x > 3 && y > 3 && z > 3)
>      {		/* { dg-final { gdb-test .+2 "i" "v + 1" } } */
> -		/* { dg-final { gdb-test .+1 "x" "10 - i" } } */
> +		/* { dg-final { gdb-test .+1 "x" "10 - i" { xfail { aarch64*-*-* && { any-opts "-Os" } } } } } */
>        bar (i);	/* { dg-final { gdb-test . "y" "20 - 2 * i" } } */
>  		/* { dg-final { gdb-test .-1 "z" "30 - 3 * i" { xfail { aarch64*-*-* && { any-opts "-fno-fat-lto-objects" "-Os" } } } } } */
>        i++, x--, y -= 2, z -= 3;
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr105818.c b/gcc/testsuite/gcc.target/powerpc/pr105818.c
> new file mode 100644
> index 00000000000..18781edbf9e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr105818.c
> @@ -0,0 +1,9 @@
> +/* { dg-options "-Os -fno-tree-vectorize" } */
> +
> +#pragma GCC optimize "-fno-tree-vectorize"
> +
> +void
> +foo (void)
> +{
> +  void bar (void);
> +}
> --
> 2.27.0
  
Segher Boessenkool June 14, 2022, 12:53 p.m. UTC | #2
On Tue, Jun 14, 2022 at 03:57:08PM +0800, Kewen.Lin wrote:
> 	PR target/105818

Change this to something else?  It is not a target bug.  "middle-end"
perhaps?

Thanks for fixing this,


Segher
  
Kewen.Lin June 15, 2022, 6:20 a.m. UTC | #3
Hi Honza,

Thanks for the comments!  Some replies are inlined below.

on 2022/6/14 19:37, Jan Hubicka wrote:
>> Hi,
>>
>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
>> if func->decl is not null but no cgraph node is available for it.
>> As PR105818 shows, this could give unexpected result.  For the
>> case in PR105818, when parsing bar decl in function foo, the cfun
>> is a function structure for foo, for which there is none cgraph
>> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
>> the context is to optimize for size, the flag optimize_size is
>> true.
>>
>> The patch is to make optimize_function_for_size_p to check
>> optimize_size as what it does when func->decl is unavailable.
>>
>> One regression failure got exposed on aarch64-linux-gnu:
>>
>> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
>> 	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
>>
>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
>> used in function fold_range_test during c parsing, it uses
>> optimize_function_for_speed_p which is equal to the invertion
>> of optimize_function_for_size_p.  At that time cfun->decl is valid
>> but no cgraph node for it, w/o this patch function
>> optimize_function_for_speed_p returns true eventually, while it
>> returns false with this patch.  Since the command line option -Os
>> is specified, there is no reason to interpret it as "for speed".
>> I think this failure is expected and adjust the test case
>> accordingly.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -----
>>
>> 	PR target/105818
>>
>> gcc/ChangeLog:
>>
>> 	* predict.cc (optimize_function_for_size_p): Check optimize_size when
>> 	func->decl is valid but its cgraph node is unavailable.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.target/powerpc/pr105818.c: New test.
>> 	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
>> ---
>>  gcc/predict.cc                              | 2 +-
>>  gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
>>  3 files changed, 11 insertions(+), 2 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
>>
>> diff --git a/gcc/predict.cc b/gcc/predict.cc
>> index 5734e4c8516..6c60a973236 100644
>> --- a/gcc/predict.cc
>> +++ b/gcc/predict.cc
>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>>    cgraph_node *n = cgraph_node::get (fun->decl);
>>    if (n)
>>      return n->optimize_for_size_p ();
>> -  return OPTIMIZE_SIZE_NO;
>> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
> 
> We could also do (opt_for_fn (cfun->decl, optimize_size) that is
> probably better since one can change optimize_size with optimization
> attribute.

Good point, agree!

> However I think in most cases we check for optimize_size early I think
> we are doing something wrong, since at that time htere is no profile
> available.  Why exactly PR105818 hits the flag change issue?

For PR105818, the reason why the flag changs is that:

Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit
of rs6000_isa_flags_explicit, it's set as below:

/* If we can shrink-wrap the TOC register save separately, then use
   -msave-toc-indirect unless explicitly disabled.  */
if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
    && flag_shrink_wrap_separate
    && optimize_function_for_speed_p (cfun))
  rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;

Initially, rs6000 initialize target_option_default_node with
OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL
and optimize_size is true.

Later, when c parser handling function foo, it builds target
option node as target_option_default_node in function
handle_optimize_attribute, it does global option saving and
verifying there as well, at that time the cfun is NULL, no
issue is found.  And function store_parm_decls allocates
struct_function for foo then, cfun becomes function struct
for foo, when c parser continues to handle the decl bar in
foo, function handle_optimize_attribute works as before,
tries to restore the target options at the end, it calls
targetm.target_option.restore (rs6000_function_specific_restore)
which calls function rs6000_option_override_internal again,
at this time the cfun is not NULL while there is no cgraph
node for its decl, optimize_function_for_speed_p returns true
and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag
rs6000_isa_flags set unexpectedly.  It becomes inconsistent
as the one saved previously.

IMHO, both contexts of global and function decl foo here hold
optimize_size, function optimize_function_for_speed_p should
not return true anyway.

btw, the aarch64 failed case also gets the unexpected
result for optimize_function_for_speed_p during c parsing
(fold_range_test <- ... <- c_parser_condition).

IIUC, in parsing time we don't have the profile information
available.

BR,
Kewen
  
Kewen.Lin June 15, 2022, 6:21 a.m. UTC | #4
on 2022/6/14 20:53, Segher Boessenkool wrote:
> On Tue, Jun 14, 2022 at 03:57:08PM +0800, Kewen.Lin wrote:
>> 	PR target/105818
> 
> Change this to something else?  It is not a target bug.  "middle-end"
> perhaps?
> 

Good catch, will fix this.  Thanks Segher!

BR,
Kewen
  
Kewen.Lin July 11, 2022, 3:42 a.m. UTC | #5
on 2022/6/15 14:20, Kewen.Lin wrote:
> Hi Honza,
> 
> Thanks for the comments!  Some replies are inlined below.
> 
> on 2022/6/14 19:37, Jan Hubicka wrote:
>>> Hi,
>>>
>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
>>> if func->decl is not null but no cgraph node is available for it.
>>> As PR105818 shows, this could give unexpected result.  For the
>>> case in PR105818, when parsing bar decl in function foo, the cfun
>>> is a function structure for foo, for which there is none cgraph
>>> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
>>> the context is to optimize for size, the flag optimize_size is
>>> true.
>>>
>>> The patch is to make optimize_function_for_size_p to check
>>> optimize_size as what it does when func->decl is unavailable.
>>>
>>> One regression failure got exposed on aarch64-linux-gnu:
>>>
>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
>>> 	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
>>>
>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
>>> used in function fold_range_test during c parsing, it uses
>>> optimize_function_for_speed_p which is equal to the invertion
>>> of optimize_function_for_size_p.  At that time cfun->decl is valid
>>> but no cgraph node for it, w/o this patch function
>>> optimize_function_for_speed_p returns true eventually, while it
>>> returns false with this patch.  Since the command line option -Os
>>> is specified, there is no reason to interpret it as "for speed".
>>> I think this failure is expected and adjust the test case
>>> accordingly.
>>>
>>> Is it ok for trunk?
>>>
>>> BR,
>>> Kewen
>>> -----
>>>
>>> 	PR target/105818
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* predict.cc (optimize_function_for_size_p): Check optimize_size when
>>> 	func->decl is valid but its cgraph node is unavailable.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	* gcc.target/powerpc/pr105818.c: New test.
>>> 	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
>>> ---
>>>  gcc/predict.cc                              | 2 +-
>>>  gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
>>>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
>>>  3 files changed, 11 insertions(+), 2 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
>>>
>>> diff --git a/gcc/predict.cc b/gcc/predict.cc
>>> index 5734e4c8516..6c60a973236 100644
>>> --- a/gcc/predict.cc
>>> +++ b/gcc/predict.cc
>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>>>    cgraph_node *n = cgraph_node::get (fun->decl);
>>>    if (n)
>>>      return n->optimize_for_size_p ();
>>> -  return OPTIMIZE_SIZE_NO;
>>> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
>>
>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is
>> probably better since one can change optimize_size with optimization
>> attribute.
> 
> Good point, agree!
> 
>> However I think in most cases we check for optimize_size early I think
>> we are doing something wrong, since at that time htere is no profile
>> available.  Why exactly PR105818 hits the flag change issue?
> 
> For PR105818, the reason why the flag changs is that:
> 
> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit
> of rs6000_isa_flags_explicit, it's set as below:
> 
> /* If we can shrink-wrap the TOC register save separately, then use
>    -msave-toc-indirect unless explicitly disabled.  */
> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
>     && flag_shrink_wrap_separate
>     && optimize_function_for_speed_p (cfun))
>   rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
> 
> Initially, rs6000 initialize target_option_default_node with
> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL
> and optimize_size is true.
> 
> Later, when c parser handling function foo, it builds target
> option node as target_option_default_node in function
> handle_optimize_attribute, it does global option saving and
> verifying there as well, at that time the cfun is NULL, no
> issue is found.  And function store_parm_decls allocates
> struct_function for foo then, cfun becomes function struct
> for foo, when c parser continues to handle the decl bar in
> foo, function handle_optimize_attribute works as before,
> tries to restore the target options at the end, it calls
> targetm.target_option.restore (rs6000_function_specific_restore)
> which calls function rs6000_option_override_internal again,
> at this time the cfun is not NULL while there is no cgraph
> node for its decl, optimize_function_for_speed_p returns true
> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag
> rs6000_isa_flags set unexpectedly.  It becomes inconsistent
> as the one saved previously.
> 
> IMHO, both contexts of global and function decl foo here hold
> optimize_size, function optimize_function_for_speed_p should
> not return true anyway.
> 
> btw, the aarch64 failed case also gets the unexpected
> result for optimize_function_for_speed_p during c parsing
> (fold_range_test <- ... <- c_parser_condition).
> 
> IIUC, in parsing time we don't have the profile information
> available.
> 

Hi Honza,

Does the above explanation sound reasonable to you?

BR,
Kewen
  
Kewen.Lin Aug. 15, 2022, 8:33 a.m. UTC | #6
on 2022/7/11 11:42, Kewen.Lin wrote:
> on 2022/6/15 14:20, Kewen.Lin wrote:
>> Hi Honza,
>>
>> Thanks for the comments!  Some replies are inlined below.
>>
>> on 2022/6/14 19:37, Jan Hubicka wrote:
>>>> Hi,
>>>>
>>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
>>>> if func->decl is not null but no cgraph node is available for it.
>>>> As PR105818 shows, this could give unexpected result.  For the
>>>> case in PR105818, when parsing bar decl in function foo, the cfun
>>>> is a function structure for foo, for which there is none cgraph
>>>> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
>>>> the context is to optimize for size, the flag optimize_size is
>>>> true.
>>>>
>>>> The patch is to make optimize_function_for_size_p to check
>>>> optimize_size as what it does when func->decl is unavailable.
>>>>
>>>> One regression failure got exposed on aarch64-linux-gnu:
>>>>
>>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
>>>> 	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
>>>>
>>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
>>>> used in function fold_range_test during c parsing, it uses
>>>> optimize_function_for_speed_p which is equal to the invertion
>>>> of optimize_function_for_size_p.  At that time cfun->decl is valid
>>>> but no cgraph node for it, w/o this patch function
>>>> optimize_function_for_speed_p returns true eventually, while it
>>>> returns false with this patch.  Since the command line option -Os
>>>> is specified, there is no reason to interpret it as "for speed".
>>>> I think this failure is expected and adjust the test case
>>>> accordingly.
>>>>
>>>> Is it ok for trunk?
>>>>
>>>> BR,
>>>> Kewen
>>>> -----
>>>>
>>>> 	PR target/105818
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 	* predict.cc (optimize_function_for_size_p): Check optimize_size when
>>>> 	func->decl is valid but its cgraph node is unavailable.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 	* gcc.target/powerpc/pr105818.c: New test.
>>>> 	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
>>>> ---
>>>>  gcc/predict.cc                              | 2 +-
>>>>  gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
>>>>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
>>>>  3 files changed, 11 insertions(+), 2 deletions(-)
>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
>>>>
>>>> diff --git a/gcc/predict.cc b/gcc/predict.cc
>>>> index 5734e4c8516..6c60a973236 100644
>>>> --- a/gcc/predict.cc
>>>> +++ b/gcc/predict.cc
>>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>>>>    cgraph_node *n = cgraph_node::get (fun->decl);
>>>>    if (n)
>>>>      return n->optimize_for_size_p ();
>>>> -  return OPTIMIZE_SIZE_NO;
>>>> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
>>>
>>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is
>>> probably better since one can change optimize_size with optimization
>>> attribute.
>>
>> Good point, agree!
>>
>>> However I think in most cases we check for optimize_size early I think
>>> we are doing something wrong, since at that time htere is no profile
>>> available.  Why exactly PR105818 hits the flag change issue?
>>
>> For PR105818, the reason why the flag changs is that:
>>
>> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit
>> of rs6000_isa_flags_explicit, it's set as below:
>>
>> /* If we can shrink-wrap the TOC register save separately, then use
>>    -msave-toc-indirect unless explicitly disabled.  */
>> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
>>     && flag_shrink_wrap_separate
>>     && optimize_function_for_speed_p (cfun))
>>   rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>>
>> Initially, rs6000 initialize target_option_default_node with
>> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL
>> and optimize_size is true.
>>
>> Later, when c parser handling function foo, it builds target
>> option node as target_option_default_node in function
>> handle_optimize_attribute, it does global option saving and
>> verifying there as well, at that time the cfun is NULL, no
>> issue is found.  And function store_parm_decls allocates
>> struct_function for foo then, cfun becomes function struct
>> for foo, when c parser continues to handle the decl bar in
>> foo, function handle_optimize_attribute works as before,
>> tries to restore the target options at the end, it calls
>> targetm.target_option.restore (rs6000_function_specific_restore)
>> which calls function rs6000_option_override_internal again,
>> at this time the cfun is not NULL while there is no cgraph
>> node for its decl, optimize_function_for_speed_p returns true
>> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag
>> rs6000_isa_flags set unexpectedly.  It becomes inconsistent
>> as the one saved previously.
>>
>> IMHO, both contexts of global and function decl foo here hold
>> optimize_size, function optimize_function_for_speed_p should
>> not return true anyway.
>>
>> btw, the aarch64 failed case also gets the unexpected
>> result for optimize_function_for_speed_p during c parsing
>> (fold_range_test <- ... <- c_parser_condition).
>>
>> IIUC, in parsing time we don't have the profile information
>> available.
>>
> 
> Hi Honza,
> 
> Does the above explanation sound reasonable to you?
> 

Hi Honza,

Gentle ping ...

BR,
Kewen
  
Kewen.Lin Aug. 29, 2022, 6:35 a.m. UTC | #7
on 2022/8/15 16:33, Kewen.Lin via Gcc-patches wrote:
> on 2022/7/11 11:42, Kewen.Lin wrote:
>> on 2022/6/15 14:20, Kewen.Lin wrote:
>>> Hi Honza,
>>>
>>> Thanks for the comments!  Some replies are inlined below.
>>>
>>> on 2022/6/14 19:37, Jan Hubicka wrote:
>>>>> Hi,
>>>>>
>>>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
>>>>> if func->decl is not null but no cgraph node is available for it.
>>>>> As PR105818 shows, this could give unexpected result.  For the
>>>>> case in PR105818, when parsing bar decl in function foo, the cfun
>>>>> is a function structure for foo, for which there is none cgraph
>>>>> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
>>>>> the context is to optimize for size, the flag optimize_size is
>>>>> true.
>>>>>
>>>>> The patch is to make optimize_function_for_size_p to check
>>>>> optimize_size as what it does when func->decl is unavailable.
>>>>>
>>>>> One regression failure got exposed on aarch64-linux-gnu:
>>>>>
>>>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
>>>>> 	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
>>>>>
>>>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
>>>>> used in function fold_range_test during c parsing, it uses
>>>>> optimize_function_for_speed_p which is equal to the invertion
>>>>> of optimize_function_for_size_p.  At that time cfun->decl is valid
>>>>> but no cgraph node for it, w/o this patch function
>>>>> optimize_function_for_speed_p returns true eventually, while it
>>>>> returns false with this patch.  Since the command line option -Os
>>>>> is specified, there is no reason to interpret it as "for speed".
>>>>> I think this failure is expected and adjust the test case
>>>>> accordingly.
>>>>>
>>>>> Is it ok for trunk?
>>>>>
>>>>> BR,
>>>>> Kewen
>>>>> -----
>>>>>
>>>>> 	PR target/105818
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> 	* predict.cc (optimize_function_for_size_p): Check optimize_size when
>>>>> 	func->decl is valid but its cgraph node is unavailable.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> 	* gcc.target/powerpc/pr105818.c: New test.
>>>>> 	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
>>>>> ---
>>>>>  gcc/predict.cc                              | 2 +-
>>>>>  gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
>>>>>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
>>>>>  3 files changed, 11 insertions(+), 2 deletions(-)
>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
>>>>>
>>>>> diff --git a/gcc/predict.cc b/gcc/predict.cc
>>>>> index 5734e4c8516..6c60a973236 100644
>>>>> --- a/gcc/predict.cc
>>>>> +++ b/gcc/predict.cc
>>>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>>>>>    cgraph_node *n = cgraph_node::get (fun->decl);
>>>>>    if (n)
>>>>>      return n->optimize_for_size_p ();
>>>>> -  return OPTIMIZE_SIZE_NO;
>>>>> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
>>>>
>>>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is
>>>> probably better since one can change optimize_size with optimization
>>>> attribute.
>>>
>>> Good point, agree!
>>>
>>>> However I think in most cases we check for optimize_size early I think
>>>> we are doing something wrong, since at that time htere is no profile
>>>> available.  Why exactly PR105818 hits the flag change issue?
>>>
>>> For PR105818, the reason why the flag changs is that:
>>>
>>> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit
>>> of rs6000_isa_flags_explicit, it's set as below:
>>>
>>> /* If we can shrink-wrap the TOC register save separately, then use
>>>    -msave-toc-indirect unless explicitly disabled.  */
>>> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
>>>     && flag_shrink_wrap_separate
>>>     && optimize_function_for_speed_p (cfun))
>>>   rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>>>
>>> Initially, rs6000 initialize target_option_default_node with
>>> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL
>>> and optimize_size is true.
>>>
>>> Later, when c parser handling function foo, it builds target
>>> option node as target_option_default_node in function
>>> handle_optimize_attribute, it does global option saving and
>>> verifying there as well, at that time the cfun is NULL, no
>>> issue is found.  And function store_parm_decls allocates
>>> struct_function for foo then, cfun becomes function struct
>>> for foo, when c parser continues to handle the decl bar in
>>> foo, function handle_optimize_attribute works as before,
>>> tries to restore the target options at the end, it calls
>>> targetm.target_option.restore (rs6000_function_specific_restore)
>>> which calls function rs6000_option_override_internal again,
>>> at this time the cfun is not NULL while there is no cgraph
>>> node for its decl, optimize_function_for_speed_p returns true
>>> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag
>>> rs6000_isa_flags set unexpectedly.  It becomes inconsistent
>>> as the one saved previously.
>>>
>>> IMHO, both contexts of global and function decl foo here hold
>>> optimize_size, function optimize_function_for_speed_p should
>>> not return true anyway.
>>>
>>> btw, the aarch64 failed case also gets the unexpected
>>> result for optimize_function_for_speed_p during c parsing
>>> (fold_range_test <- ... <- c_parser_condition).
>>>
>>> IIUC, in parsing time we don't have the profile information
>>> available.
>>>
>>
>> Hi Honza,
>>
>> Does the above explanation sound reasonable to you?
>>
> 

Hi Honza,

Gentle ping again ...

BR,
Kewen
  
Kewen.Lin Sept. 28, 2022, 5:46 a.m. UTC | #8
on 2022/8/29 14:35, Kewen.Lin via Gcc-patches wrote:
> on 2022/8/15 16:33, Kewen.Lin via Gcc-patches wrote:
>> on 2022/7/11 11:42, Kewen.Lin wrote:
>>> on 2022/6/15 14:20, Kewen.Lin wrote:
>>>> Hi Honza,
>>>>
>>>> Thanks for the comments!  Some replies are inlined below.
>>>>
>>>> on 2022/6/14 19:37, Jan Hubicka wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
>>>>>> if func->decl is not null but no cgraph node is available for it.
>>>>>> As PR105818 shows, this could give unexpected result.  For the
>>>>>> case in PR105818, when parsing bar decl in function foo, the cfun
>>>>>> is a function structure for foo, for which there is none cgraph
>>>>>> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
>>>>>> the context is to optimize for size, the flag optimize_size is
>>>>>> true.
>>>>>>
>>>>>> The patch is to make optimize_function_for_size_p to check
>>>>>> optimize_size as what it does when func->decl is unavailable.
>>>>>>
>>>>>> One regression failure got exposed on aarch64-linux-gnu:
>>>>>>
>>>>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
>>>>>> 	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
>>>>>>
>>>>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
>>>>>> used in function fold_range_test during c parsing, it uses
>>>>>> optimize_function_for_speed_p which is equal to the invertion
>>>>>> of optimize_function_for_size_p.  At that time cfun->decl is valid
>>>>>> but no cgraph node for it, w/o this patch function
>>>>>> optimize_function_for_speed_p returns true eventually, while it
>>>>>> returns false with this patch.  Since the command line option -Os
>>>>>> is specified, there is no reason to interpret it as "for speed".
>>>>>> I think this failure is expected and adjust the test case
>>>>>> accordingly.
>>>>>>
>>>>>> Is it ok for trunk?
>>>>>>
>>>>>> BR,
>>>>>> Kewen
>>>>>> -----
>>>>>>
>>>>>> 	PR target/105818
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>> 	* predict.cc (optimize_function_for_size_p): Check optimize_size when
>>>>>> 	func->decl is valid but its cgraph node is unavailable.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>> 	* gcc.target/powerpc/pr105818.c: New test.
>>>>>> 	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
>>>>>> ---
>>>>>>  gcc/predict.cc                              | 2 +-
>>>>>>  gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
>>>>>>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
>>>>>>  3 files changed, 11 insertions(+), 2 deletions(-)
>>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
>>>>>>
>>>>>> diff --git a/gcc/predict.cc b/gcc/predict.cc
>>>>>> index 5734e4c8516..6c60a973236 100644
>>>>>> --- a/gcc/predict.cc
>>>>>> +++ b/gcc/predict.cc
>>>>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>>>>>>    cgraph_node *n = cgraph_node::get (fun->decl);
>>>>>>    if (n)
>>>>>>      return n->optimize_for_size_p ();
>>>>>> -  return OPTIMIZE_SIZE_NO;
>>>>>> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
>>>>>
>>>>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is
>>>>> probably better since one can change optimize_size with optimization
>>>>> attribute.
>>>>
>>>> Good point, agree!
>>>>
>>>>> However I think in most cases we check for optimize_size early I think
>>>>> we are doing something wrong, since at that time htere is no profile
>>>>> available.  Why exactly PR105818 hits the flag change issue?
>>>>
>>>> For PR105818, the reason why the flag changs is that:
>>>>
>>>> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit
>>>> of rs6000_isa_flags_explicit, it's set as below:
>>>>
>>>> /* If we can shrink-wrap the TOC register save separately, then use
>>>>    -msave-toc-indirect unless explicitly disabled.  */
>>>> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
>>>>     && flag_shrink_wrap_separate
>>>>     && optimize_function_for_speed_p (cfun))
>>>>   rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>>>>
>>>> Initially, rs6000 initialize target_option_default_node with
>>>> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL
>>>> and optimize_size is true.
>>>>
>>>> Later, when c parser handling function foo, it builds target
>>>> option node as target_option_default_node in function
>>>> handle_optimize_attribute, it does global option saving and
>>>> verifying there as well, at that time the cfun is NULL, no
>>>> issue is found.  And function store_parm_decls allocates
>>>> struct_function for foo then, cfun becomes function struct
>>>> for foo, when c parser continues to handle the decl bar in
>>>> foo, function handle_optimize_attribute works as before,
>>>> tries to restore the target options at the end, it calls
>>>> targetm.target_option.restore (rs6000_function_specific_restore)
>>>> which calls function rs6000_option_override_internal again,
>>>> at this time the cfun is not NULL while there is no cgraph
>>>> node for its decl, optimize_function_for_speed_p returns true
>>>> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag
>>>> rs6000_isa_flags set unexpectedly.  It becomes inconsistent
>>>> as the one saved previously.
>>>>
>>>> IMHO, both contexts of global and function decl foo here hold
>>>> optimize_size, function optimize_function_for_speed_p should
>>>> not return true anyway.
>>>>
>>>> btw, the aarch64 failed case also gets the unexpected
>>>> result for optimize_function_for_speed_p during c parsing
>>>> (fold_range_test <- ... <- c_parser_condition).
>>>>
>>>> IIUC, in parsing time we don't have the profile information
>>>> available.
>>>>
>>>
>>> Hi Honza,
>>>
>>> Does the above explanation sound reasonable to you?
>>>

Hi Honza,

Gentle ping^4 ...

BR,
Kewen
  

Patch

diff --git a/gcc/predict.cc b/gcc/predict.cc
index 5734e4c8516..6c60a973236 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -268,7 +268,7 @@  optimize_function_for_size_p (struct function *fun)
   cgraph_node *n = cgraph_node::get (fun->decl);
   if (n)
     return n->optimize_for_size_p ();
-  return OPTIMIZE_SIZE_NO;
+  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
 }

 /* Return true if function FUN should always be optimized for speed.  */
diff --git a/gcc/testsuite/gcc.dg/guality/pr54693-2.c b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
index 68aa6c63d71..14ca94ad37d 100644
--- a/gcc/testsuite/gcc.dg/guality/pr54693-2.c
+++ b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
@@ -17,7 +17,7 @@  foo (int x, int y, int z)
   int i = 0;
   while (x > 3 && y > 3 && z > 3)
     {		/* { dg-final { gdb-test .+2 "i" "v + 1" } } */
-		/* { dg-final { gdb-test .+1 "x" "10 - i" } } */
+		/* { dg-final { gdb-test .+1 "x" "10 - i" { xfail { aarch64*-*-* && { any-opts "-Os" } } } } } */
       bar (i);	/* { dg-final { gdb-test . "y" "20 - 2 * i" } } */
 		/* { dg-final { gdb-test .-1 "z" "30 - 3 * i" { xfail { aarch64*-*-* && { any-opts "-fno-fat-lto-objects" "-Os" } } } } } */
       i++, x--, y -= 2, z -= 3;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr105818.c b/gcc/testsuite/gcc.target/powerpc/pr105818.c
new file mode 100644
index 00000000000..18781edbf9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr105818.c
@@ -0,0 +1,9 @@ 
+/* { dg-options "-Os -fno-tree-vectorize" } */
+
+#pragma GCC optimize "-fno-tree-vectorize"
+
+void
+foo (void)
+{
+  void bar (void);
+}