middle-end/105604 - snprintf dianostics and non-constant sizes/offsets

Message ID 20220516091616.7D3DF13ADC@imap2.suse-dmz.suse.de
State New
Headers
Series middle-end/105604 - snprintf dianostics and non-constant sizes/offsets |

Commit Message

Richard Biener May 16, 2022, 9:16 a.m. UTC
  The following tries to correct get_origin_and_offset_r not handling
non-constant sizes of array elements in ARRAY_REFs and non-constant
offsets of COMPONENT_REFs.  It isn't exactly clear how such failures
should be treated in this API and existing handling isn't consistent
here either.  The following applies two different variants, treating
non-constant array sizes like non-constant array indices and
treating non-constant offsets of COMPONENT_REFs by terminating
the recursion (not sure what that means to the callers).

Basically the code failed to use component_ref_field_offset and
array_ref_element_size and instead relies on inappropriate
helpers (that shouldn't exist in the first place ...).  The code
is also not safe-guarded against overflows in the final offset/size
computations but I'm not trying to rectify that.

Martin - can you comment on how the API should handle such
situations?

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk and branches?

Thanks,
Richard.

2022-05-16  Richard Biener  <rguenther@suse.de>

	PR middle-end/105604
	* gimple-ssa-sprintf.cc (get_origin_and_offset_r):
	Handle non-constant ARRAY_REF element size and non-constant
	COMPONENT_REF field offset.

	* gcc.dg/torture/pr105604.c: New testcase.
---
 gcc/gimple-ssa-sprintf.cc               | 14 +++++++++++---
 gcc/testsuite/gcc.dg/torture/pr105604.c | 24 ++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr105604.c
  

Comments

Martin Sebor May 17, 2022, 10:04 p.m. UTC | #1
On 5/16/22 03:16, Richard Biener wrote:
> The following tries to correct get_origin_and_offset_r not handling
> non-constant sizes of array elements in ARRAY_REFs and non-constant
> offsets of COMPONENT_REFs.  It isn't exactly clear how such failures
> should be treated in this API and existing handling isn't consistent
> here either.  The following applies two different variants, treating
> non-constant array sizes like non-constant array indices and
> treating non-constant offsets of COMPONENT_REFs by terminating
> the recursion (not sure what that means to the callers).
> 
> Basically the code failed to use component_ref_field_offset and
> array_ref_element_size and instead relies on inappropriate
> helpers (that shouldn't exist in the first place ...).  The code
> is also not safe-guarded against overflows in the final offset/size
> computations but I'm not trying to rectify that.
> 
> Martin - can you comment on how the API should handle such
> situations?

It looks like the -Wrestrict warning here ignores offsets equal to
HOST_WIDE_INT_MIN so presumably setting dst_offset (via *fldoff) to
that should avoid it.  Or maybe to HWI_MAX as it does for variable
offsets.

It also looks like the function only handles constant offsets and
sizes, and I have a vague recollection of enhancing it to work with
ranges.  That should avoid the overflow problem too.

Martin

> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK for trunk and branches?
> 
> Thanks,
> Richard.
> 
> 2022-05-16  Richard Biener  <rguenther@suse.de>
> 
> 	PR middle-end/105604
> 	* gimple-ssa-sprintf.cc (get_origin_and_offset_r):
> 	Handle non-constant ARRAY_REF element size and non-constant
> 	COMPONENT_REF field offset.
> 
> 	* gcc.dg/torture/pr105604.c: New testcase.
> ---
>   gcc/gimple-ssa-sprintf.cc               | 14 +++++++++++---
>   gcc/testsuite/gcc.dg/torture/pr105604.c | 24 ++++++++++++++++++++++++
>   2 files changed, 35 insertions(+), 3 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/torture/pr105604.c
> 
> diff --git a/gcc/gimple-ssa-sprintf.cc b/gcc/gimple-ssa-sprintf.cc
> index c93f12f90b5..14e215ce69c 100644
> --- a/gcc/gimple-ssa-sprintf.cc
> +++ b/gcc/gimple-ssa-sprintf.cc
> @@ -2312,14 +2312,16 @@ get_origin_and_offset_r (tree x, HOST_WIDE_INT *fldoff, HOST_WIDE_INT *fldsize,
>   	HOST_WIDE_INT idx = (tree_fits_uhwi_p (offset)
>   			     ? tree_to_uhwi (offset) : HOST_WIDE_INT_MAX);
>   
> +	tree elsz = array_ref_element_size (x);
>   	tree eltype = TREE_TYPE (x);
>   	if (TREE_CODE (eltype) == INTEGER_TYPE)
>   	  {
>   	    if (off)
>   	      *off = idx;
>   	  }
> -	else if (idx < HOST_WIDE_INT_MAX)
> -	  *fldoff += idx * int_size_in_bytes (eltype);
> +	else if (idx < HOST_WIDE_INT_MAX
> +		 && tree_fits_shwi_p (elsz))
> +	  *fldoff += idx * tree_to_shwi (elsz);
>   	else
>   	  *fldoff = idx;
>   
> @@ -2350,8 +2352,14 @@ get_origin_and_offset_r (tree x, HOST_WIDE_INT *fldoff, HOST_WIDE_INT *fldsize,
>   
>       case COMPONENT_REF:
>         {
> +	tree foff = component_ref_field_offset (x);
>   	tree fld = TREE_OPERAND (x, 1);
> -	*fldoff += int_byte_position (fld);
> +	if (!tree_fits_shwi_p (foff)
> +	    || !tree_fits_shwi_p (DECL_FIELD_BIT_OFFSET (fld)))
> +	  return x;
> +	*fldoff += (tree_to_shwi (foff)
> +		    + (tree_to_shwi (DECL_FIELD_BIT_OFFSET (fld))
> +		       / BITS_PER_UNIT));
>   
>   	get_origin_and_offset_r (fld, fldoff, fldsize, off);
>   	x = TREE_OPERAND (x, 0);
> diff --git a/gcc/testsuite/gcc.dg/torture/pr105604.c b/gcc/testsuite/gcc.dg/torture/pr105604.c
> new file mode 100644
> index 00000000000..b002251df10
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr105604.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Wall" } */
> +
> +struct {
> +  long users;
> +  long size;
> +  char *data;
> +} * main_trans;
> +void *main___trans_tmp_1;
> +int sprintf(char *, char *, ...);
> +int main() {
> +  int users = 0;
> +  struct {
> +    long users;
> +    long size;
> +    char *data;
> +    int links[users];
> +    char buf[];
> +  } *trans = trans;
> +  trans->data = trans->buf;
> +  main___trans_tmp_1 = trans;
> +  main_trans = main___trans_tmp_1;
> +  sprintf(main_trans->data, "test");
> +}
  
Richard Biener May 18, 2022, 6:26 a.m. UTC | #2
On Tue, 17 May 2022, Martin Sebor wrote:

> On 5/16/22 03:16, Richard Biener wrote:
> > The following tries to correct get_origin_and_offset_r not handling
> > non-constant sizes of array elements in ARRAY_REFs and non-constant
> > offsets of COMPONENT_REFs.  It isn't exactly clear how such failures
> > should be treated in this API and existing handling isn't consistent
> > here either.  The following applies two different variants, treating
> > non-constant array sizes like non-constant array indices and
> > treating non-constant offsets of COMPONENT_REFs by terminating
> > the recursion (not sure what that means to the callers).
> > 
> > Basically the code failed to use component_ref_field_offset and
> > array_ref_element_size and instead relies on inappropriate
> > helpers (that shouldn't exist in the first place ...).  The code
> > is also not safe-guarded against overflows in the final offset/size
> > computations but I'm not trying to rectify that.
> > 
> > Martin - can you comment on how the API should handle such
> > situations?
> 
> It looks like the -Wrestrict warning here ignores offsets equal to
> HOST_WIDE_INT_MIN so presumably setting dst_offset (via *fldoff) to
> that should avoid it.  Or maybe to HWI_MAX as it does for variable
> offsets.

Can you suggest wording for the function comment as to how it handles
the case when offset or size cannot be determined exactly?   The
comment currently only suggests that the caller possibly cannot
trust fldsize or off when the function returns NULL but the actual
implementation differs from that.

> It also looks like the function only handles constant offsets and
> sizes, and I have a vague recollection of enhancing it to work with
> ranges.  That should avoid the overflow problem too.

So the correct thing is to return NULL?

Is the patch OK as-is?  As said, I'm not sure how the caller interprets
the result and how it can distinguish the exact vs. non-exact cases
or what a "conservative" inexact answer would be.

Please help properly documenting this API.

Thanks,
Richard.

> Martin
> 
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > OK for trunk and branches?
> > 
> > Thanks,
> > Richard.
> > 
> > 2022-05-16  Richard Biener  <rguenther@suse.de>
> > 
> >  PR middle-end/105604
> >  * gimple-ssa-sprintf.cc (get_origin_and_offset_r):
> >  Handle non-constant ARRAY_REF element size and non-constant
> >  COMPONENT_REF field offset.
> > 
> > 	* gcc.dg/torture/pr105604.c: New testcase.
> > ---
> >   gcc/gimple-ssa-sprintf.cc               | 14 +++++++++++---
> >   gcc/testsuite/gcc.dg/torture/pr105604.c | 24 ++++++++++++++++++++++++
> >   2 files changed, 35 insertions(+), 3 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.dg/torture/pr105604.c
> > 
> > diff --git a/gcc/gimple-ssa-sprintf.cc b/gcc/gimple-ssa-sprintf.cc
> > index c93f12f90b5..14e215ce69c 100644
> > --- a/gcc/gimple-ssa-sprintf.cc
> > +++ b/gcc/gimple-ssa-sprintf.cc
> > @@ -2312,14 +2312,16 @@ get_origin_and_offset_r (tree x, HOST_WIDE_INT
> > *fldoff, HOST_WIDE_INT *fldsize,
> >    HOST_WIDE_INT idx = (tree_fits_uhwi_p (offset)
> >           ? tree_to_uhwi (offset) : HOST_WIDE_INT_MAX);
> >   +	tree elsz = array_ref_element_size (x);
> >    tree eltype = TREE_TYPE (x);
> >    if (TREE_CODE (eltype) == INTEGER_TYPE)
> >      {
> >        if (off)
> >          *off = idx;
> >   	  }
> > -	else if (idx < HOST_WIDE_INT_MAX)
> > -	  *fldoff += idx * int_size_in_bytes (eltype);
> > +	else if (idx < HOST_WIDE_INT_MAX
> > +		 && tree_fits_shwi_p (elsz))
> > +	  *fldoff += idx * tree_to_shwi (elsz);
> >    else
> >      *fldoff = idx;
> >   @@ -2350,8 +2352,14 @@ get_origin_and_offset_r (tree x, HOST_WIDE_INT
> > *fldoff, HOST_WIDE_INT *fldsize,
> >   
> >       case COMPONENT_REF:
> >         {
> > +	tree foff = component_ref_field_offset (x);
> >   	tree fld = TREE_OPERAND (x, 1);
> > -	*fldoff += int_byte_position (fld);
> > +	if (!tree_fits_shwi_p (foff)
> > +	    || !tree_fits_shwi_p (DECL_FIELD_BIT_OFFSET (fld)))
> > +	  return x;
> > +	*fldoff += (tree_to_shwi (foff)
> > +		    + (tree_to_shwi (DECL_FIELD_BIT_OFFSET (fld))
> > +		       / BITS_PER_UNIT));
> >   
> >    get_origin_and_offset_r (fld, fldoff, fldsize, off);
> >    x = TREE_OPERAND (x, 0);
> > diff --git a/gcc/testsuite/gcc.dg/torture/pr105604.c
> > b/gcc/testsuite/gcc.dg/torture/pr105604.c
> > new file mode 100644
> > index 00000000000..b002251df10
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/torture/pr105604.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-Wall" } */
> > +
> > +struct {
> > +  long users;
> > +  long size;
> > +  char *data;
> > +} * main_trans;
> > +void *main___trans_tmp_1;
> > +int sprintf(char *, char *, ...);
> > +int main() {
> > +  int users = 0;
> > +  struct {
> > +    long users;
> > +    long size;
> > +    char *data;
> > +    int links[users];
> > +    char buf[];
> > +  } *trans = trans;
> > +  trans->data = trans->buf;
> > +  main___trans_tmp_1 = trans;
> > +  main_trans = main___trans_tmp_1;
> > +  sprintf(main_trans->data, "test");
> > +}
> 
> 
>
  
Martin Sebor May 18, 2022, 10:42 p.m. UTC | #3
On 5/18/22 00:26, Richard Biener wrote:
> On Tue, 17 May 2022, Martin Sebor wrote:
> 
>> On 5/16/22 03:16, Richard Biener wrote:
>>> The following tries to correct get_origin_and_offset_r not handling
>>> non-constant sizes of array elements in ARRAY_REFs and non-constant
>>> offsets of COMPONENT_REFs.  It isn't exactly clear how such failures
>>> should be treated in this API and existing handling isn't consistent
>>> here either.  The following applies two different variants, treating
>>> non-constant array sizes like non-constant array indices and
>>> treating non-constant offsets of COMPONENT_REFs by terminating
>>> the recursion (not sure what that means to the callers).
>>>
>>> Basically the code failed to use component_ref_field_offset and
>>> array_ref_element_size and instead relies on inappropriate
>>> helpers (that shouldn't exist in the first place ...).  The code
>>> is also not safe-guarded against overflows in the final offset/size
>>> computations but I'm not trying to rectify that.
>>>
>>> Martin - can you comment on how the API should handle such
>>> situations?
>>
>> It looks like the -Wrestrict warning here ignores offsets equal to
>> HOST_WIDE_INT_MIN so presumably setting dst_offset (via *fldoff) to
>> that should avoid it.  Or maybe to HWI_MAX as it does for variable
>> offsets.
> 
> Can you suggest wording for the function comment as to how it handles
> the case when offset or size cannot be determined exactly?   The
> comment currently only suggests that the caller possibly cannot
> trust fldsize or off when the function returns NULL but the actual
> implementation differs from that.



> 
>> It also looks like the function only handles constant offsets and
>> sizes, and I have a vague recollection of enhancing it to work with
>> ranges.  That should avoid the overflow problem too.
> 
> So the correct thing is to return NULL?

No, I don't think so.  The recursive get_origin_and_offset_r() assumes
its own invocations never return null (the one place it does that should
probably be moved to the nonrecursive caller).

> 
> Is the patch OK as-is?

It's an improvement but it's not complete as the following also ICEs
(albeit somewhere else):

void* f (void);

void g (int n)
{
   struct {
     char a[n], b[];
   } *p = f ();

   __builtin_sprintf (p->b, "%s", p->a);
}

With the ICE fixed the warning triggers.  That's not ideal but it's
unavoidable given the IR (I believe I mentioned this caveat some time
back).  This is the same as for:

   struct {
     char a[8], b[8];
   } *p = f ();

   __builtin_sprintf (&p->b[n], "%s", p->a);

because the IR looks more or less the same for &p->a[n] as it is for
&p->b[n].

> As said, I'm not sure how the caller interprets
> the result and how it can distinguish the exact vs. non-exact cases
> or what a "conservative" inexact answer would be.

The warning triggers in both the certain cases and the inexact
ones like the one above when an overlap cannot be ruled out.  To
differentiate the two it's phrased as "may overlap".  The handling
is in maybe_warn_overlap().

> 
> Please help properly documenting this API.

I can spend some time in the next few days to page it all in, see
if I can clean it up a bit in addition to fixing the ICEs and
improve the comment.  Let me know if you have a different
preference.

Martin

> 
> Thanks,
> Richard.
> 
>> Martin
>>
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>>>
>>> OK for trunk and branches?
>>>
>>> Thanks,
>>> Richard.
>>>
>>> 2022-05-16  Richard Biener  <rguenther@suse.de>
>>>
>>>   PR middle-end/105604
>>>   * gimple-ssa-sprintf.cc (get_origin_and_offset_r):
>>>   Handle non-constant ARRAY_REF element size and non-constant
>>>   COMPONENT_REF field offset.
>>>
>>> 	* gcc.dg/torture/pr105604.c: New testcase.
>>> ---
>>>    gcc/gimple-ssa-sprintf.cc               | 14 +++++++++++---
>>>    gcc/testsuite/gcc.dg/torture/pr105604.c | 24 ++++++++++++++++++++++++
>>>    2 files changed, 35 insertions(+), 3 deletions(-)
>>>    create mode 100644 gcc/testsuite/gcc.dg/torture/pr105604.c
>>>
>>> diff --git a/gcc/gimple-ssa-sprintf.cc b/gcc/gimple-ssa-sprintf.cc
>>> index c93f12f90b5..14e215ce69c 100644
>>> --- a/gcc/gimple-ssa-sprintf.cc
>>> +++ b/gcc/gimple-ssa-sprintf.cc
>>> @@ -2312,14 +2312,16 @@ get_origin_and_offset_r (tree x, HOST_WIDE_INT
>>> *fldoff, HOST_WIDE_INT *fldsize,
>>>     HOST_WIDE_INT idx = (tree_fits_uhwi_p (offset)
>>>            ? tree_to_uhwi (offset) : HOST_WIDE_INT_MAX);
>>>    +	tree elsz = array_ref_element_size (x);
>>>     tree eltype = TREE_TYPE (x);
>>>     if (TREE_CODE (eltype) == INTEGER_TYPE)
>>>       {
>>>         if (off)
>>>           *off = idx;
>>>    	  }
>>> -	else if (idx < HOST_WIDE_INT_MAX)
>>> -	  *fldoff += idx * int_size_in_bytes (eltype);
>>> +	else if (idx < HOST_WIDE_INT_MAX
>>> +		 && tree_fits_shwi_p (elsz))
>>> +	  *fldoff += idx * tree_to_shwi (elsz);
>>>     else
>>>       *fldoff = idx;
>>>    @@ -2350,8 +2352,14 @@ get_origin_and_offset_r (tree x, HOST_WIDE_INT
>>> *fldoff, HOST_WIDE_INT *fldsize,
>>>    
>>>        case COMPONENT_REF:
>>>          {
>>> +	tree foff = component_ref_field_offset (x);
>>>    	tree fld = TREE_OPERAND (x, 1);
>>> -	*fldoff += int_byte_position (fld);
>>> +	if (!tree_fits_shwi_p (foff)
>>> +	    || !tree_fits_shwi_p (DECL_FIELD_BIT_OFFSET (fld)))
>>> +	  return x;
>>> +	*fldoff += (tree_to_shwi (foff)
>>> +		    + (tree_to_shwi (DECL_FIELD_BIT_OFFSET (fld))
>>> +		       / BITS_PER_UNIT));
>>>    
>>>     get_origin_and_offset_r (fld, fldoff, fldsize, off);
>>>     x = TREE_OPERAND (x, 0);
>>> diff --git a/gcc/testsuite/gcc.dg/torture/pr105604.c
>>> b/gcc/testsuite/gcc.dg/torture/pr105604.c
>>> new file mode 100644
>>> index 00000000000..b002251df10
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/torture/pr105604.c
>>> @@ -0,0 +1,24 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-additional-options "-Wall" } */
>>> +
>>> +struct {
>>> +  long users;
>>> +  long size;
>>> +  char *data;
>>> +} * main_trans;
>>> +void *main___trans_tmp_1;
>>> +int sprintf(char *, char *, ...);
>>> +int main() {
>>> +  int users = 0;
>>> +  struct {
>>> +    long users;
>>> +    long size;
>>> +    char *data;
>>> +    int links[users];
>>> +    char buf[];
>>> +  } *trans = trans;
>>> +  trans->data = trans->buf;
>>> +  main___trans_tmp_1 = trans;
>>> +  main_trans = main___trans_tmp_1;
>>> +  sprintf(main_trans->data, "test");
>>> +}
>>
>>
>>
>
  
Richard Biener May 19, 2022, 11:39 a.m. UTC | #4
On Wed, 18 May 2022, Martin Sebor wrote:

> On 5/18/22 00:26, Richard Biener wrote:
> > On Tue, 17 May 2022, Martin Sebor wrote:
> > 
> >> On 5/16/22 03:16, Richard Biener wrote:
> >>> The following tries to correct get_origin_and_offset_r not handling
> >>> non-constant sizes of array elements in ARRAY_REFs and non-constant
> >>> offsets of COMPONENT_REFs.  It isn't exactly clear how such failures
> >>> should be treated in this API and existing handling isn't consistent
> >>> here either.  The following applies two different variants, treating
> >>> non-constant array sizes like non-constant array indices and
> >>> treating non-constant offsets of COMPONENT_REFs by terminating
> >>> the recursion (not sure what that means to the callers).
> >>>
> >>> Basically the code failed to use component_ref_field_offset and
> >>> array_ref_element_size and instead relies on inappropriate
> >>> helpers (that shouldn't exist in the first place ...).  The code
> >>> is also not safe-guarded against overflows in the final offset/size
> >>> computations but I'm not trying to rectify that.
> >>>
> >>> Martin - can you comment on how the API should handle such
> >>> situations?
> >>
> >> It looks like the -Wrestrict warning here ignores offsets equal to
> >> HOST_WIDE_INT_MIN so presumably setting dst_offset (via *fldoff) to
> >> that should avoid it.  Or maybe to HWI_MAX as it does for variable
> >> offsets.
> > 
> > Can you suggest wording for the function comment as to how it handles
> > the case when offset or size cannot be determined exactly?   The
> > comment currently only suggests that the caller possibly cannot
> > trust fldsize or off when the function returns NULL but the actual
> > implementation differs from that.
> 
> 
> 
> > 
> >> It also looks like the function only handles constant offsets and
> >> sizes, and I have a vague recollection of enhancing it to work with
> >> ranges.  That should avoid the overflow problem too.
> > 
> > So the correct thing is to return NULL?
> 
> No, I don't think so.  The recursive get_origin_and_offset_r() assumes
> its own invocations never return null (the one place it does that should
> probably be moved to the nonrecursive caller).
> 
> > 
> > Is the patch OK as-is?
> 
> It's an improvement but it's not complete as the following also ICEs
> (albeit somewhere else):
> 
> void* f (void);
> 
> void g (int n)
> {
>   struct {
>     char a[n], b[];
>   } *p = f ();
> 
>   __builtin_sprintf (p->b, "%s", p->a);
> }
> 
> With the ICE fixed the warning triggers.  That's not ideal but it's
> unavoidable given the IR (I believe I mentioned this caveat some time
> back).  This is the same as for:
> 
>   struct {
>     char a[8], b[8];
>   } *p = f ();
> 
>   __builtin_sprintf (&p->b[n], "%s", p->a);
> 
> because the IR looks more or less the same for &p->a[n] as it is for
> &p->b[n].
> 
> > As said, I'm not sure how the caller interprets
> > the result and how it can distinguish the exact vs. non-exact cases
> > or what a "conservative" inexact answer would be.
> 
> The warning triggers in both the certain cases and the inexact
> ones like the one above when an overlap cannot be ruled out.  To
> differentiate the two it's phrased as "may overlap".  The handling
> is in maybe_warn_overlap().
> 
> > 
> > Please help properly documenting this API.
> 
> I can spend some time in the next few days to page it all in, see
> if I can clean it up a bit in addition to fixing the ICEs and
> improve the comment.  Let me know if you have a different
> preference.

That works for me - thanks for taking it from here.

Richard.
  
Martin Sebor May 24, 2022, 12:58 a.m. UTC | #5
On 5/19/22 05:39, Richard Biener wrote:
> On Wed, 18 May 2022, Martin Sebor wrote:
> 
>> On 5/18/22 00:26, Richard Biener wrote:
>>> On Tue, 17 May 2022, Martin Sebor wrote:
>>>
>>>> On 5/16/22 03:16, Richard Biener wrote:
>>>>> The following tries to correct get_origin_and_offset_r not handling
>>>>> non-constant sizes of array elements in ARRAY_REFs and non-constant
>>>>> offsets of COMPONENT_REFs.  It isn't exactly clear how such failures
>>>>> should be treated in this API and existing handling isn't consistent
>>>>> here either.  The following applies two different variants, treating
>>>>> non-constant array sizes like non-constant array indices and
>>>>> treating non-constant offsets of COMPONENT_REFs by terminating
>>>>> the recursion (not sure what that means to the callers).
>>>>>
>>>>> Basically the code failed to use component_ref_field_offset and
>>>>> array_ref_element_size and instead relies on inappropriate
>>>>> helpers (that shouldn't exist in the first place ...).  The code
>>>>> is also not safe-guarded against overflows in the final offset/size
>>>>> computations but I'm not trying to rectify that.
>>>>>
>>>>> Martin - can you comment on how the API should handle such
>>>>> situations?
>>>>
>>>> It looks like the -Wrestrict warning here ignores offsets equal to
>>>> HOST_WIDE_INT_MIN so presumably setting dst_offset (via *fldoff) to
>>>> that should avoid it.  Or maybe to HWI_MAX as it does for variable
>>>> offsets.
>>>
>>> Can you suggest wording for the function comment as to how it handles
>>> the case when offset or size cannot be determined exactly?   The
>>> comment currently only suggests that the caller possibly cannot
>>> trust fldsize or off when the function returns NULL but the actual
>>> implementation differs from that.
>>
>>
>>
>>>
>>>> It also looks like the function only handles constant offsets and
>>>> sizes, and I have a vague recollection of enhancing it to work with
>>>> ranges.  That should avoid the overflow problem too.
>>>
>>> So the correct thing is to return NULL?
>>
>> No, I don't think so.  The recursive get_origin_and_offset_r() assumes
>> its own invocations never return null (the one place it does that should
>> probably be moved to the nonrecursive caller).
>>
>>>
>>> Is the patch OK as-is?
>>
>> It's an improvement but it's not complete as the following also ICEs
>> (albeit somewhere else):
>>
>> void* f (void);
>>
>> void g (int n)
>> {
>>    struct {
>>      char a[n], b[];
>>    } *p = f ();
>>
>>    __builtin_sprintf (p->b, "%s", p->a);
>> }
>>
>> With the ICE fixed the warning triggers.  That's not ideal but it's
>> unavoidable given the IR (I believe I mentioned this caveat some time
>> back).  This is the same as for:
>>
>>    struct {
>>      char a[8], b[8];
>>    } *p = f ();
>>
>>    __builtin_sprintf (&p->b[n], "%s", p->a);
>>
>> because the IR looks more or less the same for &p->a[n] as it is for
>> &p->b[n].
>>
>>> As said, I'm not sure how the caller interprets
>>> the result and how it can distinguish the exact vs. non-exact cases
>>> or what a "conservative" inexact answer would be.
>>
>> The warning triggers in both the certain cases and the inexact
>> ones like the one above when an overlap cannot be ruled out.  To
>> differentiate the two it's phrased as "may overlap".  The handling
>> is in maybe_warn_overlap().
>>
>>>
>>> Please help properly documenting this API.
>>
>> I can spend some time in the next few days to page it all in, see
>> if I can clean it up a bit in addition to fixing the ICEs and
>> improve the comment.  Let me know if you have a different
>> preference.
> 
> That works for me - thanks for taking it from here.
Attached is a slightly enhanced patch that fixes both of the ICEs,
improves the comments, and adds more tests.  I tested it on x86_64.
Let me know if there's something else you'd like me to do here.

Martin
  
Richard Biener May 24, 2022, 6:26 a.m. UTC | #6
On Mon, 23 May 2022, Martin Sebor wrote:

> On 5/19/22 05:39, Richard Biener wrote:
> > On Wed, 18 May 2022, Martin Sebor wrote:
> > 
> >> On 5/18/22 00:26, Richard Biener wrote:
> >>> On Tue, 17 May 2022, Martin Sebor wrote:
> >>>
> >>>> On 5/16/22 03:16, Richard Biener wrote:
> >>>>> The following tries to correct get_origin_and_offset_r not handling
> >>>>> non-constant sizes of array elements in ARRAY_REFs and non-constant
> >>>>> offsets of COMPONENT_REFs.  It isn't exactly clear how such failures
> >>>>> should be treated in this API and existing handling isn't consistent
> >>>>> here either.  The following applies two different variants, treating
> >>>>> non-constant array sizes like non-constant array indices and
> >>>>> treating non-constant offsets of COMPONENT_REFs by terminating
> >>>>> the recursion (not sure what that means to the callers).
> >>>>>
> >>>>> Basically the code failed to use component_ref_field_offset and
> >>>>> array_ref_element_size and instead relies on inappropriate
> >>>>> helpers (that shouldn't exist in the first place ...).  The code
> >>>>> is also not safe-guarded against overflows in the final offset/size
> >>>>> computations but I'm not trying to rectify that.
> >>>>>
> >>>>> Martin - can you comment on how the API should handle such
> >>>>> situations?
> >>>>
> >>>> It looks like the -Wrestrict warning here ignores offsets equal to
> >>>> HOST_WIDE_INT_MIN so presumably setting dst_offset (via *fldoff) to
> >>>> that should avoid it.  Or maybe to HWI_MAX as it does for variable
> >>>> offsets.
> >>>
> >>> Can you suggest wording for the function comment as to how it handles
> >>> the case when offset or size cannot be determined exactly?   The
> >>> comment currently only suggests that the caller possibly cannot
> >>> trust fldsize or off when the function returns NULL but the actual
> >>> implementation differs from that.
> >>
> >>
> >>
> >>>
> >>>> It also looks like the function only handles constant offsets and
> >>>> sizes, and I have a vague recollection of enhancing it to work with
> >>>> ranges.  That should avoid the overflow problem too.
> >>>
> >>> So the correct thing is to return NULL?
> >>
> >> No, I don't think so.  The recursive get_origin_and_offset_r() assumes
> >> its own invocations never return null (the one place it does that should
> >> probably be moved to the nonrecursive caller).
> >>
> >>>
> >>> Is the patch OK as-is?
> >>
> >> It's an improvement but it's not complete as the following also ICEs
> >> (albeit somewhere else):
> >>
> >> void* f (void);
> >>
> >> void g (int n)
> >> {
> >>    struct {
> >>      char a[n], b[];
> >>    } *p = f ();
> >>
> >>    __builtin_sprintf (p->b, "%s", p->a);
> >> }
> >>
> >> With the ICE fixed the warning triggers.  That's not ideal but it's
> >> unavoidable given the IR (I believe I mentioned this caveat some time
> >> back).  This is the same as for:
> >>
> >>    struct {
> >>      char a[8], b[8];
> >>    } *p = f ();
> >>
> >>    __builtin_sprintf (&p->b[n], "%s", p->a);
> >>
> >> because the IR looks more or less the same for &p->a[n] as it is for
> >> &p->b[n].
> >>
> >>> As said, I'm not sure how the caller interprets
> >>> the result and how it can distinguish the exact vs. non-exact cases
> >>> or what a "conservative" inexact answer would be.
> >>
> >> The warning triggers in both the certain cases and the inexact
> >> ones like the one above when an overlap cannot be ruled out.  To
> >> differentiate the two it's phrased as "may overlap".  The handling
> >> is in maybe_warn_overlap().
> >>
> >>>
> >>> Please help properly documenting this API.
> >>
> >> I can spend some time in the next few days to page it all in, see
> >> if I can clean it up a bit in addition to fixing the ICEs and
> >> improve the comment.  Let me know if you have a different
> >> preference.
> > 
> > That works for me - thanks for taking it from here.
> Attached is a slightly enhanced patch that fixes both of the ICEs,
> improves the comments, and adds more tests.  I tested it on x86_64.
> Let me know if there's something else you'd like me to do here.

Looks good to me!

Thanks for fixing.
Richard.
  

Patch

diff --git a/gcc/gimple-ssa-sprintf.cc b/gcc/gimple-ssa-sprintf.cc
index c93f12f90b5..14e215ce69c 100644
--- a/gcc/gimple-ssa-sprintf.cc
+++ b/gcc/gimple-ssa-sprintf.cc
@@ -2312,14 +2312,16 @@  get_origin_and_offset_r (tree x, HOST_WIDE_INT *fldoff, HOST_WIDE_INT *fldsize,
 	HOST_WIDE_INT idx = (tree_fits_uhwi_p (offset)
 			     ? tree_to_uhwi (offset) : HOST_WIDE_INT_MAX);
 
+	tree elsz = array_ref_element_size (x);
 	tree eltype = TREE_TYPE (x);
 	if (TREE_CODE (eltype) == INTEGER_TYPE)
 	  {
 	    if (off)
 	      *off = idx;
 	  }
-	else if (idx < HOST_WIDE_INT_MAX)
-	  *fldoff += idx * int_size_in_bytes (eltype);
+	else if (idx < HOST_WIDE_INT_MAX
+		 && tree_fits_shwi_p (elsz))
+	  *fldoff += idx * tree_to_shwi (elsz);
 	else
 	  *fldoff = idx;
 
@@ -2350,8 +2352,14 @@  get_origin_and_offset_r (tree x, HOST_WIDE_INT *fldoff, HOST_WIDE_INT *fldsize,
 
     case COMPONENT_REF:
       {
+	tree foff = component_ref_field_offset (x);
 	tree fld = TREE_OPERAND (x, 1);
-	*fldoff += int_byte_position (fld);
+	if (!tree_fits_shwi_p (foff)
+	    || !tree_fits_shwi_p (DECL_FIELD_BIT_OFFSET (fld)))
+	  return x;
+	*fldoff += (tree_to_shwi (foff)
+		    + (tree_to_shwi (DECL_FIELD_BIT_OFFSET (fld))
+		       / BITS_PER_UNIT));
 
 	get_origin_and_offset_r (fld, fldoff, fldsize, off);
 	x = TREE_OPERAND (x, 0);
diff --git a/gcc/testsuite/gcc.dg/torture/pr105604.c b/gcc/testsuite/gcc.dg/torture/pr105604.c
new file mode 100644
index 00000000000..b002251df10
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr105604.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-Wall" } */
+
+struct {
+  long users;
+  long size;
+  char *data;
+} * main_trans;
+void *main___trans_tmp_1;
+int sprintf(char *, char *, ...);
+int main() {
+  int users = 0;
+  struct {
+    long users;
+    long size;
+    char *data;
+    int links[users];
+    char buf[];
+  } *trans = trans;
+  trans->data = trans->buf;
+  main___trans_tmp_1 = trans;
+  main_trans = main___trans_tmp_1;
+  sprintf(main_trans->data, "test");
+}