[4/7] middle-end: Fix mask length arg in call to vect_get_loop_mask [PR96342]

Message ID Z1BIRbih9N9feAO0@arm.com
State New
Headers
Series None |

Commit Message

Tamar Christina Dec. 4, 2024, 12:17 p.m. UTC
  Hi All,

When issuing multiple calls to a simdclone in a vectorized loop,
TYPE_VECTOR_SUBPARTS(vectype) gives the incorrect number when compared
to the TYPE_VECTOR_SUBPARTS result we get from the mask type derived
from the relevant `rgroup_controls' entry within `vect_get_loop_mask'.

By passing `masktype' instead, we are able to get the correct number of
vector subparts and thu eliminate the ICE in the call to
`vect_get_loop_mask' when the data type for which we retrieve the mask
is wider than the one used when defining the mask at mask registration
time.

gcc/ChangeLog:

	PR target/96342
	* tree-vect-stmts.cc (vectorizable_simd_clone_call):
	s/vectype/masktype/ in call to vect_get_loop_mask.


Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

---




--
  

Comments

Richard Biener Dec. 4, 2024, 2:42 p.m. UTC | #1
On Wed, 4 Dec 2024, Tamar Christina wrote:

> Hi All,
> 
> When issuing multiple calls to a simdclone in a vectorized loop,
> TYPE_VECTOR_SUBPARTS(vectype) gives the incorrect number when compared
> to the TYPE_VECTOR_SUBPARTS result we get from the mask type derived
> from the relevant `rgroup_controls' entry within `vect_get_loop_mask'.
> 
> By passing `masktype' instead, we are able to get the correct number of
> vector subparts and thu eliminate the ICE in the call to
> `vect_get_loop_mask' when the data type for which we retrieve the mask
> is wider than the one used when defining the mask at mask registration
> time.
>
> gcc/ChangeLog:
> 
> 	PR target/96342
> 	* tree-vect-stmts.cc (vectorizable_simd_clone_call):
> 	s/vectype/masktype/ in call to vect_get_loop_mask.
> 
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?

Hmm, I think the corresponding vect_record_loop_mask is

            case SIMD_CLONE_ARG_TYPE_MASK:
              if (loop_vinfo
                  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
                {
                  unsigned nmasks
                    = exact_div (ncopies * bestn->simdclone->simdlen,
                                 TYPE_VECTOR_SUBPARTS 
(vectype)).to_constant ();
                  vect_record_loop_mask (loop_vinfo,
                                         &LOOP_VINFO_MASKS (loop_vinfo),
                                         nmasks, vectype, op);
                }

and that doesn't exactly match the bestn->simdclone->nargs - 1
mask use then unless bestn->simdclone->args[mask_i].vector_type
== vectype?

The patch looks OK, but I think we miss a corresponding 
vect_record_loop_mask case.  I also wonder how we support a SIMD clone
call where the result vector type subparts doesn't match the argument
or loop mask number of subparts?
 
> Thanks,
> Tamar
> 
> ---
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 497a31322accba8672b82dee00f5403b40dca22b..be1139a423c85608755f10750bb68e70223a84ea 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -4964,7 +4964,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>  		{
>  		  vec_loop_masks *loop_masks = &LOOP_VINFO_MASKS (loop_vinfo);
>  		  mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
> -					     ncopies, vectype, j);
> +					     ncopies, masktype, j);
>  		}
>  	      else
>  		mask = vect_build_all_ones_mask (vinfo, stmt_info, masktype);
> 
> 
> 
> 
>
  
Tamar Christina Dec. 4, 2024, 3:24 p.m. UTC | #2
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, December 4, 2024 2:43 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: Re: [PATCH 4/7]middle-end: Fix mask length arg in call to
> vect_get_loop_mask [PR96342]
> 
> On Wed, 4 Dec 2024, Tamar Christina wrote:
> 
> > Hi All,
> >
> > When issuing multiple calls to a simdclone in a vectorized loop,
> > TYPE_VECTOR_SUBPARTS(vectype) gives the incorrect number when compared
> > to the TYPE_VECTOR_SUBPARTS result we get from the mask type derived
> > from the relevant `rgroup_controls' entry within `vect_get_loop_mask'.
> >
> > By passing `masktype' instead, we are able to get the correct number of
> > vector subparts and thu eliminate the ICE in the call to
> > `vect_get_loop_mask' when the data type for which we retrieve the mask
> > is wider than the one used when defining the mask at mask registration
> > time.
> >
> > gcc/ChangeLog:
> >
> > 	PR target/96342
> > 	* tree-vect-stmts.cc (vectorizable_simd_clone_call):
> > 	s/vectype/masktype/ in call to vect_get_loop_mask.
> >
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Ok for master?
> 
> Hmm, I think the corresponding vect_record_loop_mask is
> 
>             case SIMD_CLONE_ARG_TYPE_MASK:
>               if (loop_vinfo
>                   && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
>                 {
>                   unsigned nmasks
>                     = exact_div (ncopies * bestn->simdclone->simdlen,
>                                  TYPE_VECTOR_SUBPARTS
> (vectype)).to_constant ();
>                   vect_record_loop_mask (loop_vinfo,
>                                          &LOOP_VINFO_MASKS (loop_vinfo),
>                                          nmasks, vectype, op);
>                 }
> 
> and that doesn't exactly match the bestn->simdclone->nargs - 1
> mask use then unless bestn->simdclone->args[mask_i].vector_type
> == vectype?
> 
> The patch looks OK, but I think we miss a corresponding
> vect_record_loop_mask case.  I also wonder how we support a SIMD clone
> call where the result vector type subparts doesn't match the argument
> or loop mask number of subparts?
> 

I think it's correct because of the difference in type is accounted for In the usage:

As you posted, when we register a mask we use nmasks, because it's
ncopies * bestn->simdclone->simdlen. So nmasks represents the widest input.

When using it we do:

          mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
                         ncopies, vectype, j);

My understanding is that the reason for this is that when registering the mask
we want to register the smaller type, and when using it, since you may have
decomposed in multiple vectors, vect_get_look_mask on the smaller type will
give you the unpacked mask.

The problem was before that we passed the wrong type here.  So we gave the input
type rather than the expected decomposed mask.

Thanks,
Tamar

> > Thanks,
> > Tamar
> >
> > ---
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index
> 497a31322accba8672b82dee00f5403b40dca22b..be1139a423c85608755f107
> 50bb68e70223a84ea 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -4964,7 +4964,7 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> stmt_vec_info stmt_info,
> >  		{
> >  		  vec_loop_masks *loop_masks = &LOOP_VINFO_MASKS
> (loop_vinfo);
> >  		  mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
> > -					     ncopies, vectype, j);
> > +					     ncopies, masktype, j);
> >  		}
> >  	      else
> >  		mask = vect_build_all_ones_mask (vinfo, stmt_info, masktype);
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
  
Richard Biener Dec. 4, 2024, 4:19 p.m. UTC | #3
> Am 04.12.2024 um 16:26 schrieb Tamar Christina <tamar.christina@arm.com>:
> 
> 
>> 
>> -----Original Message-----
>> From: Richard Biener <rguenther@suse.de>
>> Sent: Wednesday, December 4, 2024 2:43 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
>> Subject: Re: [PATCH 4/7]middle-end: Fix mask length arg in call to
>> vect_get_loop_mask [PR96342]
>> 
>>> On Wed, 4 Dec 2024, Tamar Christina wrote:
>>> 
>>> Hi All,
>>> 
>>> When issuing multiple calls to a simdclone in a vectorized loop,
>>> TYPE_VECTOR_SUBPARTS(vectype) gives the incorrect number when compared
>>> to the TYPE_VECTOR_SUBPARTS result we get from the mask type derived
>>> from the relevant `rgroup_controls' entry within `vect_get_loop_mask'.
>>> 
>>> By passing `masktype' instead, we are able to get the correct number of
>>> vector subparts and thu eliminate the ICE in the call to
>>> `vect_get_loop_mask' when the data type for which we retrieve the mask
>>> is wider than the one used when defining the mask at mask registration
>>> time.
>>> 
>>> gcc/ChangeLog:
>>> 
>>>    PR target/96342
>>>    * tree-vect-stmts.cc (vectorizable_simd_clone_call):
>>>    s/vectype/masktype/ in call to vect_get_loop_mask.
>>> 
>>> 
>>> Bootstrapped Regtested on aarch64-none-linux-gnu,
>>> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
>>> -m32, -m64 and no issues.
>>> 
>>> Ok for master?
>> 
>> Hmm, I think the corresponding vect_record_loop_mask is
>> 
>>            case SIMD_CLONE_ARG_TYPE_MASK:
>>              if (loop_vinfo
>>                  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
>>                {
>>                  unsigned nmasks
>>                    = exact_div (ncopies * bestn->simdclone->simdlen,
>>                                 TYPE_VECTOR_SUBPARTS
>> (vectype)).to_constant ();
>>                  vect_record_loop_mask (loop_vinfo,
>>                                         &LOOP_VINFO_MASKS (loop_vinfo),
>>                                         nmasks, vectype, op);
>>                }
>> 
>> and that doesn't exactly match the bestn->simdclone->nargs - 1
>> mask use then unless bestn->simdclone->args[mask_i].vector_type
>> == vectype?
>> 
>> The patch looks OK, but I think we miss a corresponding
>> vect_record_loop_mask case.  I also wonder how we support a SIMD clone
>> call where the result vector type subparts doesn't match the argument
>> or loop mask number of subparts?
>> 
> 
> I think it's correct because of the difference in type is accounted for In the usage:
> 
> As you posted, when we register a mask we use nmasks, because it's
> ncopies * bestn->simdclone->simdlen. So nmasks represents the widest input.
> 
> When using it we do:
> 
>          mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
>                         ncopies, vectype, j);
> 
> My understanding is that the reason for this is that when registering the mask
> we want to register the smaller type, and when using it, since you may have
> decomposed in multiple vectors, vect_get_look_mask on the smaller type will
> give you the unpacked mask.
> 
> The problem was before that we passed the wrong type here.  So we gave the input
> type rather than the expected decomposed mask.

Yes, the Type Change definitely is OK.  I just Wondered About The registering…

Richard 

> 
> Thanks,
> Tamar
> 
>>> Thanks,
>>> Tamar
>>> 
>>> ---
>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>>> index
>> 497a31322accba8672b82dee00f5403b40dca22b..be1139a423c85608755f107
>> 50bb68e70223a84ea 100644
>>> --- a/gcc/tree-vect-stmts.cc
>>> +++ b/gcc/tree-vect-stmts.cc
>>> @@ -4964,7 +4964,7 @@ vectorizable_simd_clone_call (vec_info *vinfo,
>> stmt_vec_info stmt_info,
>>>        {
>>>          vec_loop_masks *loop_masks = &LOOP_VINFO_MASKS
>> (loop_vinfo);
>>>          mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
>>> -                         ncopies, vectype, j);
>>> +                         ncopies, masktype, j);
>>>        }
>>>          else
>>>        mask = vect_build_all_ones_mask (vinfo, stmt_info, masktype);
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> --
>> Richard Biener <rguenther@suse.de>
>> SUSE Software Solutions Germany GmbH,
>> Frankenstrasse 146, 90461 Nuernberg, Germany;
>> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
  

Patch

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 497a31322accba8672b82dee00f5403b40dca22b..be1139a423c85608755f10750bb68e70223a84ea 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4964,7 +4964,7 @@  vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		{
 		  vec_loop_masks *loop_masks = &LOOP_VINFO_MASKS (loop_vinfo);
 		  mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
-					     ncopies, vectype, j);
+					     ncopies, masktype, j);
 		}
 	      else
 		mask = vect_build_all_ones_mask (vinfo, stmt_info, masktype);