[vect] PR103971, PR103977: Fix epilogue mode selection for autodetect only

Message ID a18f3ba2-ed80-28d5-5bd9-5fd19cbaa227@arm.com
State New
Headers
Series [vect] PR103971, PR103977: Fix epilogue mode selection for autodetect only |

Commit Message

Andre Vieira (lists) Jan. 12, 2022, 10:13 a.m. UTC
  Hi,

This a fix for the regression caused by '[vect] Re-analyze all modes for 
epilogues'. The earlier patch assumed there was always at least one 
other mode than VOIDmode, but that does not need to be the case.
If we are dealing with a target that does not define more modes for 
'autovectorize_vector_modes', the behaviour before the patch would be to 
try to create an epilogue for the same autodetected_vector_mode, which 
unless the target supported partial vectors would always fail. So as a 
fix I suggest trying to vectorize the epilogue with the 
preferred_simd_mode for QI, mimicking autovectorize_vector_mode, which 
will be skipped if it is not a vector_mode (since that already should 
indicate partial vectors aren't possible) or if no partial vectors are 
supported and its pessimistic NUNITS is larger than the main loop's VF.

Currently bootstrapping and regression testing, otherwise OK for trunk? 
Can someone verify this fixes the issue for PR103971 on powerpc?

gcc/ChangeLog:

         * tree-vect-loop.c (vect-analyze-loop): Handle scenario where 
target
         does add autovectorize_vector_modes.
  

Comments

Richard Biener Jan. 12, 2022, 10:29 a.m. UTC | #1
On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:

> Hi,
> 
> This a fix for the regression caused by '[vect] Re-analyze all modes for
> epilogues'. The earlier patch assumed there was always at least one other mode
> than VOIDmode, but that does not need to be the case.
> If we are dealing with a target that does not define more modes for
> 'autovectorize_vector_modes', the behaviour before the patch would be to try
> to create an epilogue for the same autodetected_vector_mode, which unless the
> target supported partial vectors would always fail. So as a fix I suggest
> trying to vectorize the epilogue with the preferred_simd_mode for QI,
> mimicking autovectorize_vector_mode, which will be skipped if it is not a
> vector_mode (since that already should indicate partial vectors aren't
> possible) or if no partial vectors are supported and its pessimistic NUNITS is
> larger than the main loop's VF.
> 
> Currently bootstrapping and regression testing, otherwise OK for trunk? Can
> someone verify this fixes the issue for PR103971 on powerpc?

Why not simply start at mode_i = 0 which means autodetecting the mode
to use for the epilogue?  That appears to be a much simpler solution to
me, including for targets where there are more than one element in the
vector.

Richard.
  
Richard Sandiford Jan. 12, 2022, 10:33 a.m. UTC | #2
Richard Biener <rguenther@suse.de> writes:
> On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
>
>> Hi,
>> 
>> This a fix for the regression caused by '[vect] Re-analyze all modes for
>> epilogues'. The earlier patch assumed there was always at least one other mode
>> than VOIDmode, but that does not need to be the case.
>> If we are dealing with a target that does not define more modes for
>> 'autovectorize_vector_modes', the behaviour before the patch would be to try
>> to create an epilogue for the same autodetected_vector_mode, which unless the
>> target supported partial vectors would always fail. So as a fix I suggest
>> trying to vectorize the epilogue with the preferred_simd_mode for QI,
>> mimicking autovectorize_vector_mode, which will be skipped if it is not a
>> vector_mode (since that already should indicate partial vectors aren't
>> possible) or if no partial vectors are supported and its pessimistic NUNITS is
>> larger than the main loop's VF.
>> 
>> Currently bootstrapping and regression testing, otherwise OK for trunk? Can
>> someone verify this fixes the issue for PR103971 on powerpc?
>
> Why not simply start at mode_i = 0 which means autodetecting the mode
> to use for the epilogue?  That appears to be a much simpler solution to
> me, including for targets where there are more than one element in the
> vector.

VOIDmode doesn't tell us anything about what the autodetected mode
will be, so current short-circuit:

      /* If the target does not support partial vectors we can shorten the
	 number of modes to analyze for the epilogue as we know we can't pick a
	 mode that has at least as many NUNITS as the main loop's vectorization
	 factor, since that would imply the epilogue's vectorization factor
	 would be at least as high as the main loop's and we would be
	 vectorizing for more scalar iterations than there would be left.  */
      if (!supports_partial_vectors
	  && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
	{
	  mode_i++;
	  if (mode_i == vector_modes.length ())
	    break;
	  continue;
	}

wouldn't be effective.

Thanks,
Richard
  
Richard Biener Jan. 12, 2022, 10:40 a.m. UTC | #3
On Wed, 12 Jan 2022, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
> >
> >> Hi,
> >> 
> >> This a fix for the regression caused by '[vect] Re-analyze all modes for
> >> epilogues'. The earlier patch assumed there was always at least one other mode
> >> than VOIDmode, but that does not need to be the case.
> >> If we are dealing with a target that does not define more modes for
> >> 'autovectorize_vector_modes', the behaviour before the patch would be to try
> >> to create an epilogue for the same autodetected_vector_mode, which unless the
> >> target supported partial vectors would always fail. So as a fix I suggest
> >> trying to vectorize the epilogue with the preferred_simd_mode for QI,
> >> mimicking autovectorize_vector_mode, which will be skipped if it is not a
> >> vector_mode (since that already should indicate partial vectors aren't
> >> possible) or if no partial vectors are supported and its pessimistic NUNITS is
> >> larger than the main loop's VF.
> >> 
> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can
> >> someone verify this fixes the issue for PR103971 on powerpc?
> >
> > Why not simply start at mode_i = 0 which means autodetecting the mode
> > to use for the epilogue?  That appears to be a much simpler solution to
> > me, including for targets where there are more than one element in the
> > vector.
> 
> VOIDmode doesn't tell us anything about what the autodetected mode
> will be, so current short-circuit:
> 
>       /* If the target does not support partial vectors we can shorten the
> 	 number of modes to analyze for the epilogue as we know we can't pick a
> 	 mode that has at least as many NUNITS as the main loop's vectorization
> 	 factor, since that would imply the epilogue's vectorization factor
> 	 would be at least as high as the main loop's and we would be
> 	 vectorizing for more scalar iterations than there would be left.  */
>       if (!supports_partial_vectors
> 	  && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
> 	{
> 	  mode_i++;
> 	  if (mode_i == vector_modes.length ())
> 	    break;
> 	  continue;
> 	}
> 
> wouldn't be effective.

Well, before this change we simply did

-  /* Handle the case that the original loop can use partial
-     vectorization, but want to only adopt it for the epilogue.
-     The retry should be in the same mode as original.  */
-  if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
...
-  else
-    {
-      mode_i = first_loop_next_i;
-      if (mode_i == vector_modes.length ())
-       return first_loop_vinfo;
-    }

and thus didn't bother with epilogue vectorization.  I think we should
then just restore this behavior, not doing epilogue vectorization
if vector_modes.length () == 1?

Richard.
  
Richard Sandiford Jan. 12, 2022, 10:43 a.m. UTC | #4
Richard Biener <rguenther@suse.de> writes:
> On Wed, 12 Jan 2022, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
>> >
>> >> Hi,
>> >> 
>> >> This a fix for the regression caused by '[vect] Re-analyze all modes for
>> >> epilogues'. The earlier patch assumed there was always at least one other mode
>> >> than VOIDmode, but that does not need to be the case.
>> >> If we are dealing with a target that does not define more modes for
>> >> 'autovectorize_vector_modes', the behaviour before the patch would be to try
>> >> to create an epilogue for the same autodetected_vector_mode, which unless the
>> >> target supported partial vectors would always fail. So as a fix I suggest
>> >> trying to vectorize the epilogue with the preferred_simd_mode for QI,
>> >> mimicking autovectorize_vector_mode, which will be skipped if it is not a
>> >> vector_mode (since that already should indicate partial vectors aren't
>> >> possible) or if no partial vectors are supported and its pessimistic NUNITS is
>> >> larger than the main loop's VF.
>> >> 
>> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can
>> >> someone verify this fixes the issue for PR103971 on powerpc?
>> >
>> > Why not simply start at mode_i = 0 which means autodetecting the mode
>> > to use for the epilogue?  That appears to be a much simpler solution to
>> > me, including for targets where there are more than one element in the
>> > vector.
>> 
>> VOIDmode doesn't tell us anything about what the autodetected mode
>> will be, so current short-circuit:
>> 
>>       /* If the target does not support partial vectors we can shorten the
>> 	 number of modes to analyze for the epilogue as we know we can't pick a
>> 	 mode that has at least as many NUNITS as the main loop's vectorization
>> 	 factor, since that would imply the epilogue's vectorization factor
>> 	 would be at least as high as the main loop's and we would be
>> 	 vectorizing for more scalar iterations than there would be left.  */
>>       if (!supports_partial_vectors
>> 	  && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
>> 	{
>> 	  mode_i++;
>> 	  if (mode_i == vector_modes.length ())
>> 	    break;
>> 	  continue;
>> 	}
>> 
>> wouldn't be effective.
>
> Well, before this change we simply did
>
> -  /* Handle the case that the original loop can use partial
> -     vectorization, but want to only adopt it for the epilogue.
> -     The retry should be in the same mode as original.  */
> -  if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
> ...
> -  else
> -    {
> -      mode_i = first_loop_next_i;
> -      if (mode_i == vector_modes.length ())
> -       return first_loop_vinfo;
> -    }
>
> and thus didn't bother with epilogue vectorization.  I think we should
> then just restore this behavior, not doing epilogue vectorization
> if vector_modes.length () == 1?

Yeah, but that case didn't need epilogue vectorisation before.  This
series is adding support for unrolling, and targets with a single vector
size will benefit from epilogues in that case.

Thanks,
Richard
  
Richard Biener Jan. 12, 2022, 10:52 a.m. UTC | #5
On Wed, 12 Jan 2022, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Wed, 12 Jan 2022, Richard Sandiford wrote:
> >
> >> Richard Biener <rguenther@suse.de> writes:
> >> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
> >> >
> >> >> Hi,
> >> >> 
> >> >> This a fix for the regression caused by '[vect] Re-analyze all modes for
> >> >> epilogues'. The earlier patch assumed there was always at least one other mode
> >> >> than VOIDmode, but that does not need to be the case.
> >> >> If we are dealing with a target that does not define more modes for
> >> >> 'autovectorize_vector_modes', the behaviour before the patch would be to try
> >> >> to create an epilogue for the same autodetected_vector_mode, which unless the
> >> >> target supported partial vectors would always fail. So as a fix I suggest
> >> >> trying to vectorize the epilogue with the preferred_simd_mode for QI,
> >> >> mimicking autovectorize_vector_mode, which will be skipped if it is not a
> >> >> vector_mode (since that already should indicate partial vectors aren't
> >> >> possible) or if no partial vectors are supported and its pessimistic NUNITS is
> >> >> larger than the main loop's VF.
> >> >> 
> >> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can
> >> >> someone verify this fixes the issue for PR103971 on powerpc?
> >> >
> >> > Why not simply start at mode_i = 0 which means autodetecting the mode
> >> > to use for the epilogue?  That appears to be a much simpler solution to
> >> > me, including for targets where there are more than one element in the
> >> > vector.
> >> 
> >> VOIDmode doesn't tell us anything about what the autodetected mode
> >> will be, so current short-circuit:
> >> 
> >>       /* If the target does not support partial vectors we can shorten the
> >> 	 number of modes to analyze for the epilogue as we know we can't pick a
> >> 	 mode that has at least as many NUNITS as the main loop's vectorization
> >> 	 factor, since that would imply the epilogue's vectorization factor
> >> 	 would be at least as high as the main loop's and we would be
> >> 	 vectorizing for more scalar iterations than there would be left.  */
> >>       if (!supports_partial_vectors
> >> 	  && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
> >> 	{
> >> 	  mode_i++;
> >> 	  if (mode_i == vector_modes.length ())
> >> 	    break;
> >> 	  continue;
> >> 	}
> >> 
> >> wouldn't be effective.
> >
> > Well, before this change we simply did
> >
> > -  /* Handle the case that the original loop can use partial
> > -     vectorization, but want to only adopt it for the epilogue.
> > -     The retry should be in the same mode as original.  */
> > -  if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
> > ...
> > -  else
> > -    {
> > -      mode_i = first_loop_next_i;
> > -      if (mode_i == vector_modes.length ())
> > -       return first_loop_vinfo;
> > -    }
> >
> > and thus didn't bother with epilogue vectorization.  I think we should
> > then just restore this behavior, not doing epilogue vectorization
> > if vector_modes.length () == 1?
> 
> Yeah, but that case didn't need epilogue vectorisation before.  This
> series is adding support for unrolling, and targets with a single vector
> size will benefit from epilogues in that case.

But in that case (which we could detect), we could then just use
autodetected_vector_mode?  Like if we do before epilogue vect

 vector_modes[0] = autodetected_vector_mode;
 mode_i = 0;

thus replace VOIDmode with what we detected and then start at 0?
That is, the proposed patch looks very much like a hack to me.

I suppose the VECTOR_MODE_P check should be added to

      if (!supports_partial_vectors
          && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), 
first_vinfo_vf))
        {
          mode_i++;

instead.

> Thanks,
> Richard
>
  
Richard Sandiford Jan. 12, 2022, 11:02 a.m. UTC | #6
Richard Biener <rguenther@suse.de> writes:
> On Wed, 12 Jan 2022, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > On Wed, 12 Jan 2022, Richard Sandiford wrote:
>> >
>> >> Richard Biener <rguenther@suse.de> writes:
>> >> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
>> >> >
>> >> >> Hi,
>> >> >> 
>> >> >> This a fix for the regression caused by '[vect] Re-analyze all modes for
>> >> >> epilogues'. The earlier patch assumed there was always at least one other mode
>> >> >> than VOIDmode, but that does not need to be the case.
>> >> >> If we are dealing with a target that does not define more modes for
>> >> >> 'autovectorize_vector_modes', the behaviour before the patch would be to try
>> >> >> to create an epilogue for the same autodetected_vector_mode, which unless the
>> >> >> target supported partial vectors would always fail. So as a fix I suggest
>> >> >> trying to vectorize the epilogue with the preferred_simd_mode for QI,
>> >> >> mimicking autovectorize_vector_mode, which will be skipped if it is not a
>> >> >> vector_mode (since that already should indicate partial vectors aren't
>> >> >> possible) or if no partial vectors are supported and its pessimistic NUNITS is
>> >> >> larger than the main loop's VF.
>> >> >> 
>> >> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can
>> >> >> someone verify this fixes the issue for PR103971 on powerpc?
>> >> >
>> >> > Why not simply start at mode_i = 0 which means autodetecting the mode
>> >> > to use for the epilogue?  That appears to be a much simpler solution to
>> >> > me, including for targets where there are more than one element in the
>> >> > vector.
>> >> 
>> >> VOIDmode doesn't tell us anything about what the autodetected mode
>> >> will be, so current short-circuit:
>> >> 
>> >>       /* If the target does not support partial vectors we can shorten the
>> >> 	 number of modes to analyze for the epilogue as we know we can't pick a
>> >> 	 mode that has at least as many NUNITS as the main loop's vectorization
>> >> 	 factor, since that would imply the epilogue's vectorization factor
>> >> 	 would be at least as high as the main loop's and we would be
>> >> 	 vectorizing for more scalar iterations than there would be left.  */
>> >>       if (!supports_partial_vectors
>> >> 	  && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
>> >> 	{
>> >> 	  mode_i++;
>> >> 	  if (mode_i == vector_modes.length ())
>> >> 	    break;
>> >> 	  continue;
>> >> 	}
>> >> 
>> >> wouldn't be effective.
>> >
>> > Well, before this change we simply did
>> >
>> > -  /* Handle the case that the original loop can use partial
>> > -     vectorization, but want to only adopt it for the epilogue.
>> > -     The retry should be in the same mode as original.  */
>> > -  if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
>> > ...
>> > -  else
>> > -    {
>> > -      mode_i = first_loop_next_i;
>> > -      if (mode_i == vector_modes.length ())
>> > -       return first_loop_vinfo;
>> > -    }
>> >
>> > and thus didn't bother with epilogue vectorization.  I think we should
>> > then just restore this behavior, not doing epilogue vectorization
>> > if vector_modes.length () == 1?
>> 
>> Yeah, but that case didn't need epilogue vectorisation before.  This
>> series is adding support for unrolling, and targets with a single vector
>> size will benefit from epilogues in that case.
>
> But in that case (which we could detect), we could then just use
> autodetected_vector_mode?  Like if we do before epilogue vect
>
>  vector_modes[0] = autodetected_vector_mode;
>  mode_i = 0;
>
> thus replace VOIDmode with what we detected and then start at 0?
> That is, the proposed patch looks very much like a hack to me.

You mean check whether the loop is unrolled?  If so, that's what feels
like a hack to me :-)  The question is whether there are enough elements
for epilogue vectorisation to make sense.  The VF is what tells us that.
Unrolling is just one of the things that influences that VF and I don't
think we should check for the individual influences.  It's just the end
result that matters.

> I suppose the VECTOR_MODE_P check should be added to
>
>       if (!supports_partial_vectors
>           && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), 
> first_vinfo_vf))
>         {
>           mode_i++;
>
> instead.

You mean:

      if (!supports_partial_vectors
	  && VECTOR_MODE_P (vector_modes[mode_i])
	  && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
	{
	  mode_i++;

?  If so, the skip won't be effective the first time round.

Thanks,
Richard
  
Richard Biener Jan. 12, 2022, 11:28 a.m. UTC | #7
On Wed, 12 Jan 2022, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Wed, 12 Jan 2022, Richard Sandiford wrote:
> >
> >> Richard Biener <rguenther@suse.de> writes:
> >> > On Wed, 12 Jan 2022, Richard Sandiford wrote:
> >> >
> >> >> Richard Biener <rguenther@suse.de> writes:
> >> >> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
> >> >> >
> >> >> >> Hi,
> >> >> >> 
> >> >> >> This a fix for the regression caused by '[vect] Re-analyze all modes for
> >> >> >> epilogues'. The earlier patch assumed there was always at least one other mode
> >> >> >> than VOIDmode, but that does not need to be the case.
> >> >> >> If we are dealing with a target that does not define more modes for
> >> >> >> 'autovectorize_vector_modes', the behaviour before the patch would be to try
> >> >> >> to create an epilogue for the same autodetected_vector_mode, which unless the
> >> >> >> target supported partial vectors would always fail. So as a fix I suggest
> >> >> >> trying to vectorize the epilogue with the preferred_simd_mode for QI,
> >> >> >> mimicking autovectorize_vector_mode, which will be skipped if it is not a
> >> >> >> vector_mode (since that already should indicate partial vectors aren't
> >> >> >> possible) or if no partial vectors are supported and its pessimistic NUNITS is
> >> >> >> larger than the main loop's VF.
> >> >> >> 
> >> >> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can
> >> >> >> someone verify this fixes the issue for PR103971 on powerpc?
> >> >> >
> >> >> > Why not simply start at mode_i = 0 which means autodetecting the mode
> >> >> > to use for the epilogue?  That appears to be a much simpler solution to
> >> >> > me, including for targets where there are more than one element in the
> >> >> > vector.
> >> >> 
> >> >> VOIDmode doesn't tell us anything about what the autodetected mode
> >> >> will be, so current short-circuit:
> >> >> 
> >> >>       /* If the target does not support partial vectors we can shorten the
> >> >> 	 number of modes to analyze for the epilogue as we know we can't pick a
> >> >> 	 mode that has at least as many NUNITS as the main loop's vectorization
> >> >> 	 factor, since that would imply the epilogue's vectorization factor
> >> >> 	 would be at least as high as the main loop's and we would be
> >> >> 	 vectorizing for more scalar iterations than there would be left.  */
> >> >>       if (!supports_partial_vectors
> >> >> 	  && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
> >> >> 	{
> >> >> 	  mode_i++;
> >> >> 	  if (mode_i == vector_modes.length ())
> >> >> 	    break;
> >> >> 	  continue;
> >> >> 	}
> >> >> 
> >> >> wouldn't be effective.
> >> >
> >> > Well, before this change we simply did
> >> >
> >> > -  /* Handle the case that the original loop can use partial
> >> > -     vectorization, but want to only adopt it for the epilogue.
> >> > -     The retry should be in the same mode as original.  */
> >> > -  if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
> >> > ...
> >> > -  else
> >> > -    {
> >> > -      mode_i = first_loop_next_i;
> >> > -      if (mode_i == vector_modes.length ())
> >> > -       return first_loop_vinfo;
> >> > -    }
> >> >
> >> > and thus didn't bother with epilogue vectorization.  I think we should
> >> > then just restore this behavior, not doing epilogue vectorization
> >> > if vector_modes.length () == 1?
> >> 
> >> Yeah, but that case didn't need epilogue vectorisation before.  This
> >> series is adding support for unrolling, and targets with a single vector
> >> size will benefit from epilogues in that case.
> >
> > But in that case (which we could detect), we could then just use
> > autodetected_vector_mode?  Like if we do before epilogue vect
> >
> >  vector_modes[0] = autodetected_vector_mode;
> >  mode_i = 0;
> >
> > thus replace VOIDmode with what we detected and then start at 0?
> > That is, the proposed patch looks very much like a hack to me.
> 
> You mean check whether the loop is unrolled?  If so, that's what feels
> like a hack to me :-)  The question is whether there are enough elements
> for epilogue vectorisation to make sense.  The VF is what tells us that.
> Unrolling is just one of the things that influences that VF and I don't
> think we should check for the individual influences.  It's just the end
> result that matters.
> 
> > I suppose the VECTOR_MODE_P check should be added to
> >
> >       if (!supports_partial_vectors
> >           && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), 
> > first_vinfo_vf))
> >         {
> >           mode_i++;
> >
> > instead.
> 
> You mean:
> 
>       if (!supports_partial_vectors
> 	  && VECTOR_MODE_P (vector_modes[mode_i])
> 	  && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
> 	{
> 	  mode_i++;
> 
> ?  If so, the skip won't be effective the first time round.

Why?  See above where I set vector_modes[0] to autodetected_vector_mode.

Richard.
  
Richard Sandiford Jan. 12, 2022, 11:44 a.m. UTC | #8
Richard Biener <rguenther@suse.de> writes:
> On Wed, 12 Jan 2022, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > On Wed, 12 Jan 2022, Richard Sandiford wrote:
>> >
>> >> Richard Biener <rguenther@suse.de> writes:
>> >> > On Wed, 12 Jan 2022, Richard Sandiford wrote:
>> >> >
>> >> >> Richard Biener <rguenther@suse.de> writes:
>> >> >> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
>> >> >> >
>> >> >> >> Hi,
>> >> >> >> 
>> >> >> >> This a fix for the regression caused by '[vect] Re-analyze all modes for
>> >> >> >> epilogues'. The earlier patch assumed there was always at least one other mode
>> >> >> >> than VOIDmode, but that does not need to be the case.
>> >> >> >> If we are dealing with a target that does not define more modes for
>> >> >> >> 'autovectorize_vector_modes', the behaviour before the patch would be to try
>> >> >> >> to create an epilogue for the same autodetected_vector_mode, which unless the
>> >> >> >> target supported partial vectors would always fail. So as a fix I suggest
>> >> >> >> trying to vectorize the epilogue with the preferred_simd_mode for QI,
>> >> >> >> mimicking autovectorize_vector_mode, which will be skipped if it is not a
>> >> >> >> vector_mode (since that already should indicate partial vectors aren't
>> >> >> >> possible) or if no partial vectors are supported and its pessimistic NUNITS is
>> >> >> >> larger than the main loop's VF.
>> >> >> >> 
>> >> >> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can
>> >> >> >> someone verify this fixes the issue for PR103971 on powerpc?
>> >> >> >
>> >> >> > Why not simply start at mode_i = 0 which means autodetecting the mode
>> >> >> > to use for the epilogue?  That appears to be a much simpler solution to
>> >> >> > me, including for targets where there are more than one element in the
>> >> >> > vector.
>> >> >> 
>> >> >> VOIDmode doesn't tell us anything about what the autodetected mode
>> >> >> will be, so current short-circuit:
>> >> >> 
>> >> >>       /* If the target does not support partial vectors we can shorten the
>> >> >> 	 number of modes to analyze for the epilogue as we know we can't pick a
>> >> >> 	 mode that has at least as many NUNITS as the main loop's vectorization
>> >> >> 	 factor, since that would imply the epilogue's vectorization factor
>> >> >> 	 would be at least as high as the main loop's and we would be
>> >> >> 	 vectorizing for more scalar iterations than there would be left.  */
>> >> >>       if (!supports_partial_vectors
>> >> >> 	  && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
>> >> >> 	{
>> >> >> 	  mode_i++;
>> >> >> 	  if (mode_i == vector_modes.length ())
>> >> >> 	    break;
>> >> >> 	  continue;
>> >> >> 	}
>> >> >> 
>> >> >> wouldn't be effective.
>> >> >
>> >> > Well, before this change we simply did
>> >> >
>> >> > -  /* Handle the case that the original loop can use partial
>> >> > -     vectorization, but want to only adopt it for the epilogue.
>> >> > -     The retry should be in the same mode as original.  */
>> >> > -  if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
>> >> > ...
>> >> > -  else
>> >> > -    {
>> >> > -      mode_i = first_loop_next_i;
>> >> > -      if (mode_i == vector_modes.length ())
>> >> > -       return first_loop_vinfo;
>> >> > -    }
>> >> >
>> >> > and thus didn't bother with epilogue vectorization.  I think we should
>> >> > then just restore this behavior, not doing epilogue vectorization
>> >> > if vector_modes.length () == 1?
>> >> 
>> >> Yeah, but that case didn't need epilogue vectorisation before.  This
>> >> series is adding support for unrolling, and targets with a single vector
>> >> size will benefit from epilogues in that case.
>> >
>> > But in that case (which we could detect), we could then just use
>> > autodetected_vector_mode?  Like if we do before epilogue vect
>> >
>> >  vector_modes[0] = autodetected_vector_mode;
>> >  mode_i = 0;
>> >
>> > thus replace VOIDmode with what we detected and then start at 0?
>> > That is, the proposed patch looks very much like a hack to me.
>> 
>> You mean check whether the loop is unrolled?  If so, that's what feels
>> like a hack to me :-)  The question is whether there are enough elements
>> for epilogue vectorisation to make sense.  The VF is what tells us that.
>> Unrolling is just one of the things that influences that VF and I don't
>> think we should check for the individual influences.  It's just the end
>> result that matters.
>> 
>> > I suppose the VECTOR_MODE_P check should be added to
>> >
>> >       if (!supports_partial_vectors
>> >           && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), 
>> > first_vinfo_vf))
>> >         {
>> >           mode_i++;
>> >
>> > instead.
>> 
>> You mean:
>> 
>>       if (!supports_partial_vectors
>> 	  && VECTOR_MODE_P (vector_modes[mode_i])
>> 	  && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
>> 	{
>> 	  mode_i++;
>> 
>> ?  If so, the skip won't be effective the first time round.
>
> Why?  See above where I set vector_modes[0] to autodetected_vector_mode.

Ah, yeah, I guess that works, sorry.  It still feels odd to iterate
through N+1 modes when we don't need autodetection (and with the above,
don't use autodetection), but I can live with it. :-)

Another alternative would be to push autodetected_vector_mode when the
length is 1 and keep 1 as the starting point.

Richard
  
Andre Vieira (lists) Jan. 12, 2022, 11:56 a.m. UTC | #9
On 12/01/2022 11:44, Richard Sandiford wrote:
> Another alternative would be to push autodetected_vector_mode when the
> length is 1 and keep 1 as the starting point.
>
> Richard

I'm guessing we would still want to skip epilogue vectorization if 
!VECTOR_MODE_P (autodetected_vector_mode) in that case?
  
Richard Biener Jan. 12, 2022, 11:57 a.m. UTC | #10
On Wed, 12 Jan 2022, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Wed, 12 Jan 2022, Richard Sandiford wrote:
> >
> >> Richard Biener <rguenther@suse.de> writes:
> >> > On Wed, 12 Jan 2022, Richard Sandiford wrote:
> >> >
> >> >> Richard Biener <rguenther@suse.de> writes:
> >> >> > On Wed, 12 Jan 2022, Richard Sandiford wrote:
> >> >> >
> >> >> >> Richard Biener <rguenther@suse.de> writes:
> >> >> >> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
> >> >> >> >
> >> >> >> >> Hi,
> >> >> >> >> 
> >> >> >> >> This a fix for the regression caused by '[vect] Re-analyze all modes for
> >> >> >> >> epilogues'. The earlier patch assumed there was always at least one other mode
> >> >> >> >> than VOIDmode, but that does not need to be the case.
> >> >> >> >> If we are dealing with a target that does not define more modes for
> >> >> >> >> 'autovectorize_vector_modes', the behaviour before the patch would be to try
> >> >> >> >> to create an epilogue for the same autodetected_vector_mode, which unless the
> >> >> >> >> target supported partial vectors would always fail. So as a fix I suggest
> >> >> >> >> trying to vectorize the epilogue with the preferred_simd_mode for QI,
> >> >> >> >> mimicking autovectorize_vector_mode, which will be skipped if it is not a
> >> >> >> >> vector_mode (since that already should indicate partial vectors aren't
> >> >> >> >> possible) or if no partial vectors are supported and its pessimistic NUNITS is
> >> >> >> >> larger than the main loop's VF.
> >> >> >> >> 
> >> >> >> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can
> >> >> >> >> someone verify this fixes the issue for PR103971 on powerpc?
> >> >> >> >
> >> >> >> > Why not simply start at mode_i = 0 which means autodetecting the mode
> >> >> >> > to use for the epilogue?  That appears to be a much simpler solution to
> >> >> >> > me, including for targets where there are more than one element in the
> >> >> >> > vector.
> >> >> >> 
> >> >> >> VOIDmode doesn't tell us anything about what the autodetected mode
> >> >> >> will be, so current short-circuit:
> >> >> >> 
> >> >> >>       /* If the target does not support partial vectors we can shorten the
> >> >> >> 	 number of modes to analyze for the epilogue as we know we can't pick a
> >> >> >> 	 mode that has at least as many NUNITS as the main loop's vectorization
> >> >> >> 	 factor, since that would imply the epilogue's vectorization factor
> >> >> >> 	 would be at least as high as the main loop's and we would be
> >> >> >> 	 vectorizing for more scalar iterations than there would be left.  */
> >> >> >>       if (!supports_partial_vectors
> >> >> >> 	  && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
> >> >> >> 	{
> >> >> >> 	  mode_i++;
> >> >> >> 	  if (mode_i == vector_modes.length ())
> >> >> >> 	    break;
> >> >> >> 	  continue;
> >> >> >> 	}
> >> >> >> 
> >> >> >> wouldn't be effective.
> >> >> >
> >> >> > Well, before this change we simply did
> >> >> >
> >> >> > -  /* Handle the case that the original loop can use partial
> >> >> > -     vectorization, but want to only adopt it for the epilogue.
> >> >> > -     The retry should be in the same mode as original.  */
> >> >> > -  if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
> >> >> > ...
> >> >> > -  else
> >> >> > -    {
> >> >> > -      mode_i = first_loop_next_i;
> >> >> > -      if (mode_i == vector_modes.length ())
> >> >> > -       return first_loop_vinfo;
> >> >> > -    }
> >> >> >
> >> >> > and thus didn't bother with epilogue vectorization.  I think we should
> >> >> > then just restore this behavior, not doing epilogue vectorization
> >> >> > if vector_modes.length () == 1?
> >> >> 
> >> >> Yeah, but that case didn't need epilogue vectorisation before.  This
> >> >> series is adding support for unrolling, and targets with a single vector
> >> >> size will benefit from epilogues in that case.
> >> >
> >> > But in that case (which we could detect), we could then just use
> >> > autodetected_vector_mode?  Like if we do before epilogue vect
> >> >
> >> >  vector_modes[0] = autodetected_vector_mode;
> >> >  mode_i = 0;
> >> >
> >> > thus replace VOIDmode with what we detected and then start at 0?
> >> > That is, the proposed patch looks very much like a hack to me.
> >> 
> >> You mean check whether the loop is unrolled?  If so, that's what feels
> >> like a hack to me :-)  The question is whether there are enough elements
> >> for epilogue vectorisation to make sense.  The VF is what tells us that.
> >> Unrolling is just one of the things that influences that VF and I don't
> >> think we should check for the individual influences.  It's just the end
> >> result that matters.
> >> 
> >> > I suppose the VECTOR_MODE_P check should be added to
> >> >
> >> >       if (!supports_partial_vectors
> >> >           && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), 
> >> > first_vinfo_vf))
> >> >         {
> >> >           mode_i++;
> >> >
> >> > instead.
> >> 
> >> You mean:
> >> 
> >>       if (!supports_partial_vectors
> >> 	  && VECTOR_MODE_P (vector_modes[mode_i])
> >> 	  && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
> >> 	{
> >> 	  mode_i++;
> >> 
> >> ?  If so, the skip won't be effective the first time round.
> >
> > Why?  See above where I set vector_modes[0] to autodetected_vector_mode.
> 
> Ah, yeah, I guess that works, sorry.  It still feels odd to iterate
> through N+1 modes when we don't need autodetection (and with the above,
> don't use autodetection), but I can live with it. :-)

Of course we do autodetection anyway when doing main loop vectorization,
and when we want to iterate over all modes for epilogues it feels
natural to try that in the same order.

> Another alternative would be to push autodetected_vector_mode when the
> length is 1 and keep 1 as the starting point.

I understood that this is what Andres patch tried to do - just inventing
some "other" autodetected mode.  Note my idea eventually allows us to
get rid of autodetected_vector_mode as separate variable and just
have it be a reference to vector_modes[0] (or use that explicitely).

Richard.
  
Richard Biener Jan. 12, 2022, 11:59 a.m. UTC | #11
On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:

> 
> On 12/01/2022 11:44, Richard Sandiford wrote:
> > Another alternative would be to push autodetected_vector_mode when the
> > length is 1 and keep 1 as the starting point.
> >
> > Richard
> 
> I'm guessing we would still want to skip epilogue vectorization if
> !VECTOR_MODE_P (autodetected_vector_mode) in that case?

Practically we currently only support fixed width word_mode there,
but eventually one could end up with 64bit DImode for the main loop
and 32bit V4QImode in the epilogue ... so not sure if it's worth
special-casing.  But I don't mind adding that skip.

Richard.
  
Andre Vieira (lists) Jan. 12, 2022, 12:41 p.m. UTC | #12
On 12/01/2022 11:59, Richard Biener wrote:
> On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
>
>> On 12/01/2022 11:44, Richard Sandiford wrote:
>>> Another alternative would be to push autodetected_vector_mode when the
>>> length is 1 and keep 1 as the starting point.
>>>
>>> Richard
>> I'm guessing we would still want to skip epilogue vectorization if
>> !VECTOR_MODE_P (autodetected_vector_mode) in that case?
> Practically we currently only support fixed width word_mode there,
> but eventually one could end up with 64bit DImode for the main loop
> and 32bit V4QImode in the epilogue ... so not sure if it's worth
> special-casing.  But I don't mind adding that skip.
>
> Richard.

I left out the skip, it shouldn't break anything as it would try that 
same mode before anyway.
Just to clarify what I meant though was to skip if 
autodetected_vector_mode wasn't a vector AND the target didn't define 
autovectorize_vector_modes, so in that scenario it wouldn't ever try  
V4QImode for the epilogue if the mainloop was autodetected DImode, I 
think...
Either way, this is less code, less complicated and doesn't analyze more 
than it did before the original patch, so I'm happy with that too.

Is this what you had in mind?

Andre
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 6ed2b5f8724e5ebf27592f67d7f6bdfe1ebcf512..03459363afa48f0e2753bc7aa18cbf2771d2a4e5 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3023,7 +3023,16 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
      array may contain length-agnostic and length-specific modes.  Their
      ordering is not guaranteed, so we could end up picking a mode for the main
      loop that is after the epilogue's optimal mode.  */
-  mode_i = 1;
+  if (vector_modes.length () == 1)
+    {
+      /* If we only had VOIDmode then use AUTODETECTED_VECTOR_MODE to see if
+	 an epilogue can be created with that mode.  */
+      vector_modes[0] = autodetected_vector_mode;
+      mode_i = 0;
+    }
+  else
+    mode_i = 1;
+
   bool supports_partial_vectors = partial_vectors_supported_p ();
   poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo);
  
Richard Biener Jan. 12, 2022, 12:57 p.m. UTC | #13
On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:

> 
> On 12/01/2022 11:59, Richard Biener wrote:
> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
> >
> >> On 12/01/2022 11:44, Richard Sandiford wrote:
> >>> Another alternative would be to push autodetected_vector_mode when the
> >>> length is 1 and keep 1 as the starting point.
> >>>
> >>> Richard
> >> I'm guessing we would still want to skip epilogue vectorization if
> >> !VECTOR_MODE_P (autodetected_vector_mode) in that case?
> > Practically we currently only support fixed width word_mode there,
> > but eventually one could end up with 64bit DImode for the main loop
> > and 32bit V4QImode in the epilogue ... so not sure if it's worth
> > special-casing.  But I don't mind adding that skip.
> >
> > Richard.
> 
> I left out the skip, it shouldn't break anything as it would try that same
> mode before anyway.
> Just to clarify what I meant though was to skip if autodetected_vector_mode
> wasn't a vector AND the target didn't define autovectorize_vector_modes, so in
> that scenario it wouldn't ever try  V4QImode for the epilogue if the mainloop
> was autodetected DImode, I think...
> Either way, this is less code, less complicated and doesn't analyze more than
> it did before the original patch, so I'm happy with that too.
> 
> Is this what you had in mind?

-  mode_i = 1;
+  if (vector_modes.length () == 1)
+    {
+      /* If we only had VOIDmode then use AUTODETECTED_VECTOR_MODE to see
if
+        an epilogue can be created with that mode.  */
+      vector_modes[0] = autodetected_vector_mode;
+      mode_i = 0;
+    }
+  else
+    mode_i = 1;
+

I would have left out the condition and unconditionally do

  vector_modes[0] = autodetected_vector_mode;
  mode_i = 0;

but OK if you think it makes sense to special case length == 1.

Richard.
  
Li, Pan2 via Gcc-patches Jan. 12, 2022, 1:48 p.m. UTC | #14
I think we need a fix or a revert for this today, please.  Bootstrap has been broken
for a couple of days during the last week of stage 3, which is really problematic.

Thanks,
Bill

On 1/12/22 6:57 AM, Richard Biener via Gcc-patches wrote:
> On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
>
>> On 12/01/2022 11:59, Richard Biener wrote:
>>> On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
>>>
>>>> On 12/01/2022 11:44, Richard Sandiford wrote:
>>>>> Another alternative would be to push autodetected_vector_mode when the
>>>>> length is 1 and keep 1 as the starting point.
>>>>>
>>>>> Richard
>>>> I'm guessing we would still want to skip epilogue vectorization if
>>>> !VECTOR_MODE_P (autodetected_vector_mode) in that case?
>>> Practically we currently only support fixed width word_mode there,
>>> but eventually one could end up with 64bit DImode for the main loop
>>> and 32bit V4QImode in the epilogue ... so not sure if it's worth
>>> special-casing.  But I don't mind adding that skip.
>>>
>>> Richard.
>> I left out the skip, it shouldn't break anything as it would try that same
>> mode before anyway.
>> Just to clarify what I meant though was to skip if autodetected_vector_mode
>> wasn't a vector AND the target didn't define autovectorize_vector_modes, so in
>> that scenario it wouldn't ever try  V4QImode for the epilogue if the mainloop
>> was autodetected DImode, I think...
>> Either way, this is less code, less complicated and doesn't analyze more than
>> it did before the original patch, so I'm happy with that too.
>>
>> Is this what you had in mind?
> -  mode_i = 1;
> +  if (vector_modes.length () == 1)
> +    {
> +      /* If we only had VOIDmode then use AUTODETECTED_VECTOR_MODE to see
> if
> +        an epilogue can be created with that mode.  */
> +      vector_modes[0] = autodetected_vector_mode;
> +      mode_i = 0;
> +    }
> +  else
> +    mode_i = 1;
> +
>
> I would have left out the condition and unconditionally do
>
>   vector_modes[0] = autodetected_vector_mode;
>   mode_i = 0;
>
> but OK if you think it makes sense to special case length == 1.
>
> Richard.
  
Andre Vieira (lists) Jan. 12, 2022, 2:42 p.m. UTC | #15
On 12/01/2022 12:57, Richard Biener wrote:
> On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
>
>> On 12/01/2022 11:59, Richard Biener wrote:
>>> On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
>>>
>>>> On 12/01/2022 11:44, Richard Sandiford wrote:
>>>>> Another alternative would be to push autodetected_vector_mode when the
>>>>> length is 1 and keep 1 as the starting point.
>>>>>
>>>>> Richard
>>>> I'm guessing we would still want to skip epilogue vectorization if
>>>> !VECTOR_MODE_P (autodetected_vector_mode) in that case?
>>> Practically we currently only support fixed width word_mode there,
>>> but eventually one could end up with 64bit DImode for the main loop
>>> and 32bit V4QImode in the epilogue ... so not sure if it's worth
>>> special-casing.  But I don't mind adding that skip.
>>>
>>> Richard.
>> I left out the skip, it shouldn't break anything as it would try that same
>> mode before anyway.
>> Just to clarify what I meant though was to skip if autodetected_vector_mode
>> wasn't a vector AND the target didn't define autovectorize_vector_modes, so in
>> that scenario it wouldn't ever try  V4QImode for the epilogue if the mainloop
>> was autodetected DImode, I think...
>> Either way, this is less code, less complicated and doesn't analyze more than
>> it did before the original patch, so I'm happy with that too.
>>
>> Is this what you had in mind?
> -  mode_i = 1;
> +  if (vector_modes.length () == 1)
> +    {
> +      /* If we only had VOIDmode then use AUTODETECTED_VECTOR_MODE to see
> if
> +        an epilogue can be created with that mode.  */
> +      vector_modes[0] = autodetected_vector_mode;
> +      mode_i = 0;
> +    }
> +  else
> +    mode_i = 1;
> +
>
> I would have left out the condition and unconditionally do
>
>    vector_modes[0] = autodetected_vector_mode;
>    mode_i = 0;
>
> but OK if you think it makes sense to special case length == 1.
>
> Richard.

Tested without the special casing, all good, only have performance 
regressions left (which I'm working on), will commit this to fix the ICEs.

Thanks,
Andre
  

Patch

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 6ed2b5f8724e5ebf27592f67d7f6bdfe1ebcf512..c81ebc411312e649f9cd954895244c60c928fee1 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3024,6 +3024,18 @@  vect_analyze_loop (class loop *loop, vec_info_shared *shared)
      ordering is not guaranteed, so we could end up picking a mode for the main
      loop that is after the epilogue's optimal mode.  */
   mode_i = 1;
+  /* If we only had VOIDmode then push the AUTODETECTED_VECTOR_MODE to see if
+     an epilogue can be created with that mode.  */
+  if (vector_modes.length () == 1)
+    {
+      machine_mode preferred_mode
+	= targetm.vectorize.preferred_simd_mode (QImode);
+      /* If the preferred mode isn't a vector mode we will not be needing an
+	  epilogue.  */
+      if (!VECTOR_MODE_P (preferred_mode))
+	return first_loop_vinfo;
+      vector_modes.safe_push (preferred_mode);
+    }
   bool supports_partial_vectors = partial_vectors_supported_p ();
   poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo);