Message ID | a18f3ba2-ed80-28d5-5bd9-5fd19cbaa227@arm.com |
---|---|

State | New |

Headers | show |

Series | [vect] PR103971, PR103977: Fix epilogue mode selection for autodetect only | expand |

On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: > Hi, > > This a fix for the regression caused by '[vect] Re-analyze all modes for > epilogues'. The earlier patch assumed there was always at least one other mode > than VOIDmode, but that does not need to be the case. > If we are dealing with a target that does not define more modes for > 'autovectorize_vector_modes', the behaviour before the patch would be to try > to create an epilogue for the same autodetected_vector_mode, which unless the > target supported partial vectors would always fail. So as a fix I suggest > trying to vectorize the epilogue with the preferred_simd_mode for QI, > mimicking autovectorize_vector_mode, which will be skipped if it is not a > vector_mode (since that already should indicate partial vectors aren't > possible) or if no partial vectors are supported and its pessimistic NUNITS is > larger than the main loop's VF. > > Currently bootstrapping and regression testing, otherwise OK for trunk? Can > someone verify this fixes the issue for PR103971 on powerpc? Why not simply start at mode_i = 0 which means autodetecting the mode to use for the epilogue? That appears to be a much simpler solution to me, including for targets where there are more than one element in the vector. Richard.

Richard Biener <rguenther@suse.de> writes: > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: > >> Hi, >> >> This a fix for the regression caused by '[vect] Re-analyze all modes for >> epilogues'. The earlier patch assumed there was always at least one other mode >> than VOIDmode, but that does not need to be the case. >> If we are dealing with a target that does not define more modes for >> 'autovectorize_vector_modes', the behaviour before the patch would be to try >> to create an epilogue for the same autodetected_vector_mode, which unless the >> target supported partial vectors would always fail. So as a fix I suggest >> trying to vectorize the epilogue with the preferred_simd_mode for QI, >> mimicking autovectorize_vector_mode, which will be skipped if it is not a >> vector_mode (since that already should indicate partial vectors aren't >> possible) or if no partial vectors are supported and its pessimistic NUNITS is >> larger than the main loop's VF. >> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can >> someone verify this fixes the issue for PR103971 on powerpc? > > Why not simply start at mode_i = 0 which means autodetecting the mode > to use for the epilogue? That appears to be a much simpler solution to > me, including for targets where there are more than one element in the > vector. VOIDmode doesn't tell us anything about what the autodetected mode will be, so current short-circuit: /* If the target does not support partial vectors we can shorten the number of modes to analyze for the epilogue as we know we can't pick a mode that has at least as many NUNITS as the main loop's vectorization factor, since that would imply the epilogue's vectorization factor would be at least as high as the main loop's and we would be vectorizing for more scalar iterations than there would be left. */ if (!supports_partial_vectors && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf)) { mode_i++; if (mode_i == vector_modes.length ()) break; continue; } wouldn't be effective. Thanks, Richard

On Wed, 12 Jan 2022, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: > > > >> Hi, > >> > >> This a fix for the regression caused by '[vect] Re-analyze all modes for > >> epilogues'. The earlier patch assumed there was always at least one other mode > >> than VOIDmode, but that does not need to be the case. > >> If we are dealing with a target that does not define more modes for > >> 'autovectorize_vector_modes', the behaviour before the patch would be to try > >> to create an epilogue for the same autodetected_vector_mode, which unless the > >> target supported partial vectors would always fail. So as a fix I suggest > >> trying to vectorize the epilogue with the preferred_simd_mode for QI, > >> mimicking autovectorize_vector_mode, which will be skipped if it is not a > >> vector_mode (since that already should indicate partial vectors aren't > >> possible) or if no partial vectors are supported and its pessimistic NUNITS is > >> larger than the main loop's VF. > >> > >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can > >> someone verify this fixes the issue for PR103971 on powerpc? > > > > Why not simply start at mode_i = 0 which means autodetecting the mode > > to use for the epilogue? That appears to be a much simpler solution to > > me, including for targets where there are more than one element in the > > vector. > > VOIDmode doesn't tell us anything about what the autodetected mode > will be, so current short-circuit: > > /* If the target does not support partial vectors we can shorten the > number of modes to analyze for the epilogue as we know we can't pick a > mode that has at least as many NUNITS as the main loop's vectorization > factor, since that would imply the epilogue's vectorization factor > would be at least as high as the main loop's and we would be > vectorizing for more scalar iterations than there would be left. */ > if (!supports_partial_vectors > && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf)) > { > mode_i++; > if (mode_i == vector_modes.length ()) > break; > continue; > } > > wouldn't be effective. Well, before this change we simply did - /* Handle the case that the original loop can use partial - vectorization, but want to only adopt it for the epilogue. - The retry should be in the same mode as original. */ - if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) ... - else - { - mode_i = first_loop_next_i; - if (mode_i == vector_modes.length ()) - return first_loop_vinfo; - } and thus didn't bother with epilogue vectorization. I think we should then just restore this behavior, not doing epilogue vectorization if vector_modes.length () == 1? Richard.

Richard Biener <rguenther@suse.de> writes: > On Wed, 12 Jan 2022, Richard Sandiford wrote: > >> Richard Biener <rguenther@suse.de> writes: >> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: >> > >> >> Hi, >> >> >> >> This a fix for the regression caused by '[vect] Re-analyze all modes for >> >> epilogues'. The earlier patch assumed there was always at least one other mode >> >> than VOIDmode, but that does not need to be the case. >> >> If we are dealing with a target that does not define more modes for >> >> 'autovectorize_vector_modes', the behaviour before the patch would be to try >> >> to create an epilogue for the same autodetected_vector_mode, which unless the >> >> target supported partial vectors would always fail. So as a fix I suggest >> >> trying to vectorize the epilogue with the preferred_simd_mode for QI, >> >> mimicking autovectorize_vector_mode, which will be skipped if it is not a >> >> vector_mode (since that already should indicate partial vectors aren't >> >> possible) or if no partial vectors are supported and its pessimistic NUNITS is >> >> larger than the main loop's VF. >> >> >> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can >> >> someone verify this fixes the issue for PR103971 on powerpc? >> > >> > Why not simply start at mode_i = 0 which means autodetecting the mode >> > to use for the epilogue? That appears to be a much simpler solution to >> > me, including for targets where there are more than one element in the >> > vector. >> >> VOIDmode doesn't tell us anything about what the autodetected mode >> will be, so current short-circuit: >> >> /* If the target does not support partial vectors we can shorten the >> number of modes to analyze for the epilogue as we know we can't pick a >> mode that has at least as many NUNITS as the main loop's vectorization >> factor, since that would imply the epilogue's vectorization factor >> would be at least as high as the main loop's and we would be >> vectorizing for more scalar iterations than there would be left. */ >> if (!supports_partial_vectors >> && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf)) >> { >> mode_i++; >> if (mode_i == vector_modes.length ()) >> break; >> continue; >> } >> >> wouldn't be effective. > > Well, before this change we simply did > > - /* Handle the case that the original loop can use partial > - vectorization, but want to only adopt it for the epilogue. > - The retry should be in the same mode as original. */ > - if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) > ... > - else > - { > - mode_i = first_loop_next_i; > - if (mode_i == vector_modes.length ()) > - return first_loop_vinfo; > - } > > and thus didn't bother with epilogue vectorization. I think we should > then just restore this behavior, not doing epilogue vectorization > if vector_modes.length () == 1? Yeah, but that case didn't need epilogue vectorisation before. This series is adding support for unrolling, and targets with a single vector size will benefit from epilogues in that case. Thanks, Richard

On Wed, 12 Jan 2022, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > > On Wed, 12 Jan 2022, Richard Sandiford wrote: > > > >> Richard Biener <rguenther@suse.de> writes: > >> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: > >> > > >> >> Hi, > >> >> > >> >> This a fix for the regression caused by '[vect] Re-analyze all modes for > >> >> epilogues'. The earlier patch assumed there was always at least one other mode > >> >> than VOIDmode, but that does not need to be the case. > >> >> If we are dealing with a target that does not define more modes for > >> >> 'autovectorize_vector_modes', the behaviour before the patch would be to try > >> >> to create an epilogue for the same autodetected_vector_mode, which unless the > >> >> target supported partial vectors would always fail. So as a fix I suggest > >> >> trying to vectorize the epilogue with the preferred_simd_mode for QI, > >> >> mimicking autovectorize_vector_mode, which will be skipped if it is not a > >> >> vector_mode (since that already should indicate partial vectors aren't > >> >> possible) or if no partial vectors are supported and its pessimistic NUNITS is > >> >> larger than the main loop's VF. > >> >> > >> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can > >> >> someone verify this fixes the issue for PR103971 on powerpc? > >> > > >> > Why not simply start at mode_i = 0 which means autodetecting the mode > >> > to use for the epilogue? That appears to be a much simpler solution to > >> > me, including for targets where there are more than one element in the > >> > vector. > >> > >> VOIDmode doesn't tell us anything about what the autodetected mode > >> will be, so current short-circuit: > >> > >> /* If the target does not support partial vectors we can shorten the > >> number of modes to analyze for the epilogue as we know we can't pick a > >> mode that has at least as many NUNITS as the main loop's vectorization > >> factor, since that would imply the epilogue's vectorization factor > >> would be at least as high as the main loop's and we would be > >> vectorizing for more scalar iterations than there would be left. */ > >> if (!supports_partial_vectors > >> && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf)) > >> { > >> mode_i++; > >> if (mode_i == vector_modes.length ()) > >> break; > >> continue; > >> } > >> > >> wouldn't be effective. > > > > Well, before this change we simply did > > > > - /* Handle the case that the original loop can use partial > > - vectorization, but want to only adopt it for the epilogue. > > - The retry should be in the same mode as original. */ > > - if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) > > ... > > - else > > - { > > - mode_i = first_loop_next_i; > > - if (mode_i == vector_modes.length ()) > > - return first_loop_vinfo; > > - } > > > > and thus didn't bother with epilogue vectorization. I think we should > > then just restore this behavior, not doing epilogue vectorization > > if vector_modes.length () == 1? > > Yeah, but that case didn't need epilogue vectorisation before. This > series is adding support for unrolling, and targets with a single vector > size will benefit from epilogues in that case. But in that case (which we could detect), we could then just use autodetected_vector_mode? Like if we do before epilogue vect vector_modes[0] = autodetected_vector_mode; mode_i = 0; thus replace VOIDmode with what we detected and then start at 0? That is, the proposed patch looks very much like a hack to me. I suppose the VECTOR_MODE_P check should be added to if (!supports_partial_vectors && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf)) { mode_i++; instead. > Thanks, > Richard >

Richard Biener <rguenther@suse.de> writes: > On Wed, 12 Jan 2022, Richard Sandiford wrote: > >> Richard Biener <rguenther@suse.de> writes: >> > On Wed, 12 Jan 2022, Richard Sandiford wrote: >> > >> >> Richard Biener <rguenther@suse.de> writes: >> >> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: >> >> > >> >> >> Hi, >> >> >> >> >> >> This a fix for the regression caused by '[vect] Re-analyze all modes for >> >> >> epilogues'. The earlier patch assumed there was always at least one other mode >> >> >> than VOIDmode, but that does not need to be the case. >> >> >> If we are dealing with a target that does not define more modes for >> >> >> 'autovectorize_vector_modes', the behaviour before the patch would be to try >> >> >> to create an epilogue for the same autodetected_vector_mode, which unless the >> >> >> target supported partial vectors would always fail. So as a fix I suggest >> >> >> trying to vectorize the epilogue with the preferred_simd_mode for QI, >> >> >> mimicking autovectorize_vector_mode, which will be skipped if it is not a >> >> >> vector_mode (since that already should indicate partial vectors aren't >> >> >> possible) or if no partial vectors are supported and its pessimistic NUNITS is >> >> >> larger than the main loop's VF. >> >> >> >> >> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can >> >> >> someone verify this fixes the issue for PR103971 on powerpc? >> >> > >> >> > Why not simply start at mode_i = 0 which means autodetecting the mode >> >> > to use for the epilogue? That appears to be a much simpler solution to >> >> > me, including for targets where there are more than one element in the >> >> > vector. >> >> >> >> VOIDmode doesn't tell us anything about what the autodetected mode >> >> will be, so current short-circuit: >> >> >> >> /* If the target does not support partial vectors we can shorten the >> >> number of modes to analyze for the epilogue as we know we can't pick a >> >> mode that has at least as many NUNITS as the main loop's vectorization >> >> factor, since that would imply the epilogue's vectorization factor >> >> would be at least as high as the main loop's and we would be >> >> vectorizing for more scalar iterations than there would be left. */ >> >> if (!supports_partial_vectors >> >> && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf)) >> >> { >> >> mode_i++; >> >> if (mode_i == vector_modes.length ()) >> >> break; >> >> continue; >> >> } >> >> >> >> wouldn't be effective. >> > >> > Well, before this change we simply did >> > >> > - /* Handle the case that the original loop can use partial >> > - vectorization, but want to only adopt it for the epilogue. >> > - The retry should be in the same mode as original. */ >> > - if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) >> > ... >> > - else >> > - { >> > - mode_i = first_loop_next_i; >> > - if (mode_i == vector_modes.length ()) >> > - return first_loop_vinfo; >> > - } >> > >> > and thus didn't bother with epilogue vectorization. I think we should >> > then just restore this behavior, not doing epilogue vectorization >> > if vector_modes.length () == 1? >> >> Yeah, but that case didn't need epilogue vectorisation before. This >> series is adding support for unrolling, and targets with a single vector >> size will benefit from epilogues in that case. > > But in that case (which we could detect), we could then just use > autodetected_vector_mode? Like if we do before epilogue vect > > vector_modes[0] = autodetected_vector_mode; > mode_i = 0; > > thus replace VOIDmode with what we detected and then start at 0? > That is, the proposed patch looks very much like a hack to me. You mean check whether the loop is unrolled? If so, that's what feels like a hack to me :-) The question is whether there are enough elements for epilogue vectorisation to make sense. The VF is what tells us that. Unrolling is just one of the things that influences that VF and I don't think we should check for the individual influences. It's just the end result that matters. > I suppose the VECTOR_MODE_P check should be added to > > if (!supports_partial_vectors > && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), > first_vinfo_vf)) > { > mode_i++; > > instead. You mean: if (!supports_partial_vectors && VECTOR_MODE_P (vector_modes[mode_i]) && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf)) { mode_i++; ? If so, the skip won't be effective the first time round. Thanks, Richard

On Wed, 12 Jan 2022, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > > On Wed, 12 Jan 2022, Richard Sandiford wrote: > > > >> Richard Biener <rguenther@suse.de> writes: > >> > On Wed, 12 Jan 2022, Richard Sandiford wrote: > >> > > >> >> Richard Biener <rguenther@suse.de> writes: > >> >> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: > >> >> > > >> >> >> Hi, > >> >> >> > >> >> >> This a fix for the regression caused by '[vect] Re-analyze all modes for > >> >> >> epilogues'. The earlier patch assumed there was always at least one other mode > >> >> >> than VOIDmode, but that does not need to be the case. > >> >> >> If we are dealing with a target that does not define more modes for > >> >> >> 'autovectorize_vector_modes', the behaviour before the patch would be to try > >> >> >> to create an epilogue for the same autodetected_vector_mode, which unless the > >> >> >> target supported partial vectors would always fail. So as a fix I suggest > >> >> >> trying to vectorize the epilogue with the preferred_simd_mode for QI, > >> >> >> mimicking autovectorize_vector_mode, which will be skipped if it is not a > >> >> >> vector_mode (since that already should indicate partial vectors aren't > >> >> >> possible) or if no partial vectors are supported and its pessimistic NUNITS is > >> >> >> larger than the main loop's VF. > >> >> >> > >> >> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can > >> >> >> someone verify this fixes the issue for PR103971 on powerpc? > >> >> > > >> >> > Why not simply start at mode_i = 0 which means autodetecting the mode > >> >> > to use for the epilogue? That appears to be a much simpler solution to > >> >> > me, including for targets where there are more than one element in the > >> >> > vector. > >> >> > >> >> VOIDmode doesn't tell us anything about what the autodetected mode > >> >> will be, so current short-circuit: > >> >> > >> >> /* If the target does not support partial vectors we can shorten the > >> >> number of modes to analyze for the epilogue as we know we can't pick a > >> >> mode that has at least as many NUNITS as the main loop's vectorization > >> >> factor, since that would imply the epilogue's vectorization factor > >> >> would be at least as high as the main loop's and we would be > >> >> vectorizing for more scalar iterations than there would be left. */ > >> >> if (!supports_partial_vectors > >> >> && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf)) > >> >> { > >> >> mode_i++; > >> >> if (mode_i == vector_modes.length ()) > >> >> break; > >> >> continue; > >> >> } > >> >> > >> >> wouldn't be effective. > >> > > >> > Well, before this change we simply did > >> > > >> > - /* Handle the case that the original loop can use partial > >> > - vectorization, but want to only adopt it for the epilogue. > >> > - The retry should be in the same mode as original. */ > >> > - if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) > >> > ... > >> > - else > >> > - { > >> > - mode_i = first_loop_next_i; > >> > - if (mode_i == vector_modes.length ()) > >> > - return first_loop_vinfo; > >> > - } > >> > > >> > and thus didn't bother with epilogue vectorization. I think we should > >> > then just restore this behavior, not doing epilogue vectorization > >> > if vector_modes.length () == 1? > >> > >> Yeah, but that case didn't need epilogue vectorisation before. This > >> series is adding support for unrolling, and targets with a single vector > >> size will benefit from epilogues in that case. > > > > But in that case (which we could detect), we could then just use > > autodetected_vector_mode? Like if we do before epilogue vect > > > > vector_modes[0] = autodetected_vector_mode; > > mode_i = 0; > > > > thus replace VOIDmode with what we detected and then start at 0? > > That is, the proposed patch looks very much like a hack to me. > > You mean check whether the loop is unrolled? If so, that's what feels > like a hack to me :-) The question is whether there are enough elements > for epilogue vectorisation to make sense. The VF is what tells us that. > Unrolling is just one of the things that influences that VF and I don't > think we should check for the individual influences. It's just the end > result that matters. > > > I suppose the VECTOR_MODE_P check should be added to > > > > if (!supports_partial_vectors > > && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), > > first_vinfo_vf)) > > { > > mode_i++; > > > > instead. > > You mean: > > if (!supports_partial_vectors > && VECTOR_MODE_P (vector_modes[mode_i]) > && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf)) > { > mode_i++; > > ? If so, the skip won't be effective the first time round. Why? See above where I set vector_modes[0] to autodetected_vector_mode. Richard.

Richard Biener <rguenther@suse.de> writes: > On Wed, 12 Jan 2022, Richard Sandiford wrote: > >> Richard Biener <rguenther@suse.de> writes: >> > On Wed, 12 Jan 2022, Richard Sandiford wrote: >> > >> >> Richard Biener <rguenther@suse.de> writes: >> >> > On Wed, 12 Jan 2022, Richard Sandiford wrote: >> >> > >> >> >> Richard Biener <rguenther@suse.de> writes: >> >> >> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: >> >> >> > >> >> >> >> Hi, >> >> >> >> >> >> >> >> This a fix for the regression caused by '[vect] Re-analyze all modes for >> >> >> >> epilogues'. The earlier patch assumed there was always at least one other mode >> >> >> >> than VOIDmode, but that does not need to be the case. >> >> >> >> If we are dealing with a target that does not define more modes for >> >> >> >> 'autovectorize_vector_modes', the behaviour before the patch would be to try >> >> >> >> to create an epilogue for the same autodetected_vector_mode, which unless the >> >> >> >> target supported partial vectors would always fail. So as a fix I suggest >> >> >> >> trying to vectorize the epilogue with the preferred_simd_mode for QI, >> >> >> >> mimicking autovectorize_vector_mode, which will be skipped if it is not a >> >> >> >> vector_mode (since that already should indicate partial vectors aren't >> >> >> >> possible) or if no partial vectors are supported and its pessimistic NUNITS is >> >> >> >> larger than the main loop's VF. >> >> >> >> >> >> >> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can >> >> >> >> someone verify this fixes the issue for PR103971 on powerpc? >> >> >> > >> >> >> > Why not simply start at mode_i = 0 which means autodetecting the mode >> >> >> > to use for the epilogue? That appears to be a much simpler solution to >> >> >> > me, including for targets where there are more than one element in the >> >> >> > vector. >> >> >> >> >> >> VOIDmode doesn't tell us anything about what the autodetected mode >> >> >> will be, so current short-circuit: >> >> >> >> >> >> /* If the target does not support partial vectors we can shorten the >> >> >> number of modes to analyze for the epilogue as we know we can't pick a >> >> >> mode that has at least as many NUNITS as the main loop's vectorization >> >> >> factor, since that would imply the epilogue's vectorization factor >> >> >> would be at least as high as the main loop's and we would be >> >> >> vectorizing for more scalar iterations than there would be left. */ >> >> >> if (!supports_partial_vectors >> >> >> && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf)) >> >> >> { >> >> >> mode_i++; >> >> >> if (mode_i == vector_modes.length ()) >> >> >> break; >> >> >> continue; >> >> >> } >> >> >> >> >> >> wouldn't be effective. >> >> > >> >> > Well, before this change we simply did >> >> > >> >> > - /* Handle the case that the original loop can use partial >> >> > - vectorization, but want to only adopt it for the epilogue. >> >> > - The retry should be in the same mode as original. */ >> >> > - if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) >> >> > ... >> >> > - else >> >> > - { >> >> > - mode_i = first_loop_next_i; >> >> > - if (mode_i == vector_modes.length ()) >> >> > - return first_loop_vinfo; >> >> > - } >> >> > >> >> > and thus didn't bother with epilogue vectorization. I think we should >> >> > then just restore this behavior, not doing epilogue vectorization >> >> > if vector_modes.length () == 1? >> >> >> >> Yeah, but that case didn't need epilogue vectorisation before. This >> >> series is adding support for unrolling, and targets with a single vector >> >> size will benefit from epilogues in that case. >> > >> > But in that case (which we could detect), we could then just use >> > autodetected_vector_mode? Like if we do before epilogue vect >> > >> > vector_modes[0] = autodetected_vector_mode; >> > mode_i = 0; >> > >> > thus replace VOIDmode with what we detected and then start at 0? >> > That is, the proposed patch looks very much like a hack to me. >> >> You mean check whether the loop is unrolled? If so, that's what feels >> like a hack to me :-) The question is whether there are enough elements >> for epilogue vectorisation to make sense. The VF is what tells us that. >> Unrolling is just one of the things that influences that VF and I don't >> think we should check for the individual influences. It's just the end >> result that matters. >> >> > I suppose the VECTOR_MODE_P check should be added to >> > >> > if (!supports_partial_vectors >> > && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), >> > first_vinfo_vf)) >> > { >> > mode_i++; >> > >> > instead. >> >> You mean: >> >> if (!supports_partial_vectors >> && VECTOR_MODE_P (vector_modes[mode_i]) >> && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf)) >> { >> mode_i++; >> >> ? If so, the skip won't be effective the first time round. > > Why? See above where I set vector_modes[0] to autodetected_vector_mode. Ah, yeah, I guess that works, sorry. It still feels odd to iterate through N+1 modes when we don't need autodetection (and with the above, don't use autodetection), but I can live with it. :-) Another alternative would be to push autodetected_vector_mode when the length is 1 and keep 1 as the starting point. Richard

On 12/01/2022 11:44, Richard Sandiford wrote: > Another alternative would be to push autodetected_vector_mode when the > length is 1 and keep 1 as the starting point. > > Richard I'm guessing we would still want to skip epilogue vectorization if !VECTOR_MODE_P (autodetected_vector_mode) in that case?

On Wed, 12 Jan 2022, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > > On Wed, 12 Jan 2022, Richard Sandiford wrote: > > > >> Richard Biener <rguenther@suse.de> writes: > >> > On Wed, 12 Jan 2022, Richard Sandiford wrote: > >> > > >> >> Richard Biener <rguenther@suse.de> writes: > >> >> > On Wed, 12 Jan 2022, Richard Sandiford wrote: > >> >> > > >> >> >> Richard Biener <rguenther@suse.de> writes: > >> >> >> > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: > >> >> >> > > >> >> >> >> Hi, > >> >> >> >> > >> >> >> >> This a fix for the regression caused by '[vect] Re-analyze all modes for > >> >> >> >> epilogues'. The earlier patch assumed there was always at least one other mode > >> >> >> >> than VOIDmode, but that does not need to be the case. > >> >> >> >> If we are dealing with a target that does not define more modes for > >> >> >> >> 'autovectorize_vector_modes', the behaviour before the patch would be to try > >> >> >> >> to create an epilogue for the same autodetected_vector_mode, which unless the > >> >> >> >> target supported partial vectors would always fail. So as a fix I suggest > >> >> >> >> trying to vectorize the epilogue with the preferred_simd_mode for QI, > >> >> >> >> mimicking autovectorize_vector_mode, which will be skipped if it is not a > >> >> >> >> vector_mode (since that already should indicate partial vectors aren't > >> >> >> >> possible) or if no partial vectors are supported and its pessimistic NUNITS is > >> >> >> >> larger than the main loop's VF. > >> >> >> >> > >> >> >> >> Currently bootstrapping and regression testing, otherwise OK for trunk? Can > >> >> >> >> someone verify this fixes the issue for PR103971 on powerpc? > >> >> >> > > >> >> >> > Why not simply start at mode_i = 0 which means autodetecting the mode > >> >> >> > to use for the epilogue? That appears to be a much simpler solution to > >> >> >> > me, including for targets where there are more than one element in the > >> >> >> > vector. > >> >> >> > >> >> >> VOIDmode doesn't tell us anything about what the autodetected mode > >> >> >> will be, so current short-circuit: > >> >> >> > >> >> >> /* If the target does not support partial vectors we can shorten the > >> >> >> number of modes to analyze for the epilogue as we know we can't pick a > >> >> >> mode that has at least as many NUNITS as the main loop's vectorization > >> >> >> factor, since that would imply the epilogue's vectorization factor > >> >> >> would be at least as high as the main loop's and we would be > >> >> >> vectorizing for more scalar iterations than there would be left. */ > >> >> >> if (!supports_partial_vectors > >> >> >> && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf)) > >> >> >> { > >> >> >> mode_i++; > >> >> >> if (mode_i == vector_modes.length ()) > >> >> >> break; > >> >> >> continue; > >> >> >> } > >> >> >> > >> >> >> wouldn't be effective. > >> >> > > >> >> > Well, before this change we simply did > >> >> > > >> >> > - /* Handle the case that the original loop can use partial > >> >> > - vectorization, but want to only adopt it for the epilogue. > >> >> > - The retry should be in the same mode as original. */ > >> >> > - if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) > >> >> > ... > >> >> > - else > >> >> > - { > >> >> > - mode_i = first_loop_next_i; > >> >> > - if (mode_i == vector_modes.length ()) > >> >> > - return first_loop_vinfo; > >> >> > - } > >> >> > > >> >> > and thus didn't bother with epilogue vectorization. I think we should > >> >> > then just restore this behavior, not doing epilogue vectorization > >> >> > if vector_modes.length () == 1? > >> >> > >> >> Yeah, but that case didn't need epilogue vectorisation before. This > >> >> series is adding support for unrolling, and targets with a single vector > >> >> size will benefit from epilogues in that case. > >> > > >> > But in that case (which we could detect), we could then just use > >> > autodetected_vector_mode? Like if we do before epilogue vect > >> > > >> > vector_modes[0] = autodetected_vector_mode; > >> > mode_i = 0; > >> > > >> > thus replace VOIDmode with what we detected and then start at 0? > >> > That is, the proposed patch looks very much like a hack to me. > >> > >> You mean check whether the loop is unrolled? If so, that's what feels > >> like a hack to me :-) The question is whether there are enough elements > >> for epilogue vectorisation to make sense. The VF is what tells us that. > >> Unrolling is just one of the things that influences that VF and I don't > >> think we should check for the individual influences. It's just the end > >> result that matters. > >> > >> > I suppose the VECTOR_MODE_P check should be added to > >> > > >> > if (!supports_partial_vectors > >> > && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), > >> > first_vinfo_vf)) > >> > { > >> > mode_i++; > >> > > >> > instead. > >> > >> You mean: > >> > >> if (!supports_partial_vectors > >> && VECTOR_MODE_P (vector_modes[mode_i]) > >> && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf)) > >> { > >> mode_i++; > >> > >> ? If so, the skip won't be effective the first time round. > > > > Why? See above where I set vector_modes[0] to autodetected_vector_mode. > > Ah, yeah, I guess that works, sorry. It still feels odd to iterate > through N+1 modes when we don't need autodetection (and with the above, > don't use autodetection), but I can live with it. :-) Of course we do autodetection anyway when doing main loop vectorization, and when we want to iterate over all modes for epilogues it feels natural to try that in the same order. > Another alternative would be to push autodetected_vector_mode when the > length is 1 and keep 1 as the starting point. I understood that this is what Andres patch tried to do - just inventing some "other" autodetected mode. Note my idea eventually allows us to get rid of autodetected_vector_mode as separate variable and just have it be a reference to vector_modes[0] (or use that explicitely). Richard.

On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: > > On 12/01/2022 11:44, Richard Sandiford wrote: > > Another alternative would be to push autodetected_vector_mode when the > > length is 1 and keep 1 as the starting point. > > > > Richard > > I'm guessing we would still want to skip epilogue vectorization if > !VECTOR_MODE_P (autodetected_vector_mode) in that case? Practically we currently only support fixed width word_mode there, but eventually one could end up with 64bit DImode for the main loop and 32bit V4QImode in the epilogue ... so not sure if it's worth special-casing. But I don't mind adding that skip. Richard.

On 12/01/2022 11:59, Richard Biener wrote: > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: > >> On 12/01/2022 11:44, Richard Sandiford wrote: >>> Another alternative would be to push autodetected_vector_mode when the >>> length is 1 and keep 1 as the starting point. >>> >>> Richard >> I'm guessing we would still want to skip epilogue vectorization if >> !VECTOR_MODE_P (autodetected_vector_mode) in that case? > Practically we currently only support fixed width word_mode there, > but eventually one could end up with 64bit DImode for the main loop > and 32bit V4QImode in the epilogue ... so not sure if it's worth > special-casing. But I don't mind adding that skip. > > Richard. I left out the skip, it shouldn't break anything as it would try that same mode before anyway. Just to clarify what I meant though was to skip if autodetected_vector_mode wasn't a vector AND the target didn't define autovectorize_vector_modes, so in that scenario it wouldn't ever try V4QImode for the epilogue if the mainloop was autodetected DImode, I think... Either way, this is less code, less complicated and doesn't analyze more than it did before the original patch, so I'm happy with that too. Is this what you had in mind? Andre diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 6ed2b5f8724e5ebf27592f67d7f6bdfe1ebcf512..03459363afa48f0e2753bc7aa18cbf2771d2a4e5 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -3023,7 +3023,16 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) array may contain length-agnostic and length-specific modes. Their ordering is not guaranteed, so we could end up picking a mode for the main loop that is after the epilogue's optimal mode. */ - mode_i = 1; + if (vector_modes.length () == 1) + { + /* If we only had VOIDmode then use AUTODETECTED_VECTOR_MODE to see if + an epilogue can be created with that mode. */ + vector_modes[0] = autodetected_vector_mode; + mode_i = 0; + } + else + mode_i = 1; + bool supports_partial_vectors = partial_vectors_supported_p (); poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo);

On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: > > On 12/01/2022 11:59, Richard Biener wrote: > > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: > > > >> On 12/01/2022 11:44, Richard Sandiford wrote: > >>> Another alternative would be to push autodetected_vector_mode when the > >>> length is 1 and keep 1 as the starting point. > >>> > >>> Richard > >> I'm guessing we would still want to skip epilogue vectorization if > >> !VECTOR_MODE_P (autodetected_vector_mode) in that case? > > Practically we currently only support fixed width word_mode there, > > but eventually one could end up with 64bit DImode for the main loop > > and 32bit V4QImode in the epilogue ... so not sure if it's worth > > special-casing. But I don't mind adding that skip. > > > > Richard. > > I left out the skip, it shouldn't break anything as it would try that same > mode before anyway. > Just to clarify what I meant though was to skip if autodetected_vector_mode > wasn't a vector AND the target didn't define autovectorize_vector_modes, so in > that scenario it wouldn't ever try V4QImode for the epilogue if the mainloop > was autodetected DImode, I think... > Either way, this is less code, less complicated and doesn't analyze more than > it did before the original patch, so I'm happy with that too. > > Is this what you had in mind? - mode_i = 1; + if (vector_modes.length () == 1) + { + /* If we only had VOIDmode then use AUTODETECTED_VECTOR_MODE to see if + an epilogue can be created with that mode. */ + vector_modes[0] = autodetected_vector_mode; + mode_i = 0; + } + else + mode_i = 1; + I would have left out the condition and unconditionally do vector_modes[0] = autodetected_vector_mode; mode_i = 0; but OK if you think it makes sense to special case length == 1. Richard.

I think we need a fix or a revert for this today, please. Bootstrap has been broken for a couple of days during the last week of stage 3, which is really problematic. Thanks, Bill On 1/12/22 6:57 AM, Richard Biener via Gcc-patches wrote: > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: > >> On 12/01/2022 11:59, Richard Biener wrote: >>> On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: >>> >>>> On 12/01/2022 11:44, Richard Sandiford wrote: >>>>> Another alternative would be to push autodetected_vector_mode when the >>>>> length is 1 and keep 1 as the starting point. >>>>> >>>>> Richard >>>> I'm guessing we would still want to skip epilogue vectorization if >>>> !VECTOR_MODE_P (autodetected_vector_mode) in that case? >>> Practically we currently only support fixed width word_mode there, >>> but eventually one could end up with 64bit DImode for the main loop >>> and 32bit V4QImode in the epilogue ... so not sure if it's worth >>> special-casing. But I don't mind adding that skip. >>> >>> Richard. >> I left out the skip, it shouldn't break anything as it would try that same >> mode before anyway. >> Just to clarify what I meant though was to skip if autodetected_vector_mode >> wasn't a vector AND the target didn't define autovectorize_vector_modes, so in >> that scenario it wouldn't ever try V4QImode for the epilogue if the mainloop >> was autodetected DImode, I think... >> Either way, this is less code, less complicated and doesn't analyze more than >> it did before the original patch, so I'm happy with that too. >> >> Is this what you had in mind? > - mode_i = 1; > + if (vector_modes.length () == 1) > + { > + /* If we only had VOIDmode then use AUTODETECTED_VECTOR_MODE to see > if > + an epilogue can be created with that mode. */ > + vector_modes[0] = autodetected_vector_mode; > + mode_i = 0; > + } > + else > + mode_i = 1; > + > > I would have left out the condition and unconditionally do > > vector_modes[0] = autodetected_vector_mode; > mode_i = 0; > > but OK if you think it makes sense to special case length == 1. > > Richard.

On 12/01/2022 12:57, Richard Biener wrote: > On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: > >> On 12/01/2022 11:59, Richard Biener wrote: >>> On Wed, 12 Jan 2022, Andre Vieira (lists) wrote: >>> >>>> On 12/01/2022 11:44, Richard Sandiford wrote: >>>>> Another alternative would be to push autodetected_vector_mode when the >>>>> length is 1 and keep 1 as the starting point. >>>>> >>>>> Richard >>>> I'm guessing we would still want to skip epilogue vectorization if >>>> !VECTOR_MODE_P (autodetected_vector_mode) in that case? >>> Practically we currently only support fixed width word_mode there, >>> but eventually one could end up with 64bit DImode for the main loop >>> and 32bit V4QImode in the epilogue ... so not sure if it's worth >>> special-casing. But I don't mind adding that skip. >>> >>> Richard. >> I left out the skip, it shouldn't break anything as it would try that same >> mode before anyway. >> Just to clarify what I meant though was to skip if autodetected_vector_mode >> wasn't a vector AND the target didn't define autovectorize_vector_modes, so in >> that scenario it wouldn't ever try V4QImode for the epilogue if the mainloop >> was autodetected DImode, I think... >> Either way, this is less code, less complicated and doesn't analyze more than >> it did before the original patch, so I'm happy with that too. >> >> Is this what you had in mind? > - mode_i = 1; > + if (vector_modes.length () == 1) > + { > + /* If we only had VOIDmode then use AUTODETECTED_VECTOR_MODE to see > if > + an epilogue can be created with that mode. */ > + vector_modes[0] = autodetected_vector_mode; > + mode_i = 0; > + } > + else > + mode_i = 1; > + > > I would have left out the condition and unconditionally do > > vector_modes[0] = autodetected_vector_mode; > mode_i = 0; > > but OK if you think it makes sense to special case length == 1. > > Richard. Tested without the special casing, all good, only have performance regressions left (which I'm working on), will commit this to fix the ICEs. Thanks, Andre

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 6ed2b5f8724e5ebf27592f67d7f6bdfe1ebcf512..c81ebc411312e649f9cd954895244c60c928fee1 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -3024,6 +3024,18 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) ordering is not guaranteed, so we could end up picking a mode for the main loop that is after the epilogue's optimal mode. */ mode_i = 1; + /* If we only had VOIDmode then push the AUTODETECTED_VECTOR_MODE to see if + an epilogue can be created with that mode. */ + if (vector_modes.length () == 1) + { + machine_mode preferred_mode + = targetm.vectorize.preferred_simd_mode (QImode); + /* If the preferred mode isn't a vector mode we will not be needing an + epilogue. */ + if (!VECTOR_MODE_P (preferred_mode)) + return first_loop_vinfo; + vector_modes.safe_push (preferred_mode); + } bool supports_partial_vectors = partial_vectors_supported_p (); poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo);