GCN: '--param=gcn-preferred-vectorization-factor=[default,32,64]' (was: GCN: '--param=gcn-preferred-vector-lane-width=[default,32,64]')
Checks
Commit Message
Hi!
On 2024-04-08T13:24:06+0100, Andrew Stubbs <ams@baylibre.com> wrote:
> On 08/04/2024 11:45, Thomas Schwinge wrote:
>> On 2024-03-28T08:00:50+0100, I wrote:
>>> On 2024-03-22T15:54:48+0000, Andrew Stubbs <ams@baylibre.com> wrote:
>>>> This patch alters the default (preferred) vector size to 32 on RDNA devices to
>>>> better match the actual hardware. 64-lane vectors will continue to be
>>>> used where they are hard-coded (such as function prologues).
>>>>
>>>> We run these devices in wavefrontsize64 for compatibility, but they actually
>>>> only have 32-lane vectors, natively. If the upper part of a V64 is masked
>>>> off (as it is in V32) then RDNA devices will skip execution of the upper part
>>>> for most operations, so this adjustment shouldn't leave too much performance on
>>>> the table. One exception is memory instructions, so full wavefrontsize32
>>>> support would be better.
>>>>
>>>> The advantage is that we avoid the missing V64 operations (such as permute and
>>>> vec_extract).
>>>>
>>>> Committed to mainline.
>>>
>>> In my GCN target '-march=gfx1100' testing, this commit
>>> "amdgcn: Prefer V32 on RDNA devices" does resolve (or, make latent?) a
>>> number of execution test FAILs (that is, regressions compared to earlier
>>> '-march=gfx90a' etc. testing).
>>>
>>> This commit also resolves (for my '-march=gfx1100' testing) one
>>> pre-existing FAIL (that is, already seen in '-march=gfx90a' earlier
>>> etc. testing):
>>>
>>> PASS: gcc.dg/tree-ssa/scev-14.c (test for excess errors)
>>> [-FAIL:-]{+PASS:+} gcc.dg/tree-ssa/scev-14.c scan-tree-dump ivopts "Overflowness wrto loop niter:\tNo-overflow"
>>>
>>> That means, this test case specifically (or, just its 'scan-tree-dump'?)
>>> needs to be adjusted for GCN V64 testing?
>>>
>>> This commit, as you'd also mentioned elsewhere, however also causes a
>>> number of regressions in 'gcc.target/gcn/gcn.exp', see list below.
>>>
>>> Those can be "fixed" with 'dg-additional-options -march=gfx90a' (or
>>> similar) in the affected test cases (let me know if you'd like me to
>>> 'git push' that), but I suppose something more elaborate may be in order?
>>> (Conditionalize those on 'target { ! gcn_rdna }', and add respective
>>> scanning for 'target gcn_rdna'? I can help with effective-target
>>> 'gcn_rdna' (or similar), if you'd like me to.)
>>>
>>> And/or, have a '-mpreferred-simd-mode=v64' (or similar) to be used for
>>> such test cases, to override 'if (TARGET_RDNA2_PLUS)' etc. in
>>> 'gcn_vectorize_preferred_simd_mode'?
>>
>> The latter I have quickly implemented, see attached
>> "GCN: '--param=gcn-preferred-vector-lane-width=[default,32,64]'". OK to
>> push to trunk branch?
>>
>> (This '--param' will also be useful for another bug/regression I'm about
>> to file.)
>>
>>> Best, probably, both these things, to properly test both V32 and V64?
>>
>> That part remains to be done, but is best done by someone who actually
>> knowns "GCN" assembly/GCC back end -- that is, not me.
>
> I'm not sure that this is *best* solution to the problem (in general,
> it's probably best to test the actual code that will be generated in
> practice), but I think this option will be useful for testing
> performance in each configuration and other correctness issues, and
> these tests are not testing that feature.
ACK.
> However, "vector lane width" sounds like it's configuring the number of
> bits in each lane. I think "vectorization factor" is unambigous.
>
> OK to commit, with the name change.
Thanks, changed, and pushed v2 version to trunk branch in
commit df7625c3af004a81c13d54bb8810e03932eeb59a
"GCN: '--param=gcn-preferred-vectorization-factor=[default,32,64]'", see
attached.
Grüße
Thomas
>>> PASS: gcc.target/gcn/cond_fmaxnm_1.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-times smaxv64df3_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-times smaxv64sf3_exec 3
>>> PASS: gcc.target/gcn/cond_fmaxnm_1_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fmaxnm_1_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fmaxnm_2.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-times smaxv64df3_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-times smaxv64sf3_exec 3
>>> PASS: gcc.target/gcn/cond_fmaxnm_2_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fmaxnm_2_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fmaxnm_3.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times movv64df_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times movv64sf_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times smaxv64sf3 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times smaxv64sf3 3
>>> PASS: gcc.target/gcn/cond_fmaxnm_3_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fmaxnm_3_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fmaxnm_4.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times movv64df_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times movv64sf_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times smaxv64sf3 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times smaxv64sf3 3
>>> PASS: gcc.target/gcn/cond_fmaxnm_4_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fmaxnm_4_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fmaxnm_5.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-times smaxv64df3_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-times smaxv64sf3_exec 3
>>> PASS: gcc.target/gcn/cond_fmaxnm_5_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fmaxnm_5_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fmaxnm_6.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_6.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_6.c scan-assembler-times smaxv64df3_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_6.c scan-assembler-times smaxv64sf3_exec 3
>>> PASS: gcc.target/gcn/cond_fmaxnm_6_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fmaxnm_6_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fmaxnm_7.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_7.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_7.c scan-assembler-times smaxv64df3_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_7.c scan-assembler-times smaxv64sf3_exec 3
>>> PASS: gcc.target/gcn/cond_fmaxnm_7_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fmaxnm_7_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fmaxnm_8.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_8.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_8.c scan-assembler-times smaxv64df3_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_8.c scan-assembler-times smaxv64sf3_exec 3
>>> PASS: gcc.target/gcn/cond_fmaxnm_8_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fmaxnm_8_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fminnm_1.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_1.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_1.c scan-assembler-times sminv64df3_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_1.c scan-assembler-times sminv64sf3_exec 3
>>> PASS: gcc.target/gcn/cond_fminnm_1_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fminnm_1_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fminnm_2.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_2.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_2.c scan-assembler-times sminv64df3_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_2.c scan-assembler-times sminv64sf3_exec 3
>>> PASS: gcc.target/gcn/cond_fminnm_2_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fminnm_2_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fminnm_3.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-times movv64df_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-times movv64sf_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-times sminv64sf3 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-times sminv64sf3 3
>>> PASS: gcc.target/gcn/cond_fminnm_3_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fminnm_3_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fminnm_4.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-times movv64df_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-times movv64sf_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-times sminv64sf3 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-times sminv64sf3 3
>>> PASS: gcc.target/gcn/cond_fminnm_4_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fminnm_4_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fminnm_5.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_5.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_5.c scan-assembler-times sminv64df3_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_5.c scan-assembler-times sminv64sf3_exec 3
>>> PASS: gcc.target/gcn/cond_fminnm_5_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fminnm_5_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fminnm_6.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_6.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_6.c scan-assembler-times sminv64df3_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_6.c scan-assembler-times sminv64sf3_exec 3
>>> PASS: gcc.target/gcn/cond_fminnm_6_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fminnm_6_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fminnm_7.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_7.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_7.c scan-assembler-times sminv64df3_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_7.c scan-assembler-times sminv64sf3_exec 3
>>> PASS: gcc.target/gcn/cond_fminnm_7_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fminnm_7_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_fminnm_8.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_8.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_8.c scan-assembler-times sminv64df3_exec 3
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_8.c scan-assembler-times sminv64sf3_exec 3
>>> PASS: gcc.target/gcn/cond_fminnm_8_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_fminnm_8_run.c execution test
>>>
>>> @@ -124634,12 +124634,12 @@ PASS: gcc.target/gcn/cond_shift_3.c scan-assembler-not movv64di_exec/2
>>> PASS: gcc.target/gcn/cond_shift_3.c scan-assembler-not v_cndmask_b32
>>> PASS: gcc.target/gcn/cond_shift_3.c scan-assembler-times \\tv_ashrrev_i32\\tv[0-9]+, 3, v[0-9]+ 1
>>> PASS: gcc.target/gcn/cond_shift_3.c scan-assembler-times \\tv_lshlrev_b32\\tv[0-9]+, 3, v[0-9]+ 10
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times vashlv64di3_exec 2
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times vashlv64si3_exec 18
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times vashrv64di3_exec 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times vashrv64si3_exec 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times vlshrv64di3_exec 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times vlshrv64si3_exec 1
>>> PASS: gcc.target/gcn/cond_shift_3_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_shift_3_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_shift_4.c (test for excess errors)
>>> @@ -124647,77 +124647,77 @@ PASS: gcc.target/gcn/cond_shift_4.c scan-assembler-not movv64di_exec/2
>>> PASS: gcc.target/gcn/cond_shift_4.c scan-assembler-not v_cndmask_b32
>>> PASS: gcc.target/gcn/cond_shift_4.c scan-assembler-times \\tv_ashrrev_i32\\tv[0-9]+, 3, v[0-9]+ 1
>>> PASS: gcc.target/gcn/cond_shift_4.c scan-assembler-times \\tv_lshlrev_b32\\tv[0-9]+, 3, v[0-9]+ 10
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times vashlv64di3_exec 2
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times vashlv64si3_exec 18
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times vashrv64di3_exec 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times vashrv64si3_exec 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times vlshrv64di3_exec 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times vlshrv64si3_exec 1
>>> PASS: gcc.target/gcn/cond_shift_4_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_shift_4_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_shift_8.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_shift_8.c scan-assembler-not movv64di_exec/0
>>> PASS: gcc.target/gcn/cond_shift_8.c scan-assembler-not movv64si_exec/0
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times vashlv64di3_exec 2
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times vashlv64si3_exec 18
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times vashrv64di3_exec 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times vashrv64si3_exec 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times vlshrv64di3_exec 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times vlshrv64si3_exec 1
>>> PASS: gcc.target/gcn/cond_shift_8_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_shift_8_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_shift_9.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_shift_9.c scan-assembler-not movv64di_exec/1
>>> PASS: gcc.target/gcn/cond_shift_9.c scan-assembler-not movv64si_exec/2
>>> PASS: gcc.target/gcn/cond_shift_9.c scan-assembler-not v_cndmask_b32
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times vashlv64di3_exec 2
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times vashlv64si3_exec 18
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times vashrv64di3_exec 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times vashrv64si3_exec 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times vlshrv64di3_exec 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times vlshrv64si3_exec 1
>>> PASS: gcc.target/gcn/cond_shift_9_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_shift_9_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_smax_1.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_smax_1.c scan-assembler-not \\ts_cmpk_lg_u32\\tvcc_lo, 0
>>> PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-not \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+
>>> PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_??, 0
>>> PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-not smaxv64si3/0
>>> PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80
>>> PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10
>>> PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_ne_u64\\ts\\[[0-9]+:[0-9]+\\], v\\[[0-9]+:[0-9]+\\], -1 10
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_smax_1.c scan-assembler-times smaxv64si3_exec 30
>>> PASS: gcc.target/gcn/cond_smax_1_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_smax_1_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_smin_1.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_smin_1.c scan-assembler-not \\ts_cmpk_lg_u32\\tvcc_lo, 0
>>> PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-not \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+
>>> PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_??, 0
>>> PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-not sminv64si3/0
>>> PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80
>>> PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_lt_i64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10
>>> PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_ne_u64\\ts\\[[0-9]+:[0-9]+\\], v\\[[0-9]+:[0-9]+\\], -1 10
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_smin_1.c scan-assembler-times sminv64si3_exec 30
>>> PASS: gcc.target/gcn/cond_smin_1_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_smin_1_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_umax_1.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umax_1.c scan-assembler-not \\ts_cmpk_lg_u32\\tvcc_lo, 0
>>> PASS: gcc.target/gcn/cond_umax_1.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_??, 0
>>> PASS: gcc.target/gcn/cond_umax_1.c scan-assembler-not umaxv64si3/0
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56
>>> PASS: gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_u64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8
>>> PASS: gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_ne_u64\\ts\\[[0-9]+:[0-9]+\\], v\\[[0-9]+:[0-9]+\\], 1 8
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umax_1.c scan-assembler-times umaxv64si3_exec 20
>>> PASS: gcc.target/gcn/cond_umax_1_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_umax_1_run.c execution test
>>>
>>> PASS: gcc.target/gcn/cond_umin_1.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umin_1.c scan-assembler-not \\ts_cmpk_lg_u32\\tvcc_lo, 0
>>> PASS: gcc.target/gcn/cond_umin_1.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_??, 0
>>> PASS: gcc.target/gcn/cond_umin_1.c scan-assembler-not uminv64si3/0
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56
>>> PASS: gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_lt_u64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8
>>> PASS: gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_ne_u64\\ts\\[[0-9]+:[0-9]+\\], v\\[[0-9]+:[0-9]+\\], 1 8
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umin_1.c scan-assembler-times uminv64si3_exec 20
>>> PASS: gcc.target/gcn/cond_umin_1_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/cond_umin_1_run.c execution test
>>>
>>> PASS: gcc.target/gcn/simd-math-1.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_acos"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_acosh"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_asin"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_asinh"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_atan"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_atan2"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_atanh"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_copysign"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_cos"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_cosh"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_erf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_exp"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_exp2"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_fmod"
>>> XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_gamma"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_hypot"
>>> XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_lgamma"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_log"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_log10"
>>> XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_log2"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_pow"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_remainder"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_rint"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_scalb"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_significand"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_sin"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_sinh"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_sqrt"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_tan"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_tanh"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_tgamma"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_acosf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_acoshf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_asinf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_asinhf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_atan2f"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_atanf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_atanhf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_copysignf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_cosf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_coshf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_erff"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_exp2f"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_expf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_fmodf"
>>> XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_gammaf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_hypotf"
>>> XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_lgammaf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_log10f"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_log2f"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_logf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_powf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_remainderf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_rintf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_scalbf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_significandf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_sinf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_sinhf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_sqrtf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_tanf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_tanhf"
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_tgammaf"
>>>
>>> @@ -125130,7 +125130,7 @@ PASS: gcc.target/gcn/simd-math-5-char-run.c (test for excess errors)
>>> PASS: gcc.target/gcn/simd-math-5-char-run.c execution test
>>> PASS: gcc.target/gcn/simd-math-5-char.c (test for excess errors)
>>> XFAIL: gcc.target/gcn/simd-math-5-char.c scan-assembler-times __divmodv64si4@rel32@lo 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5-char.c scan-assembler-times __divv64hi3@rel32@lo 1
>>> PASS: gcc.target/gcn/simd-math-5-char.c scan-assembler-times __divv64qi3@rel32@lo 0
>>> FAIL: gcc.target/gcn/simd-math-5-char.c scan-assembler-times __modv64qi3@rel32@lo 1
>>> PASS: gcc.target/gcn/simd-math-5-char.c scan-assembler-times __udivv64qi3@rel32@lo 0
>>>
>>> @@ -125171,8 +125171,8 @@ PASS: gcc.target/gcn/simd-math-5-long-run.c (test for excess errors)
>>> PASS: gcc.target/gcn/simd-math-5-long-run.c execution test
>>> PASS: gcc.target/gcn/simd-math-5-long.c (test for excess errors)
>>> XFAIL: gcc.target/gcn/simd-math-5-long.c scan-assembler-times __divmodv64di4@rel32@lo 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5-long.c scan-assembler-times __divv64di3@rel32@lo 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5-long.c scan-assembler-times __modv64di3@rel32@lo 1
>>> PASS: gcc.target/gcn/simd-math-5-long.c scan-assembler-times __udivv64di3@rel32@lo 0
>>> PASS: gcc.target/gcn/simd-math-5-long.c scan-assembler-times __umodv64di3@rel32@lo 0
>>>
>>> PASS: gcc.target/gcn/simd-math-5-short.c (test for excess errors)
>>> XFAIL: gcc.target/gcn/simd-math-5-short.c scan-assembler-times __divmodv64si4@rel32@lo 1
>>> PASS: gcc.target/gcn/simd-math-5-short.c scan-assembler-times __divv64hi3@rel32@lo 0
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5-short.c scan-assembler-times __divv64si3@rel32@lo 1
>>> FAIL: gcc.target/gcn/simd-math-5-short.c scan-assembler-times __modv64hi3@rel32@lo 1
>>> PASS: gcc.target/gcn/simd-math-5-short.c scan-assembler-times __udivv64hi3@rel32@lo 0
>>> PASS: gcc.target/gcn/simd-math-5-short.c scan-assembler-times __umodv64hi3@rel32@lo 0
>>>
>>> PASS: gcc.target/gcn/simd-math-5.c (test for excess errors)
>>> XFAIL: gcc.target/gcn/simd-math-5.c scan-assembler-times __divmodv64si4@rel32@lo 1
>>> PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times __divsi3@rel32@lo 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5.c scan-assembler-times __divv64si3@rel32@lo 1
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5.c scan-assembler-times __modv64si3@rel32@lo 1
>>> PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times __udivmodv64si4@rel32@lo 0
>>> PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times __udivsi3@rel32@lo 0
>>> PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times __udivv64si3@rel32@lo 0
>>> @@ -125242,13 +125242,13 @@ PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times __umodv64si3@rel32@lo 0
>>>
>>> PASS: gcc.target/gcn/smax_1.c (test for excess errors)
>>> PASS: gcc.target/gcn/smax_1.c scan-assembler-times \\tv_cmp_gt_i64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10
>>> FAIL: gcc.target/gcn/smax_1.c scan-assembler-times \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/smax_1.c scan-assembler-times vec_cmpv64didi 10
>>> PASS: gcc.target/gcn/smax_1_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/smax_1_run.c execution test
>>>
>>> PASS: gcc.target/gcn/smin_1.c (test for excess errors)
>>> PASS: gcc.target/gcn/smin_1.c scan-assembler-times \\tv_cmp_lt_i64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10
>>> FAIL: gcc.target/gcn/smin_1.c scan-assembler-times \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/smin_1.c scan-assembler-times vec_cmpv64didi 10
>>> PASS: gcc.target/gcn/smin_1_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/smin_1_run.c execution test
>>>
>>> PASS: gcc.target/gcn/sram-ecc-3.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-3.c scan-assembler (\\*zero_extendv64qiv64si_sdwa|\\*zero_extendv64qiv64si_shift)
>>>
>>> PASS: gcc.target/gcn/sram-ecc-4.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-4.c scan-assembler (\\*zero_extendv64hiv64si_sdwa|\\*zero_extendv64hiv64si_shift)
>>>
>>> PASS: gcc.target/gcn/sram-ecc-7.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-7.c scan-assembler (\\*zero_extendv64qiv64si_sdwa|\\*zero_extendv64qiv64si_shift)
>>>
>>> PASS: gcc.target/gcn/sram-ecc-8.c (test for excess errors)
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-8.c scan-assembler (\\*zero_extendv64hiv64si_sdwa|\\*zero_extendv64hiv64si_shift)
>>>
>>> PASS: gcc.target/gcn/umax_1.c (test for excess errors)
>>> PASS: gcc.target/gcn/umax_1.c scan-assembler-times \\tv_cmp_gt_u64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8
>>> FAIL: gcc.target/gcn/umax_1.c scan-assembler-times \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/umax_1.c scan-assembler-times vec_cmpv64didi 8
>>> PASS: gcc.target/gcn/umax_1_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/umax_1_run.c execution test
>>>
>>> PASS: gcc.target/gcn/umin_1.c (test for excess errors)
>>> PASS: gcc.target/gcn/umin_1.c scan-assembler-times \\tv_cmp_lt_u64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8
>>> FAIL: gcc.target/gcn/umin_1.c scan-assembler-times \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56
>>> [-PASS:-]{+FAIL:+} gcc.target/gcn/umin_1.c scan-assembler-times vec_cmpv64didi 8
>>> PASS: gcc.target/gcn/umin_1_run.c (test for excess errors)
>>> PASS: gcc.target/gcn/umin_1_run.c execution test
>>>
>>>
>>> Grüße
>>> Thomas
From df7625c3af004a81c13d54bb8810e03932eeb59a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tschwinge@baylibre.com>
Date: Sat, 24 Feb 2024 00:29:14 +0100
Subject: [PATCH] GCN:
'--param=gcn-preferred-vectorization-factor=[default,32,64]'
..., and specify '--param=gcn-preferred-vectorization-factor=64' for
'gcc.target/gcn/[...]' test cases with 'scan-assembler' directives that
are specific to 64-lane vectors. This resolves regressions introduced
in commit 6dedafe166cc02ae87b6a0699ad61ce3ffc46803
"amdgcn: Prefer V32 on RDNA devices".
gcc/
* config/gcn/gcn.opt (--param=gcn-preferred-vectorization-factor):
New.
* config/gcn/gcn.cc (gcn_vectorize_preferred_simd_mode) Use it.
* doc/invoke.texi (Optimize Options): Document it.
gcc/testsuite/
* gcc.target/gcn/cond_fmaxnm_1.c: Specify
'--param=gcn-preferred-vectorization-factor=64'.
* gcc.target/gcn/cond_fmaxnm_2.c: Likewise.
* gcc.target/gcn/cond_fmaxnm_3.c: Likewise.
* gcc.target/gcn/cond_fmaxnm_4.c: Likewise.
* gcc.target/gcn/cond_fmaxnm_5.c: Likewise.
* gcc.target/gcn/cond_fmaxnm_6.c: Likewise.
* gcc.target/gcn/cond_fmaxnm_7.c: Likewise.
* gcc.target/gcn/cond_fmaxnm_8.c: Likewise.
* gcc.target/gcn/cond_fminnm_1.c: Likewise.
* gcc.target/gcn/cond_fminnm_2.c: Likewise.
* gcc.target/gcn/cond_fminnm_3.c: Likewise.
* gcc.target/gcn/cond_fminnm_4.c: Likewise.
* gcc.target/gcn/cond_fminnm_5.c: Likewise.
* gcc.target/gcn/cond_fminnm_6.c: Likewise.
* gcc.target/gcn/cond_fminnm_7.c: Likewise.
* gcc.target/gcn/cond_fminnm_8.c: Likewise.
* gcc.target/gcn/cond_shift_3.c: Likewise.
* gcc.target/gcn/cond_shift_4.c: Likewise.
* gcc.target/gcn/cond_shift_8.c: Likewise.
* gcc.target/gcn/cond_shift_9.c: Likewise.
* gcc.target/gcn/cond_smax_1.c: Likewise.
* gcc.target/gcn/cond_smin_1.c: Likewise.
* gcc.target/gcn/cond_umax_1.c: Likewise.
* gcc.target/gcn/cond_umin_1.c: Likewise.
* gcc.target/gcn/simd-math-1.c: Likewise.
* gcc.target/gcn/simd-math-5-char.c: Likewise.
* gcc.target/gcn/simd-math-5-long.c: Likewise.
* gcc.target/gcn/simd-math-5-short.c: Likewise.
* gcc.target/gcn/simd-math-5.c: Likewise.
* gcc.target/gcn/smax_1.c: Likewise.
* gcc.target/gcn/smin_1.c: Likewise.
* gcc.target/gcn/umax_1.c: Likewise.
* gcc.target/gcn/umin_1.c: Likewise.
---
gcc/config/gcn/gcn.cc | 14 +++++++++++++-
gcc/config/gcn/gcn.opt | 16 ++++++++++++++++
gcc/doc/invoke.texi | 8 ++++++++
gcc/testsuite/gcc.target/gcn/cond_fmaxnm_1.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fmaxnm_2.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fmaxnm_3.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fmaxnm_4.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fmaxnm_5.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fmaxnm_6.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fmaxnm_7.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fmaxnm_8.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fminnm_1.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fminnm_2.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fminnm_3.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fminnm_4.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fminnm_5.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fminnm_6.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fminnm_7.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_fminnm_8.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_shift_3.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_shift_4.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_shift_8.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_shift_9.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_smax_1.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_smin_1.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_umax_1.c | 2 ++
gcc/testsuite/gcc.target/gcn/cond_umin_1.c | 2 ++
gcc/testsuite/gcc.target/gcn/simd-math-1.c | 3 ++-
gcc/testsuite/gcc.target/gcn/simd-math-5-char.c | 3 +++
gcc/testsuite/gcc.target/gcn/simd-math-5-long.c | 3 +++
gcc/testsuite/gcc.target/gcn/simd-math-5-short.c | 3 +++
gcc/testsuite/gcc.target/gcn/simd-math-5.c | 3 +++
gcc/testsuite/gcc.target/gcn/smax_1.c | 2 ++
gcc/testsuite/gcc.target/gcn/smin_1.c | 2 ++
gcc/testsuite/gcc.target/gcn/umax_1.c | 2 ++
gcc/testsuite/gcc.target/gcn/umin_1.c | 2 ++
36 files changed, 107 insertions(+), 2 deletions(-)
@@ -5231,6 +5231,14 @@ gcn_vector_mode_supported_p (machine_mode mode)
static machine_mode
gcn_vectorize_preferred_simd_mode (scalar_mode mode)
{
+ bool v32;
+ if (gcn_preferred_vectorization_factor == 32)
+ v32 = true;
+ else if (gcn_preferred_vectorization_factor == 64)
+ v32 = false;
+ else if (gcn_preferred_vectorization_factor != -1)
+ gcc_unreachable ();
+ else if (TARGET_RDNA2_PLUS)
/* RDNA devices have 32-lane vectors with limited support for 64-bit vectors
(in particular, permute operations are only available for cases that don't
span the 32-lane boundary).
@@ -5238,7 +5246,11 @@ gcn_vectorize_preferred_simd_mode (scalar_mode mode)
From the RDNA3 manual: "Hardware may choose to skip either half if the
EXEC mask for that half is all zeros...". This means that preferring
32-lanes is a good stop-gap until we have proper wave32 support. */
- if (TARGET_RDNA2_PLUS)
+ v32 = true;
+ else
+ v32 = false;
+
+ if (v32)
switch (mode)
{
case E_QImode:
@@ -116,3 +116,19 @@ Compile for devices requiring XNACK enabled. Default \"any\" if USM is supported
msram-ecc=
Target RejectNegative Joined ToLower Enum(hsaco_attr_type) Var(flag_sram_ecc) Init(HSACO_ATTR_ANY)
Compile for devices with the SRAM ECC feature enabled, or not. Default \"any\".
+
+-param=gcn-preferred-vectorization-factor=
+Target Joined Enum(gcn_preferred_vectorization_factor) Var(gcn_preferred_vectorization_factor) Init(-1) Param
+--param=gcn-preferred-vectorization-factor=[default,32,64] Preferred vectorization factor.
+
+Enum
+Name(gcn_preferred_vectorization_factor) Type(int)
+
+EnumValue
+Enum(gcn_preferred_vectorization_factor) String(default) Value(-1)
+
+EnumValue
+Enum(gcn_preferred_vectorization_factor) String(32) Value(32)
+
+EnumValue
+Enum(gcn_preferred_vectorization_factor) String(64) Value(64)
@@ -17017,6 +17017,14 @@ loop. The default value is four.
@end table
+The following choices of @var{name} are available on GCN targets:
+
+@table @gcctabopt
+@item gcn-preferred-vectorization-factor
+Preferred vectorization factor: @samp{default}, @samp{32}, @samp{64}.
+
+@end table
+
The following choices of @var{name} are available on i386 and x86_64 targets:
@table @gcctabopt
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include "cond_fmaxnm_1.c"
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include "cond_fmaxnm_2.c"
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include "cond_fmaxnm_3.c"
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include "cond_fmaxnm_4.c"
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#define FN(X) __builtin_fmin##X
#include "cond_fmaxnm_1.c"
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#define FN(X) __builtin_fmin##X
#include "cond_fmaxnm_2.c"
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#define FN(X) __builtin_fmin##X
#include "cond_fmaxnm_3.c"
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#define FN(X) __builtin_fmin##X
#include "cond_fmaxnm_4.c"
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#define FN(X) __builtin_fmin##X
#include "cond_fmaxnm_1.c"
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#define FN(X) __builtin_fmin##X
#include "cond_fmaxnm_2.c"
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#define FN(X) __builtin_fmin##X
#include "cond_fmaxnm_3.c"
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#define FN(X) __builtin_fmin##X
#include "cond_fmaxnm_4.c"
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -2,7 +2,8 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -fno-math-errno -mstack-size=3000000 -fdump-tree-vect" } */
-
+/* The 'scan-tree-dump' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#undef PRINT_RESULT
#define VERBOSE 0
@@ -1,3 +1,6 @@
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
+
#define TYPE char
#include "simd-math-5.c"
@@ -1,3 +1,6 @@
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
+
#define TYPE long
#include "simd-math-5.c"
@@ -1,3 +1,6 @@
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
+
#define TYPE short
#include "simd-math-5.c"
@@ -1,6 +1,9 @@
/* Test that the auto-vectorizer uses the libgcc vectorized division and
modulus functions. */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
+
/* Setting it this way ensures the run tests use the same flag as the
compile tests. */
#pragma GCC optimize("O2")
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
@@ -1,5 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -dp" } */
+/* The 'scan-assembler' directives are specific to 64-lane vectors.
+ { dg-additional-options --param=gcn-preferred-vectorization-factor=64 } */
#include <stdint.h>
--
2.34.1