middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)

Message ID 20220818104608.259204-1-juzhe.zhong@rivai.ai
State New
Headers
Series middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) |

Commit Message

juzhe.zhong@rivai.ai Aug. 18, 2022, 10:46 a.m. UTC
  From: zhongjuzhe <juzhe.zhong@rivai.ai>

Hello. This patch is preparing for following RVV support.

Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.

So I define the machine_mode as follows:
VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
ADJUST_NUNITS (MODE, riscv_vector_chunks);
The riscv_vector_chunks = poly_uint16 (1, 1)

The compilation is failed for the stepped vector test:
(const_vector:VNx1DI repeat [
        (const_int 8 [0x8])
        (const_int 7 [0x7])
    ])

I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).

machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!

  

gcc/ChangeLog:

        * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).

---
 gcc/simplify-rtx.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Richard Sandiford Aug. 19, 2022, 8:03 a.m. UTC | #1
juzhe.zhong@rivai.ai writes:
> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>
> Hello. This patch is preparing for following RVV support.
>
> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>
> So I define the machine_mode as follows:
> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
> ADJUST_NUNITS (MODE, riscv_vector_chunks);
> The riscv_vector_chunks = poly_uint16 (1, 1)
>
> The compilation is failed for the stepped vector test:
> (const_vector:VNx1DI repeat [
>         (const_int 8 [0x8])
>         (const_int 7 [0x7])
>     ])
>
> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>
> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>
>   
>
> gcc/ChangeLog:
>
>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>
> ---
>  gcc/simplify-rtx.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 7d09bf7103d..61e0dfa00d0 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>    rtx x = builder.build ();
>  
>    test_vector_subregs_modes (x);
> -  if (!nunits.is_constant ())
> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>      test_vector_subregs_modes (x, nunits - min_nunits, count);

I think instead we should use maybe_gt (nunits, 1), on the basis that
the fore_back tests require vectors that have a minimum of 2 elements.
Something like poly_uint16 (1, 2) would have the same problem as
poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
principle.)

This corresponds to the minimum of 3 elements for the stepped tests:

	  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
	      && maybe_gt (GET_MODE_NUNITS (mode), 2))
	    {
	      test_vector_ops_series (mode, scalar_reg);
	      test_vector_subregs (mode);
	    }

Thanks,
Richard
  
juzhe.zhong@rivai.ai Aug. 19, 2022, 8:19 a.m. UTC | #2
Thank you so much. Address your comment. I think "maybe_gt (nunits, 1)" is a more solid solution than I do.
I will send a patch to fix this.



juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-19 16:03
To: juzhe.zhong
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
juzhe.zhong@rivai.ai writes:
> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>
> Hello. This patch is preparing for following RVV support.
>
> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>
> So I define the machine_mode as follows:
> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
> ADJUST_NUNITS (MODE, riscv_vector_chunks);
> The riscv_vector_chunks = poly_uint16 (1, 1)
>
> The compilation is failed for the stepped vector test:
> (const_vector:VNx1DI repeat [
>         (const_int 8 [0x8])
>         (const_int 7 [0x7])
>     ])
>
> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>
> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>
>   
>
> gcc/ChangeLog:
>
>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>
> ---
>  gcc/simplify-rtx.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 7d09bf7103d..61e0dfa00d0 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>    rtx x = builder.build ();
>  
>    test_vector_subregs_modes (x);
> -  if (!nunits.is_constant ())
> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>      test_vector_subregs_modes (x, nunits - min_nunits, count);
 
I think instead we should use maybe_gt (nunits, 1), on the basis that
the fore_back tests require vectors that have a minimum of 2 elements.
Something like poly_uint16 (1, 2) would have the same problem as
poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
principle.)
 
This corresponds to the minimum of 3 elements for the stepped tests:
 
  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
      && maybe_gt (GET_MODE_NUNITS (mode), 2))
    {
      test_vector_ops_series (mode, scalar_reg);
      test_vector_subregs (mode);
    }
 
Thanks,
Richard
  
juzhe.zhong@rivai.ai Aug. 19, 2022, 9:06 a.m. UTC | #3
Hi, Richard. I tried the codes:
if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
test_vector_subregs_modes (x, nunits - min_nunits, count);

It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
But I tried:
if (!nunits.is_constant () && known_gt (nunits, 1)) 
test_vector_subregs_modes (x, nunits - min_nunits, count);
It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.

Finally, I tried:
if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
test_vector_subregs_modes (x, nunits - min_nunits, count);
It passed with no warning.

Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
Thanks!


juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-19 16:03
To: juzhe.zhong
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
juzhe.zhong@rivai.ai writes:
> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>
> Hello. This patch is preparing for following RVV support.
>
> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>
> So I define the machine_mode as follows:
> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
> ADJUST_NUNITS (MODE, riscv_vector_chunks);
> The riscv_vector_chunks = poly_uint16 (1, 1)
>
> The compilation is failed for the stepped vector test:
> (const_vector:VNx1DI repeat [
>         (const_int 8 [0x8])
>         (const_int 7 [0x7])
>     ])
>
> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>
> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>
>   
>
> gcc/ChangeLog:
>
>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>
> ---
>  gcc/simplify-rtx.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 7d09bf7103d..61e0dfa00d0 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>    rtx x = builder.build ();
>  
>    test_vector_subregs_modes (x);
> -  if (!nunits.is_constant ())
> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>      test_vector_subregs_modes (x, nunits - min_nunits, count);
 
I think instead we should use maybe_gt (nunits, 1), on the basis that
the fore_back tests require vectors that have a minimum of 2 elements.
Something like poly_uint16 (1, 2) would have the same problem as
poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
principle.)
 
This corresponds to the minimum of 3 elements for the stepped tests:
 
  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
      && maybe_gt (GET_MODE_NUNITS (mode), 2))
    {
      test_vector_ops_series (mode, scalar_reg);
      test_vector_subregs (mode);
    }
 
Thanks,
Richard
  
Richard Sandiford Aug. 19, 2022, 9:35 a.m. UTC | #4
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Hi, Richard. I tried the codes:
> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
> test_vector_subregs_modes (x, nunits - min_nunits, count);
>
> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.

Ah, right, sorry for the bogus suggestion.

In that case, what specifically goes wrong?  Which test in
test_vector_subregs_modes fails, and what does the ICE look like?

Thanks,
Richard

> But I tried:
> if (!nunits.is_constant () && known_gt (nunits, 1)) 
> test_vector_subregs_modes (x, nunits - min_nunits, count);
> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>
> Finally, I tried:
> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
> test_vector_subregs_modes (x, nunits - min_nunits, count);
> It passed with no warning.
>
> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
> Thanks!
>
>
> juzhe.zhong@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-19 16:03
> To: juzhe.zhong
> CC: gcc-patches; rguenther; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> juzhe.zhong@rivai.ai writes:
>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>
>> Hello. This patch is preparing for following RVV support.
>>
>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>
>> So I define the machine_mode as follows:
>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>
>> The compilation is failed for the stepped vector test:
>> (const_vector:VNx1DI repeat [
>>         (const_int 8 [0x8])
>>         (const_int 7 [0x7])
>>     ])
>>
>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>
>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>
>>   
>>
>> gcc/ChangeLog:
>>
>>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>
>> ---
>>  gcc/simplify-rtx.cc | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>> index 7d09bf7103d..61e0dfa00d0 100644
>> --- a/gcc/simplify-rtx.cc
>> +++ b/gcc/simplify-rtx.cc
>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>    rtx x = builder.build ();
>>  
>>    test_vector_subregs_modes (x);
>> -  if (!nunits.is_constant ())
>> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>      test_vector_subregs_modes (x, nunits - min_nunits, count);
>  
> I think instead we should use maybe_gt (nunits, 1), on the basis that
> the fore_back tests require vectors that have a minimum of 2 elements.
> Something like poly_uint16 (1, 2) would have the same problem as
> poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
> principle.)
>  
> This corresponds to the minimum of 3 elements for the stepped tests:
>  
>   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>       && maybe_gt (GET_MODE_NUNITS (mode), 2))
>     {
>       test_vector_ops_series (mode, scalar_reg);
>       test_vector_subregs (mode);
>     }
>  
> Thanks,
> Richard
>
  
juzhe.zhong@rivai.ai Aug. 19, 2022, 9:57 a.m. UTC | #5
> Ah, right, sorry for the bogus suggestion.
> In that case, what specifically goes wrong?  Which test in
> test_vector_subregs_modes fails, and what does the ICE look like?

Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
The fail ICE:
../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
  expected: (nil)

  actual: (const_int 0 [0])
cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
        ../../../riscv-gcc/gcc/selftest-rtl.cc:57
0x1332504 test_vector_subregs_modes
        ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
0x1332988 test_vector_subregs_fore_back
        ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
0x1332ae7 test_vector_subregs
        ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
0x1332c57 test_vector_ops
        ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
0x1332c7b selftest::simplify_rtx_cc_tests()
        ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
0x21318fa selftest::run_tests()
        ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
0x1362a76 toplev::run_self_tests()
        ../../../riscv-gcc/gcc/toplev.cc:2205

I analyzed the codes:
In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
So the assertion fails.

This is the test for stepped vector using 2 element per pattern.  For poly_uint16 (1,1), it's true it is possible only has 1 element. 
I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
kind of machine mode only used for intrinsics?  I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
I only enable auto-vectorization with mode larger than (2,2) and test it fully.
From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~



juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-19 17:35
To: juzhe.zhong\@rivai.ai
CC: rguenther; gcc-patches; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Hi, Richard. I tried the codes:
> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
> test_vector_subregs_modes (x, nunits - min_nunits, count);
>
> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
 
Ah, right, sorry for the bogus suggestion.
 
In that case, what specifically goes wrong?  Which test in
test_vector_subregs_modes fails, and what does the ICE look like?
 
Thanks,
Richard
 
> But I tried:
> if (!nunits.is_constant () && known_gt (nunits, 1)) 
> test_vector_subregs_modes (x, nunits - min_nunits, count);
> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>
> Finally, I tried:
> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
> test_vector_subregs_modes (x, nunits - min_nunits, count);
> It passed with no warning.
>
> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
> Thanks!
>
>
> juzhe.zhong@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-19 16:03
> To: juzhe.zhong
> CC: gcc-patches; rguenther; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> juzhe.zhong@rivai.ai writes:
>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>
>> Hello. This patch is preparing for following RVV support.
>>
>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>
>> So I define the machine_mode as follows:
>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>
>> The compilation is failed for the stepped vector test:
>> (const_vector:VNx1DI repeat [
>>         (const_int 8 [0x8])
>>         (const_int 7 [0x7])
>>     ])
>>
>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>
>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>
>>   
>>
>> gcc/ChangeLog:
>>
>>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>
>> ---
>>  gcc/simplify-rtx.cc | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>> index 7d09bf7103d..61e0dfa00d0 100644
>> --- a/gcc/simplify-rtx.cc
>> +++ b/gcc/simplify-rtx.cc
>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>    rtx x = builder.build ();
>>  
>>    test_vector_subregs_modes (x);
>> -  if (!nunits.is_constant ())
>> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>      test_vector_subregs_modes (x, nunits - min_nunits, count);
>  
> I think instead we should use maybe_gt (nunits, 1), on the basis that
> the fore_back tests require vectors that have a minimum of 2 elements.
> Something like poly_uint16 (1, 2) would have the same problem as
> poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
> principle.)
>  
> This corresponds to the minimum of 3 elements for the stepped tests:
>  
>   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>       && maybe_gt (GET_MODE_NUNITS (mode), 2))
>     {
>       test_vector_ops_series (mode, scalar_reg);
>       test_vector_subregs (mode);
>     }
>  
> Thanks,
> Richard
>
  
Richard Sandiford Aug. 19, 2022, 12:52 p.m. UTC | #6
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Ah, right, sorry for the bogus suggestion.
>> In that case, what specifically goes wrong?  Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>
> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
> The fail ICE:
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>   expected: (nil)
>
>   actual: (const_int 0 [0])
> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
>         ../../../riscv-gcc/gcc/selftest-rtl.cc:57
> 0x1332504 test_vector_subregs_modes
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
> 0x1332988 test_vector_subregs_fore_back
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
> 0x1332ae7 test_vector_subregs
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
> 0x1332c57 test_vector_ops
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
> 0x1332c7b selftest::simplify_rtx_cc_tests()
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
> 0x21318fa selftest::run_tests()
>         ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
> 0x1362a76 toplev::run_self_tests()
>         ../../../riscv-gcc/gcc/toplev.cc:2205
>
> I analyzed the codes:
> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
> So the assertion fails.

Hmm, ok, so the subreg operation is unexpected succeeding.

> This is the test for stepped vector using 2 element per pattern.  For poly_uint16 (1,1), it's true it is possible only has 1 element. 

The stepped case is 3 elements per pattern rather than 2.  In a stepped
pattern: a, b, b+n are represented explicitly, then the rest are
implicitly b+n*2, b+n*3, etc.

The case being handled by this code is instead the 2-element case:
a, b are represented explicitly, then the rest are implicitly all b.

Why is (1,1) different though?  The test is constructing:

  nunits: 1 + 1x
  shape: nelts_per_pattern == 2, npatterns == 1
  elements: a, b[, b, b, b, b, ...]

It then tests subregs starting at 0 + 1x (so starting at the first b).
But for (2,2) we should have:

  nunits: 2 + 2x
  shape: nelts_per_pattern == 2, npatterns == 2
  elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]

and we should test subregs starting at 0 + 2x (so starting at the
first b1).  The two cases should be very similar, it's just that the
(2,2) case doubles the number of patterns.

> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
> kind of machine mode only used for intrinsics?  I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~

Following on from what I said above, it doesn't look like this particular
case is related to stepped vectors.

(1,1) shouldn't (need to) be a special case though.  Any potentital
problems that would occur for (1,1) with npatterns==1 would also occur
for (n,n) with npatterns==n.  E.g. if stepped vectors are problematic
for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
would be problematic for (2,2).

So yeah, preventing a mode being used for autovectorisation would allow
the target to have a bit more control over which constants are actually
generated.  But it shouldn't be necessary to do that for correctness.

Thanks,
Richard

> juzhe.zhong@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-19 17:35
> To: juzhe.zhong\@rivai.ai
> CC: rguenther; gcc-patches; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Hi, Richard. I tried the codes:
>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>
>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>  
> Ah, right, sorry for the bogus suggestion.
>  
> In that case, what specifically goes wrong?  Which test in
> test_vector_subregs_modes fails, and what does the ICE look like?
>  
> Thanks,
> Richard
>  
>> But I tried:
>> if (!nunits.is_constant () && known_gt (nunits, 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>
>> Finally, I tried:
>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It passed with no warning.
>>
>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>> Thanks!
>>
>>
>> juzhe.zhong@rivai.ai
>>  
>> From: Richard Sandiford
>> Date: 2022-08-19 16:03
>> To: juzhe.zhong
>> CC: gcc-patches; rguenther; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> juzhe.zhong@rivai.ai writes:
>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>
>>> Hello. This patch is preparing for following RVV support.
>>>
>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>
>>> So I define the machine_mode as follows:
>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>
>>> The compilation is failed for the stepped vector test:
>>> (const_vector:VNx1DI repeat [
>>>         (const_int 8 [0x8])
>>>         (const_int 7 [0x7])
>>>     ])
>>>
>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>
>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>
>>>   
>>>
>>> gcc/ChangeLog:
>>>
>>>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>
>>> ---
>>>  gcc/simplify-rtx.cc | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>> index 7d09bf7103d..61e0dfa00d0 100644
>>> --- a/gcc/simplify-rtx.cc
>>> +++ b/gcc/simplify-rtx.cc
>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>>    rtx x = builder.build ();
>>>  
>>>    test_vector_subregs_modes (x);
>>> -  if (!nunits.is_constant ())
>>> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>>      test_vector_subregs_modes (x, nunits - min_nunits, count);
>>  
>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>> the fore_back tests require vectors that have a minimum of 2 elements.
>> Something like poly_uint16 (1, 2) would have the same problem as
>> poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
>> principle.)
>>  
>> This corresponds to the minimum of 3 elements for the stepped tests:
>>  
>>   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>       && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>     {
>>       test_vector_ops_series (mode, scalar_reg);
>>       test_vector_subregs (mode);
>>     }
>>  
>> Thanks,
>> Richard
>>  
>
  
juzhe.zhong@rivai.ai Aug. 19, 2022, 2:10 p.m. UTC | #7
As you mentioned. For poly_uint16 (1, 1), the pattern should be:
shape: nelts_per_pattern == 2, npatterns == 1
 elements: a, b[, b, b, b, b, ...]

I tried to print out the rtx that tests create to test:

static void
test_vector_subregs_fore_back (machine_mode inner_mode)
{
  poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
  unsigned int min_nunits = constant_lower_bound (nunits);
  scalar_mode int_mode = GET_MODE_INNER (inner_mode);
  unsigned int count = gcd (min_nunits, 4);

  rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (-(int) i, int_mode));
  rtx x = builder.build ();

  test_vector_subregs_modes (x);
  print_rtl_single(stdout,x);
  if (!nunits.is_constant ())
    test_vector_subregs_modes (x, nunits - min_nunits, count);
}

the x value:
(const_vector:VNx1DI repeat [
        (const_int 0 [0])
    ])

It seems that it doesn't match the pattern you said. Am I understanding correctly? And would you mind taking a look at the codes which generate x:

static void
test_vector_subregs_fore_back (machine_mode inner_mode)
{
  poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
  unsigned int min_nunits = constant_lower_bound (nunits);
  scalar_mode int_mode = GET_MODE_INNER (inner_mode);
  unsigned int count = gcd (min_nunits, 4);

  rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (-(int) i, int_mode));
  rtx x = builder.build ();

The nunits  = (1,1), min_nunits = 1, count = 1.  I print out these variable value may helpful for you to check. Thanks!




juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-19 20:52
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Ah, right, sorry for the bogus suggestion.
>> In that case, what specifically goes wrong?  Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>
> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
> The fail ICE:
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>   expected: (nil)
>
>   actual: (const_int 0 [0])
> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
>         ../../../riscv-gcc/gcc/selftest-rtl.cc:57
> 0x1332504 test_vector_subregs_modes
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
> 0x1332988 test_vector_subregs_fore_back
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
> 0x1332ae7 test_vector_subregs
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
> 0x1332c57 test_vector_ops
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
> 0x1332c7b selftest::simplify_rtx_cc_tests()
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
> 0x21318fa selftest::run_tests()
>         ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
> 0x1362a76 toplev::run_self_tests()
>         ../../../riscv-gcc/gcc/toplev.cc:2205
>
> I analyzed the codes:
> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
> So the assertion fails.
 
Hmm, ok, so the subreg operation is unexpected succeeding.
 
> This is the test for stepped vector using 2 element per pattern.  For poly_uint16 (1,1), it's true it is possible only has 1 element. 
 
The stepped case is 3 elements per pattern rather than 2.  In a stepped
pattern: a, b, b+n are represented explicitly, then the rest are
implicitly b+n*2, b+n*3, etc.
 
The case being handled by this code is instead the 2-element case:
a, b are represented explicitly, then the rest are implicitly all b.
 
Why is (1,1) different though?  The test is constructing:
 
  nunits: 1 + 1x
  shape: nelts_per_pattern == 2, npatterns == 1
  elements: a, b[, b, b, b, b, ...]
 
It then tests subregs starting at 0 + 1x (so starting at the first b).
But for (2,2) we should have:
 
  nunits: 2 + 2x
  shape: nelts_per_pattern == 2, npatterns == 2
  elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
 
and we should test subregs starting at 0 + 2x (so starting at the
first b1).  The two cases should be very similar, it's just that the
(2,2) case doubles the number of patterns.
 
> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
> kind of machine mode only used for intrinsics?  I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
 
Following on from what I said above, it doesn't look like this particular
case is related to stepped vectors.
 
(1,1) shouldn't (need to) be a special case though.  Any potentital
problems that would occur for (1,1) with npatterns==1 would also occur
for (n,n) with npatterns==n.  E.g. if stepped vectors are problematic
for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
would be problematic for (2,2).
 
So yeah, preventing a mode being used for autovectorisation would allow
the target to have a bit more control over which constants are actually
generated.  But it shouldn't be necessary to do that for correctness.
 
Thanks,
Richard
 
> juzhe.zhong@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-19 17:35
> To: juzhe.zhong\@rivai.ai
> CC: rguenther; gcc-patches; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Hi, Richard. I tried the codes:
>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>
>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>  
> Ah, right, sorry for the bogus suggestion.
>  
> In that case, what specifically goes wrong?  Which test in
> test_vector_subregs_modes fails, and what does the ICE look like?
>  
> Thanks,
> Richard
>  
>> But I tried:
>> if (!nunits.is_constant () && known_gt (nunits, 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>
>> Finally, I tried:
>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It passed with no warning.
>>
>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>> Thanks!
>>
>>
>> juzhe.zhong@rivai.ai
>>  
>> From: Richard Sandiford
>> Date: 2022-08-19 16:03
>> To: juzhe.zhong
>> CC: gcc-patches; rguenther; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> juzhe.zhong@rivai.ai writes:
>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>
>>> Hello. This patch is preparing for following RVV support.
>>>
>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>
>>> So I define the machine_mode as follows:
>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>
>>> The compilation is failed for the stepped vector test:
>>> (const_vector:VNx1DI repeat [
>>>         (const_int 8 [0x8])
>>>         (const_int 7 [0x7])
>>>     ])
>>>
>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>
>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>
>>>   
>>>
>>> gcc/ChangeLog:
>>>
>>>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>
>>> ---
>>>  gcc/simplify-rtx.cc | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>> index 7d09bf7103d..61e0dfa00d0 100644
>>> --- a/gcc/simplify-rtx.cc
>>> +++ b/gcc/simplify-rtx.cc
>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>>    rtx x = builder.build ();
>>>  
>>>    test_vector_subregs_modes (x);
>>> -  if (!nunits.is_constant ())
>>> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>>      test_vector_subregs_modes (x, nunits - min_nunits, count);
>>  
>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>> the fore_back tests require vectors that have a minimum of 2 elements.
>> Something like poly_uint16 (1, 2) would have the same problem as
>> poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
>> principle.)
>>  
>> This corresponds to the minimum of 3 elements for the stepped tests:
>>  
>>   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>       && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>     {
>>       test_vector_ops_series (mode, scalar_reg);
>>       test_vector_subregs (mode);
>>     }
>>  
>> Thanks,
>> Richard
>>  
>
  
juzhe.zhong@rivai.ai Aug. 19, 2022, 2:34 p.m. UTC | #8
I rewrite test_vector_subregs_fore_back as follows:

static void
test_vector_subregs_fore_back (machine_mode inner_mode)
{
  poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
  unsigned int min_nunits = constant_lower_bound (nunits);
  scalar_mode int_mode = GET_MODE_INNER (inner_mode);
  unsigned int count = gcd (min_nunits, 4);

  rtx_vector_builder builder (inner_mode, count, 2);
  unsigned int nelts_per_pattern = count == 1 ? 2 : count;
  for (unsigned int i = 0; i < nelts_per_pattern; ++i)
    builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (-(int) i, int_mode));
  rtx x = builder.build ();

  test_vector_subregs_modes (x);
  if (!nunits.is_constant ())
    test_vector_subregs_modes (x, nunits - min_nunits, count);
}

I add the code: unsigned int nelts_per_pattern = count == 1 ? 2 : count;
then replace the "count" into "nelts_per_pattern " in the first loop.

It can pass now. And "x" value I print out seems to be correct:
(const_vector:VNx1DI [
        (const_int 0 [0])
        repeat [
            (const_int 1 [0x1])
        ]
    ])

Is this correct solution ? 

Thanks.


juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-19 20:52
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Ah, right, sorry for the bogus suggestion.
>> In that case, what specifically goes wrong?  Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>
> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
> The fail ICE:
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>   expected: (nil)
>
>   actual: (const_int 0 [0])
> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
>         ../../../riscv-gcc/gcc/selftest-rtl.cc:57
> 0x1332504 test_vector_subregs_modes
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
> 0x1332988 test_vector_subregs_fore_back
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
> 0x1332ae7 test_vector_subregs
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
> 0x1332c57 test_vector_ops
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
> 0x1332c7b selftest::simplify_rtx_cc_tests()
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
> 0x21318fa selftest::run_tests()
>         ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
> 0x1362a76 toplev::run_self_tests()
>         ../../../riscv-gcc/gcc/toplev.cc:2205
>
> I analyzed the codes:
> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
> So the assertion fails.
 
Hmm, ok, so the subreg operation is unexpected succeeding.
 
> This is the test for stepped vector using 2 element per pattern.  For poly_uint16 (1,1), it's true it is possible only has 1 element. 
 
The stepped case is 3 elements per pattern rather than 2.  In a stepped
pattern: a, b, b+n are represented explicitly, then the rest are
implicitly b+n*2, b+n*3, etc.
 
The case being handled by this code is instead the 2-element case:
a, b are represented explicitly, then the rest are implicitly all b.
 
Why is (1,1) different though?  The test is constructing:
 
  nunits: 1 + 1x
  shape: nelts_per_pattern == 2, npatterns == 1
  elements: a, b[, b, b, b, b, ...]
 
It then tests subregs starting at 0 + 1x (so starting at the first b).
But for (2,2) we should have:
 
  nunits: 2 + 2x
  shape: nelts_per_pattern == 2, npatterns == 2
  elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
 
and we should test subregs starting at 0 + 2x (so starting at the
first b1).  The two cases should be very similar, it's just that the
(2,2) case doubles the number of patterns.
 
> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
> kind of machine mode only used for intrinsics?  I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
 
Following on from what I said above, it doesn't look like this particular
case is related to stepped vectors.
 
(1,1) shouldn't (need to) be a special case though.  Any potentital
problems that would occur for (1,1) with npatterns==1 would also occur
for (n,n) with npatterns==n.  E.g. if stepped vectors are problematic
for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
would be problematic for (2,2).
 
So yeah, preventing a mode being used for autovectorisation would allow
the target to have a bit more control over which constants are actually
generated.  But it shouldn't be necessary to do that for correctness.
 
Thanks,
Richard
 
> juzhe.zhong@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-19 17:35
> To: juzhe.zhong\@rivai.ai
> CC: rguenther; gcc-patches; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Hi, Richard. I tried the codes:
>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>
>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>  
> Ah, right, sorry for the bogus suggestion.
>  
> In that case, what specifically goes wrong?  Which test in
> test_vector_subregs_modes fails, and what does the ICE look like?
>  
> Thanks,
> Richard
>  
>> But I tried:
>> if (!nunits.is_constant () && known_gt (nunits, 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>
>> Finally, I tried:
>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It passed with no warning.
>>
>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>> Thanks!
>>
>>
>> juzhe.zhong@rivai.ai
>>  
>> From: Richard Sandiford
>> Date: 2022-08-19 16:03
>> To: juzhe.zhong
>> CC: gcc-patches; rguenther; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> juzhe.zhong@rivai.ai writes:
>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>
>>> Hello. This patch is preparing for following RVV support.
>>>
>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>
>>> So I define the machine_mode as follows:
>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>
>>> The compilation is failed for the stepped vector test:
>>> (const_vector:VNx1DI repeat [
>>>         (const_int 8 [0x8])
>>>         (const_int 7 [0x7])
>>>     ])
>>>
>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>
>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>
>>>   
>>>
>>> gcc/ChangeLog:
>>>
>>>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>
>>> ---
>>>  gcc/simplify-rtx.cc | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>> index 7d09bf7103d..61e0dfa00d0 100644
>>> --- a/gcc/simplify-rtx.cc
>>> +++ b/gcc/simplify-rtx.cc
>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>>    rtx x = builder.build ();
>>>  
>>>    test_vector_subregs_modes (x);
>>> -  if (!nunits.is_constant ())
>>> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>>      test_vector_subregs_modes (x, nunits - min_nunits, count);
>>  
>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>> the fore_back tests require vectors that have a minimum of 2 elements.
>> Something like poly_uint16 (1, 2) would have the same problem as
>> poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
>> principle.)
>>  
>> This corresponds to the minimum of 3 elements for the stepped tests:
>>  
>>   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>       && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>     {
>>       test_vector_ops_series (mode, scalar_reg);
>>       test_vector_subregs (mode);
>>     }
>>  
>> Thanks,
>> Richard
>>  
>
  
juzhe.zhong@rivai.ai Aug. 20, 2022, 1:23 a.m. UTC | #9
After deep analysis and several tries. I have made a analysis and conclusion as follows.
I really appreciate if you could spend some time take a look at the following analysis that I made:

According to the codes in test_vector_subregs_fore_back:

~~~
 poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
  unsigned int min_nunits = constant_lower_bound (nunits);
  scalar_mode int_mode = GET_MODE_INNER (inner_mode);
  unsigned int count = gcd (min_nunits, 4);

  rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (-(int) i, int_mode));
  rtx x = builder.build ();
~~~~

Analyze the codes and tried several patterns:

For poly_uint16 (2,2):
x = (const_vector:VNx1DI [
        (const_int 0 [0])
        (const_int 1 [0x1])
        repeat [
            (const_int 0 [0])
            (const_int -1 [0xffffffffffffffff])
        ]
    ])

For poly_uint16 (4,4):
x = (const_vector:VNx1DI [
        (const_int 0 [0])
        (const_int 1 [0x1])
        (const_int 2 [0x2])
        (const_int 3 [0x3])
        repeat [
            (const_int 0 [0])
            (const_int -1 [0xffffffffffffffff])
            (const_int -2 [0xfffffffffffffffe])
            (const_int -3 [0xfffffffffffffffd])
        ]
    ])

So, I think I can conclude the pattern rule as follows:

poly_uint16 (n,n):
x = (const_vector:VNx1DI [
        (const_int 0 [0])
        (const_int 1 [0x1])
        (const_int 2 [0x2])
        (const_int 3 [0x3])
        ..................
        (const_int n [hex(n)])
        repeat [
            (const_int 0 [0])
            (const_int -1 [0xffffffffffffffff])
            (const_int -2 [0xfffffffffffffffe])
            (const_int -3 [0xfffffffffffffffd])
            .........
             (const_int -n [hex(-n)])
        ]
    ])
Am I understanding right?

Let's first take a look at the codes you write:

rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (-(int) i, int_mode));
  rtx x = builder.build ();

So for poly_uint (1,1):
Ideally according to the codes, you want to generate the pattern like this:
x = (const_vector:VNx1DI [
        (const_int 0 [0])
        repeat [
            (const_int 0 [0])
        ]
    ])

In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0.
In this case, i = -i = 0.

So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();".
Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector:

x = (const_vector:VNx1DI repeat [
        (const_int 0 [0])
    ])

This is a const_vector duplicate vector.
This is not the correct pattern you want so it fails in the following check.

I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong.

To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value
and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1):

================
Original codes:
rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (-(int) i, int_mode));
================

================
1st solution:
rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (-(int) i, int_mode));

x = (const_vector:VNx1DI [
        (const_int 0 [0])
        repeat [
            (const_int 1 [0x1])
        ]
    ])
================

================
2nd solution:
rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
    builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode));

x= 
(const_vector:VNx1DI [
        (const_int 0 [0])
        repeat [
            (const_int -1 [0xffffffffffffffff])
        ]
    ])
================

Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much.


juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-19 20:52
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Ah, right, sorry for the bogus suggestion.
>> In that case, what specifically goes wrong?  Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>
> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
> The fail ICE:
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>   expected: (nil)
>
>   actual: (const_int 0 [0])
> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
>         ../../../riscv-gcc/gcc/selftest-rtl.cc:57
> 0x1332504 test_vector_subregs_modes
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
> 0x1332988 test_vector_subregs_fore_back
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
> 0x1332ae7 test_vector_subregs
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
> 0x1332c57 test_vector_ops
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
> 0x1332c7b selftest::simplify_rtx_cc_tests()
>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
> 0x21318fa selftest::run_tests()
>         ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
> 0x1362a76 toplev::run_self_tests()
>         ../../../riscv-gcc/gcc/toplev.cc:2205
>
> I analyzed the codes:
> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
> So the assertion fails.
 
Hmm, ok, so the subreg operation is unexpected succeeding.
 
> This is the test for stepped vector using 2 element per pattern.  For poly_uint16 (1,1), it's true it is possible only has 1 element. 
 
The stepped case is 3 elements per pattern rather than 2.  In a stepped
pattern: a, b, b+n are represented explicitly, then the rest are
implicitly b+n*2, b+n*3, etc.
 
The case being handled by this code is instead the 2-element case:
a, b are represented explicitly, then the rest are implicitly all b.
 
Why is (1,1) different though?  The test is constructing:
 
  nunits: 1 + 1x
  shape: nelts_per_pattern == 2, npatterns == 1
  elements: a, b[, b, b, b, b, ...]
 
It then tests subregs starting at 0 + 1x (so starting at the first b).
But for (2,2) we should have:
 
  nunits: 2 + 2x
  shape: nelts_per_pattern == 2, npatterns == 2
  elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
 
and we should test subregs starting at 0 + 2x (so starting at the
first b1).  The two cases should be very similar, it's just that the
(2,2) case doubles the number of patterns.
 
> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
> kind of machine mode only used for intrinsics?  I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
 
Following on from what I said above, it doesn't look like this particular
case is related to stepped vectors.
 
(1,1) shouldn't (need to) be a special case though.  Any potentital
problems that would occur for (1,1) with npatterns==1 would also occur
for (n,n) with npatterns==n.  E.g. if stepped vectors are problematic
for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
would be problematic for (2,2).
 
So yeah, preventing a mode being used for autovectorisation would allow
the target to have a bit more control over which constants are actually
generated.  But it shouldn't be necessary to do that for correctness.
 
Thanks,
Richard
 
> juzhe.zhong@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-19 17:35
> To: juzhe.zhong\@rivai.ai
> CC: rguenther; gcc-patches; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Hi, Richard. I tried the codes:
>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>
>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>  
> Ah, right, sorry for the bogus suggestion.
>  
> In that case, what specifically goes wrong?  Which test in
> test_vector_subregs_modes fails, and what does the ICE look like?
>  
> Thanks,
> Richard
>  
>> But I tried:
>> if (!nunits.is_constant () && known_gt (nunits, 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>
>> Finally, I tried:
>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It passed with no warning.
>>
>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>> Thanks!
>>
>>
>> juzhe.zhong@rivai.ai
>>  
>> From: Richard Sandiford
>> Date: 2022-08-19 16:03
>> To: juzhe.zhong
>> CC: gcc-patches; rguenther; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> juzhe.zhong@rivai.ai writes:
>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>
>>> Hello. This patch is preparing for following RVV support.
>>>
>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>
>>> So I define the machine_mode as follows:
>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>
>>> The compilation is failed for the stepped vector test:
>>> (const_vector:VNx1DI repeat [
>>>         (const_int 8 [0x8])
>>>         (const_int 7 [0x7])
>>>     ])
>>>
>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>
>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>
>>>   
>>>
>>> gcc/ChangeLog:
>>>
>>>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>
>>> ---
>>>  gcc/simplify-rtx.cc | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>> index 7d09bf7103d..61e0dfa00d0 100644
>>> --- a/gcc/simplify-rtx.cc
>>> +++ b/gcc/simplify-rtx.cc
>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>>    rtx x = builder.build ();
>>>  
>>>    test_vector_subregs_modes (x);
>>> -  if (!nunits.is_constant ())
>>> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>>      test_vector_subregs_modes (x, nunits - min_nunits, count);
>>  
>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>> the fore_back tests require vectors that have a minimum of 2 elements.
>> Something like poly_uint16 (1, 2) would have the same problem as
>> poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
>> principle.)
>>  
>> This corresponds to the minimum of 3 elements for the stepped tests:
>>  
>>   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>       && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>     {
>>       test_vector_ops_series (mode, scalar_reg);
>>       test_vector_subregs (mode);
>>     }
>>  
>> Thanks,
>> Richard
>>  
>
  
Richard Sandiford Aug. 22, 2022, 8:31 a.m. UTC | #10
钟居哲 <juzhe.zhong@rivai.ai> writes:
> After deep analysis and several tries. I have made a analysis and conclusion as follows.
> I really appreciate if you could spend some time take a look at the following analysis that I made:
>
> According to the codes in test_vector_subregs_fore_back:
>
> ~~~
>  poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
>   unsigned int min_nunits = constant_lower_bound (nunits);
>   scalar_mode int_mode = GET_MODE_INNER (inner_mode);
>   unsigned int count = gcd (min_nunits, 4);
>
>   rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>   rtx x = builder.build ();
> ~~~~
>
> Analyze the codes and tried several patterns:
>
> For poly_uint16 (2,2):
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         (const_int 1 [0x1])
>         repeat [
>             (const_int 0 [0])
>             (const_int -1 [0xffffffffffffffff])
>         ]
>     ])
>
> For poly_uint16 (4,4):
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         (const_int 1 [0x1])
>         (const_int 2 [0x2])
>         (const_int 3 [0x3])
>         repeat [
>             (const_int 0 [0])
>             (const_int -1 [0xffffffffffffffff])
>             (const_int -2 [0xfffffffffffffffe])
>             (const_int -3 [0xfffffffffffffffd])
>         ]
>     ])
>
> So, I think I can conclude the pattern rule as follows:
>
> poly_uint16 (n,n):
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         (const_int 1 [0x1])
>         (const_int 2 [0x2])
>         (const_int 3 [0x3])
>         ..................
>         (const_int n [hex(n)])
>         repeat [
>             (const_int 0 [0])
>             (const_int -1 [0xffffffffffffffff])
>             (const_int -2 [0xfffffffffffffffe])
>             (const_int -3 [0xfffffffffffffffd])
>             .........
>              (const_int -n [hex(-n)])
>         ]
>     ])
> Am I understanding right?
>
> Let's first take a look at the codes you write:
>
> rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>   rtx x = builder.build ();
>
> So for poly_uint (1,1):
> Ideally according to the codes, you want to generate the pattern like this:
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         repeat [
>             (const_int 0 [0])
>         ]
>     ])

Ah, right, the two subsequences are supposed to be different.

> In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0.
> In this case, i = -i = 0.
>
> So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();".
> Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector:
>
> x = (const_vector:VNx1DI repeat [
>         (const_int 0 [0])
>     ])
>
> This is a const_vector duplicate vector.
> This is not the correct pattern you want so it fails in the following check.
>
> I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong.
>
> To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value
> and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1):
>
> ================
> Original codes:
> rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
> ================
>
> ================
> 1st solution:
> rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         repeat [
>             (const_int 1 [0x1])
>         ]
>     ])
> ================
>
> ================
> 2nd solution:
> rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode));

I don't think we need to single out count == 1.  It should be OK to use
-1 - (int) i unconditionally.

Thanks,
Richard

> x= 
> (const_vector:VNx1DI [
>         (const_int 0 [0])
>         repeat [
>             (const_int -1 [0xffffffffffffffff])
>         ]
>     ])
> ================
>
> Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much.
>
>
> juzhe.zhong@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-19 20:52
> To: juzhe.zhong\@rivai.ai
> CC: gcc-patches; rguenther; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>> Ah, right, sorry for the bogus suggestion.
>>> In that case, what specifically goes wrong?  Which test in
>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>
>> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
>> The fail ICE:
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>>   expected: (nil)
>>
>>   actual: (const_int 0 [0])
>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
>>         ../../../riscv-gcc/gcc/selftest-rtl.cc:57
>> 0x1332504 test_vector_subregs_modes
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
>> 0x1332988 test_vector_subregs_fore_back
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
>> 0x1332ae7 test_vector_subregs
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
>> 0x1332c57 test_vector_ops
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
>> 0x1332c7b selftest::simplify_rtx_cc_tests()
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
>> 0x21318fa selftest::run_tests()
>>         ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
>> 0x1362a76 toplev::run_self_tests()
>>         ../../../riscv-gcc/gcc/toplev.cc:2205
>>
>> I analyzed the codes:
>> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
>> So the assertion fails.
>  
> Hmm, ok, so the subreg operation is unexpected succeeding.
>  
>> This is the test for stepped vector using 2 element per pattern.  For poly_uint16 (1,1), it's true it is possible only has 1 element. 
>  
> The stepped case is 3 elements per pattern rather than 2.  In a stepped
> pattern: a, b, b+n are represented explicitly, then the rest are
> implicitly b+n*2, b+n*3, etc.
>  
> The case being handled by this code is instead the 2-element case:
> a, b are represented explicitly, then the rest are implicitly all b.
>  
> Why is (1,1) different though?  The test is constructing:
>  
>   nunits: 1 + 1x
>   shape: nelts_per_pattern == 2, npatterns == 1
>   elements: a, b[, b, b, b, b, ...]
>  
> It then tests subregs starting at 0 + 1x (so starting at the first b).
> But for (2,2) we should have:
>  
>   nunits: 2 + 2x
>   shape: nelts_per_pattern == 2, npatterns == 2
>   elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
>  
> and we should test subregs starting at 0 + 2x (so starting at the
> first b1).  The two cases should be very similar, it's just that the
> (2,2) case doubles the number of patterns.
>  
>> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
>> kind of machine mode only used for intrinsics?  I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
>> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
>> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
>> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
>  
> Following on from what I said above, it doesn't look like this particular
> case is related to stepped vectors.
>  
> (1,1) shouldn't (need to) be a special case though.  Any potentital
> problems that would occur for (1,1) with npatterns==1 would also occur
> for (n,n) with npatterns==n.  E.g. if stepped vectors are problematic
> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
> would be problematic for (2,2).
>  
> So yeah, preventing a mode being used for autovectorisation would allow
> the target to have a bit more control over which constants are actually
> generated.  But it shouldn't be necessary to do that for correctness.
>  
> Thanks,
> Richard
>  
>> juzhe.zhong@rivai.ai
>>  
>> From: Richard Sandiford
>> Date: 2022-08-19 17:35
>> To: juzhe.zhong\@rivai.ai
>> CC: rguenther; gcc-patches; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>> Hi, Richard. I tried the codes:
>>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>
>>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>>  
>> Ah, right, sorry for the bogus suggestion.
>>  
>> In that case, what specifically goes wrong?  Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>>  
>> Thanks,
>> Richard
>>  
>>> But I tried:
>>> if (!nunits.is_constant () && known_gt (nunits, 1)) 
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>>
>>> Finally, I tried:
>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>> It passed with no warning.
>>>
>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>>> Thanks!
>>>
>>>
>>> juzhe.zhong@rivai.ai
>>>  
>>> From: Richard Sandiford
>>> Date: 2022-08-19 16:03
>>> To: juzhe.zhong
>>> CC: gcc-patches; rguenther; kito.cheng
>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>> juzhe.zhong@rivai.ai writes:
>>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>>
>>>> Hello. This patch is preparing for following RVV support.
>>>>
>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>>
>>>> So I define the machine_mode as follows:
>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>>
>>>> The compilation is failed for the stepped vector test:
>>>> (const_vector:VNx1DI repeat [
>>>>         (const_int 8 [0x8])
>>>>         (const_int 7 [0x7])
>>>>     ])
>>>>
>>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>>
>>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>>
>>>>   
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>>
>>>> ---
>>>>  gcc/simplify-rtx.cc | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>>> index 7d09bf7103d..61e0dfa00d0 100644
>>>> --- a/gcc/simplify-rtx.cc
>>>> +++ b/gcc/simplify-rtx.cc
>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>>>    rtx x = builder.build ();
>>>>  
>>>>    test_vector_subregs_modes (x);
>>>> -  if (!nunits.is_constant ())
>>>> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>>>      test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>  
>>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>>> the fore_back tests require vectors that have a minimum of 2 elements.
>>> Something like poly_uint16 (1, 2) would have the same problem as
>>> poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
>>> principle.)
>>>  
>>> This corresponds to the minimum of 3 elements for the stepped tests:
>>>  
>>>   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>>       && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>>     {
>>>       test_vector_ops_series (mode, scalar_reg);
>>>       test_vector_subregs (mode);
>>>     }
>>>  
>>> Thanks,
>>> Richard
>>>  
>>  
>
  
juzhe.zhong@rivai.ai Aug. 22, 2022, 8:45 a.m. UTC | #11
Thanks for your reply. Your suggestion "-1 - (int) i" is better.

Can I send another patch that changes "builder.quick_push (gen_int_mode (-(int) i, int_mode));"
into “builder.quick_push (gen_int_mode (-1 - (int) i, int_mode));" then you merge it for me?

Or you change it yourself?
 
Thank you so much.




juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-22 16:31
To: 钟居哲
CC: rguenther; gcc-patches; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
钟居哲 <juzhe.zhong@rivai.ai> writes:
> After deep analysis and several tries. I have made a analysis and conclusion as follows.
> I really appreciate if you could spend some time take a look at the following analysis that I made:
>
> According to the codes in test_vector_subregs_fore_back:
>
> ~~~
>  poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
>   unsigned int min_nunits = constant_lower_bound (nunits);
>   scalar_mode int_mode = GET_MODE_INNER (inner_mode);
>   unsigned int count = gcd (min_nunits, 4);
>
>   rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>   rtx x = builder.build ();
> ~~~~
>
> Analyze the codes and tried several patterns:
>
> For poly_uint16 (2,2):
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         (const_int 1 [0x1])
>         repeat [
>             (const_int 0 [0])
>             (const_int -1 [0xffffffffffffffff])
>         ]
>     ])
>
> For poly_uint16 (4,4):
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         (const_int 1 [0x1])
>         (const_int 2 [0x2])
>         (const_int 3 [0x3])
>         repeat [
>             (const_int 0 [0])
>             (const_int -1 [0xffffffffffffffff])
>             (const_int -2 [0xfffffffffffffffe])
>             (const_int -3 [0xfffffffffffffffd])
>         ]
>     ])
>
> So, I think I can conclude the pattern rule as follows:
>
> poly_uint16 (n,n):
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         (const_int 1 [0x1])
>         (const_int 2 [0x2])
>         (const_int 3 [0x3])
>         ..................
>         (const_int n [hex(n)])
>         repeat [
>             (const_int 0 [0])
>             (const_int -1 [0xffffffffffffffff])
>             (const_int -2 [0xfffffffffffffffe])
>             (const_int -3 [0xfffffffffffffffd])
>             .........
>              (const_int -n [hex(-n)])
>         ]
>     ])
> Am I understanding right?
>
> Let's first take a look at the codes you write:
>
> rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>   rtx x = builder.build ();
>
> So for poly_uint (1,1):
> Ideally according to the codes, you want to generate the pattern like this:
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         repeat [
>             (const_int 0 [0])
>         ]
>     ])
 
Ah, right, the two subsequences are supposed to be different.
 
> In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0.
> In this case, i = -i = 0.
>
> So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();".
> Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector:
>
> x = (const_vector:VNx1DI repeat [
>         (const_int 0 [0])
>     ])
>
> This is a const_vector duplicate vector.
> This is not the correct pattern you want so it fails in the following check.
>
> I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong.
>
> To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value
> and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1):
>
> ================
> Original codes:
> rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
> ================
>
> ================
> 1st solution:
> rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         repeat [
>             (const_int 1 [0x1])
>         ]
>     ])
> ================
>
> ================
> 2nd solution:
> rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode));
 
I don't think we need to single out count == 1.  It should be OK to use
-1 - (int) i unconditionally.
 
Thanks,
Richard
 
> x= 
> (const_vector:VNx1DI [
>         (const_int 0 [0])
>         repeat [
>             (const_int -1 [0xffffffffffffffff])
>         ]
>     ])
> ================
>
> Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much.
>
>
> juzhe.zhong@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-19 20:52
> To: juzhe.zhong\@rivai.ai
> CC: gcc-patches; rguenther; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>> Ah, right, sorry for the bogus suggestion.
>>> In that case, what specifically goes wrong?  Which test in
>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>
>> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
>> The fail ICE:
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>>   expected: (nil)
>>
>>   actual: (const_int 0 [0])
>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
>>         ../../../riscv-gcc/gcc/selftest-rtl.cc:57
>> 0x1332504 test_vector_subregs_modes
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
>> 0x1332988 test_vector_subregs_fore_back
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
>> 0x1332ae7 test_vector_subregs
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
>> 0x1332c57 test_vector_ops
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
>> 0x1332c7b selftest::simplify_rtx_cc_tests()
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
>> 0x21318fa selftest::run_tests()
>>         ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
>> 0x1362a76 toplev::run_self_tests()
>>         ../../../riscv-gcc/gcc/toplev.cc:2205
>>
>> I analyzed the codes:
>> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
>> So the assertion fails.
>  
> Hmm, ok, so the subreg operation is unexpected succeeding.
>  
>> This is the test for stepped vector using 2 element per pattern.  For poly_uint16 (1,1), it's true it is possible only has 1 element. 
>  
> The stepped case is 3 elements per pattern rather than 2.  In a stepped
> pattern: a, b, b+n are represented explicitly, then the rest are
> implicitly b+n*2, b+n*3, etc.
>  
> The case being handled by this code is instead the 2-element case:
> a, b are represented explicitly, then the rest are implicitly all b.
>  
> Why is (1,1) different though?  The test is constructing:
>  
>   nunits: 1 + 1x
>   shape: nelts_per_pattern == 2, npatterns == 1
>   elements: a, b[, b, b, b, b, ...]
>  
> It then tests subregs starting at 0 + 1x (so starting at the first b).
> But for (2,2) we should have:
>  
>   nunits: 2 + 2x
>   shape: nelts_per_pattern == 2, npatterns == 2
>   elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
>  
> and we should test subregs starting at 0 + 2x (so starting at the
> first b1).  The two cases should be very similar, it's just that the
> (2,2) case doubles the number of patterns.
>  
>> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
>> kind of machine mode only used for intrinsics?  I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
>> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
>> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
>> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
>  
> Following on from what I said above, it doesn't look like this particular
> case is related to stepped vectors.
>  
> (1,1) shouldn't (need to) be a special case though.  Any potentital
> problems that would occur for (1,1) with npatterns==1 would also occur
> for (n,n) with npatterns==n.  E.g. if stepped vectors are problematic
> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
> would be problematic for (2,2).
>  
> So yeah, preventing a mode being used for autovectorisation would allow
> the target to have a bit more control over which constants are actually
> generated.  But it shouldn't be necessary to do that for correctness.
>  
> Thanks,
> Richard
>  
>> juzhe.zhong@rivai.ai
>>  
>> From: Richard Sandiford
>> Date: 2022-08-19 17:35
>> To: juzhe.zhong\@rivai.ai
>> CC: rguenther; gcc-patches; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>> Hi, Richard. I tried the codes:
>>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>
>>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>>  
>> Ah, right, sorry for the bogus suggestion.
>>  
>> In that case, what specifically goes wrong?  Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>>  
>> Thanks,
>> Richard
>>  
>>> But I tried:
>>> if (!nunits.is_constant () && known_gt (nunits, 1)) 
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>>
>>> Finally, I tried:
>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>> It passed with no warning.
>>>
>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>>> Thanks!
>>>
>>>
>>> juzhe.zhong@rivai.ai
>>>  
>>> From: Richard Sandiford
>>> Date: 2022-08-19 16:03
>>> To: juzhe.zhong
>>> CC: gcc-patches; rguenther; kito.cheng
>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>> juzhe.zhong@rivai.ai writes:
>>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>>
>>>> Hello. This patch is preparing for following RVV support.
>>>>
>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>>
>>>> So I define the machine_mode as follows:
>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>>
>>>> The compilation is failed for the stepped vector test:
>>>> (const_vector:VNx1DI repeat [
>>>>         (const_int 8 [0x8])
>>>>         (const_int 7 [0x7])
>>>>     ])
>>>>
>>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>>
>>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>>
>>>>   
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>>
>>>> ---
>>>>  gcc/simplify-rtx.cc | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>>> index 7d09bf7103d..61e0dfa00d0 100644
>>>> --- a/gcc/simplify-rtx.cc
>>>> +++ b/gcc/simplify-rtx.cc
>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>>>    rtx x = builder.build ();
>>>>  
>>>>    test_vector_subregs_modes (x);
>>>> -  if (!nunits.is_constant ())
>>>> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>>>      test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>  
>>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>>> the fore_back tests require vectors that have a minimum of 2 elements.
>>> Something like poly_uint16 (1, 2) would have the same problem as
>>> poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
>>> principle.)
>>>  
>>> This corresponds to the minimum of 3 elements for the stepped tests:
>>>  
>>>   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>>       && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>>     {
>>>       test_vector_ops_series (mode, scalar_reg);
>>>       test_vector_subregs (mode);
>>>     }
>>>  
>>> Thanks,
>>> Richard
>>>  
>>  
>
  
Richard Sandiford Aug. 22, 2022, 8:56 a.m. UTC | #12
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Thanks for your reply. Your suggestion "-1 - (int) i" is better.
>
> Can I send another patch that changes "builder.quick_push (gen_int_mode (-(int) i, int_mode));"
> into “builder.quick_push (gen_int_mode (-1 - (int) i, int_mode));" then you merge it for me?

Yeah, please send a patch, and I'll merge it like you say.

Thanks,
Richard

>
> Or you change it yourself?
>  
> Thank you so much.
>
>
>
>
> juzhe.zhong@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-22 16:31
> To: 钟居哲
> CC: rguenther; gcc-patches; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> 钟居哲 <juzhe.zhong@rivai.ai> writes:
>> After deep analysis and several tries. I have made a analysis and conclusion as follows.
>> I really appreciate if you could spend some time take a look at the following analysis that I made:
>>
>> According to the codes in test_vector_subregs_fore_back:
>>
>> ~~~
>>  poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
>>   unsigned int min_nunits = constant_lower_bound (nunits);
>>   scalar_mode int_mode = GET_MODE_INNER (inner_mode);
>>   unsigned int count = gcd (min_nunits, 4);
>>
>>   rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (i, int_mode));
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>>   rtx x = builder.build ();
>> ~~~~
>>
>> Analyze the codes and tried several patterns:
>>
>> For poly_uint16 (2,2):
>> x = (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         (const_int 1 [0x1])
>>         repeat [
>>             (const_int 0 [0])
>>             (const_int -1 [0xffffffffffffffff])
>>         ]
>>     ])
>>
>> For poly_uint16 (4,4):
>> x = (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         (const_int 1 [0x1])
>>         (const_int 2 [0x2])
>>         (const_int 3 [0x3])
>>         repeat [
>>             (const_int 0 [0])
>>             (const_int -1 [0xffffffffffffffff])
>>             (const_int -2 [0xfffffffffffffffe])
>>             (const_int -3 [0xfffffffffffffffd])
>>         ]
>>     ])
>>
>> So, I think I can conclude the pattern rule as follows:
>>
>> poly_uint16 (n,n):
>> x = (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         (const_int 1 [0x1])
>>         (const_int 2 [0x2])
>>         (const_int 3 [0x3])
>>         ..................
>>         (const_int n [hex(n)])
>>         repeat [
>>             (const_int 0 [0])
>>             (const_int -1 [0xffffffffffffffff])
>>             (const_int -2 [0xfffffffffffffffe])
>>             (const_int -3 [0xfffffffffffffffd])
>>             .........
>>              (const_int -n [hex(-n)])
>>         ]
>>     ])
>> Am I understanding right?
>>
>> Let's first take a look at the codes you write:
>>
>> rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (i, int_mode));
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>>   rtx x = builder.build ();
>>
>> So for poly_uint (1,1):
>> Ideally according to the codes, you want to generate the pattern like this:
>> x = (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         repeat [
>>             (const_int 0 [0])
>>         ]
>>     ])
>  
> Ah, right, the two subsequences are supposed to be different.
>  
>> In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0.
>> In this case, i = -i = 0.
>>
>> So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();".
>> Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector:
>>
>> x = (const_vector:VNx1DI repeat [
>>         (const_int 0 [0])
>>     ])
>>
>> This is a const_vector duplicate vector.
>> This is not the correct pattern you want so it fails in the following check.
>>
>> I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong.
>>
>> To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value
>> and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1):
>>
>> ================
>> Original codes:
>> rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (i, int_mode));
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>> ================
>>
>> ================
>> 1st solution:
>> rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode));
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>>
>> x = (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         repeat [
>>             (const_int 1 [0x1])
>>         ]
>>     ])
>> ================
>>
>> ================
>> 2nd solution:
>> rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (i, int_mode));
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode));
>  
> I don't think we need to single out count == 1.  It should be OK to use
> -1 - (int) i unconditionally.
>  
> Thanks,
> Richard
>  
>> x= 
>> (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         repeat [
>>             (const_int -1 [0xffffffffffffffff])
>>         ]
>>     ])
>> ================
>>
>> Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much.
>>
>>
>> juzhe.zhong@rivai.ai
>>  
>> From: Richard Sandiford
>> Date: 2022-08-19 20:52
>> To: juzhe.zhong\@rivai.ai
>> CC: gcc-patches; rguenther; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>>> Ah, right, sorry for the bogus suggestion.
>>>> In that case, what specifically goes wrong?  Which test in
>>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>>
>>> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
>>> The fail ICE:
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>>>   expected: (nil)
>>>
>>>   actual: (const_int 0 [0])
>>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
>>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
>>>         ../../../riscv-gcc/gcc/selftest-rtl.cc:57
>>> 0x1332504 test_vector_subregs_modes
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
>>> 0x1332988 test_vector_subregs_fore_back
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
>>> 0x1332ae7 test_vector_subregs
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
>>> 0x1332c57 test_vector_ops
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
>>> 0x1332c7b selftest::simplify_rtx_cc_tests()
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
>>> 0x21318fa selftest::run_tests()
>>>         ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
>>> 0x1362a76 toplev::run_self_tests()
>>>         ../../../riscv-gcc/gcc/toplev.cc:2205
>>>
>>> I analyzed the codes:
>>> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
>>> So the assertion fails.
>>  
>> Hmm, ok, so the subreg operation is unexpected succeeding.
>>  
>>> This is the test for stepped vector using 2 element per pattern.  For poly_uint16 (1,1), it's true it is possible only has 1 element. 
>>  
>> The stepped case is 3 elements per pattern rather than 2.  In a stepped
>> pattern: a, b, b+n are represented explicitly, then the rest are
>> implicitly b+n*2, b+n*3, etc.
>>  
>> The case being handled by this code is instead the 2-element case:
>> a, b are represented explicitly, then the rest are implicitly all b.
>>  
>> Why is (1,1) different though?  The test is constructing:
>>  
>>   nunits: 1 + 1x
>>   shape: nelts_per_pattern == 2, npatterns == 1
>>   elements: a, b[, b, b, b, b, ...]
>>  
>> It then tests subregs starting at 0 + 1x (so starting at the first b).
>> But for (2,2) we should have:
>>  
>>   nunits: 2 + 2x
>>   shape: nelts_per_pattern == 2, npatterns == 2
>>   elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
>>  
>> and we should test subregs starting at 0 + 2x (so starting at the
>> first b1).  The two cases should be very similar, it's just that the
>> (2,2) case doubles the number of patterns.
>>  
>>> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
>>> kind of machine mode only used for intrinsics?  I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
>>> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
>>> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
>>> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
>>  
>> Following on from what I said above, it doesn't look like this particular
>> case is related to stepped vectors.
>>  
>> (1,1) shouldn't (need to) be a special case though.  Any potentital
>> problems that would occur for (1,1) with npatterns==1 would also occur
>> for (n,n) with npatterns==n.  E.g. if stepped vectors are problematic
>> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
>> would be problematic for (2,2).
>>  
>> So yeah, preventing a mode being used for autovectorisation would allow
>> the target to have a bit more control over which constants are actually
>> generated.  But it shouldn't be necessary to do that for correctness.
>>  
>> Thanks,
>> Richard
>>  
>>> juzhe.zhong@rivai.ai
>>>  
>>> From: Richard Sandiford
>>> Date: 2022-08-19 17:35
>>> To: juzhe.zhong\@rivai.ai
>>> CC: rguenther; gcc-patches; kito.cheng
>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>>> Hi, Richard. I tried the codes:
>>>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>>
>>>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>>>  
>>> Ah, right, sorry for the bogus suggestion.
>>>  
>>> In that case, what specifically goes wrong?  Which test in
>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>>  
>>> Thanks,
>>> Richard
>>>  
>>>> But I tried:
>>>> if (!nunits.is_constant () && known_gt (nunits, 1)) 
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>>>
>>>> Finally, I tried:
>>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>> It passed with no warning.
>>>>
>>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>>>> Thanks!
>>>>
>>>>
>>>> juzhe.zhong@rivai.ai
>>>>  
>>>> From: Richard Sandiford
>>>> Date: 2022-08-19 16:03
>>>> To: juzhe.zhong
>>>> CC: gcc-patches; rguenther; kito.cheng
>>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>>> juzhe.zhong@rivai.ai writes:
>>>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>>>
>>>>> Hello. This patch is preparing for following RVV support.
>>>>>
>>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>>>
>>>>> So I define the machine_mode as follows:
>>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>>>
>>>>> The compilation is failed for the stepped vector test:
>>>>> (const_vector:VNx1DI repeat [
>>>>>         (const_int 8 [0x8])
>>>>>         (const_int 7 [0x7])
>>>>>     ])
>>>>>
>>>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>>>
>>>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>>>
>>>>>   
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>>>
>>>>> ---
>>>>>  gcc/simplify-rtx.cc | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>>>> index 7d09bf7103d..61e0dfa00d0 100644
>>>>> --- a/gcc/simplify-rtx.cc
>>>>> +++ b/gcc/simplify-rtx.cc
>>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>>>>    rtx x = builder.build ();
>>>>>  
>>>>>    test_vector_subregs_modes (x);
>>>>> -  if (!nunits.is_constant ())
>>>>> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>>>>      test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>>  
>>>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>>>> the fore_back tests require vectors that have a minimum of 2 elements.
>>>> Something like poly_uint16 (1, 2) would have the same problem as
>>>> poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
>>>> principle.)
>>>>  
>>>> This corresponds to the minimum of 3 elements for the stepped tests:
>>>>  
>>>>   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>>>       && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>>>     {
>>>>       test_vector_ops_series (mode, scalar_reg);
>>>>       test_vector_subregs (mode);
>>>>     }
>>>>  
>>>> Thanks,
>>>> Richard
>>>>  
>>>  
>>  
>
  
juzhe.zhong@rivai.ai Aug. 22, 2022, 9:12 a.m. UTC | #13
I sent another patch called "Fix issue of poly_uint16 (1,1) in self test" to fix it.
I didn't send V2 following this patch because this patch name is misleading.

Thank you so much. Richard.



juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-22 16:56
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Thanks for your reply. Your suggestion "-1 - (int) i" is better.
>
> Can I send another patch that changes "builder.quick_push (gen_int_mode (-(int) i, int_mode));"
> into “builder.quick_push (gen_int_mode (-1 - (int) i, int_mode));" then you merge it for me?
 
Yeah, please send a patch, and I'll merge it like you say.
 
Thanks,
Richard
 
>
> Or you change it yourself?
>  
> Thank you so much.
>
>
>
>
> juzhe.zhong@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-22 16:31
> To: 钟居哲
> CC: rguenther; gcc-patches; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> 钟居哲 <juzhe.zhong@rivai.ai> writes:
>> After deep analysis and several tries. I have made a analysis and conclusion as follows.
>> I really appreciate if you could spend some time take a look at the following analysis that I made:
>>
>> According to the codes in test_vector_subregs_fore_back:
>>
>> ~~~
>>  poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
>>   unsigned int min_nunits = constant_lower_bound (nunits);
>>   scalar_mode int_mode = GET_MODE_INNER (inner_mode);
>>   unsigned int count = gcd (min_nunits, 4);
>>
>>   rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (i, int_mode));
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>>   rtx x = builder.build ();
>> ~~~~
>>
>> Analyze the codes and tried several patterns:
>>
>> For poly_uint16 (2,2):
>> x = (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         (const_int 1 [0x1])
>>         repeat [
>>             (const_int 0 [0])
>>             (const_int -1 [0xffffffffffffffff])
>>         ]
>>     ])
>>
>> For poly_uint16 (4,4):
>> x = (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         (const_int 1 [0x1])
>>         (const_int 2 [0x2])
>>         (const_int 3 [0x3])
>>         repeat [
>>             (const_int 0 [0])
>>             (const_int -1 [0xffffffffffffffff])
>>             (const_int -2 [0xfffffffffffffffe])
>>             (const_int -3 [0xfffffffffffffffd])
>>         ]
>>     ])
>>
>> So, I think I can conclude the pattern rule as follows:
>>
>> poly_uint16 (n,n):
>> x = (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         (const_int 1 [0x1])
>>         (const_int 2 [0x2])
>>         (const_int 3 [0x3])
>>         ..................
>>         (const_int n [hex(n)])
>>         repeat [
>>             (const_int 0 [0])
>>             (const_int -1 [0xffffffffffffffff])
>>             (const_int -2 [0xfffffffffffffffe])
>>             (const_int -3 [0xfffffffffffffffd])
>>             .........
>>              (const_int -n [hex(-n)])
>>         ]
>>     ])
>> Am I understanding right?
>>
>> Let's first take a look at the codes you write:
>>
>> rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (i, int_mode));
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>>   rtx x = builder.build ();
>>
>> So for poly_uint (1,1):
>> Ideally according to the codes, you want to generate the pattern like this:
>> x = (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         repeat [
>>             (const_int 0 [0])
>>         ]
>>     ])
>  
> Ah, right, the two subsequences are supposed to be different.
>  
>> In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0.
>> In this case, i = -i = 0.
>>
>> So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();".
>> Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector:
>>
>> x = (const_vector:VNx1DI repeat [
>>         (const_int 0 [0])
>>     ])
>>
>> This is a const_vector duplicate vector.
>> This is not the correct pattern you want so it fails in the following check.
>>
>> I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong.
>>
>> To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value
>> and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1):
>>
>> ================
>> Original codes:
>> rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (i, int_mode));
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>> ================
>>
>> ================
>> 1st solution:
>> rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode));
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>>
>> x = (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         repeat [
>>             (const_int 1 [0x1])
>>         ]
>>     ])
>> ================
>>
>> ================
>> 2nd solution:
>> rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (i, int_mode));
>>   for (unsigned int i = 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode));
>  
> I don't think we need to single out count == 1.  It should be OK to use
> -1 - (int) i unconditionally.
>  
> Thanks,
> Richard
>  
>> x= 
>> (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         repeat [
>>             (const_int -1 [0xffffffffffffffff])
>>         ]
>>     ])
>> ================
>>
>> Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much.
>>
>>
>> juzhe.zhong@rivai.ai
>>  
>> From: Richard Sandiford
>> Date: 2022-08-19 20:52
>> To: juzhe.zhong\@rivai.ai
>> CC: gcc-patches; rguenther; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>>> Ah, right, sorry for the bogus suggestion.
>>>> In that case, what specifically goes wrong?  Which test in
>>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>>
>>> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
>>> The fail ICE:
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>>>   expected: (nil)
>>>
>>>   actual: (const_int 0 [0])
>>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
>>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
>>>         ../../../riscv-gcc/gcc/selftest-rtl.cc:57
>>> 0x1332504 test_vector_subregs_modes
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
>>> 0x1332988 test_vector_subregs_fore_back
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
>>> 0x1332ae7 test_vector_subregs
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
>>> 0x1332c57 test_vector_ops
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
>>> 0x1332c7b selftest::simplify_rtx_cc_tests()
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
>>> 0x21318fa selftest::run_tests()
>>>         ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
>>> 0x1362a76 toplev::run_self_tests()
>>>         ../../../riscv-gcc/gcc/toplev.cc:2205
>>>
>>> I analyzed the codes:
>>> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
>>> So the assertion fails.
>>  
>> Hmm, ok, so the subreg operation is unexpected succeeding.
>>  
>>> This is the test for stepped vector using 2 element per pattern.  For poly_uint16 (1,1), it's true it is possible only has 1 element. 
>>  
>> The stepped case is 3 elements per pattern rather than 2.  In a stepped
>> pattern: a, b, b+n are represented explicitly, then the rest are
>> implicitly b+n*2, b+n*3, etc.
>>  
>> The case being handled by this code is instead the 2-element case:
>> a, b are represented explicitly, then the rest are implicitly all b.
>>  
>> Why is (1,1) different though?  The test is constructing:
>>  
>>   nunits: 1 + 1x
>>   shape: nelts_per_pattern == 2, npatterns == 1
>>   elements: a, b[, b, b, b, b, ...]
>>  
>> It then tests subregs starting at 0 + 1x (so starting at the first b).
>> But for (2,2) we should have:
>>  
>>   nunits: 2 + 2x
>>   shape: nelts_per_pattern == 2, npatterns == 2
>>   elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
>>  
>> and we should test subregs starting at 0 + 2x (so starting at the
>> first b1).  The two cases should be very similar, it's just that the
>> (2,2) case doubles the number of patterns.
>>  
>>> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
>>> kind of machine mode only used for intrinsics?  I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
>>> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
>>> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
>>> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
>>  
>> Following on from what I said above, it doesn't look like this particular
>> case is related to stepped vectors.
>>  
>> (1,1) shouldn't (need to) be a special case though.  Any potentital
>> problems that would occur for (1,1) with npatterns==1 would also occur
>> for (n,n) with npatterns==n.  E.g. if stepped vectors are problematic
>> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
>> would be problematic for (2,2).
>>  
>> So yeah, preventing a mode being used for autovectorisation would allow
>> the target to have a bit more control over which constants are actually
>> generated.  But it shouldn't be necessary to do that for correctness.
>>  
>> Thanks,
>> Richard
>>  
>>> juzhe.zhong@rivai.ai
>>>  
>>> From: Richard Sandiford
>>> Date: 2022-08-19 17:35
>>> To: juzhe.zhong\@rivai.ai
>>> CC: rguenther; gcc-patches; kito.cheng
>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>>> Hi, Richard. I tried the codes:
>>>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>>
>>>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>>>  
>>> Ah, right, sorry for the bogus suggestion.
>>>  
>>> In that case, what specifically goes wrong?  Which test in
>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>>  
>>> Thanks,
>>> Richard
>>>  
>>>> But I tried:
>>>> if (!nunits.is_constant () && known_gt (nunits, 1)) 
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>>>
>>>> Finally, I tried:
>>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>> It passed with no warning.
>>>>
>>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>>>> Thanks!
>>>>
>>>>
>>>> juzhe.zhong@rivai.ai
>>>>  
>>>> From: Richard Sandiford
>>>> Date: 2022-08-19 16:03
>>>> To: juzhe.zhong
>>>> CC: gcc-patches; rguenther; kito.cheng
>>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>>> juzhe.zhong@rivai.ai writes:
>>>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>>>
>>>>> Hello. This patch is preparing for following RVV support.
>>>>>
>>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>>>
>>>>> So I define the machine_mode as follows:
>>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>>>
>>>>> The compilation is failed for the stepped vector test:
>>>>> (const_vector:VNx1DI repeat [
>>>>>         (const_int 8 [0x8])
>>>>>         (const_int 7 [0x7])
>>>>>     ])
>>>>>
>>>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>>>
>>>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>>>
>>>>>   
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>>>
>>>>> ---
>>>>>  gcc/simplify-rtx.cc | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>>>> index 7d09bf7103d..61e0dfa00d0 100644
>>>>> --- a/gcc/simplify-rtx.cc
>>>>> +++ b/gcc/simplify-rtx.cc
>>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>>>>    rtx x = builder.build ();
>>>>>  
>>>>>    test_vector_subregs_modes (x);
>>>>> -  if (!nunits.is_constant ())
>>>>> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>>>>      test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>>  
>>>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>>>> the fore_back tests require vectors that have a minimum of 2 elements.
>>>> Something like poly_uint16 (1, 2) would have the same problem as
>>>> poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
>>>> principle.)
>>>>  
>>>> This corresponds to the minimum of 3 elements for the stepped tests:
>>>>  
>>>>   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>>>       && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>>>     {
>>>>       test_vector_ops_series (mode, scalar_reg);
>>>>       test_vector_subregs (mode);
>>>>     }
>>>>  
>>>> Thanks,
>>>> Richard
>>>>  
>>>  
>>  
>
  

Patch

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 7d09bf7103d..61e0dfa00d0 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -8438,7 +8438,7 @@  test_vector_subregs_fore_back (machine_mode inner_mode)
   rtx x = builder.build ();
 
   test_vector_subregs_modes (x);
-  if (!nunits.is_constant ())
+  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
     test_vector_subregs_modes (x, nunits - min_nunits, count);
 }