middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
Message ID | 20220818104608.259204-1-juzhe.zhong@rivai.ai |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DCF28385843A for <patchwork@sourceware.org>; Thu, 18 Aug 2022 10:46:41 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgeu2.qq.com (smtpbgeu2.qq.com [18.194.254.142]) by sourceware.org (Postfix) with ESMTPS id A4B373858D28 for <gcc-patches@gcc.gnu.org>; Thu, 18 Aug 2022 10:46:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A4B373858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp82t1660819573t0n7blfl Received: from server1.localdomain ( [42.247.22.65]) by bizesmtp.qq.com (ESMTP) with id ; Thu, 18 Aug 2022 18:46:11 +0800 (CST) X-QQ-SSF: 01400000002000C0I000B00A0000000 X-QQ-FEAT: hoArX50alxFw51gJ2QF40dBdk3RJy1IHjicOZ1ImaSbPe8pRZSgWOOhz4opHE JyeXSnhCg5D5QDbiNU67DVeQ06HcDJK++eGraQgE1l8ZPWosz/E//iSgwGKtAyIgX+txnOp ElaCBJKZHwyf6FSiPtsGLApenpf+p1aGt2Thw5qDyhOyL1eM7C5GcqrrFUzm65BBjO8SLPH 6X6llE9Qx7ucEdayLI6VfJSK2QQSUmMBsKhDcj4vzNr5wiJ+UGgNLACaL496vYXRE2nWOXQ htdLvRQTM2UIF3XQZAle1brRQ0VZfOeNXgb0cO7YCRxpD08o0u9gcUhEecDH7XT8SX5BGIZ cBPZV88VltkoXXDEo9FbEzsmwsHM9IKLweGAleHFJA8mXxaTtKvjZMJa4ECIW4LXB7q8Z30 CmjVeAFL37Q= X-QQ-GoodBg: 2 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Subject: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) Date: Thu, 18 Aug 2022 18:46:08 +0800 Message-Id: <20220818104608.259204-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvr:qybglogicsvr7 X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Cc: richard.sandiford@arm.com, rguenther@suse.de, kito.cheng@gmail.com, zhongjuzhe <juzhe.zhong@rivai.ai> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
|
|
Commit Message
钟居哲
Aug. 18, 2022, 10:46 a.m. UTC
From: zhongjuzhe <juzhe.zhong@rivai.ai>
Hello. This patch is preparing for following RVV support.
Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
So I define the machine_mode as follows:
VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
ADJUST_NUNITS (MODE, riscv_vector_chunks);
The riscv_vector_chunks = poly_uint16 (1, 1)
The compilation is failed for the stepped vector test:
(const_vector:VNx1DI repeat [
(const_int 8 [0x8])
(const_int 7 [0x7])
])
I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
gcc/ChangeLog:
* simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
---
gcc/simplify-rtx.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
juzhe.zhong@rivai.ai writes: > From: zhongjuzhe <juzhe.zhong@rivai.ai> > > Hello. This patch is preparing for following RVV support. > > Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. > The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks. > However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension. > > So I define the machine_mode as follows: > VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); > ADJUST_NUNITS (MODE, riscv_vector_chunks); > The riscv_vector_chunks = poly_uint16 (1, 1) > > The compilation is failed for the stepped vector test: > (const_vector:VNx1DI repeat [ > (const_int 8 [0x8]) > (const_int 7 [0x7]) > ]) > > I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common > for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1). > > machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization > with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define > vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks! > > > > gcc/ChangeLog: > > * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1). > > --- > gcc/simplify-rtx.cc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index 7d09bf7103d..61e0dfa00d0 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) > rtx x = builder.build (); > > test_vector_subregs_modes (x); > - if (!nunits.is_constant ()) > + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) > test_vector_subregs_modes (x, nunits - min_nunits, count); I think instead we should use maybe_gt (nunits, 1), on the basis that the fore_back tests require vectors that have a minimum of 2 elements. Something like poly_uint16 (1, 2) would have the same problem as poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in principle.) This corresponds to the minimum of 3 elements for the stepped tests: if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT && maybe_gt (GET_MODE_NUNITS (mode), 2)) { test_vector_ops_series (mode, scalar_reg); test_vector_subregs (mode); } Thanks, Richard
Thank you so much. Address your comment. I think "maybe_gt (nunits, 1)" is a more solid solution than I do. I will send a patch to fix this. juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2022-08-19 16:03 To: juzhe.zhong CC: gcc-patches; rguenther; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) juzhe.zhong@rivai.ai writes: > From: zhongjuzhe <juzhe.zhong@rivai.ai> > > Hello. This patch is preparing for following RVV support. > > Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. > The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks. > However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension. > > So I define the machine_mode as follows: > VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); > ADJUST_NUNITS (MODE, riscv_vector_chunks); > The riscv_vector_chunks = poly_uint16 (1, 1) > > The compilation is failed for the stepped vector test: > (const_vector:VNx1DI repeat [ > (const_int 8 [0x8]) > (const_int 7 [0x7]) > ]) > > I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common > for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1). > > machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization > with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define > vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks! > > > > gcc/ChangeLog: > > * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1). > > --- > gcc/simplify-rtx.cc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index 7d09bf7103d..61e0dfa00d0 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) > rtx x = builder.build (); > > test_vector_subregs_modes (x); > - if (!nunits.is_constant ()) > + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) > test_vector_subregs_modes (x, nunits - min_nunits, count); I think instead we should use maybe_gt (nunits, 1), on the basis that the fore_back tests require vectors that have a minimum of 2 elements. Something like poly_uint16 (1, 2) would have the same problem as poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in principle.) This corresponds to the minimum of 3 elements for the stepped tests: if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT && maybe_gt (GET_MODE_NUNITS (mode), 2)) { test_vector_ops_series (mode, scalar_reg); test_vector_subregs (mode); } Thanks, Richard
Hi, Richard. I tried the codes: if (!nunits.is_constant () && maybe_gt (nunits, 1)) test_vector_subregs_modes (x, nunits - min_nunits, count); It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. But I tried: if (!nunits.is_constant () && known_gt (nunits, 1)) test_vector_subregs_modes (x, nunits - min_nunits, count); It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation. Finally, I tried: if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) test_vector_subregs_modes (x, nunits - min_nunits, count); It passed with no warning. Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? Thanks! juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2022-08-19 16:03 To: juzhe.zhong CC: gcc-patches; rguenther; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) juzhe.zhong@rivai.ai writes: > From: zhongjuzhe <juzhe.zhong@rivai.ai> > > Hello. This patch is preparing for following RVV support. > > Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. > The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks. > However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension. > > So I define the machine_mode as follows: > VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); > ADJUST_NUNITS (MODE, riscv_vector_chunks); > The riscv_vector_chunks = poly_uint16 (1, 1) > > The compilation is failed for the stepped vector test: > (const_vector:VNx1DI repeat [ > (const_int 8 [0x8]) > (const_int 7 [0x7]) > ]) > > I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common > for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1). > > machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization > with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define > vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks! > > > > gcc/ChangeLog: > > * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1). > > --- > gcc/simplify-rtx.cc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index 7d09bf7103d..61e0dfa00d0 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) > rtx x = builder.build (); > > test_vector_subregs_modes (x); > - if (!nunits.is_constant ()) > + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) > test_vector_subregs_modes (x, nunits - min_nunits, count); I think instead we should use maybe_gt (nunits, 1), on the basis that the fore_back tests require vectors that have a minimum of 2 elements. Something like poly_uint16 (1, 2) would have the same problem as poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in principle.) This corresponds to the minimum of 3 elements for the stepped tests: if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT && maybe_gt (GET_MODE_NUNITS (mode), 2)) { test_vector_ops_series (mode, scalar_reg); test_vector_subregs (mode); } Thanks, Richard
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: > Hi, Richard. I tried the codes: > if (!nunits.is_constant () && maybe_gt (nunits, 1)) > test_vector_subregs_modes (x, nunits - min_nunits, count); > > It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. Ah, right, sorry for the bogus suggestion. In that case, what specifically goes wrong? Which test in test_vector_subregs_modes fails, and what does the ICE look like? Thanks, Richard > But I tried: > if (!nunits.is_constant () && known_gt (nunits, 1)) > test_vector_subregs_modes (x, nunits - min_nunits, count); > It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation. > > Finally, I tried: > if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) > test_vector_subregs_modes (x, nunits - min_nunits, count); > It passed with no warning. > > Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? > Thanks! > > > juzhe.zhong@rivai.ai > > From: Richard Sandiford > Date: 2022-08-19 16:03 > To: juzhe.zhong > CC: gcc-patches; rguenther; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) > juzhe.zhong@rivai.ai writes: >> From: zhongjuzhe <juzhe.zhong@rivai.ai> >> >> Hello. This patch is preparing for following RVV support. >> >> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. >> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks. >> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension. >> >> So I define the machine_mode as follows: >> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >> ADJUST_NUNITS (MODE, riscv_vector_chunks); >> The riscv_vector_chunks = poly_uint16 (1, 1) >> >> The compilation is failed for the stepped vector test: >> (const_vector:VNx1DI repeat [ >> (const_int 8 [0x8]) >> (const_int 7 [0x7]) >> ]) >> >> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common >> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1). >> >> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization >> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define >> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks! >> >> >> >> gcc/ChangeLog: >> >> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1). >> >> --- >> gcc/simplify-rtx.cc | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >> index 7d09bf7103d..61e0dfa00d0 100644 >> --- a/gcc/simplify-rtx.cc >> +++ b/gcc/simplify-rtx.cc >> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >> rtx x = builder.build (); >> >> test_vector_subregs_modes (x); >> - if (!nunits.is_constant ()) >> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) >> test_vector_subregs_modes (x, nunits - min_nunits, count); > > I think instead we should use maybe_gt (nunits, 1), on the basis that > the fore_back tests require vectors that have a minimum of 2 elements. > Something like poly_uint16 (1, 2) would have the same problem as > poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in > principle.) > > This corresponds to the minimum of 3 elements for the stepped tests: > > if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT > && maybe_gt (GET_MODE_NUNITS (mode), 2)) > { > test_vector_ops_series (mode, scalar_reg); > test_vector_subregs (mode); > } > > Thanks, > Richard >
> Ah, right, sorry for the bogus suggestion. > In that case, what specifically goes wrong? Which test in > test_vector_subregs_modes fails, and what does the ICE look like? Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes. The fail ICE: ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) expected: (nil) actual: (const_int 0 [0]) cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*) ../../../riscv-gcc/gcc/selftest-rtl.cc:57 0x1332504 test_vector_subregs_modes ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 0x1332988 test_vector_subregs_fore_back ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 0x1332ae7 test_vector_subregs ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 0x1332c57 test_vector_ops ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 0x1332c7b selftest::simplify_rtx_cc_tests() ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 0x21318fa selftest::run_tests() ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 0x1362a76 toplev::run_self_tests() ../../../riscv-gcc/gcc/toplev.cc:2205 I analyzed the codes: In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. So the assertion fails. This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element. I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization). I only enable auto-vectorization with mode larger than (2,2) and test it fully. From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~ juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2022-08-19 17:35 To: juzhe.zhong\@rivai.ai CC: rguenther; gcc-patches; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: > Hi, Richard. I tried the codes: > if (!nunits.is_constant () && maybe_gt (nunits, 1)) > test_vector_subregs_modes (x, nunits - min_nunits, count); > > It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. Ah, right, sorry for the bogus suggestion. In that case, what specifically goes wrong? Which test in test_vector_subregs_modes fails, and what does the ICE look like? Thanks, Richard > But I tried: > if (!nunits.is_constant () && known_gt (nunits, 1)) > test_vector_subregs_modes (x, nunits - min_nunits, count); > It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation. > > Finally, I tried: > if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) > test_vector_subregs_modes (x, nunits - min_nunits, count); > It passed with no warning. > > Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? > Thanks! > > > juzhe.zhong@rivai.ai > > From: Richard Sandiford > Date: 2022-08-19 16:03 > To: juzhe.zhong > CC: gcc-patches; rguenther; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) > juzhe.zhong@rivai.ai writes: >> From: zhongjuzhe <juzhe.zhong@rivai.ai> >> >> Hello. This patch is preparing for following RVV support. >> >> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. >> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks. >> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension. >> >> So I define the machine_mode as follows: >> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >> ADJUST_NUNITS (MODE, riscv_vector_chunks); >> The riscv_vector_chunks = poly_uint16 (1, 1) >> >> The compilation is failed for the stepped vector test: >> (const_vector:VNx1DI repeat [ >> (const_int 8 [0x8]) >> (const_int 7 [0x7]) >> ]) >> >> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common >> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1). >> >> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization >> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define >> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks! >> >> >> >> gcc/ChangeLog: >> >> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1). >> >> --- >> gcc/simplify-rtx.cc | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >> index 7d09bf7103d..61e0dfa00d0 100644 >> --- a/gcc/simplify-rtx.cc >> +++ b/gcc/simplify-rtx.cc >> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >> rtx x = builder.build (); >> >> test_vector_subregs_modes (x); >> - if (!nunits.is_constant ()) >> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) >> test_vector_subregs_modes (x, nunits - min_nunits, count); > > I think instead we should use maybe_gt (nunits, 1), on the basis that > the fore_back tests require vectors that have a minimum of 2 elements. > Something like poly_uint16 (1, 2) would have the same problem as > poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in > principle.) > > This corresponds to the minimum of 3 elements for the stepped tests: > > if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT > && maybe_gt (GET_MODE_NUNITS (mode), 2)) > { > test_vector_ops_series (mode, scalar_reg); > test_vector_subregs (mode); > } > > Thanks, > Richard >
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >> Ah, right, sorry for the bogus suggestion. >> In that case, what specifically goes wrong? Which test in >> test_vector_subregs_modes fails, and what does the ICE look like? > > Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes. > The fail ICE: > ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) > expected: (nil) > > actual: (const_int 0 [0]) > cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 > 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*) > ../../../riscv-gcc/gcc/selftest-rtl.cc:57 > 0x1332504 test_vector_subregs_modes > ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 > 0x1332988 test_vector_subregs_fore_back > ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 > 0x1332ae7 test_vector_subregs > ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 > 0x1332c57 test_vector_ops > ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 > 0x1332c7b selftest::simplify_rtx_cc_tests() > ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 > 0x21318fa selftest::run_tests() > ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 > 0x1362a76 toplev::run_self_tests() > ../../../riscv-gcc/gcc/toplev.cc:2205 > > I analyzed the codes: > In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. > So the assertion fails. Hmm, ok, so the subreg operation is unexpected succeeding. > This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element. The stepped case is 3 elements per pattern rather than 2. In a stepped pattern: a, b, b+n are represented explicitly, then the rest are implicitly b+n*2, b+n*3, etc. The case being handled by this code is instead the 2-element case: a, b are represented explicitly, then the rest are implicitly all b. Why is (1,1) different though? The test is constructing: nunits: 1 + 1x shape: nelts_per_pattern == 2, npatterns == 1 elements: a, b[, b, b, b, b, ...] It then tests subregs starting at 0 + 1x (so starting at the first b). But for (2,2) we should have: nunits: 2 + 2x shape: nelts_per_pattern == 2, npatterns == 2 elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] and we should test subregs starting at 0 + 2x (so starting at the first b1). The two cases should be very similar, it's just that the (2,2) case doubles the number of patterns. > I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this > kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization). > I only enable auto-vectorization with mode larger than (2,2) and test it fully. > From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will > not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~ Following on from what I said above, it doesn't look like this particular case is related to stepped vectors. (1,1) shouldn't (need to) be a special case though. Any potentital problems that would occur for (1,1) with npatterns==1 would also occur for (n,n) with npatterns==n. E.g. if stepped vectors are problematic for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2) would be problematic for (2,2). So yeah, preventing a mode being used for autovectorisation would allow the target to have a bit more control over which constants are actually generated. But it shouldn't be necessary to do that for correctness. Thanks, Richard > juzhe.zhong@rivai.ai > > From: Richard Sandiford > Date: 2022-08-19 17:35 > To: juzhe.zhong\@rivai.ai > CC: rguenther; gcc-patches; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) > "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >> Hi, Richard. I tried the codes: >> if (!nunits.is_constant () && maybe_gt (nunits, 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> >> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. > > Ah, right, sorry for the bogus suggestion. > > In that case, what specifically goes wrong? Which test in > test_vector_subregs_modes fails, and what does the ICE look like? > > Thanks, > Richard > >> But I tried: >> if (!nunits.is_constant () && known_gt (nunits, 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation. >> >> Finally, I tried: >> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> It passed with no warning. >> >> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? >> Thanks! >> >> >> juzhe.zhong@rivai.ai >> >> From: Richard Sandiford >> Date: 2022-08-19 16:03 >> To: juzhe.zhong >> CC: gcc-patches; rguenther; kito.cheng >> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >> juzhe.zhong@rivai.ai writes: >>> From: zhongjuzhe <juzhe.zhong@rivai.ai> >>> >>> Hello. This patch is preparing for following RVV support. >>> >>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. >>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks. >>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension. >>> >>> So I define the machine_mode as follows: >>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >>> ADJUST_NUNITS (MODE, riscv_vector_chunks); >>> The riscv_vector_chunks = poly_uint16 (1, 1) >>> >>> The compilation is failed for the stepped vector test: >>> (const_vector:VNx1DI repeat [ >>> (const_int 8 [0x8]) >>> (const_int 7 [0x7]) >>> ]) >>> >>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common >>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1). >>> >>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization >>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define >>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks! >>> >>> >>> >>> gcc/ChangeLog: >>> >>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1). >>> >>> --- >>> gcc/simplify-rtx.cc | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >>> index 7d09bf7103d..61e0dfa00d0 100644 >>> --- a/gcc/simplify-rtx.cc >>> +++ b/gcc/simplify-rtx.cc >>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >>> rtx x = builder.build (); >>> >>> test_vector_subregs_modes (x); >>> - if (!nunits.is_constant ()) >>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) >>> test_vector_subregs_modes (x, nunits - min_nunits, count); >> >> I think instead we should use maybe_gt (nunits, 1), on the basis that >> the fore_back tests require vectors that have a minimum of 2 elements. >> Something like poly_uint16 (1, 2) would have the same problem as >> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in >> principle.) >> >> This corresponds to the minimum of 3 elements for the stepped tests: >> >> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT >> && maybe_gt (GET_MODE_NUNITS (mode), 2)) >> { >> test_vector_ops_series (mode, scalar_reg); >> test_vector_subregs (mode); >> } >> >> Thanks, >> Richard >> >
As you mentioned. For poly_uint16 (1, 1), the pattern should be: shape: nelts_per_pattern == 2, npatterns == 1 elements: a, b[, b, b, b, b, ...] I tried to print out the rtx that tests create to test: static void test_vector_subregs_fore_back (machine_mode inner_mode) { poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); unsigned int min_nunits = constant_lower_bound (nunits); scalar_mode int_mode = GET_MODE_INNER (inner_mode); unsigned int count = gcd (min_nunits, 4); rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); rtx x = builder.build (); test_vector_subregs_modes (x); print_rtl_single(stdout,x); if (!nunits.is_constant ()) test_vector_subregs_modes (x, nunits - min_nunits, count); } the x value: (const_vector:VNx1DI repeat [ (const_int 0 [0]) ]) It seems that it doesn't match the pattern you said. Am I understanding correctly? And would you mind taking a look at the codes which generate x: static void test_vector_subregs_fore_back (machine_mode inner_mode) { poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); unsigned int min_nunits = constant_lower_bound (nunits); scalar_mode int_mode = GET_MODE_INNER (inner_mode); unsigned int count = gcd (min_nunits, 4); rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); rtx x = builder.build (); The nunits = (1,1), min_nunits = 1, count = 1. I print out these variable value may helpful for you to check. Thanks! juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2022-08-19 20:52 To: juzhe.zhong\@rivai.ai CC: gcc-patches; rguenther; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >> Ah, right, sorry for the bogus suggestion. >> In that case, what specifically goes wrong? Which test in >> test_vector_subregs_modes fails, and what does the ICE look like? > > Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes. > The fail ICE: > ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) > expected: (nil) > > actual: (const_int 0 [0]) > cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 > 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*) > ../../../riscv-gcc/gcc/selftest-rtl.cc:57 > 0x1332504 test_vector_subregs_modes > ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 > 0x1332988 test_vector_subregs_fore_back > ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 > 0x1332ae7 test_vector_subregs > ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 > 0x1332c57 test_vector_ops > ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 > 0x1332c7b selftest::simplify_rtx_cc_tests() > ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 > 0x21318fa selftest::run_tests() > ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 > 0x1362a76 toplev::run_self_tests() > ../../../riscv-gcc/gcc/toplev.cc:2205 > > I analyzed the codes: > In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. > So the assertion fails. Hmm, ok, so the subreg operation is unexpected succeeding. > This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element. The stepped case is 3 elements per pattern rather than 2. In a stepped pattern: a, b, b+n are represented explicitly, then the rest are implicitly b+n*2, b+n*3, etc. The case being handled by this code is instead the 2-element case: a, b are represented explicitly, then the rest are implicitly all b. Why is (1,1) different though? The test is constructing: nunits: 1 + 1x shape: nelts_per_pattern == 2, npatterns == 1 elements: a, b[, b, b, b, b, ...] It then tests subregs starting at 0 + 1x (so starting at the first b). But for (2,2) we should have: nunits: 2 + 2x shape: nelts_per_pattern == 2, npatterns == 2 elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] and we should test subregs starting at 0 + 2x (so starting at the first b1). The two cases should be very similar, it's just that the (2,2) case doubles the number of patterns. > I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this > kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization). > I only enable auto-vectorization with mode larger than (2,2) and test it fully. > From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will > not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~ Following on from what I said above, it doesn't look like this particular case is related to stepped vectors. (1,1) shouldn't (need to) be a special case though. Any potentital problems that would occur for (1,1) with npatterns==1 would also occur for (n,n) with npatterns==n. E.g. if stepped vectors are problematic for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2) would be problematic for (2,2). So yeah, preventing a mode being used for autovectorisation would allow the target to have a bit more control over which constants are actually generated. But it shouldn't be necessary to do that for correctness. Thanks, Richard > juzhe.zhong@rivai.ai > > From: Richard Sandiford > Date: 2022-08-19 17:35 > To: juzhe.zhong\@rivai.ai > CC: rguenther; gcc-patches; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) > "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >> Hi, Richard. I tried the codes: >> if (!nunits.is_constant () && maybe_gt (nunits, 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> >> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. > > Ah, right, sorry for the bogus suggestion. > > In that case, what specifically goes wrong? Which test in > test_vector_subregs_modes fails, and what does the ICE look like? > > Thanks, > Richard > >> But I tried: >> if (!nunits.is_constant () && known_gt (nunits, 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation. >> >> Finally, I tried: >> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> It passed with no warning. >> >> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? >> Thanks! >> >> >> juzhe.zhong@rivai.ai >> >> From: Richard Sandiford >> Date: 2022-08-19 16:03 >> To: juzhe.zhong >> CC: gcc-patches; rguenther; kito.cheng >> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >> juzhe.zhong@rivai.ai writes: >>> From: zhongjuzhe <juzhe.zhong@rivai.ai> >>> >>> Hello. This patch is preparing for following RVV support. >>> >>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. >>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks. >>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension. >>> >>> So I define the machine_mode as follows: >>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >>> ADJUST_NUNITS (MODE, riscv_vector_chunks); >>> The riscv_vector_chunks = poly_uint16 (1, 1) >>> >>> The compilation is failed for the stepped vector test: >>> (const_vector:VNx1DI repeat [ >>> (const_int 8 [0x8]) >>> (const_int 7 [0x7]) >>> ]) >>> >>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common >>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1). >>> >>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization >>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define >>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks! >>> >>> >>> >>> gcc/ChangeLog: >>> >>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1). >>> >>> --- >>> gcc/simplify-rtx.cc | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >>> index 7d09bf7103d..61e0dfa00d0 100644 >>> --- a/gcc/simplify-rtx.cc >>> +++ b/gcc/simplify-rtx.cc >>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >>> rtx x = builder.build (); >>> >>> test_vector_subregs_modes (x); >>> - if (!nunits.is_constant ()) >>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) >>> test_vector_subregs_modes (x, nunits - min_nunits, count); >> >> I think instead we should use maybe_gt (nunits, 1), on the basis that >> the fore_back tests require vectors that have a minimum of 2 elements. >> Something like poly_uint16 (1, 2) would have the same problem as >> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in >> principle.) >> >> This corresponds to the minimum of 3 elements for the stepped tests: >> >> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT >> && maybe_gt (GET_MODE_NUNITS (mode), 2)) >> { >> test_vector_ops_series (mode, scalar_reg); >> test_vector_subregs (mode); >> } >> >> Thanks, >> Richard >> >
I rewrite test_vector_subregs_fore_back as follows: static void test_vector_subregs_fore_back (machine_mode inner_mode) { poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); unsigned int min_nunits = constant_lower_bound (nunits); scalar_mode int_mode = GET_MODE_INNER (inner_mode); unsigned int count = gcd (min_nunits, 4); rtx_vector_builder builder (inner_mode, count, 2); unsigned int nelts_per_pattern = count == 1 ? 2 : count; for (unsigned int i = 0; i < nelts_per_pattern; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); rtx x = builder.build (); test_vector_subregs_modes (x); if (!nunits.is_constant ()) test_vector_subregs_modes (x, nunits - min_nunits, count); } I add the code: unsigned int nelts_per_pattern = count == 1 ? 2 : count; then replace the "count" into "nelts_per_pattern " in the first loop. It can pass now. And "x" value I print out seems to be correct: (const_vector:VNx1DI [ (const_int 0 [0]) repeat [ (const_int 1 [0x1]) ] ]) Is this correct solution ? Thanks. juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2022-08-19 20:52 To: juzhe.zhong\@rivai.ai CC: gcc-patches; rguenther; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >> Ah, right, sorry for the bogus suggestion. >> In that case, what specifically goes wrong? Which test in >> test_vector_subregs_modes fails, and what does the ICE look like? > > Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes. > The fail ICE: > ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) > expected: (nil) > > actual: (const_int 0 [0]) > cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 > 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*) > ../../../riscv-gcc/gcc/selftest-rtl.cc:57 > 0x1332504 test_vector_subregs_modes > ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 > 0x1332988 test_vector_subregs_fore_back > ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 > 0x1332ae7 test_vector_subregs > ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 > 0x1332c57 test_vector_ops > ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 > 0x1332c7b selftest::simplify_rtx_cc_tests() > ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 > 0x21318fa selftest::run_tests() > ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 > 0x1362a76 toplev::run_self_tests() > ../../../riscv-gcc/gcc/toplev.cc:2205 > > I analyzed the codes: > In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. > So the assertion fails. Hmm, ok, so the subreg operation is unexpected succeeding. > This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element. The stepped case is 3 elements per pattern rather than 2. In a stepped pattern: a, b, b+n are represented explicitly, then the rest are implicitly b+n*2, b+n*3, etc. The case being handled by this code is instead the 2-element case: a, b are represented explicitly, then the rest are implicitly all b. Why is (1,1) different though? The test is constructing: nunits: 1 + 1x shape: nelts_per_pattern == 2, npatterns == 1 elements: a, b[, b, b, b, b, ...] It then tests subregs starting at 0 + 1x (so starting at the first b). But for (2,2) we should have: nunits: 2 + 2x shape: nelts_per_pattern == 2, npatterns == 2 elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] and we should test subregs starting at 0 + 2x (so starting at the first b1). The two cases should be very similar, it's just that the (2,2) case doubles the number of patterns. > I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this > kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization). > I only enable auto-vectorization with mode larger than (2,2) and test it fully. > From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will > not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~ Following on from what I said above, it doesn't look like this particular case is related to stepped vectors. (1,1) shouldn't (need to) be a special case though. Any potentital problems that would occur for (1,1) with npatterns==1 would also occur for (n,n) with npatterns==n. E.g. if stepped vectors are problematic for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2) would be problematic for (2,2). So yeah, preventing a mode being used for autovectorisation would allow the target to have a bit more control over which constants are actually generated. But it shouldn't be necessary to do that for correctness. Thanks, Richard > juzhe.zhong@rivai.ai > > From: Richard Sandiford > Date: 2022-08-19 17:35 > To: juzhe.zhong\@rivai.ai > CC: rguenther; gcc-patches; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) > "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >> Hi, Richard. I tried the codes: >> if (!nunits.is_constant () && maybe_gt (nunits, 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> >> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. > > Ah, right, sorry for the bogus suggestion. > > In that case, what specifically goes wrong? Which test in > test_vector_subregs_modes fails, and what does the ICE look like? > > Thanks, > Richard > >> But I tried: >> if (!nunits.is_constant () && known_gt (nunits, 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation. >> >> Finally, I tried: >> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> It passed with no warning. >> >> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? >> Thanks! >> >> >> juzhe.zhong@rivai.ai >> >> From: Richard Sandiford >> Date: 2022-08-19 16:03 >> To: juzhe.zhong >> CC: gcc-patches; rguenther; kito.cheng >> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >> juzhe.zhong@rivai.ai writes: >>> From: zhongjuzhe <juzhe.zhong@rivai.ai> >>> >>> Hello. This patch is preparing for following RVV support. >>> >>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. >>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks. >>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension. >>> >>> So I define the machine_mode as follows: >>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >>> ADJUST_NUNITS (MODE, riscv_vector_chunks); >>> The riscv_vector_chunks = poly_uint16 (1, 1) >>> >>> The compilation is failed for the stepped vector test: >>> (const_vector:VNx1DI repeat [ >>> (const_int 8 [0x8]) >>> (const_int 7 [0x7]) >>> ]) >>> >>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common >>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1). >>> >>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization >>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define >>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks! >>> >>> >>> >>> gcc/ChangeLog: >>> >>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1). >>> >>> --- >>> gcc/simplify-rtx.cc | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >>> index 7d09bf7103d..61e0dfa00d0 100644 >>> --- a/gcc/simplify-rtx.cc >>> +++ b/gcc/simplify-rtx.cc >>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >>> rtx x = builder.build (); >>> >>> test_vector_subregs_modes (x); >>> - if (!nunits.is_constant ()) >>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) >>> test_vector_subregs_modes (x, nunits - min_nunits, count); >> >> I think instead we should use maybe_gt (nunits, 1), on the basis that >> the fore_back tests require vectors that have a minimum of 2 elements. >> Something like poly_uint16 (1, 2) would have the same problem as >> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in >> principle.) >> >> This corresponds to the minimum of 3 elements for the stepped tests: >> >> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT >> && maybe_gt (GET_MODE_NUNITS (mode), 2)) >> { >> test_vector_ops_series (mode, scalar_reg); >> test_vector_subregs (mode); >> } >> >> Thanks, >> Richard >> >
After deep analysis and several tries. I have made a analysis and conclusion as follows. I really appreciate if you could spend some time take a look at the following analysis that I made: According to the codes in test_vector_subregs_fore_back: ~~~ poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); unsigned int min_nunits = constant_lower_bound (nunits); scalar_mode int_mode = GET_MODE_INNER (inner_mode); unsigned int count = gcd (min_nunits, 4); rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); rtx x = builder.build (); ~~~~ Analyze the codes and tried several patterns: For poly_uint16 (2,2): x = (const_vector:VNx1DI [ (const_int 0 [0]) (const_int 1 [0x1]) repeat [ (const_int 0 [0]) (const_int -1 [0xffffffffffffffff]) ] ]) For poly_uint16 (4,4): x = (const_vector:VNx1DI [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) repeat [ (const_int 0 [0]) (const_int -1 [0xffffffffffffffff]) (const_int -2 [0xfffffffffffffffe]) (const_int -3 [0xfffffffffffffffd]) ] ]) So, I think I can conclude the pattern rule as follows: poly_uint16 (n,n): x = (const_vector:VNx1DI [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) .................. (const_int n [hex(n)]) repeat [ (const_int 0 [0]) (const_int -1 [0xffffffffffffffff]) (const_int -2 [0xfffffffffffffffe]) (const_int -3 [0xfffffffffffffffd]) ......... (const_int -n [hex(-n)]) ] ]) Am I understanding right? Let's first take a look at the codes you write: rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); rtx x = builder.build (); So for poly_uint (1,1): Ideally according to the codes, you want to generate the pattern like this: x = (const_vector:VNx1DI [ (const_int 0 [0]) repeat [ (const_int 0 [0]) ] ]) In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0. In this case, i = -i = 0. So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();". Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector: x = (const_vector:VNx1DI repeat [ (const_int 0 [0]) ]) This is a const_vector duplicate vector. This is not the correct pattern you want so it fails in the following check. I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong. To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1): ================ Original codes: rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); ================ ================ 1st solution: rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); x = (const_vector:VNx1DI [ (const_int 0 [0]) repeat [ (const_int 1 [0x1]) ] ]) ================ ================ 2nd solution: rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode)); x= (const_vector:VNx1DI [ (const_int 0 [0]) repeat [ (const_int -1 [0xffffffffffffffff]) ] ]) ================ Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much. juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2022-08-19 20:52 To: juzhe.zhong\@rivai.ai CC: gcc-patches; rguenther; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >> Ah, right, sorry for the bogus suggestion. >> In that case, what specifically goes wrong? Which test in >> test_vector_subregs_modes fails, and what does the ICE look like? > > Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes. > The fail ICE: > ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) > expected: (nil) > > actual: (const_int 0 [0]) > cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 > 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*) > ../../../riscv-gcc/gcc/selftest-rtl.cc:57 > 0x1332504 test_vector_subregs_modes > ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 > 0x1332988 test_vector_subregs_fore_back > ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 > 0x1332ae7 test_vector_subregs > ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 > 0x1332c57 test_vector_ops > ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 > 0x1332c7b selftest::simplify_rtx_cc_tests() > ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 > 0x21318fa selftest::run_tests() > ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 > 0x1362a76 toplev::run_self_tests() > ../../../riscv-gcc/gcc/toplev.cc:2205 > > I analyzed the codes: > In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. > So the assertion fails. Hmm, ok, so the subreg operation is unexpected succeeding. > This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element. The stepped case is 3 elements per pattern rather than 2. In a stepped pattern: a, b, b+n are represented explicitly, then the rest are implicitly b+n*2, b+n*3, etc. The case being handled by this code is instead the 2-element case: a, b are represented explicitly, then the rest are implicitly all b. Why is (1,1) different though? The test is constructing: nunits: 1 + 1x shape: nelts_per_pattern == 2, npatterns == 1 elements: a, b[, b, b, b, b, ...] It then tests subregs starting at 0 + 1x (so starting at the first b). But for (2,2) we should have: nunits: 2 + 2x shape: nelts_per_pattern == 2, npatterns == 2 elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] and we should test subregs starting at 0 + 2x (so starting at the first b1). The two cases should be very similar, it's just that the (2,2) case doubles the number of patterns. > I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this > kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization). > I only enable auto-vectorization with mode larger than (2,2) and test it fully. > From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will > not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~ Following on from what I said above, it doesn't look like this particular case is related to stepped vectors. (1,1) shouldn't (need to) be a special case though. Any potentital problems that would occur for (1,1) with npatterns==1 would also occur for (n,n) with npatterns==n. E.g. if stepped vectors are problematic for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2) would be problematic for (2,2). So yeah, preventing a mode being used for autovectorisation would allow the target to have a bit more control over which constants are actually generated. But it shouldn't be necessary to do that for correctness. Thanks, Richard > juzhe.zhong@rivai.ai > > From: Richard Sandiford > Date: 2022-08-19 17:35 > To: juzhe.zhong\@rivai.ai > CC: rguenther; gcc-patches; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) > "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >> Hi, Richard. I tried the codes: >> if (!nunits.is_constant () && maybe_gt (nunits, 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> >> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. > > Ah, right, sorry for the bogus suggestion. > > In that case, what specifically goes wrong? Which test in > test_vector_subregs_modes fails, and what does the ICE look like? > > Thanks, > Richard > >> But I tried: >> if (!nunits.is_constant () && known_gt (nunits, 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation. >> >> Finally, I tried: >> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> It passed with no warning. >> >> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? >> Thanks! >> >> >> juzhe.zhong@rivai.ai >> >> From: Richard Sandiford >> Date: 2022-08-19 16:03 >> To: juzhe.zhong >> CC: gcc-patches; rguenther; kito.cheng >> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >> juzhe.zhong@rivai.ai writes: >>> From: zhongjuzhe <juzhe.zhong@rivai.ai> >>> >>> Hello. This patch is preparing for following RVV support. >>> >>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. >>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks. >>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension. >>> >>> So I define the machine_mode as follows: >>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >>> ADJUST_NUNITS (MODE, riscv_vector_chunks); >>> The riscv_vector_chunks = poly_uint16 (1, 1) >>> >>> The compilation is failed for the stepped vector test: >>> (const_vector:VNx1DI repeat [ >>> (const_int 8 [0x8]) >>> (const_int 7 [0x7]) >>> ]) >>> >>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common >>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1). >>> >>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization >>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define >>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks! >>> >>> >>> >>> gcc/ChangeLog: >>> >>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1). >>> >>> --- >>> gcc/simplify-rtx.cc | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >>> index 7d09bf7103d..61e0dfa00d0 100644 >>> --- a/gcc/simplify-rtx.cc >>> +++ b/gcc/simplify-rtx.cc >>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >>> rtx x = builder.build (); >>> >>> test_vector_subregs_modes (x); >>> - if (!nunits.is_constant ()) >>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) >>> test_vector_subregs_modes (x, nunits - min_nunits, count); >> >> I think instead we should use maybe_gt (nunits, 1), on the basis that >> the fore_back tests require vectors that have a minimum of 2 elements. >> Something like poly_uint16 (1, 2) would have the same problem as >> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in >> principle.) >> >> This corresponds to the minimum of 3 elements for the stepped tests: >> >> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT >> && maybe_gt (GET_MODE_NUNITS (mode), 2)) >> { >> test_vector_ops_series (mode, scalar_reg); >> test_vector_subregs (mode); >> } >> >> Thanks, >> Richard >> >
钟居哲 <juzhe.zhong@rivai.ai> writes: > After deep analysis and several tries. I have made a analysis and conclusion as follows. > I really appreciate if you could spend some time take a look at the following analysis that I made: > > According to the codes in test_vector_subregs_fore_back: > > ~~~ > poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); > unsigned int min_nunits = constant_lower_bound (nunits); > scalar_mode int_mode = GET_MODE_INNER (inner_mode); > unsigned int count = gcd (min_nunits, 4); > > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (-(int) i, int_mode)); > rtx x = builder.build (); > ~~~~ > > Analyze the codes and tried several patterns: > > For poly_uint16 (2,2): > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > (const_int 1 [0x1]) > repeat [ > (const_int 0 [0]) > (const_int -1 [0xffffffffffffffff]) > ] > ]) > > For poly_uint16 (4,4): > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > (const_int 1 [0x1]) > (const_int 2 [0x2]) > (const_int 3 [0x3]) > repeat [ > (const_int 0 [0]) > (const_int -1 [0xffffffffffffffff]) > (const_int -2 [0xfffffffffffffffe]) > (const_int -3 [0xfffffffffffffffd]) > ] > ]) > > So, I think I can conclude the pattern rule as follows: > > poly_uint16 (n,n): > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > (const_int 1 [0x1]) > (const_int 2 [0x2]) > (const_int 3 [0x3]) > .................. > (const_int n [hex(n)]) > repeat [ > (const_int 0 [0]) > (const_int -1 [0xffffffffffffffff]) > (const_int -2 [0xfffffffffffffffe]) > (const_int -3 [0xfffffffffffffffd]) > ......... > (const_int -n [hex(-n)]) > ] > ]) > Am I understanding right? > > Let's first take a look at the codes you write: > > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (-(int) i, int_mode)); > rtx x = builder.build (); > > So for poly_uint (1,1): > Ideally according to the codes, you want to generate the pattern like this: > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > repeat [ > (const_int 0 [0]) > ] > ]) Ah, right, the two subsequences are supposed to be different. > In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0. > In this case, i = -i = 0. > > So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();". > Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector: > > x = (const_vector:VNx1DI repeat [ > (const_int 0 [0]) > ]) > > This is a const_vector duplicate vector. > This is not the correct pattern you want so it fails in the following check. > > I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong. > > To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value > and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1): > > ================ > Original codes: > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (-(int) i, int_mode)); > ================ > > ================ > 1st solution: > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (-(int) i, int_mode)); > > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > repeat [ > (const_int 1 [0x1]) > ] > ]) > ================ > > ================ > 2nd solution: > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode)); I don't think we need to single out count == 1. It should be OK to use -1 - (int) i unconditionally. Thanks, Richard > x= > (const_vector:VNx1DI [ > (const_int 0 [0]) > repeat [ > (const_int -1 [0xffffffffffffffff]) > ] > ]) > ================ > > Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much. > > > juzhe.zhong@rivai.ai > > From: Richard Sandiford > Date: 2022-08-19 20:52 > To: juzhe.zhong\@rivai.ai > CC: gcc-patches; rguenther; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) > "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >>> Ah, right, sorry for the bogus suggestion. >>> In that case, what specifically goes wrong? Which test in >>> test_vector_subregs_modes fails, and what does the ICE look like? >> >> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes. >> The fail ICE: >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) >> expected: (nil) >> >> actual: (const_int 0 [0]) >> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 >> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*) >> ../../../riscv-gcc/gcc/selftest-rtl.cc:57 >> 0x1332504 test_vector_subregs_modes >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 >> 0x1332988 test_vector_subregs_fore_back >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 >> 0x1332ae7 test_vector_subregs >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 >> 0x1332c57 test_vector_ops >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 >> 0x1332c7b selftest::simplify_rtx_cc_tests() >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 >> 0x21318fa selftest::run_tests() >> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 >> 0x1362a76 toplev::run_self_tests() >> ../../../riscv-gcc/gcc/toplev.cc:2205 >> >> I analyzed the codes: >> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. >> So the assertion fails. > > Hmm, ok, so the subreg operation is unexpected succeeding. > >> This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element. > > The stepped case is 3 elements per pattern rather than 2. In a stepped > pattern: a, b, b+n are represented explicitly, then the rest are > implicitly b+n*2, b+n*3, etc. > > The case being handled by this code is instead the 2-element case: > a, b are represented explicitly, then the rest are implicitly all b. > > Why is (1,1) different though? The test is constructing: > > nunits: 1 + 1x > shape: nelts_per_pattern == 2, npatterns == 1 > elements: a, b[, b, b, b, b, ...] > > It then tests subregs starting at 0 + 1x (so starting at the first b). > But for (2,2) we should have: > > nunits: 2 + 2x > shape: nelts_per_pattern == 2, npatterns == 2 > elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] > > and we should test subregs starting at 0 + 2x (so starting at the > first b1). The two cases should be very similar, it's just that the > (2,2) case doubles the number of patterns. > >> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this >> kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization). >> I only enable auto-vectorization with mode larger than (2,2) and test it fully. >> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will >> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~ > > Following on from what I said above, it doesn't look like this particular > case is related to stepped vectors. > > (1,1) shouldn't (need to) be a special case though. Any potentital > problems that would occur for (1,1) with npatterns==1 would also occur > for (n,n) with npatterns==n. E.g. if stepped vectors are problematic > for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2) > would be problematic for (2,2). > > So yeah, preventing a mode being used for autovectorisation would allow > the target to have a bit more control over which constants are actually > generated. But it shouldn't be necessary to do that for correctness. > > Thanks, > Richard > >> juzhe.zhong@rivai.ai >> >> From: Richard Sandiford >> Date: 2022-08-19 17:35 >> To: juzhe.zhong\@rivai.ai >> CC: rguenther; gcc-patches; kito.cheng >> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >>> Hi, Richard. I tried the codes: >>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) >>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>> >>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. >> >> Ah, right, sorry for the bogus suggestion. >> >> In that case, what specifically goes wrong? Which test in >> test_vector_subregs_modes fails, and what does the ICE look like? >> >> Thanks, >> Richard >> >>> But I tried: >>> if (!nunits.is_constant () && known_gt (nunits, 1)) >>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation. >>> >>> Finally, I tried: >>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) >>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>> It passed with no warning. >>> >>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? >>> Thanks! >>> >>> >>> juzhe.zhong@rivai.ai >>> >>> From: Richard Sandiford >>> Date: 2022-08-19 16:03 >>> To: juzhe.zhong >>> CC: gcc-patches; rguenther; kito.cheng >>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >>> juzhe.zhong@rivai.ai writes: >>>> From: zhongjuzhe <juzhe.zhong@rivai.ai> >>>> >>>> Hello. This patch is preparing for following RVV support. >>>> >>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. >>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks. >>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension. >>>> >>>> So I define the machine_mode as follows: >>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >>>> ADJUST_NUNITS (MODE, riscv_vector_chunks); >>>> The riscv_vector_chunks = poly_uint16 (1, 1) >>>> >>>> The compilation is failed for the stepped vector test: >>>> (const_vector:VNx1DI repeat [ >>>> (const_int 8 [0x8]) >>>> (const_int 7 [0x7]) >>>> ]) >>>> >>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common >>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1). >>>> >>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization >>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define >>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks! >>>> >>>> >>>> >>>> gcc/ChangeLog: >>>> >>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1). >>>> >>>> --- >>>> gcc/simplify-rtx.cc | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >>>> index 7d09bf7103d..61e0dfa00d0 100644 >>>> --- a/gcc/simplify-rtx.cc >>>> +++ b/gcc/simplify-rtx.cc >>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >>>> rtx x = builder.build (); >>>> >>>> test_vector_subregs_modes (x); >>>> - if (!nunits.is_constant ()) >>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>> >>> I think instead we should use maybe_gt (nunits, 1), on the basis that >>> the fore_back tests require vectors that have a minimum of 2 elements. >>> Something like poly_uint16 (1, 2) would have the same problem as >>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in >>> principle.) >>> >>> This corresponds to the minimum of 3 elements for the stepped tests: >>> >>> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT >>> && maybe_gt (GET_MODE_NUNITS (mode), 2)) >>> { >>> test_vector_ops_series (mode, scalar_reg); >>> test_vector_subregs (mode); >>> } >>> >>> Thanks, >>> Richard >>> >> >
Thanks for your reply. Your suggestion "-1 - (int) i" is better. Can I send another patch that changes "builder.quick_push (gen_int_mode (-(int) i, int_mode));" into “builder.quick_push (gen_int_mode (-1 - (int) i, int_mode));" then you merge it for me? Or you change it yourself? Thank you so much. juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2022-08-22 16:31 To: 钟居哲 CC: rguenther; gcc-patches; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) 钟居哲 <juzhe.zhong@rivai.ai> writes: > After deep analysis and several tries. I have made a analysis and conclusion as follows. > I really appreciate if you could spend some time take a look at the following analysis that I made: > > According to the codes in test_vector_subregs_fore_back: > > ~~~ > poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); > unsigned int min_nunits = constant_lower_bound (nunits); > scalar_mode int_mode = GET_MODE_INNER (inner_mode); > unsigned int count = gcd (min_nunits, 4); > > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (-(int) i, int_mode)); > rtx x = builder.build (); > ~~~~ > > Analyze the codes and tried several patterns: > > For poly_uint16 (2,2): > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > (const_int 1 [0x1]) > repeat [ > (const_int 0 [0]) > (const_int -1 [0xffffffffffffffff]) > ] > ]) > > For poly_uint16 (4,4): > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > (const_int 1 [0x1]) > (const_int 2 [0x2]) > (const_int 3 [0x3]) > repeat [ > (const_int 0 [0]) > (const_int -1 [0xffffffffffffffff]) > (const_int -2 [0xfffffffffffffffe]) > (const_int -3 [0xfffffffffffffffd]) > ] > ]) > > So, I think I can conclude the pattern rule as follows: > > poly_uint16 (n,n): > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > (const_int 1 [0x1]) > (const_int 2 [0x2]) > (const_int 3 [0x3]) > .................. > (const_int n [hex(n)]) > repeat [ > (const_int 0 [0]) > (const_int -1 [0xffffffffffffffff]) > (const_int -2 [0xfffffffffffffffe]) > (const_int -3 [0xfffffffffffffffd]) > ......... > (const_int -n [hex(-n)]) > ] > ]) > Am I understanding right? > > Let's first take a look at the codes you write: > > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (-(int) i, int_mode)); > rtx x = builder.build (); > > So for poly_uint (1,1): > Ideally according to the codes, you want to generate the pattern like this: > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > repeat [ > (const_int 0 [0]) > ] > ]) Ah, right, the two subsequences are supposed to be different. > In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0. > In this case, i = -i = 0. > > So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();". > Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector: > > x = (const_vector:VNx1DI repeat [ > (const_int 0 [0]) > ]) > > This is a const_vector duplicate vector. > This is not the correct pattern you want so it fails in the following check. > > I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong. > > To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value > and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1): > > ================ > Original codes: > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (-(int) i, int_mode)); > ================ > > ================ > 1st solution: > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (-(int) i, int_mode)); > > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > repeat [ > (const_int 1 [0x1]) > ] > ]) > ================ > > ================ > 2nd solution: > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode)); I don't think we need to single out count == 1. It should be OK to use -1 - (int) i unconditionally. Thanks, Richard > x= > (const_vector:VNx1DI [ > (const_int 0 [0]) > repeat [ > (const_int -1 [0xffffffffffffffff]) > ] > ]) > ================ > > Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much. > > > juzhe.zhong@rivai.ai > > From: Richard Sandiford > Date: 2022-08-19 20:52 > To: juzhe.zhong\@rivai.ai > CC: gcc-patches; rguenther; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) > "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >>> Ah, right, sorry for the bogus suggestion. >>> In that case, what specifically goes wrong? Which test in >>> test_vector_subregs_modes fails, and what does the ICE look like? >> >> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes. >> The fail ICE: >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) >> expected: (nil) >> >> actual: (const_int 0 [0]) >> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 >> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*) >> ../../../riscv-gcc/gcc/selftest-rtl.cc:57 >> 0x1332504 test_vector_subregs_modes >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 >> 0x1332988 test_vector_subregs_fore_back >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 >> 0x1332ae7 test_vector_subregs >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 >> 0x1332c57 test_vector_ops >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 >> 0x1332c7b selftest::simplify_rtx_cc_tests() >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 >> 0x21318fa selftest::run_tests() >> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 >> 0x1362a76 toplev::run_self_tests() >> ../../../riscv-gcc/gcc/toplev.cc:2205 >> >> I analyzed the codes: >> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. >> So the assertion fails. > > Hmm, ok, so the subreg operation is unexpected succeeding. > >> This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element. > > The stepped case is 3 elements per pattern rather than 2. In a stepped > pattern: a, b, b+n are represented explicitly, then the rest are > implicitly b+n*2, b+n*3, etc. > > The case being handled by this code is instead the 2-element case: > a, b are represented explicitly, then the rest are implicitly all b. > > Why is (1,1) different though? The test is constructing: > > nunits: 1 + 1x > shape: nelts_per_pattern == 2, npatterns == 1 > elements: a, b[, b, b, b, b, ...] > > It then tests subregs starting at 0 + 1x (so starting at the first b). > But for (2,2) we should have: > > nunits: 2 + 2x > shape: nelts_per_pattern == 2, npatterns == 2 > elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] > > and we should test subregs starting at 0 + 2x (so starting at the > first b1). The two cases should be very similar, it's just that the > (2,2) case doubles the number of patterns. > >> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this >> kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization). >> I only enable auto-vectorization with mode larger than (2,2) and test it fully. >> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will >> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~ > > Following on from what I said above, it doesn't look like this particular > case is related to stepped vectors. > > (1,1) shouldn't (need to) be a special case though. Any potentital > problems that would occur for (1,1) with npatterns==1 would also occur > for (n,n) with npatterns==n. E.g. if stepped vectors are problematic > for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2) > would be problematic for (2,2). > > So yeah, preventing a mode being used for autovectorisation would allow > the target to have a bit more control over which constants are actually > generated. But it shouldn't be necessary to do that for correctness. > > Thanks, > Richard > >> juzhe.zhong@rivai.ai >> >> From: Richard Sandiford >> Date: 2022-08-19 17:35 >> To: juzhe.zhong\@rivai.ai >> CC: rguenther; gcc-patches; kito.cheng >> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >>> Hi, Richard. I tried the codes: >>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) >>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>> >>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. >> >> Ah, right, sorry for the bogus suggestion. >> >> In that case, what specifically goes wrong? Which test in >> test_vector_subregs_modes fails, and what does the ICE look like? >> >> Thanks, >> Richard >> >>> But I tried: >>> if (!nunits.is_constant () && known_gt (nunits, 1)) >>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation. >>> >>> Finally, I tried: >>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) >>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>> It passed with no warning. >>> >>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? >>> Thanks! >>> >>> >>> juzhe.zhong@rivai.ai >>> >>> From: Richard Sandiford >>> Date: 2022-08-19 16:03 >>> To: juzhe.zhong >>> CC: gcc-patches; rguenther; kito.cheng >>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >>> juzhe.zhong@rivai.ai writes: >>>> From: zhongjuzhe <juzhe.zhong@rivai.ai> >>>> >>>> Hello. This patch is preparing for following RVV support. >>>> >>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. >>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks. >>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension. >>>> >>>> So I define the machine_mode as follows: >>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >>>> ADJUST_NUNITS (MODE, riscv_vector_chunks); >>>> The riscv_vector_chunks = poly_uint16 (1, 1) >>>> >>>> The compilation is failed for the stepped vector test: >>>> (const_vector:VNx1DI repeat [ >>>> (const_int 8 [0x8]) >>>> (const_int 7 [0x7]) >>>> ]) >>>> >>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common >>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1). >>>> >>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization >>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define >>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks! >>>> >>>> >>>> >>>> gcc/ChangeLog: >>>> >>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1). >>>> >>>> --- >>>> gcc/simplify-rtx.cc | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >>>> index 7d09bf7103d..61e0dfa00d0 100644 >>>> --- a/gcc/simplify-rtx.cc >>>> +++ b/gcc/simplify-rtx.cc >>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >>>> rtx x = builder.build (); >>>> >>>> test_vector_subregs_modes (x); >>>> - if (!nunits.is_constant ()) >>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>> >>> I think instead we should use maybe_gt (nunits, 1), on the basis that >>> the fore_back tests require vectors that have a minimum of 2 elements. >>> Something like poly_uint16 (1, 2) would have the same problem as >>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in >>> principle.) >>> >>> This corresponds to the minimum of 3 elements for the stepped tests: >>> >>> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT >>> && maybe_gt (GET_MODE_NUNITS (mode), 2)) >>> { >>> test_vector_ops_series (mode, scalar_reg); >>> test_vector_subregs (mode); >>> } >>> >>> Thanks, >>> Richard >>> >> >
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: > Thanks for your reply. Your suggestion "-1 - (int) i" is better. > > Can I send another patch that changes "builder.quick_push (gen_int_mode (-(int) i, int_mode));" > into “builder.quick_push (gen_int_mode (-1 - (int) i, int_mode));" then you merge it for me? Yeah, please send a patch, and I'll merge it like you say. Thanks, Richard > > Or you change it yourself? > > Thank you so much. > > > > > juzhe.zhong@rivai.ai > > From: Richard Sandiford > Date: 2022-08-22 16:31 > To: 钟居哲 > CC: rguenther; gcc-patches; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) > 钟居哲 <juzhe.zhong@rivai.ai> writes: >> After deep analysis and several tries. I have made a analysis and conclusion as follows. >> I really appreciate if you could spend some time take a look at the following analysis that I made: >> >> According to the codes in test_vector_subregs_fore_back: >> >> ~~~ >> poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); >> unsigned int min_nunits = constant_lower_bound (nunits); >> scalar_mode int_mode = GET_MODE_INNER (inner_mode); >> unsigned int count = gcd (min_nunits, 4); >> >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> rtx x = builder.build (); >> ~~~~ >> >> Analyze the codes and tried several patterns: >> >> For poly_uint16 (2,2): >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> (const_int 1 [0x1]) >> repeat [ >> (const_int 0 [0]) >> (const_int -1 [0xffffffffffffffff]) >> ] >> ]) >> >> For poly_uint16 (4,4): >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> (const_int 1 [0x1]) >> (const_int 2 [0x2]) >> (const_int 3 [0x3]) >> repeat [ >> (const_int 0 [0]) >> (const_int -1 [0xffffffffffffffff]) >> (const_int -2 [0xfffffffffffffffe]) >> (const_int -3 [0xfffffffffffffffd]) >> ] >> ]) >> >> So, I think I can conclude the pattern rule as follows: >> >> poly_uint16 (n,n): >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> (const_int 1 [0x1]) >> (const_int 2 [0x2]) >> (const_int 3 [0x3]) >> .................. >> (const_int n [hex(n)]) >> repeat [ >> (const_int 0 [0]) >> (const_int -1 [0xffffffffffffffff]) >> (const_int -2 [0xfffffffffffffffe]) >> (const_int -3 [0xfffffffffffffffd]) >> ......... >> (const_int -n [hex(-n)]) >> ] >> ]) >> Am I understanding right? >> >> Let's first take a look at the codes you write: >> >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> rtx x = builder.build (); >> >> So for poly_uint (1,1): >> Ideally according to the codes, you want to generate the pattern like this: >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> repeat [ >> (const_int 0 [0]) >> ] >> ]) > > Ah, right, the two subsequences are supposed to be different. > >> In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0. >> In this case, i = -i = 0. >> >> So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();". >> Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector: >> >> x = (const_vector:VNx1DI repeat [ >> (const_int 0 [0]) >> ]) >> >> This is a const_vector duplicate vector. >> This is not the correct pattern you want so it fails in the following check. >> >> I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong. >> >> To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value >> and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1): >> >> ================ >> Original codes: >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> ================ >> >> ================ >> 1st solution: >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> repeat [ >> (const_int 1 [0x1]) >> ] >> ]) >> ================ >> >> ================ >> 2nd solution: >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode)); > > I don't think we need to single out count == 1. It should be OK to use > -1 - (int) i unconditionally. > > Thanks, > Richard > >> x= >> (const_vector:VNx1DI [ >> (const_int 0 [0]) >> repeat [ >> (const_int -1 [0xffffffffffffffff]) >> ] >> ]) >> ================ >> >> Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much. >> >> >> juzhe.zhong@rivai.ai >> >> From: Richard Sandiford >> Date: 2022-08-19 20:52 >> To: juzhe.zhong\@rivai.ai >> CC: gcc-patches; rguenther; kito.cheng >> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >>>> Ah, right, sorry for the bogus suggestion. >>>> In that case, what specifically goes wrong? Which test in >>>> test_vector_subregs_modes fails, and what does the ICE look like? >>> >>> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes. >>> The fail ICE: >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) >>> expected: (nil) >>> >>> actual: (const_int 0 [0]) >>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 >>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*) >>> ../../../riscv-gcc/gcc/selftest-rtl.cc:57 >>> 0x1332504 test_vector_subregs_modes >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 >>> 0x1332988 test_vector_subregs_fore_back >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 >>> 0x1332ae7 test_vector_subregs >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 >>> 0x1332c57 test_vector_ops >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 >>> 0x1332c7b selftest::simplify_rtx_cc_tests() >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 >>> 0x21318fa selftest::run_tests() >>> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 >>> 0x1362a76 toplev::run_self_tests() >>> ../../../riscv-gcc/gcc/toplev.cc:2205 >>> >>> I analyzed the codes: >>> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. >>> So the assertion fails. >> >> Hmm, ok, so the subreg operation is unexpected succeeding. >> >>> This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element. >> >> The stepped case is 3 elements per pattern rather than 2. In a stepped >> pattern: a, b, b+n are represented explicitly, then the rest are >> implicitly b+n*2, b+n*3, etc. >> >> The case being handled by this code is instead the 2-element case: >> a, b are represented explicitly, then the rest are implicitly all b. >> >> Why is (1,1) different though? The test is constructing: >> >> nunits: 1 + 1x >> shape: nelts_per_pattern == 2, npatterns == 1 >> elements: a, b[, b, b, b, b, ...] >> >> It then tests subregs starting at 0 + 1x (so starting at the first b). >> But for (2,2) we should have: >> >> nunits: 2 + 2x >> shape: nelts_per_pattern == 2, npatterns == 2 >> elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] >> >> and we should test subregs starting at 0 + 2x (so starting at the >> first b1). The two cases should be very similar, it's just that the >> (2,2) case doubles the number of patterns. >> >>> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this >>> kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization). >>> I only enable auto-vectorization with mode larger than (2,2) and test it fully. >>> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will >>> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~ >> >> Following on from what I said above, it doesn't look like this particular >> case is related to stepped vectors. >> >> (1,1) shouldn't (need to) be a special case though. Any potentital >> problems that would occur for (1,1) with npatterns==1 would also occur >> for (n,n) with npatterns==n. E.g. if stepped vectors are problematic >> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2) >> would be problematic for (2,2). >> >> So yeah, preventing a mode being used for autovectorisation would allow >> the target to have a bit more control over which constants are actually >> generated. But it shouldn't be necessary to do that for correctness. >> >> Thanks, >> Richard >> >>> juzhe.zhong@rivai.ai >>> >>> From: Richard Sandiford >>> Date: 2022-08-19 17:35 >>> To: juzhe.zhong\@rivai.ai >>> CC: rguenther; gcc-patches; kito.cheng >>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >>>> Hi, Richard. I tried the codes: >>>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> >>>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. >>> >>> Ah, right, sorry for the bogus suggestion. >>> >>> In that case, what specifically goes wrong? Which test in >>> test_vector_subregs_modes fails, and what does the ICE look like? >>> >>> Thanks, >>> Richard >>> >>>> But I tried: >>>> if (!nunits.is_constant () && known_gt (nunits, 1)) >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation. >>>> >>>> Finally, I tried: >>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> It passed with no warning. >>>> >>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? >>>> Thanks! >>>> >>>> >>>> juzhe.zhong@rivai.ai >>>> >>>> From: Richard Sandiford >>>> Date: 2022-08-19 16:03 >>>> To: juzhe.zhong >>>> CC: gcc-patches; rguenther; kito.cheng >>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >>>> juzhe.zhong@rivai.ai writes: >>>>> From: zhongjuzhe <juzhe.zhong@rivai.ai> >>>>> >>>>> Hello. This patch is preparing for following RVV support. >>>>> >>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. >>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks. >>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension. >>>>> >>>>> So I define the machine_mode as follows: >>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks); >>>>> The riscv_vector_chunks = poly_uint16 (1, 1) >>>>> >>>>> The compilation is failed for the stepped vector test: >>>>> (const_vector:VNx1DI repeat [ >>>>> (const_int 8 [0x8]) >>>>> (const_int 7 [0x7]) >>>>> ]) >>>>> >>>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common >>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1). >>>>> >>>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization >>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define >>>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks! >>>>> >>>>> >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1). >>>>> >>>>> --- >>>>> gcc/simplify-rtx.cc | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >>>>> index 7d09bf7103d..61e0dfa00d0 100644 >>>>> --- a/gcc/simplify-rtx.cc >>>>> +++ b/gcc/simplify-rtx.cc >>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >>>>> rtx x = builder.build (); >>>>> >>>>> test_vector_subregs_modes (x); >>>>> - if (!nunits.is_constant ()) >>>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) >>>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> >>>> I think instead we should use maybe_gt (nunits, 1), on the basis that >>>> the fore_back tests require vectors that have a minimum of 2 elements. >>>> Something like poly_uint16 (1, 2) would have the same problem as >>>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in >>>> principle.) >>>> >>>> This corresponds to the minimum of 3 elements for the stepped tests: >>>> >>>> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT >>>> && maybe_gt (GET_MODE_NUNITS (mode), 2)) >>>> { >>>> test_vector_ops_series (mode, scalar_reg); >>>> test_vector_subregs (mode); >>>> } >>>> >>>> Thanks, >>>> Richard >>>> >>> >> >
I sent another patch called "Fix issue of poly_uint16 (1,1) in self test" to fix it. I didn't send V2 following this patch because this patch name is misleading. Thank you so much. Richard. juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2022-08-22 16:56 To: juzhe.zhong\@rivai.ai CC: gcc-patches; rguenther; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: > Thanks for your reply. Your suggestion "-1 - (int) i" is better. > > Can I send another patch that changes "builder.quick_push (gen_int_mode (-(int) i, int_mode));" > into “builder.quick_push (gen_int_mode (-1 - (int) i, int_mode));" then you merge it for me? Yeah, please send a patch, and I'll merge it like you say. Thanks, Richard > > Or you change it yourself? > > Thank you so much. > > > > > juzhe.zhong@rivai.ai > > From: Richard Sandiford > Date: 2022-08-22 16:31 > To: 钟居哲 > CC: rguenther; gcc-patches; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) > 钟居哲 <juzhe.zhong@rivai.ai> writes: >> After deep analysis and several tries. I have made a analysis and conclusion as follows. >> I really appreciate if you could spend some time take a look at the following analysis that I made: >> >> According to the codes in test_vector_subregs_fore_back: >> >> ~~~ >> poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); >> unsigned int min_nunits = constant_lower_bound (nunits); >> scalar_mode int_mode = GET_MODE_INNER (inner_mode); >> unsigned int count = gcd (min_nunits, 4); >> >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> rtx x = builder.build (); >> ~~~~ >> >> Analyze the codes and tried several patterns: >> >> For poly_uint16 (2,2): >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> (const_int 1 [0x1]) >> repeat [ >> (const_int 0 [0]) >> (const_int -1 [0xffffffffffffffff]) >> ] >> ]) >> >> For poly_uint16 (4,4): >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> (const_int 1 [0x1]) >> (const_int 2 [0x2]) >> (const_int 3 [0x3]) >> repeat [ >> (const_int 0 [0]) >> (const_int -1 [0xffffffffffffffff]) >> (const_int -2 [0xfffffffffffffffe]) >> (const_int -3 [0xfffffffffffffffd]) >> ] >> ]) >> >> So, I think I can conclude the pattern rule as follows: >> >> poly_uint16 (n,n): >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> (const_int 1 [0x1]) >> (const_int 2 [0x2]) >> (const_int 3 [0x3]) >> .................. >> (const_int n [hex(n)]) >> repeat [ >> (const_int 0 [0]) >> (const_int -1 [0xffffffffffffffff]) >> (const_int -2 [0xfffffffffffffffe]) >> (const_int -3 [0xfffffffffffffffd]) >> ......... >> (const_int -n [hex(-n)]) >> ] >> ]) >> Am I understanding right? >> >> Let's first take a look at the codes you write: >> >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> rtx x = builder.build (); >> >> So for poly_uint (1,1): >> Ideally according to the codes, you want to generate the pattern like this: >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> repeat [ >> (const_int 0 [0]) >> ] >> ]) > > Ah, right, the two subsequences are supposed to be different. > >> In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0. >> In this case, i = -i = 0. >> >> So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();". >> Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector: >> >> x = (const_vector:VNx1DI repeat [ >> (const_int 0 [0]) >> ]) >> >> This is a const_vector duplicate vector. >> This is not the correct pattern you want so it fails in the following check. >> >> I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong. >> >> To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value >> and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1): >> >> ================ >> Original codes: >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> ================ >> >> ================ >> 1st solution: >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> repeat [ >> (const_int 1 [0x1]) >> ] >> ]) >> ================ >> >> ================ >> 2nd solution: >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode)); > > I don't think we need to single out count == 1. It should be OK to use > -1 - (int) i unconditionally. > > Thanks, > Richard > >> x= >> (const_vector:VNx1DI [ >> (const_int 0 [0]) >> repeat [ >> (const_int -1 [0xffffffffffffffff]) >> ] >> ]) >> ================ >> >> Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much. >> >> >> juzhe.zhong@rivai.ai >> >> From: Richard Sandiford >> Date: 2022-08-19 20:52 >> To: juzhe.zhong\@rivai.ai >> CC: gcc-patches; rguenther; kito.cheng >> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >>>> Ah, right, sorry for the bogus suggestion. >>>> In that case, what specifically goes wrong? Which test in >>>> test_vector_subregs_modes fails, and what does the ICE look like? >>> >>> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes. >>> The fail ICE: >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) >>> expected: (nil) >>> >>> actual: (const_int 0 [0]) >>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 >>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*) >>> ../../../riscv-gcc/gcc/selftest-rtl.cc:57 >>> 0x1332504 test_vector_subregs_modes >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 >>> 0x1332988 test_vector_subregs_fore_back >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 >>> 0x1332ae7 test_vector_subregs >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 >>> 0x1332c57 test_vector_ops >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 >>> 0x1332c7b selftest::simplify_rtx_cc_tests() >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 >>> 0x21318fa selftest::run_tests() >>> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 >>> 0x1362a76 toplev::run_self_tests() >>> ../../../riscv-gcc/gcc/toplev.cc:2205 >>> >>> I analyzed the codes: >>> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. >>> So the assertion fails. >> >> Hmm, ok, so the subreg operation is unexpected succeeding. >> >>> This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element. >> >> The stepped case is 3 elements per pattern rather than 2. In a stepped >> pattern: a, b, b+n are represented explicitly, then the rest are >> implicitly b+n*2, b+n*3, etc. >> >> The case being handled by this code is instead the 2-element case: >> a, b are represented explicitly, then the rest are implicitly all b. >> >> Why is (1,1) different though? The test is constructing: >> >> nunits: 1 + 1x >> shape: nelts_per_pattern == 2, npatterns == 1 >> elements: a, b[, b, b, b, b, ...] >> >> It then tests subregs starting at 0 + 1x (so starting at the first b). >> But for (2,2) we should have: >> >> nunits: 2 + 2x >> shape: nelts_per_pattern == 2, npatterns == 2 >> elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] >> >> and we should test subregs starting at 0 + 2x (so starting at the >> first b1). The two cases should be very similar, it's just that the >> (2,2) case doubles the number of patterns. >> >>> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this >>> kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization). >>> I only enable auto-vectorization with mode larger than (2,2) and test it fully. >>> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will >>> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~ >> >> Following on from what I said above, it doesn't look like this particular >> case is related to stepped vectors. >> >> (1,1) shouldn't (need to) be a special case though. Any potentital >> problems that would occur for (1,1) with npatterns==1 would also occur >> for (n,n) with npatterns==n. E.g. if stepped vectors are problematic >> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2) >> would be problematic for (2,2). >> >> So yeah, preventing a mode being used for autovectorisation would allow >> the target to have a bit more control over which constants are actually >> generated. But it shouldn't be necessary to do that for correctness. >> >> Thanks, >> Richard >> >>> juzhe.zhong@rivai.ai >>> >>> From: Richard Sandiford >>> Date: 2022-08-19 17:35 >>> To: juzhe.zhong\@rivai.ai >>> CC: rguenther; gcc-patches; kito.cheng >>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes: >>>> Hi, Richard. I tried the codes: >>>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> >>>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. >>> >>> Ah, right, sorry for the bogus suggestion. >>> >>> In that case, what specifically goes wrong? Which test in >>> test_vector_subregs_modes fails, and what does the ICE look like? >>> >>> Thanks, >>> Richard >>> >>>> But I tried: >>>> if (!nunits.is_constant () && known_gt (nunits, 1)) >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation. >>>> >>>> Finally, I tried: >>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> It passed with no warning. >>>> >>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? >>>> Thanks! >>>> >>>> >>>> juzhe.zhong@rivai.ai >>>> >>>> From: Richard Sandiford >>>> Date: 2022-08-19 16:03 >>>> To: juzhe.zhong >>>> CC: gcc-patches; rguenther; kito.cheng >>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >>>> juzhe.zhong@rivai.ai writes: >>>>> From: zhongjuzhe <juzhe.zhong@rivai.ai> >>>>> >>>>> Hello. This patch is preparing for following RVV support. >>>>> >>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. >>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks. >>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension. >>>>> >>>>> So I define the machine_mode as follows: >>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks); >>>>> The riscv_vector_chunks = poly_uint16 (1, 1) >>>>> >>>>> The compilation is failed for the stepped vector test: >>>>> (const_vector:VNx1DI repeat [ >>>>> (const_int 8 [0x8]) >>>>> (const_int 7 [0x7]) >>>>> ]) >>>>> >>>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common >>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1). >>>>> >>>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization >>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define >>>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks! >>>>> >>>>> >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1). >>>>> >>>>> --- >>>>> gcc/simplify-rtx.cc | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >>>>> index 7d09bf7103d..61e0dfa00d0 100644 >>>>> --- a/gcc/simplify-rtx.cc >>>>> +++ b/gcc/simplify-rtx.cc >>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >>>>> rtx x = builder.build (); >>>>> >>>>> test_vector_subregs_modes (x); >>>>> - if (!nunits.is_constant ()) >>>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) >>>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> >>>> I think instead we should use maybe_gt (nunits, 1), on the basis that >>>> the fore_back tests require vectors that have a minimum of 2 elements. >>>> Something like poly_uint16 (1, 2) would have the same problem as >>>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in >>>> principle.) >>>> >>>> This corresponds to the minimum of 3 elements for the stepped tests: >>>> >>>> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT >>>> && maybe_gt (GET_MODE_NUNITS (mode), 2)) >>>> { >>>> test_vector_ops_series (mode, scalar_reg); >>>> test_vector_subregs (mode); >>>> } >>>> >>>> Thanks, >>>> Richard >>>> >>> >> >
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 7d09bf7103d..61e0dfa00d0 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) rtx x = builder.build (); test_vector_subregs_modes (x); - if (!nunits.is_constant ()) + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) test_vector_subregs_modes (x, nunits - min_nunits, count); }