[v2] RISC-V: Optimize branches with shifted immediate operands
Checks
Context |
Check |
Description |
rivoscibot/toolchain-ci-rivos-lint |
success
|
Lint passed
|
rivoscibot/toolchain-ci-rivos-apply-patch |
success
|
Patch applied
|
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gc-lp64d-non-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc-lp64d-non-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-test |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Test passed
|
Commit Message
Based on the feedback I received, the patch now uses a custom code iterator
instead of relying on match_operator.
Since both operands (2 and 3) have trailing zeros, an additional check
was introduced to verify if they remain SMALL_OPERANDs after shifting by
the smaller number of trailing zeros.
To avoid complex inline check, I introduced two new macros, ensuring that the
check is encapsulated and reusable, as suggested in the original feedback.
It was pointed out that this check should be added because expressions like
(x & 0x81000) == 0x8100 or (x & 0x8100) == 0x81000 might lead to
unrecognized instructions during splitting. However, both expressions
immediately evaluate to false (https://godbolt.org/z/Y11EGMb4f) due to
the bit arrangement in constants 0x8100 and 0x81000.
As a result, this pattern won't be applied in such cases, so it cannot
cause any issues. Therefore, it wasn't necessary to include this as a
negative test in the testsuite.
Despite my efforts, I couldn't find another example for a negative test.
All similar cases evaluated to false immediately due to the bit arrangement.
I mentioned this in a follow-up comment on the previous patch version.
Regardless, the necessary check has been added, so if a negative case
exists, an unrecognized instruction will not be created.
2024-09-30 Jovan Vukic <Jovan.Vukic@rt-rk.com>
PR target/115921
gcc/ChangeLog:
* config/riscv/iterators.md (any_eq): New code iterator.
* config/riscv/riscv.h (COMMON_TRAILING_ZEROS): New macro.
(SMALL_AFTER_COMMON_TRAILING_SHIFT): Ditto.
* config/riscv/riscv.md (*branch<ANYI:mode>_shiftedarith_<optab>_shifted):
New pattern.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/branch-1.c: Additional tests.
CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straymail@rt-rk.com immediately.
---
gcc/config/riscv/iterators.md | 3 +++
gcc/config/riscv/riscv.h | 12 +++++++++
gcc/config/riscv/riscv.md | 32 +++++++++++++++++++++++
gcc/testsuite/gcc.target/riscv/branch-1.c | 16 +++++++++---
4 files changed, 60 insertions(+), 3 deletions(-)
@@ -233,6 +233,7 @@
(define_code_iterator any_ge [ge geu])
(define_code_iterator any_lt [lt ltu])
(define_code_iterator any_le [le leu])
+(define_code_iterator any_eq [eq ne])
; atomics code iterator
(define_code_iterator any_atomic [plus ior xor and])
@@ -283,6 +284,8 @@
(le "le")
(gt "gt")
(lt "lt")
+ (eq "eq")
+ (ne "ne")
(ior "ior")
(xor "xor")
(and "and")
@@ -667,6 +667,18 @@ enum reg_class
/* True if bit BIT is set in VALUE. */
#define BITSET_P(VALUE, BIT) (((VALUE) & (1ULL << (BIT))) != 0)
+/* Returns the smaller (common) number of trailing zeros for VAL1 and VAL2. */
+#define COMMON_TRAILING_ZEROS(VAL1, VAL2) \
+ (ctz_hwi (VAL1) < ctz_hwi (VAL2) \
+ ? ctz_hwi (VAL1) \
+ : ctz_hwi (VAL2))
+
+/* Returns true if both VAL1 and VAL2 are SMALL_OPERANDs after shifting by
+ the common number of trailing zeros. */
+#define SMALL_AFTER_COMMON_TRAILING_SHIFT(VAL1, VAL2) \
+ (SMALL_OPERAND ((VAL1) >> COMMON_TRAILING_ZEROS (VAL1, VAL2)) \
+ && SMALL_OPERAND ((VAL2) >> COMMON_TRAILING_ZEROS (VAL1, VAL2)))
+
/* Stack layout; function entry, exit and calling. */
#define STACK_GROWS_DOWNWARD 1
@@ -3129,6 +3129,38 @@
}
[(set_attr "type" "branch")])
+(define_insn_and_split "*branch<ANYI:mode>_shiftedarith_<optab>_shifted"
+ [(set (pc)
+ (if_then_else (any_eq
+ (and:ANYI (match_operand:ANYI 1 "register_operand" "r")
+ (match_operand 2 "shifted_const_arith_operand" "i"))
+ (match_operand 3 "shifted_const_arith_operand" "i"))
+ (label_ref (match_operand 0 "" ""))
+ (pc)))
+ (clobber (match_scratch:X 4 "=&r"))
+ (clobber (match_scratch:X 5 "=&r"))]
+ "!SMALL_OPERAND (INTVAL (operands[2]))
+ && !SMALL_OPERAND (INTVAL (operands[3]))
+ && SMALL_AFTER_COMMON_TRAILING_SHIFT (INTVAL (operands[2]),
+ INTVAL (operands[3]))"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 4) (lshiftrt:X (match_dup 1) (match_dup 7)))
+ (set (match_dup 4) (and:X (match_dup 4) (match_dup 8)))
+ (set (match_dup 5) (match_dup 9))
+ (set (pc) (if_then_else (any_eq (match_dup 4) (match_dup 5))
+ (label_ref (match_dup 0)) (pc)))]
+{
+ HOST_WIDE_INT mask1 = INTVAL (operands[2]);
+ HOST_WIDE_INT mask2 = INTVAL (operands[3]);
+ int trailing_shift = COMMON_TRAILING_ZEROS (mask1, mask2);
+
+ operands[7] = GEN_INT (trailing_shift);
+ operands[8] = GEN_INT (mask1 >> trailing_shift);
+ operands[9] = GEN_INT (mask2 >> trailing_shift);
+}
+[(set_attr "type" "branch")])
+
(define_insn_and_split "*branch<ANYI:mode>_shiftedmask_equals_zero"
[(set (pc)
(if_then_else (match_operator 1 "equality_operator"
@@ -28,10 +28,20 @@ void f4(long long a)
g();
}
+void f5(long long a) {
+ if ((a & 0x2120000) == 0x2000000)
+ g();
+}
+
+void f6(long long a) {
+ if ((a & 0x70000000) == 0x30000000)
+ g();
+}
+
/* { dg-final { scan-assembler-times "slli\t" 2 } } */
-/* { dg-final { scan-assembler-times "srli\t" 3 } } */
-/* { dg-final { scan-assembler-times "andi\t" 1 } } */
-/* { dg-final { scan-assembler-times "\tli\t" 1 } } */
+/* { dg-final { scan-assembler-times "srli\t" 5 } } */
+/* { dg-final { scan-assembler-times "andi\t" 3 } } */
+/* { dg-final { scan-assembler-times "\tli\t" 3 } } */
/* { dg-final { scan-assembler-not "addi\t" } } */
/* { dg-final { scan-assembler-not "and\t" } } */