[1/2] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Testing passed
|
Commit Message
When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
of A, then it can be simplified to LSHIFTRT.
i.e Simplify
(and:v8hi
(ashifrt:v8hi A 8)
(const_vector 0xff x8))
to
(lshifrt:v8hi A 8)
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok of trunk?
gcc/ChangeLog:
PR target/114428
* simplify-rtx.cc
(simplify_context::simplify_binary_operation_1):
Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
specific mask.
---
gcc/simplify-rtx.cc | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
Comments
CC for review.
On Tue, May 21, 2024 at 1:12 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
> of A, then it can be simplified to LSHIFTRT.
>
> i.e Simplify
> (and:v8hi
> (ashifrt:v8hi A 8)
> (const_vector 0xff x8))
> to
> (lshifrt:v8hi A 8)
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok of trunk?
>
> gcc/ChangeLog:
>
> PR target/114428
> * simplify-rtx.cc
> (simplify_context::simplify_binary_operation_1):
> Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
> specific mask.
> ---
> gcc/simplify-rtx.cc | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 53f54d1d392..6c91409200e 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -4021,6 +4021,31 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
> return tem;
> }
>
> + /* (and:v4si
> + (ashiftrt:v4si A 16)
> + (const_vector: 0xffff x4))
> + is just (lshiftrt:v4si A 16). */
> + if (VECTOR_MODE_P (mode) && GET_CODE (op0) == ASHIFTRT
> + && (CONST_INT_P (XEXP (op0, 1))
> + || (GET_CODE (XEXP (op0, 1)) == CONST_VECTOR
> + && CONST_VECTOR_DUPLICATE_P (XEXP (op0, 1))))
> + && GET_CODE (op1) == CONST_VECTOR
> + && CONST_VECTOR_DUPLICATE_P (op1))
> + {
> + unsigned HOST_WIDE_INT shift_count
> + = (CONST_INT_P (XEXP (op0, 1))
> + ? UINTVAL (XEXP (op0, 1))
> + : UINTVAL (XVECEXP (XEXP (op0, 1), 0, 0)));
> + unsigned HOST_WIDE_INT inner_prec
> + = GET_MODE_PRECISION (GET_MODE_INNER (mode));
> +
> + /* Avoid UD shift count. */
> + if (shift_count < inner_prec
> + && (UINTVAL (XVECEXP (op1, 0, 0))
> + == (HOST_WIDE_INT_1U << (inner_prec - shift_count)) - 1))
> + return simplify_gen_binary (LSHIFTRT, mode, XEXP (op0, 0), XEXP (op0, 1));
> + }
> +
> tem = simplify_byte_swapping_operation (code, mode, op0, op1);
> if (tem)
> return tem;
> --
> 2.31.1
>
On 5/23/24 8:25 PM, Hongtao Liu wrote:
> CC for review.
>
> On Tue, May 21, 2024 at 1:12 PM liuhongt <hongtao.liu@intel.com> wrote:
>>
>> When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
>> of A, then it can be simplified to LSHIFTRT.
>>
>> i.e Simplify
>> (and:v8hi
>> (ashifrt:v8hi A 8)
>> (const_vector 0xff x8))
>> to
>> (lshifrt:v8hi A 8)
>>
>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>> Ok of trunk?
>>
>> gcc/ChangeLog:
>>
>> PR target/114428
>> * simplify-rtx.cc
>> (simplify_context::simplify_binary_operation_1):
>> Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
>> specific mask.
Can you add a testcase for this? I don't mind if it's x86 specific and
does a bit of asm scanning.
Also note that the context for this patch has changed, so it won't
automatically apply. So be extra careful when updating so that it goes
into the right place (all the more reason to have a testcase validating
that the optimization works correctly).
I think the patch itself is fine. So further review is just for the
testcase and should be easy.
jeff
ps. It seems to help RISC-V as well :-)
@@ -4021,6 +4021,31 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
return tem;
}
+ /* (and:v4si
+ (ashiftrt:v4si A 16)
+ (const_vector: 0xffff x4))
+ is just (lshiftrt:v4si A 16). */
+ if (VECTOR_MODE_P (mode) && GET_CODE (op0) == ASHIFTRT
+ && (CONST_INT_P (XEXP (op0, 1))
+ || (GET_CODE (XEXP (op0, 1)) == CONST_VECTOR
+ && CONST_VECTOR_DUPLICATE_P (XEXP (op0, 1))))
+ && GET_CODE (op1) == CONST_VECTOR
+ && CONST_VECTOR_DUPLICATE_P (op1))
+ {
+ unsigned HOST_WIDE_INT shift_count
+ = (CONST_INT_P (XEXP (op0, 1))
+ ? UINTVAL (XEXP (op0, 1))
+ : UINTVAL (XVECEXP (XEXP (op0, 1), 0, 0)));
+ unsigned HOST_WIDE_INT inner_prec
+ = GET_MODE_PRECISION (GET_MODE_INNER (mode));
+
+ /* Avoid UD shift count. */
+ if (shift_count < inner_prec
+ && (UINTVAL (XVECEXP (op1, 0, 0))
+ == (HOST_WIDE_INT_1U << (inner_prec - shift_count)) - 1))
+ return simplify_gen_binary (LSHIFTRT, mode, XEXP (op0, 0), XEXP (op0, 1));
+ }
+
tem = simplify_byte_swapping_operation (code, mode, op0, op1);
if (tem)
return tem;