[ver,3] rs6000, Add new overloaded vector shift builtin int128, variants
Checks
Commit Message
GCC developers:
Version 3, updated the testcase dg-do link to dg-do compile. Moved the
new documentation again. Retested on Power 10 LE and BE to verify the
dg arguments disable the test on Power10BE but enable the test for
Power10LE. Reran the full regression testsuite. There were no new
regressions for the testsuite.
Version 2, updated rs6000-overload.def to remove adding additonal
internal names and to change XXSLDWI_Q to XXSLDWI_1TI per comments from
Kewen. Move new documentation statement for the PIVPR built-ins per
comments from Kewen. Updated dg-do-run directive and added comment
about the save-temps in testcase per feedback from Segher. Retested
the patch on Power 10 with no regressions.
The following patch adds the int128 varients to the existing overloaded
built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb,
vec_srl, vec_sro. These varients were requested by Steve Munroe.
The patch has been tested on a Power 10 system with no regressions.
Please let me know if the patch is acceptable for mainline.
Carl
------------------------------------------------------------------------------
rs6000, Add new overloaded vector shift builtin int128 variants
Add the signed __int128 and unsigned __int128 argument types for the
overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
vec_srdb, vec_srl, vec_sro. For each of the new argument types add a
testcase and update the documentation for the built-in.
gcc/ChangeLog:
* config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change
define_insn iterator to VEC_IC.
* config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
__builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
__builtin_altivec_vsrdb_v1ti): New builtin definitions.
* config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
definitions.
* doc/extend.texi (vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
built-ins.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test file.
---
gcc/config/rs6000/altivec.md | 6 +-
gcc/config/rs6000/rs6000-builtins.def | 12 +
gcc/config/rs6000/rs6000-overload.def | 40 ++
gcc/doc/extend.texi | 43 ++
.../vec-shift-double-runnable-int128.c | 419 ++++++++++++++++++
5 files changed, 517 insertions(+), 3 deletions(-)
create mode 100644
gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
+
+ return 0;
+}
+
+/* { dg-final { scan-assembler-times {\mvsrdbi\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsldbi\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsl\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsr\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslo\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mvsro\M} 4 } } */
Comments
Hi Carl,
on 2024/8/2 03:35, Carl Love wrote:
> GCC developers:
>
> Version 3, updated the testcase dg-do link to dg-do compile. Moved the new documentation again. Retested on Power 10 LE and BE to verify the dg arguments disable the test on Power10BE but enable the test for Power10LE. Reran the full regression testsuite. There were no new regressions for the testsuite.
>
> Version 2, updated rs6000-overload.def to remove adding additonal internal names and to change XXSLDWI_Q to XXSLDWI_1TI per comments from Kewen. Move new documentation statement for the PIVPR built-ins per comments from Kewen. Updated dg-do-run directive and added comment about the save-temps in testcase per feedback from Segher. Retested the patch on Power 10 with no regressions.
>
> The following patch adds the int128 varients to the existing overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro. These varients were requested by Steve Munroe.
>
> The patch has been tested on a Power 10 system with no regressions.
OK with the below nits tweaked and tested well on both BE and LE, thanks!
>
> Please let me know if the patch is acceptable for mainline.
>
> Carl
>
> ------------------------------------------------------------------------------
> rs6000, Add new overloaded vector shift builtin int128 variants
>
> Add the signed __int128 and unsigned __int128 argument types for the
> overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
> vec_srdb, vec_srl, vec_sro. For each of the new argument types add a
> testcase and update the documentation for the built-in.
>
> gcc/ChangeLog:
> * config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change
> define_insn iterator to VEC_IC.
> * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
> __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
> __builtin_altivec_vsrdb_v1ti): New builtin definitions.
> * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
> vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
> definitions.
> * doc/extend.texi (vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
> vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
> built-ins.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test file.
> ---
> gcc/config/rs6000/altivec.md | 6 +-
> gcc/config/rs6000/rs6000-builtins.def | 12 +
> gcc/config/rs6000/rs6000-overload.def | 40 ++
> gcc/doc/extend.texi | 43 ++
> .../vec-shift-double-runnable-int128.c | 419 ++++++++++++++++++
> 5 files changed, 517 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 5af9bf920a2..2a18ee44526 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
> (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
>
> (define_insn "vs<SLDB_lr>db_<mode>"
> - [(set (match_operand:VI2 0 "register_operand" "=v")
> - (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
> - (match_operand:VI2 2 "register_operand" "v")
> + [(set (match_operand:VEC_IC 0 "register_operand" "=v")
> + (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
> + (match_operand:VEC_IC 2 "register_operand" "v")
> (match_operand:QI 3 "const_0_to_12_operand" "n")]
> VSHIFT_DBL_LR))]
> "TARGET_POWER10"
> diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def
> index 77eb0f7e406..a2b2b729270 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -964,6 +964,9 @@
> const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>);
> VSLDOI_8HI altivec_vsldoi_v8hi {}
>
> + const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
> + VSLDOI_V1TI altivec_vsldoi_v1ti {}
Nit: s/VSLDOI_V1TI/VSLDOI_1TI/ to align with the other VSLDOI_* (no 'V' in *).
> +
> const vss __builtin_altivec_vslh (vss, vus);
> VSLH vashlv8hi3 {}
>
> @@ -1831,6 +1834,9 @@
> const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>);
> XXSLDWI_2DI vsx_xxsldwi_v2di {}
>
> + const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>);
> + XXSLDWI_1TI vsx_xxsldwi_v1ti {}
> +
> const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>);
> XXSLDWI_4SF vsx_xxsldwi_v4sf {}
>
> @@ -3299,6 +3305,9 @@
> const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>);
> VSLDB_V8HI vsldb_v8hi {}
>
> + const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>);
> + VSLDB_V1TI vsldb_v1ti {}
> +
> const vsq __builtin_altivec_vslq (vsq, vuq);
> VSLQ vashlv1ti3 {}
>
> @@ -3317,6 +3326,9 @@
> const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>);
> VSRDB_V8HI vsrdb_v8hi {}
>
> + const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>);
> + VSRDB_V1TI vsrdb_v1ti {}
> +
> const vsq __builtin_altivec_vsrq (vsq, vuq);
> VSRQ vlshrv1ti3 {}
>
> diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def
> index b5fd9d0e38f..b1e038488e2 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -3399,6 +3399,10 @@
> VSLDOI_4SF
> vd __builtin_vec_sld (vd, vd, const int);
> VSLDOI_2DF
> + vsq __builtin_vec_sld (vsq, vsq, const int);
> + VSLDOI_V1TI VSLDOI_VSQ
> + vuq __builtin_vec_sld (vuq, vuq, const int);
> + VSLDOI_V1TI VSLDOI_VUQ
Nit: s/VSLDOI_V1TI/VSLDOI_1TI/ as the above change.
>
> [VEC_SLDB, vec_sldb, __builtin_vec_sldb]
> vsc __builtin_vec_sldb (vsc, vsc, const int);
> @@ -3417,6 +3421,10 @@
> VSLDB_V2DI VSLDB_VSLL
> vull __builtin_vec_sldb (vull, vull, const int);
> VSLDB_V2DI VSLDB_VULL
> + vsq __builtin_vec_sldb (vsq, vsq, const int);
> + VSLDB_V1TI VSLDB_VSQ
> + vuq __builtin_vec_sldb (vuq, vuq, const int);
> + VSLDB_V1TI VSLDB_VUQ
>
> [VEC_SLDW, vec_sldw, __builtin_vec_sldw]
> vsc __builtin_vec_sldw (vsc, vsc, const int);
> @@ -3439,6 +3447,10 @@
> XXSLDWI_4SF XXSLDWI_VF
> vd __builtin_vec_sldw (vd, vd, const int);
> XXSLDWI_2DF XXSLDWI_VD
> + vsq __builtin_vec_sldw (vsq, vsq, const int);
> + XXSLDWI_1TI XXSLDWI_VSQ
> + vuq __builtin_vec_sldw (vuq, vuq, const int);
> + XXSLDWI_1TI XXSLDWI_VUQ
>
> [VEC_SLL, vec_sll, __builtin_vec_sll]
> vsc __builtin_vec_sll (vsc, vuc);
> @@ -3459,6 +3471,10 @@
> VSL VSL_VSLL
> vull __builtin_vec_sll (vull, vuc);
> VSL VSL_VULL
> + vsq __builtin_vec_sll (vsq, vuc);
> + VSL VSL_VSQ
> + vuq __builtin_vec_sll (vuq, vuc);
> + VSL VSL_VUQ
> ; The following variants are deprecated.
> vsc __builtin_vec_sll (vsc, vus);
> VSL VSL_VSC_VUS
> @@ -3554,6 +3570,14 @@
> VSLO VSLO_VFS
> vf __builtin_vec_slo (vf, vuc);
> VSLO VSLO_VFU
> + vsq __builtin_vec_slo (vsq, vsc);
> + VSLO VSLDO_VSQS
> + vsq __builtin_vec_slo (vsq, vuc);
> + VSLO VSLDO_VSQU
> + vuq __builtin_vec_slo (vuq, vsc);
> + VSLO VSLDO_VUQS
> + vuq __builtin_vec_slo (vuq, vuc);
> + VSLO VSLDO_VUQU
>
> [VEC_SLV, vec_slv, __builtin_vec_vslv]
> vuc __builtin_vec_vslv (vuc, vuc);
> @@ -3699,6 +3723,10 @@
> VSRDB_V2DI VSRDB_VSLL
> vull __builtin_vec_srdb (vull, vull, const int);
> VSRDB_V2DI VSRDB_VULL
> + vsq __builtin_vec_srdb (vsq, vsq, const int);
> + VSRDB_V1TI VSRDB_VSQ
> + vuq __builtin_vec_srdb (vuq, vuq, const int);
> + VSRDB_V1TI VSRDB_VUQ
>
> [VEC_SRL, vec_srl, __builtin_vec_srl]
> vsc __builtin_vec_srl (vsc, vuc);
> @@ -3719,6 +3747,10 @@
> VSR VSR_VSLL
> vull __builtin_vec_srl (vull, vuc);
> VSR VSR_VULL
> + vsq __builtin_vec_srl (vsq, vuc);
> + VSR VSR_VSQ
> + vuq __builtin_vec_srl (vuq, vuc);
> + VSR VSR_VUQ
> ; The following variants are deprecated.
> vsc __builtin_vec_srl (vsc, vus);
> VSR VSR_VSC_VUS
> @@ -3808,6 +3840,14 @@
> VSRO VSRO_VFS
> vf __builtin_vec_sro (vf, vuc);
> VSRO VSRO_VFU
> + vsq __builtin_vec_sro (vsq, vsc);
> + VSRO VSRDO_VSQS
> + vsq __builtin_vec_sro (vsq, vuc);
> + VSRO VSRDO_VSQU
> + vuq __builtin_vec_sro (vuq, vsc);
> + VSRO VSRDO_VUQS
> + vuq __builtin_vec_sro (vuq, vuc);
> + VSRO VSRDO_VUQU
>
> [VEC_SRV, vec_srv, __builtin_vec_vsrv]
> vuc __builtin_vec_vsrv (vuc, vuc);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 00b93f954e3..4a9cf366f56 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -23519,6 +23519,10 @@ const unsigned int);
> vector signed long long, const unsigned int);
> @exdent vector unsigned long long vec_sldb (vector unsigned long long,
> vector unsigned long long, const unsigned int);
> +@exdent vector signed __int128 vec_sldb (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128,
> +vector unsigned __int128, const unsigned int);
> @end smallexample
>
> Shift the combined input vectors left by the amount specified by the low-order
> @@ -23546,6 +23550,10 @@ const unsigned int);
> vector signed long long, const unsigned int);
> @exdent vector unsigned long long vec_srdb (vector unsigned long long,
> vector unsigned long long, const unsigned int);
> +@exdent vector signed __int128 vec_srdb (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128,
> +vector unsigned __int128, const unsigned int);
> @end smallexample
>
> Shift the combined input vectors right by the amount specified by the low-order
> @@ -24041,6 +24049,41 @@ int vec_any_le (vector unsigned __int128, vector unsigned __int128);
> @end smallexample
>
Nit:
The following instances are extension of the existing overloaded built-ins
@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl}
that are documented in the PVIPR.
>
> +@smallexample
> +@exdent vector signed __int128 vec_sld (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_sld (vector unsigned __int128,
> +vector unsigned __int128, const unsigned int);
> +@exdent vector signed __int128 vec_sldw (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_sldw (vector unsigned __int,
> +vector unsigned __int128, const unsigned int);
> +@exdent vector signed __int128 vec_slo (vector signed __int128,
> +vector signed char);
> +@exdent vector signed __int128 vec_slo (vector signed __int128,
> +vector unsigned char);
> +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
> +vector signed char);
> +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
> +vector unsigned char);
> +@exdent vector signed __int128 vec_sro (vector signed __int128,
> +vector signed char);
> +@exdent vector signed __int128 vec_sro (vector signed __int128,
> +vector unsigned char);
> +@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
> +vector signed char);
> +@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
> +vector unsigned char);
> +@exdent vector signed __int128 vec_srl (vector signed __int128,
> +vector unsigned char);
> +@exdent vector unsigned __int128 vec_srl (vector unsigned __int128,
> +vector unsigned char);
> +@end smallexample
> +
> +The above instances are extension of the existing overloaded built-ins
> +@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl}
> +that are documented in the PVIPR.
... this praragraph is moved above to break two "smallexample"s.
BR,
Kewen
> +
> @node PowerPC Hardware Transactional Memory Built-in Functions
> @subsection PowerPC Hardware Transactional Memory Built-in Functions
> GCC provides two interfaces for accessing the Hardware Transactional
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
> new file mode 100644
> index 00000000000..0a329fa0e9c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
> @@ -0,0 +1,419 @@
> +/* { dg-do run { target power10_hw } } */
> +/* { dg-do compile { target { ! power10_hw } } } */
> +/* { dg-require-effective-target int128 } */
> +
> +/* Need -save-temps for dg-final scan-assembler-times at end of test. */
> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> +
> +#include <altivec.h>
> +
> +#define DEBUG 0
> +
> +#if DEBUG
> +#include <stdio.h>
> +
> +void print_i128 (unsigned __int128 val)
> +{
> + printf(" 0x%016llx%016llx",
> + (unsigned long long)(val >> 64),
> + (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF));
> +}
> +#endif
> +
> +extern void abort (void);
> +
> +#if DEBUG
> +#define ACTION_2ARG_UNSIGNED(NAME, TYPE_NAME) \
> + printf ("vec_%s (vector TYPE __int128, vector TYPE __int128) \n", #NAME); \
> + printf(" src_va_s128[0] = "); \
> + print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]); \
> + printf("\n"); \
> + printf(" src_vb_uc = 0x"); \
> + for (i = 0; i < 16; i++) \
> + printf("%02x", src_vb_uc[i]); \
> + printf("\n"); \
> + printf(" vresult[0] = "); \
> + print_i128 ((unsigned __int128) vresult[0]); \
> + printf("\n"); \
> + printf(" expected_vresult[0] = "); \
> + print_i128 ((unsigned __int128) expected_vresult[0]); \
> + printf("\n");
> +
> +#define ACTION_2ARG_SIGNED(NAME, TYPE_NAME) \
> + printf ("vec_%s (vector TYPE __int128, vector TYPE __int128) \n", #NAME); \
> + printf(" src_va_s128[0] = "); \
> + print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]); \
> + printf("\n"); \
> + printf(" src_vb_sc = 0x"); \
> + for (i = 0; i < 16; i++) \
> + printf("%02x", src_vb_sc[i]); \
> + printf("\n"); \
> + printf(" vresult[0] = "); \
> + print_i128 ((unsigned __int128) vresult[0]); \
> + printf("\n"); \
> + printf(" expected_vresult[0] = "); \
> + print_i128 ((unsigned __int128) expected_vresult[0]); \
> + printf("\n");
> +
> +#define ACTION_3ARG(NAME, TYPE_NAME, CONST) \
> + printf ("vec_%s (vector TYPE __int128, vector TYPE __int128, %s) \n", \
> + #NAME, #CONST); \
> + printf(" src_va_s128[0] = "); \
> + print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]); \
> + printf("\n"); \
> + printf(" src_vb_s128[0] = "); \
> + print_i128 ((unsigned __int128) src_vb_##TYPE_NAME[0]); \
> + printf("\n"); \
> + printf(" vresult[0] = "); \
> + print_i128 ((unsigned __int128) vresult[0]); \
> + printf("\n"); \
> + printf(" expected_vresult[0] = "); \
> + print_i128 ((unsigned __int128) expected_vresult[0]); \
> + printf("\n");
> +
> +#else
> +#define ACTION_2ARG_UNSIGNED(NAME, TYPE_NAME) \
> + abort();
> +
> +#define ACTION_2ARG_SIGNED(NAME, TYPE_NAME) \
> + abort();
> +
> +#define ACTION_2ARG(NAME, TYPE_NAME) \
> + abort();
> +
> +#define ACTION_3ARG(NAME, TYPE_NAME, CONST) \
> + abort();
> +#endif
> +
> +/* Second argument of the builtin is vector unsigned char. */
> +#define TEST_2ARG_UNSIGNED(NAME, TYPE, TYPE_NAME, EXP_RESULT_HI, EXP_RESULT_LO) \
> + { \
> + vector TYPE __int128 vresult; \
> + vector TYPE __int128 expected_vresult; \
> + int i; \
> + \
> + expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI }; \
> + expected_vresult = (expected_vresult << 64) | \
> + (vector TYPE __int128) { EXP_RESULT_LO }; \
> + vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_uc); \
> + \
> + if (!vec_all_eq (vresult, expected_vresult)) { \
> + ACTION_2ARG_UNSIGNED(NAME, TYPE_NAME) \
> + } \
> + }
> +
> +/* Second argument of the builtin is vector signed char. */
> +#define TEST_2ARG_SIGNED(NAME, TYPE, TYPE_NAME, EXP_RESULT_HI, EXP_RESULT_LO) \
> + { \
> + vector TYPE __int128 vresult; \
> + vector TYPE __int128 expected_vresult; \
> + int i; \
> + \
> + expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI }; \
> + expected_vresult = (expected_vresult << 64) | \
> + (vector TYPE __int128) { EXP_RESULT_LO }; \
> + vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_sc); \
> + \
> + if (!vec_all_eq (vresult, expected_vresult)) { \
> + ACTION_2ARG_SIGNED(NAME, TYPE_NAME) \
> + } \
> + }
> +
> +#define TEST_3ARG(NAME, TYPE, TYPE_NAME, CONST, EXP_RESULT_HI, EXP_RESULT_LO) \
> + { \
> + vector TYPE __int128 vresult; \
> + vector TYPE __int128 expected_vresult; \
> + \
> + expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI }; \
> + expected_vresult = (expected_vresult << 64) | \
> + (vector TYPE __int128) { EXP_RESULT_LO }; \
> + vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_##TYPE_NAME, CONST); \
> + \
> + if (!vec_all_eq (vresult, expected_vresult)) { \
> + ACTION_3ARG(NAME, TYPE_NAME, CONST) \
> + } \
> + }
> +
> +int
> +main (int argc, char *argv [])
> +{
> + vector signed __int128 vresult_s128;
> + vector signed __int128 expected_vresult_s128;
> + vector signed __int128 src_va_s128;
> + vector signed __int128 src_vb_s128;
> + vector unsigned __int128 vresult_u128;
> + vector unsigned __int128 expected_vresult_u128;
> + vector unsigned __int128 src_va_u128;
> + vector unsigned __int128 src_vb_u128;
> + vector signed char src_vb_sc;
> + vector unsigned char src_vb_uc;
> +
> + /* 128-bit vector shift right tests, vec_srdb. */
> + src_va_s128 = (vector signed __int128) {0x12345678};
> + src_vb_s128 = (vector signed __int128) {0xFEDCBA90};
> + TEST_3ARG(srdb, signed, s128, 4, 0x8000000000000000, 0xFEDCBA9)
> +
> + src_va_u128 = (vector unsigned __int128) { 0xFEDCBA98 };
> + src_vb_u128 = (vector unsigned __int128) { 0x76543210};
> + TEST_3ARG(srdb, unsigned, u128, 4, 0x8000000000000000, 0x07654321)
> +
> + /* 128-bit vector shift left tests, vec_sldb. */
> + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
> + src_va_s128 = (src_va_s128 << 64)
> + | (vector signed __int128) {0x123456789ABCDEF0};
> + src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210};
> + src_vb_s128 = (src_vb_s128 << 64)
> + | (vector signed __int128) {0xFEDCBA9876543210};
> + TEST_3ARG(sldb, signed, s128, 4, 0x23456789ABCDEF01, 0x23456789ABCDEF0F)
> +
> + src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
> + src_va_u128 = src_va_u128 << 64
> + | (vector unsigned __int128) {0xFEDCBA9876543210};
> + src_vb_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
> + src_vb_u128 = src_vb_u128 << 64
> + | (vector unsigned __int128) {0x123456789ABCDEF0};
> + TEST_3ARG(sldb, unsigned, u128, 4, 0xEDCBA9876543210F, 0xEDCBA98765432101)
> +
> + /* Shift left by octect tests, vec_sld. Shift is by immediate value
> + times 8. */
> + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
> + src_va_s128 = (src_va_s128 << 64)
> + | (vector signed __int128) {0x123456789ABCDEF0};
> + src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210};
> + src_vb_s128 = (src_vb_s128 << 64)
> + | (vector signed __int128) {0xFEDCBA9876543210};
> + TEST_3ARG(sld, signed, s128, 4, 0x9abcdef012345678, 0x9abcdef0fedcba98)
> +
> + src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
> + src_va_u128 = src_va_u128 << 64
> + | (vector unsigned __int128) {0xFEDCBA9876543210};
> + src_vb_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
> + src_vb_u128 = src_vb_u128 << 64
> + | (vector unsigned __int128) {0x123456789ABCDEF0};
> + TEST_3ARG(sld, unsigned, u128, 4, 0x76543210fedcba98, 0x7654321012345678)
> +
> + /* Vector left shift bytes within the vector, vec_sll. */
> + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
> + src_va_s128 = (src_va_s128 << 64)
> + | (vector signed __int128) {0x123456789ABCDEF0};
> + src_vb_uc = (vector unsigned char) {0x01, 0x01, 0x01, 0x01,
> + 0x01, 0x01, 0x01, 0x01,
> + 0x01, 0x01, 0x01, 0x01,
> + 0x01, 0x01, 0x01, 0x01};
> + TEST_2ARG_UNSIGNED(sll, signed, s128, 0x2468acf13579bde0,
> + 0x2468acf13579bde0)
> +
> + src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
> + src_va_u128 = src_va_u128 << 64
> + | (vector unsigned __int128) {0x123456789ABCDEF0};
> + src_vb_uc = (vector unsigned char) {0x02, 0x02, 0x02, 0x02,
> + 0x02, 0x02, 0x02, 0x02,
> + 0x02, 0x02, 0x02, 0x02,
> + 0x02, 0x02, 0x02, 0x02};
> + TEST_2ARG_UNSIGNED(sll, unsigned, u128, 0x48d159e26af37bc0,
> + 0x48d159e26af37bc0)
> +
> + /* Vector right shift bytes within the vector, vec_srl. */
> + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
> + src_va_s128 = (src_va_s128 << 64)
> + | (vector signed __int128) {0x123456789ABCDEF0};
> + src_vb_uc = (vector unsigned char) {0x01, 0x01, 0x01, 0x01,
> + 0x01, 0x01, 0x01, 0x01,
> + 0x01, 0x01, 0x01, 0x01,
> + 0x01, 0x01, 0x01, 0x01};
> + TEST_2ARG_UNSIGNED(srl, signed, s128, 0x091a2b3c4d5e6f78,
> + 0x091a2b3c4d5e6f78)
> +
> + src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
> + src_va_u128 = src_va_u128 << 64
> + | (vector unsigned __int128) {0x123456789ABCDEF0};
> + src_vb_uc = (vector unsigned char) {0x02, 0x02, 0x02, 0x02,
> + 0x02, 0x02, 0x02, 0x02,
> + 0x02, 0x02, 0x02, 0x02,
> + 0x02, 0x02, 0x02, 0x02};
> + TEST_2ARG_UNSIGNED(srl, unsigned, u128, 0x48d159e26af37bc,
> + 0x48d159e26af37bc)
> +
> + /* Shift left by octect tests, vec_slo. Shift is by immediate value
> + bytes. Shift amount in bits 121:124. */
> + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
> + src_va_s128 = (src_va_s128 << 64)
> + | (vector signed __int128) {0x123456789ABCDEF0};
> + /* Note vb_sc is Endian specific. */
> + /* The left shift amount is 1 byte, i.e. 1 * 8 bits. */
> +#if __LITTLE_ENDIAN__
> + src_vb_sc = (vector signed char) {0x1 << 3, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0};
> +#else
> + src_vb_sc = (vector signed char) {0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x1 << 3};
> +#endif
> +
> + TEST_2ARG_SIGNED(slo, signed, s128, 0x3456789ABCDEF012,
> + 0x3456789ABCDEF000)
> + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
> + src_va_s128 = (src_va_s128 << 64)
> + | (vector signed __int128) {0x123456789ABCDEF0};
> +
> + /* Note vb_sc is Endian specific. */
> + /* The left shift amount is 2 bytes, i.e. 2 * 8 bits. */
> +#if __LITTLE_ENDIAN__
> + src_vb_uc = (vector unsigned char) {0x2 << 3, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0};
> +#else
> + src_vb_uc = (vector unsigned char) {0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x2 << 3};
> +#endif
> + TEST_2ARG_UNSIGNED(slo, signed, s128, 0x56789ABCDEF01234,
> + 0x56789ABCDEF00000)
> +
> + src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
> + src_va_u128 = src_va_u128 << 64
> + | (vector unsigned __int128) {0xFEDCBA9876543210};
> + /* Note vb_sc is Endian specific. */
> + /* The left shift amount is 3 bytes, i.e. 3 * 8 bits. */
> +#if __LITTLE_ENDIAN__
> + src_vb_sc = (vector signed char) {0x03<<3, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x00, 0x00, 0x00, 0x0};
> +#else
> + src_vb_sc = (vector signed char) {0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x00, 0x00, 0x00, 0x03<<3};
> +#endif
> + TEST_2ARG_SIGNED(slo, unsigned, u128, 0x9876543210FEDCBA,
> + 0x9876543210000000)
> +
> + src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
> + src_va_u128 = src_va_u128 << 64
> + | (vector unsigned __int128) {0xFEDCBA9876543210};
> + /* Note vb_sc is Endian specific. */
> + /* The left shift amount is 4 bytes, i.e. 4 * 8 bits. */
> +#if __LITTLE_ENDIAN__
> + src_vb_uc = (vector unsigned char) {0x04<<3, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0};
> +#else
> + src_vb_uc = (vector unsigned char) {0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x00, 0x00, 0x00, 0x04<<3};
> +#endif
> + TEST_2ARG_UNSIGNED(slo, unsigned, u128, 0x76543210FEDCBA98,
> + 0x7654321000000000)
> +
> + /* Shift right by octect tests, vec_sro. Shift is by immediate value
> + times 8. Shift amount in bits 121:124. */
> + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
> + src_va_s128 = (src_va_s128 << 64)
> + | (vector signed __int128) {0x123456789ABCDEF0};
> + /* Note vb_sc is Endian specific. */
> + /* The left shift amount is 1 byte, i.e. 1 * 8 bits. */
> +#if __LITTLE_ENDIAN__
> + src_vb_sc = (vector signed char) {0x1 << 3, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0};
> +#else
> + src_vb_sc = (vector signed char) {0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x1 << 3};
> +#endif
> + TEST_2ARG_SIGNED(sro, signed, s128, 0x00123456789ABCDE, 0xF0123456789ABCDE)
> +
> + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
> + src_va_s128 = (src_va_s128 << 64)
> + | (vector signed __int128) {0x123456789ABCDEF0};
> + /* Note vb_sc is Endian specific. */
> + /* The left shift amount is 1 byte, i.e. 1 * 8 bits. */
> +#if __LITTLE_ENDIAN__
> + src_vb_uc = (vector unsigned char) {0x2 << 3, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0};
> +#else
> + src_vb_uc = (vector unsigned char) {0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x2 << 3};
> +#endif
> + TEST_2ARG_UNSIGNED(sro, signed, s128, 0x0000123456789ABC,
> + 0xDEF0123456789ABC)
> +
> + src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
> + src_va_u128 = src_va_u128 << 64
> + | (vector unsigned __int128) {0xFEDCBA9876543210};
> + /* Note vb_sc is Endian specific. */
> + /* The left shift amount is 4 bytes, i.e. 4 * 8 bits. */
> +#if __LITTLE_ENDIAN__
> + src_vb_sc = (vector signed char) {0x03<<3, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x00, 0x00, 0x00, 0x0};
> +#else
> + src_vb_sc = (vector signed char) {0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x00, 0x00, 0x00, 0x03<<3};
> +#endif
> + TEST_2ARG_SIGNED(sro, unsigned, u128, 0x000000FEDCBA9876,
> + 0x543210FEDCBA9876)
> +
> + src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
> + src_va_u128 = src_va_u128 << 64
> + | (vector unsigned __int128) {0xFEDCBA9876543210};
> + /* Note vb_sc is Endian specific. */
> + /* The left shift amount is 4 bytes, i.e. 4 * 8 bits. */
> +#if __LITTLE_ENDIAN__
> + src_vb_uc = (vector unsigned char) {0x04<<3, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x00, 0x0};
> +#else
> + src_vb_uc = (vector unsigned char) {0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0,
> + 0x00, 0x0, 0x0, 0x04<<3};
> +#endif
> + TEST_2ARG_UNSIGNED(sro, unsigned, u128, 0x00000000FEDCBA98,
> + 0x76543210FEDCBA98)
> +
> + /* 128-bit vector shift left tests, vec_sldw. */
> + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
> + src_va_s128 = (src_va_s128 << 64)
> + | (vector signed __int128) {0x123456789ABCDEF0};
> + src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210};
> + src_vb_s128 = (src_vb_s128 << 64)
> + | (vector signed __int128) {0xFEDCBA9876543210};
> + TEST_3ARG(sldw, signed, s128, 1, 0x9ABCDEF012345678, 0x9ABCDEF0FEDCBA98)
> +
> + src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
> + src_va_u128 = (src_va_u128 << 64)
> + | (vector unsigned __int128) {0x123456789ABCDEF0};
> + src_vb_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
> + src_vb_u128 = (src_vb_u128 << 64)
> + | (vector unsigned __int128) {0xFEDCBA9876543210};
> + TEST_3ARG(sldw, unsigned, u128, 2, 0x123456789ABCDEF0, 0xFEDCBA9876543210)
> +
> +
> + return 0;
> +}
> +
> +/* { dg-final { scan-assembler-times {\mvsrdbi\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvsldbi\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvsl\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvsr\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mvslo\M} 4 } } */
> +/* { dg-final { scan-assembler-times {\mvsro\M} 4 } } */
Kewen:
On 8/4/24 11:13 PM, Kewen.Lin wrote:
> Hi Carl,
>
> on 2024/8/2 03:35, Carl Love wrote:
>> GCC developers:
>>
>> Version 3, updated the testcase dg-do link to dg-do compile. Moved the new documentation again. Retested on Power 10 LE and BE to verify the dg arguments disable the test on Power10BE but enable the test for Power10LE. Reran the full regression testsuite. There were no new regressions for the testsuite.
>>
>> Version 2, updated rs6000-overload.def to remove adding additonal internal names and to change XXSLDWI_Q to XXSLDWI_1TI per comments from Kewen. Move new documentation statement for the PIVPR built-ins per comments from Kewen. Updated dg-do-run directive and added comment about the save-temps in testcase per feedback from Segher. Retested the patch on Power 10 with no regressions.
>>
>> The following patch adds the int128 varients to the existing overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro. These varients were requested by Steve Munroe.
>>
>> The patch has been tested on a Power 10 system with no regressions.
> OK with the below nits tweaked and tested well on both BE and LE, thanks!
Fixed the various nits, see specifics below.
Rebased on current mainline, rebuilt and retested on LE and BE with no
regression failures.
Will commit in a day or so assuming I don't receive any additional comments.
>
>> Please let me know if the patch is acceptable for mainline.
>>
>> Carl
>>
>> ------------------------------------------------------------------------------
>> rs6000, Add new overloaded vector shift builtin int128 variants
>>
>> Add the signed __int128 and unsigned __int128 argument types for the
>> overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
>> vec_srdb, vec_srl, vec_sro. For each of the new argument types add a
>> testcase and update the documentation for the built-in.
>>
>> gcc/ChangeLog:
>> * config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change
>> define_insn iterator to VEC_IC.
>> * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
>> __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
>> __builtin_altivec_vsrdb_v1ti): New builtin definitions.
>> * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
>> vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
>> definitions.
>> * doc/extend.texi (vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
>> vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
>> built-ins.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test file.
>> ---
>> gcc/config/rs6000/altivec.md | 6 +-
>> gcc/config/rs6000/rs6000-builtins.def | 12 +
>> gcc/config/rs6000/rs6000-overload.def | 40 ++
>> gcc/doc/extend.texi | 43 ++
>> .../vec-shift-double-runnable-int128.c | 419 ++++++++++++++++++
>> 5 files changed, 517 insertions(+), 3 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
>>
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index 5af9bf920a2..2a18ee44526 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
>> (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
>>
>> (define_insn "vs<SLDB_lr>db_<mode>"
>> - [(set (match_operand:VI2 0 "register_operand" "=v")
>> - (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
>> - (match_operand:VI2 2 "register_operand" "v")
>> + [(set (match_operand:VEC_IC 0 "register_operand" "=v")
>> + (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
>> + (match_operand:VEC_IC 2 "register_operand" "v")
>> (match_operand:QI 3 "const_0_to_12_operand" "n")]
>> VSHIFT_DBL_LR))]
>> "TARGET_POWER10"
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def
>> index 77eb0f7e406..a2b2b729270 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -964,6 +964,9 @@
>> const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>);
>> VSLDOI_8HI altivec_vsldoi_v8hi {}
>>
>> + const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
>> + VSLDOI_V1TI altivec_vsldoi_v1ti {}
> Nit: s/VSLDOI_V1TI/VSLDOI_1TI/ to align with the other VSLDOI_* (no 'V' in *).
Changed to:
const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
VSLDOI_1TI altivec_vsldoi_v1ti {}
>
>> +
>> const vss __builtin_altivec_vslh (vss, vus);
>> VSLH vashlv8hi3 {}
>>
>> @@ -1831,6 +1834,9 @@
>> const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>);
>> XXSLDWI_2DI vsx_xxsldwi_v2di {}
>>
>> + const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>);
>> + XXSLDWI_1TI vsx_xxsldwi_v1ti {}
>> +
>> const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>);
>> XXSLDWI_4SF vsx_xxsldwi_v4sf {}
>>
>> @@ -3299,6 +3305,9 @@
>> const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>);
>> VSLDB_V8HI vsldb_v8hi {}
>>
>> + const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>);
>> + VSLDB_V1TI vsldb_v1ti {}
>> +
>> const vsq __builtin_altivec_vslq (vsq, vuq);
>> VSLQ vashlv1ti3 {}
>>
>> @@ -3317,6 +3326,9 @@
>> const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>);
>> VSRDB_V8HI vsrdb_v8hi {}
>>
>> + const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>);
>> + VSRDB_V1TI vsrdb_v1ti {}
>> +
>> const vsq __builtin_altivec_vsrq (vsq, vuq);
>> VSRQ vlshrv1ti3 {}
>>
>> diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def
>> index b5fd9d0e38f..b1e038488e2 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -3399,6 +3399,10 @@
>> VSLDOI_4SF
>> vd __builtin_vec_sld (vd, vd, const int);
>> VSLDOI_2DF
>> + vsq __builtin_vec_sld (vsq, vsq, const int);
>> + VSLDOI_V1TI VSLDOI_VSQ
>> + vuq __builtin_vec_sld (vuq, vuq, const int);
>> + VSLDOI_V1TI VSLDOI_VUQ
> Nit: s/VSLDOI_V1TI/VSLDOI_1TI/ as the above change.
Changed the above two entries to:
vsq __builtin_vec_sld (vsq, vsq, const int);
VSLDOI_1TI VSLDOI_VSQ
vuq __builtin_vec_sld (vuq, vuq, const int);
VSLDOI_1TI VSLDOI_VUQ
>
>> [VEC_SLDB, vec_sldb, __builtin_vec_sldb]
>> vsc __builtin_vec_sldb (vsc, vsc, const int);
>> @@ -3417,6 +3421,10 @@
>> VSLDB_V2DI VSLDB_VSLL
>> vull __builtin_vec_sldb (vull, vull, const int);
>> VSLDB_V2DI VSLDB_VULL
>> + vsq __builtin_vec_sldb (vsq, vsq, const int);
>> + VSLDB_V1TI VSLDB_VSQ
>> + vuq __builtin_vec_sldb (vuq, vuq, const int);
>> + VSLDB_V1TI VSLDB_VUQ
>>
>> [VEC_SLDW, vec_sldw, __builtin_vec_sldw]
>> vsc __builtin_vec_sldw (vsc, vsc, const int);
>> @@ -3439,6 +3447,10 @@
>> XXSLDWI_4SF XXSLDWI_VF
>> vd __builtin_vec_sldw (vd, vd, const int);
>> XXSLDWI_2DF XXSLDWI_VD
>> + vsq __builtin_vec_sldw (vsq, vsq, const int);
>> + XXSLDWI_1TI XXSLDWI_VSQ
>> + vuq __builtin_vec_sldw (vuq, vuq, const int);
>> + XXSLDWI_1TI XXSLDWI_VUQ
>>
>> [VEC_SLL, vec_sll, __builtin_vec_sll]
>> vsc __builtin_vec_sll (vsc, vuc);
>> @@ -3459,6 +3471,10 @@
>> VSL VSL_VSLL
>> vull __builtin_vec_sll (vull, vuc);
>> VSL VSL_VULL
>> + vsq __builtin_vec_sll (vsq, vuc);
>> + VSL VSL_VSQ
>> + vuq __builtin_vec_sll (vuq, vuc);
>> + VSL VSL_VUQ
>> ; The following variants are deprecated.
>> vsc __builtin_vec_sll (vsc, vus);
>> VSL VSL_VSC_VUS
>> @@ -3554,6 +3570,14 @@
>> VSLO VSLO_VFS
>> vf __builtin_vec_slo (vf, vuc);
>> VSLO VSLO_VFU
>> + vsq __builtin_vec_slo (vsq, vsc);
>> + VSLO VSLDO_VSQS
>> + vsq __builtin_vec_slo (vsq, vuc);
>> + VSLO VSLDO_VSQU
>> + vuq __builtin_vec_slo (vuq, vsc);
>> + VSLO VSLDO_VUQS
>> + vuq __builtin_vec_slo (vuq, vuc);
>> + VSLO VSLDO_VUQU
>>
>> [VEC_SLV, vec_slv, __builtin_vec_vslv]
>> vuc __builtin_vec_vslv (vuc, vuc);
>> @@ -3699,6 +3723,10 @@
>> VSRDB_V2DI VSRDB_VSLL
>> vull __builtin_vec_srdb (vull, vull, const int);
>> VSRDB_V2DI VSRDB_VULL
>> + vsq __builtin_vec_srdb (vsq, vsq, const int);
>> + VSRDB_V1TI VSRDB_VSQ
>> + vuq __builtin_vec_srdb (vuq, vuq, const int);
>> + VSRDB_V1TI VSRDB_VUQ
>>
>> [VEC_SRL, vec_srl, __builtin_vec_srl]
>> vsc __builtin_vec_srl (vsc, vuc);
>> @@ -3719,6 +3747,10 @@
>> VSR VSR_VSLL
>> vull __builtin_vec_srl (vull, vuc);
>> VSR VSR_VULL
>> + vsq __builtin_vec_srl (vsq, vuc);
>> + VSR VSR_VSQ
>> + vuq __builtin_vec_srl (vuq, vuc);
>> + VSR VSR_VUQ
>> ; The following variants are deprecated.
>> vsc __builtin_vec_srl (vsc, vus);
>> VSR VSR_VSC_VUS
>> @@ -3808,6 +3840,14 @@
>> VSRO VSRO_VFS
>> vf __builtin_vec_sro (vf, vuc);
>> VSRO VSRO_VFU
>> + vsq __builtin_vec_sro (vsq, vsc);
>> + VSRO VSRDO_VSQS
>> + vsq __builtin_vec_sro (vsq, vuc);
>> + VSRO VSRDO_VSQU
>> + vuq __builtin_vec_sro (vuq, vsc);
>> + VSRO VSRDO_VUQS
>> + vuq __builtin_vec_sro (vuq, vuc);
>> + VSRO VSRDO_VUQU
>>
>> [VEC_SRV, vec_srv, __builtin_vec_vsrv]
>> vuc __builtin_vec_vsrv (vuc, vuc);
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 00b93f954e3..4a9cf366f56 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -23519,6 +23519,10 @@ const unsigned int);
>> vector signed long long, const unsigned int);
>> @exdent vector unsigned long long vec_sldb (vector unsigned long long,
>> vector unsigned long long, const unsigned int);
>> +@exdent vector signed __int128 vec_sldb (vector signed __int128,
>> +vector signed __int128, const unsigned int);
>> +@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128,
>> +vector unsigned __int128, const unsigned int);
>> @end smallexample
>>
>> Shift the combined input vectors left by the amount specified by the low-order
>> @@ -23546,6 +23550,10 @@ const unsigned int);
>> vector signed long long, const unsigned int);
>> @exdent vector unsigned long long vec_srdb (vector unsigned long long,
>> vector unsigned long long, const unsigned int);
>> +@exdent vector signed __int128 vec_srdb (vector signed __int128,
>> +vector signed __int128, const unsigned int);
>> +@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128,
>> +vector unsigned __int128, const unsigned int);
>> @end smallexample
>>
>> Shift the combined input vectors right by the amount specified by the low-order
>> @@ -24041,6 +24049,41 @@ int vec_any_le (vector unsigned __int128, vector unsigned __int128);
>> @end smallexample
>>
> Nit:
>
> The following instances are extension of the existing overloaded built-ins
> @code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl}
> that are documented in the PVIPR.
Changed to:
...
int vec_any_le (vector signed __int128, vector signed __int128);
int vec_any_le (vector unsigned __int128, vector unsigned __int128);
@end smallexample
The following instances are extension of the existing overloaded built-ins
@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro},
@code{vec_srl}
that are documented in the PVIPR.
@smallexample
@exdent vector signed __int128 vec_sld (vector signed __int128,
vector signed __int128, const unsigned int);
@exdent vector unsigned __int128 vec_sld (vector unsigned __int128,
vector unsigned __int128, const unsigned int);
@exdent vector signed __int128 vec_sldw (vector signed __int128,
vector signed __int128, const unsigned int);
...
>
>> +@smallexample
>> +@exdent vector signed __int128 vec_sld (vector signed __int128,
>> +vector signed __int128, const unsigned int);
>> +@exdent vector unsigned __int128 vec_sld (vector unsigned __int128,
>> +vector unsigned __int128, const unsigned int);
>> +@exdent vector signed __int128 vec_sldw (vector signed __int128,
>> +vector signed __int128, const unsigned int);
>> +@exdent vector unsigned __int128 vec_sldw (vector unsigned __int,
>> +vector unsigned __int128, const unsigned int);
>> +@exdent vector signed __int128 vec_slo (vector signed __int128,
>> +vector signed char);
>> +@exdent vector signed __int128 vec_slo (vector signed __int128,
>> +vector unsigned char);
>> +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
>> +vector signed char);
>> +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
>> +vector unsigned char);
>> +@exdent vector signed __int128 vec_sro (vector signed __int128,
>> +vector signed char);
>> +@exdent vector signed __int128 vec_sro (vector signed __int128,
>> +vector unsigned char);
>> +@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
>> +vector signed char);
>> +@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
>> +vector unsigned char);
>> +@exdent vector signed __int128 vec_srl (vector signed __int128,
>> +vector unsigned char);
>> +@exdent vector unsigned __int128 vec_srl (vector unsigned __int128,
>> +vector unsigned char);
>> +@end smallexample
>> +
>> +The above instances are extension of the existing overloaded built-ins
>> +@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl}
>> +that are documented in the PVIPR.
> ... this praragraph is moved above to break two "smallexample"s.
Moved, see above.
Carl
@@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
(define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
(define_insn "vs<SLDB_lr>db_<mode>"
- [(set (match_operand:VI2 0 "register_operand" "=v")
- (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
- (match_operand:VI2 2 "register_operand" "v")
+ [(set (match_operand:VEC_IC 0 "register_operand" "=v")
+ (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
+ (match_operand:VEC_IC 2 "register_operand" "v")
(match_operand:QI 3 "const_0_to_12_operand" "n")]
VSHIFT_DBL_LR))]
"TARGET_POWER10"
b/gcc/config/rs6000/rs6000-builtins.def
@@ -964,6 +964,9 @@
const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>);
VSLDOI_8HI altivec_vsldoi_v8hi {}
+ const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
+ VSLDOI_V1TI altivec_vsldoi_v1ti {}
+
const vss __builtin_altivec_vslh (vss, vus);
VSLH vashlv8hi3 {}
@@ -1831,6 +1834,9 @@
const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>);
XXSLDWI_2DI vsx_xxsldwi_v2di {}
+ const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>);
+ XXSLDWI_1TI vsx_xxsldwi_v1ti {}
+
const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>);
XXSLDWI_4SF vsx_xxsldwi_v4sf {}
@@ -3299,6 +3305,9 @@
const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>);
VSLDB_V8HI vsldb_v8hi {}
+ const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>);
+ VSLDB_V1TI vsldb_v1ti {}
+
const vsq __builtin_altivec_vslq (vsq, vuq);
VSLQ vashlv1ti3 {}
@@ -3317,6 +3326,9 @@
const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>);
VSRDB_V8HI vsrdb_v8hi {}
+ const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>);
+ VSRDB_V1TI vsrdb_v1ti {}
+
const vsq __builtin_altivec_vsrq (vsq, vuq);
VSRQ vlshrv1ti3 {}
b/gcc/config/rs6000/rs6000-overload.def
@@ -3399,6 +3399,10 @@
VSLDOI_4SF
vd __builtin_vec_sld (vd, vd, const int);
VSLDOI_2DF
+ vsq __builtin_vec_sld (vsq, vsq, const int);
+ VSLDOI_V1TI VSLDOI_VSQ
+ vuq __builtin_vec_sld (vuq, vuq, const int);
+ VSLDOI_V1TI VSLDOI_VUQ
[VEC_SLDB, vec_sldb, __builtin_vec_sldb]
vsc __builtin_vec_sldb (vsc, vsc, const int);
@@ -3417,6 +3421,10 @@
VSLDB_V2DI VSLDB_VSLL
vull __builtin_vec_sldb (vull, vull, const int);
VSLDB_V2DI VSLDB_VULL
+ vsq __builtin_vec_sldb (vsq, vsq, const int);
+ VSLDB_V1TI VSLDB_VSQ
+ vuq __builtin_vec_sldb (vuq, vuq, const int);
+ VSLDB_V1TI VSLDB_VUQ
[VEC_SLDW, vec_sldw, __builtin_vec_sldw]
vsc __builtin_vec_sldw (vsc, vsc, const int);
@@ -3439,6 +3447,10 @@
XXSLDWI_4SF XXSLDWI_VF
vd __builtin_vec_sldw (vd, vd, const int);
XXSLDWI_2DF XXSLDWI_VD
+ vsq __builtin_vec_sldw (vsq, vsq, const int);
+ XXSLDWI_1TI XXSLDWI_VSQ
+ vuq __builtin_vec_sldw (vuq, vuq, const int);
+ XXSLDWI_1TI XXSLDWI_VUQ
[VEC_SLL, vec_sll, __builtin_vec_sll]
vsc __builtin_vec_sll (vsc, vuc);
@@ -3459,6 +3471,10 @@
VSL VSL_VSLL
vull __builtin_vec_sll (vull, vuc);
VSL VSL_VULL
+ vsq __builtin_vec_sll (vsq, vuc);
+ VSL VSL_VSQ
+ vuq __builtin_vec_sll (vuq, vuc);
+ VSL VSL_VUQ
; The following variants are deprecated.
vsc __builtin_vec_sll (vsc, vus);
VSL VSL_VSC_VUS
@@ -3554,6 +3570,14 @@
VSLO VSLO_VFS
vf __builtin_vec_slo (vf, vuc);
VSLO VSLO_VFU
+ vsq __builtin_vec_slo (vsq, vsc);
+ VSLO VSLDO_VSQS
+ vsq __builtin_vec_slo (vsq, vuc);
+ VSLO VSLDO_VSQU
+ vuq __builtin_vec_slo (vuq, vsc);
+ VSLO VSLDO_VUQS
+ vuq __builtin_vec_slo (vuq, vuc);
+ VSLO VSLDO_VUQU
[VEC_SLV, vec_slv, __builtin_vec_vslv]
vuc __builtin_vec_vslv (vuc, vuc);
@@ -3699,6 +3723,10 @@
VSRDB_V2DI VSRDB_VSLL
vull __builtin_vec_srdb (vull, vull, const int);
VSRDB_V2DI VSRDB_VULL
+ vsq __builtin_vec_srdb (vsq, vsq, const int);
+ VSRDB_V1TI VSRDB_VSQ
+ vuq __builtin_vec_srdb (vuq, vuq, const int);
+ VSRDB_V1TI VSRDB_VUQ
[VEC_SRL, vec_srl, __builtin_vec_srl]
vsc __builtin_vec_srl (vsc, vuc);
@@ -3719,6 +3747,10 @@
VSR VSR_VSLL
vull __builtin_vec_srl (vull, vuc);
VSR VSR_VULL
+ vsq __builtin_vec_srl (vsq, vuc);
+ VSR VSR_VSQ
+ vuq __builtin_vec_srl (vuq, vuc);
+ VSR VSR_VUQ
; The following variants are deprecated.
vsc __builtin_vec_srl (vsc, vus);
VSR VSR_VSC_VUS
@@ -3808,6 +3840,14 @@
VSRO VSRO_VFS
vf __builtin_vec_sro (vf, vuc);
VSRO VSRO_VFU
+ vsq __builtin_vec_sro (vsq, vsc);
+ VSRO VSRDO_VSQS
+ vsq __builtin_vec_sro (vsq, vuc);
+ VSRO VSRDO_VSQU
+ vuq __builtin_vec_sro (vuq, vsc);
+ VSRO VSRDO_VUQS
+ vuq __builtin_vec_sro (vuq, vuc);
+ VSRO VSRDO_VUQU
[VEC_SRV, vec_srv, __builtin_vec_vsrv]
vuc __builtin_vec_vsrv (vuc, vuc);
@@ -23519,6 +23519,10 @@ const unsigned int);
vector signed long long, const unsigned int);
@exdent vector unsigned long long vec_sldb (vector unsigned long long,
vector unsigned long long, const unsigned int);
+@exdent vector signed __int128 vec_sldb (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
@end smallexample
Shift the combined input vectors left by the amount specified by the
low-order
@@ -23546,6 +23550,10 @@ const unsigned int);
vector signed long long, const unsigned int);
@exdent vector unsigned long long vec_srdb (vector unsigned long long,
vector unsigned long long, const unsigned int);
+@exdent vector signed __int128 vec_srdb (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
@end smallexample
Shift the combined input vectors right by the amount specified by the
low-order
@@ -24041,6 +24049,41 @@ int vec_any_le (vector unsigned __int128,
vector unsigned __int128);
@end smallexample
+@smallexample
+@exdent vector signed __int128 vec_sld (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sld (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
+@exdent vector signed __int128 vec_sldw (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sldw (vector unsigned __int,
+vector unsigned __int128, const unsigned int);
+@exdent vector signed __int128 vec_slo (vector signed __int128,
+vector signed char);
+@exdent vector signed __int128 vec_slo (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
+vector signed char);
+@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
+vector unsigned char);
+@exdent vector signed __int128 vec_sro (vector signed __int128,
+vector signed char);
+@exdent vector signed __int128 vec_sro (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
+vector signed char);
+@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
+vector unsigned char);
+@exdent vector signed __int128 vec_srl (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_srl (vector unsigned __int128,
+vector unsigned char);
+@end smallexample
+
+The above instances are extension of the existing overloaded built-ins
+@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro},
@code{vec_srl}
+that are documented in the PVIPR.
+
@node PowerPC Hardware Transactional Memory Built-in Functions
@subsection PowerPC Hardware Transactional Memory Built-in Functions
GCC provides two interfaces for accessing the Hardware Transactional
a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
new file mode 100644
@@ -0,0 +1,419 @@
+/* { dg-do run { target power10_hw } } */
+/* { dg-do compile { target { ! power10_hw } } } */
+/* { dg-require-effective-target int128 } */
+
+/* Need -save-temps for dg-final scan-assembler-times at end of test. */
+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+
+void print_i128 (unsigned __int128 val)
+{
+ printf(" 0x%016llx%016llx",
+ (unsigned long long)(val >> 64),
+ (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF));
+}
+#endif
+
+extern void abort (void);
+
+#if DEBUG
+#define ACTION_2ARG_UNSIGNED(NAME, TYPE_NAME) \
+ printf ("vec_%s (vector TYPE __int128, vector TYPE __int128) \n",
#NAME); \
+ printf(" src_va_s128[0] = "); \
+ print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]); \
+ printf("\n"); \
+ printf(" src_vb_uc = 0x"); \
+ for (i = 0; i < 16; i++) \
+ printf("%02x", src_vb_uc[i]); \
+ printf("\n"); \
+ printf(" vresult[0] = "); \
+ print_i128 ((unsigned __int128) vresult[0]); \
+ printf("\n"); \
+ printf(" expected_vresult[0] = "); \
+ print_i128 ((unsigned __int128) expected_vresult[0]); \
+ printf("\n");
+
+#define ACTION_2ARG_SIGNED(NAME, TYPE_NAME) \
+ printf ("vec_%s (vector TYPE __int128, vector TYPE __int128) \n",
#NAME); \
+ printf(" src_va_s128[0] = "); \
+ print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]); \
+ printf("\n"); \
+ printf(" src_vb_sc = 0x"); \
+ for (i = 0; i < 16; i++) \
+ printf("%02x", src_vb_sc[i]); \
+ printf("\n"); \
+ printf(" vresult[0] = "); \
+ print_i128 ((unsigned __int128) vresult[0]); \
+ printf("\n"); \
+ printf(" expected_vresult[0] = "); \
+ print_i128 ((unsigned __int128) expected_vresult[0]); \
+ printf("\n");
+
+#define ACTION_3ARG(NAME, TYPE_NAME, CONST) \
+ printf ("vec_%s (vector TYPE __int128, vector TYPE __int128, %s)
\n", \
+ #NAME, #CONST); \
+ printf(" src_va_s128[0] = "); \
+ print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]); \
+ printf("\n"); \
+ printf(" src_vb_s128[0] = "); \
+ print_i128 ((unsigned __int128) src_vb_##TYPE_NAME[0]); \
+ printf("\n"); \
+ printf(" vresult[0] = "); \
+ print_i128 ((unsigned __int128) vresult[0]); \
+ printf("\n"); \
+ printf(" expected_vresult[0] = "); \
+ print_i128 ((unsigned __int128) expected_vresult[0]); \
+ printf("\n");
+
+#else
+#define ACTION_2ARG_UNSIGNED(NAME, TYPE_NAME) \
+ abort();
+
+#define ACTION_2ARG_SIGNED(NAME, TYPE_NAME) \
+ abort();
+
+#define ACTION_2ARG(NAME, TYPE_NAME) \
+ abort();
+
+#define ACTION_3ARG(NAME, TYPE_NAME, CONST) \
+ abort();
+#endif
+
+/* Second argument of the builtin is vector unsigned char. */
+#define TEST_2ARG_UNSIGNED(NAME, TYPE, TYPE_NAME, EXP_RESULT_HI,
EXP_RESULT_LO) \
+ { \
+ vector TYPE __int128 vresult; \
+ vector TYPE __int128 expected_vresult; \
+ int i; \
+ \
+ expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI }; \
+ expected_vresult = (expected_vresult << 64) | \
+ (vector TYPE __int128) { EXP_RESULT_LO }; \
+ vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_uc); \
+ \
+ if (!vec_all_eq (vresult, expected_vresult)) { \
+ ACTION_2ARG_UNSIGNED(NAME, TYPE_NAME) \
+ } \
+ }
+
+/* Second argument of the builtin is vector signed char. */
+#define TEST_2ARG_SIGNED(NAME, TYPE, TYPE_NAME, EXP_RESULT_HI,
EXP_RESULT_LO) \
+ { \
+ vector TYPE __int128 vresult; \
+ vector TYPE __int128 expected_vresult; \
+ int i; \
+ \
+ expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI }; \
+ expected_vresult = (expected_vresult << 64) | \
+ (vector TYPE __int128) { EXP_RESULT_LO }; \
+ vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_sc); \
+ \
+ if (!vec_all_eq (vresult, expected_vresult)) { \
+ ACTION_2ARG_SIGNED(NAME, TYPE_NAME) \
+ } \
+ }
+
+#define TEST_3ARG(NAME, TYPE, TYPE_NAME, CONST, EXP_RESULT_HI,
EXP_RESULT_LO) \
+ { \
+ vector TYPE __int128 vresult; \
+ vector TYPE __int128 expected_vresult; \
+ \
+ expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI }; \
+ expected_vresult = (expected_vresult << 64) | \
+ (vector TYPE __int128) { EXP_RESULT_LO }; \
+ vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_##TYPE_NAME,
CONST); \
+ \
+ if (!vec_all_eq (vresult, expected_vresult)) { \
+ ACTION_3ARG(NAME, TYPE_NAME, CONST) \
+ } \
+ }
+
+int
+main (int argc, char *argv [])
+{
+ vector signed __int128 vresult_s128;
+ vector signed __int128 expected_vresult_s128;
+ vector signed __int128 src_va_s128;
+ vector signed __int128 src_vb_s128;
+ vector unsigned __int128 vresult_u128;
+ vector unsigned __int128 expected_vresult_u128;
+ vector unsigned __int128 src_va_u128;
+ vector unsigned __int128 src_vb_u128;
+ vector signed char src_vb_sc;
+ vector unsigned char src_vb_uc;
+
+ /* 128-bit vector shift right tests, vec_srdb. */
+ src_va_s128 = (vector signed __int128) {0x12345678};
+ src_vb_s128 = (vector signed __int128) {0xFEDCBA90};
+ TEST_3ARG(srdb, signed, s128, 4, 0x8000000000000000, 0xFEDCBA9)
+
+ src_va_u128 = (vector unsigned __int128) { 0xFEDCBA98 };
+ src_vb_u128 = (vector unsigned __int128) { 0x76543210};
+ TEST_3ARG(srdb, unsigned, u128, 4, 0x8000000000000000, 0x07654321)
+
+ /* 128-bit vector shift left tests, vec_sldb. */
+ src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+ src_va_s128 = (src_va_s128 << 64)
+ | (vector signed __int128) {0x123456789ABCDEF0};
+ src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210};
+ src_vb_s128 = (src_vb_s128 << 64)
+ | (vector signed __int128) {0xFEDCBA9876543210};
+ TEST_3ARG(sldb, signed, s128, 4, 0x23456789ABCDEF01, 0x23456789ABCDEF0F)
+
+ src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+ src_va_u128 = src_va_u128 << 64
+ | (vector unsigned __int128) {0xFEDCBA9876543210};
+ src_vb_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
+ src_vb_u128 = src_vb_u128 << 64
+ | (vector unsigned __int128) {0x123456789ABCDEF0};
+ TEST_3ARG(sldb, unsigned, u128, 4, 0xEDCBA9876543210F,
0xEDCBA98765432101)
+
+ /* Shift left by octect tests, vec_sld. Shift is by immediate value
+ times 8. */
+ src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+ src_va_s128 = (src_va_s128 << 64)
+ | (vector signed __int128) {0x123456789ABCDEF0};
+ src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210};
+ src_vb_s128 = (src_vb_s128 << 64)
+ | (vector signed __int128) {0xFEDCBA9876543210};
+ TEST_3ARG(sld, signed, s128, 4, 0x9abcdef012345678, 0x9abcdef0fedcba98)
+
+ src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+ src_va_u128 = src_va_u128 << 64
+ | (vector unsigned __int128) {0xFEDCBA9876543210};
+ src_vb_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
+ src_vb_u128 = src_vb_u128 << 64
+ | (vector unsigned __int128) {0x123456789ABCDEF0};
+ TEST_3ARG(sld, unsigned, u128, 4, 0x76543210fedcba98, 0x7654321012345678)
+
+ /* Vector left shift bytes within the vector, vec_sll. */
+ src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+ src_va_s128 = (src_va_s128 << 64)
+ | (vector signed __int128) {0x123456789ABCDEF0};
+ src_vb_uc = (vector unsigned char) {0x01, 0x01, 0x01, 0x01,
+ 0x01, 0x01, 0x01, 0x01,
+ 0x01, 0x01, 0x01, 0x01,
+ 0x01, 0x01, 0x01, 0x01};
+ TEST_2ARG_UNSIGNED(sll, signed, s128, 0x2468acf13579bde0,
+ 0x2468acf13579bde0)
+
+ src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
+ src_va_u128 = src_va_u128 << 64
+ | (vector unsigned __int128) {0x123456789ABCDEF0};
+ src_vb_uc = (vector unsigned char) {0x02, 0x02, 0x02, 0x02,
+ 0x02, 0x02, 0x02, 0x02,
+ 0x02, 0x02, 0x02, 0x02,
+ 0x02, 0x02, 0x02, 0x02};
+ TEST_2ARG_UNSIGNED(sll, unsigned, u128, 0x48d159e26af37bc0,
+ 0x48d159e26af37bc0)
+
+ /* Vector right shift bytes within the vector, vec_srl. */
+ src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+ src_va_s128 = (src_va_s128 << 64)
+ | (vector signed __int128) {0x123456789ABCDEF0};
+ src_vb_uc = (vector unsigned char) {0x01, 0x01, 0x01, 0x01,
+ 0x01, 0x01, 0x01, 0x01,
+ 0x01, 0x01, 0x01, 0x01,
+ 0x01, 0x01, 0x01, 0x01};
+ TEST_2ARG_UNSIGNED(srl, signed, s128, 0x091a2b3c4d5e6f78,
+ 0x091a2b3c4d5e6f78)
+
+ src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
+ src_va_u128 = src_va_u128 << 64
+ | (vector unsigned __int128) {0x123456789ABCDEF0};
+ src_vb_uc = (vector unsigned char) {0x02, 0x02, 0x02, 0x02,
+ 0x02, 0x02, 0x02, 0x02,
+ 0x02, 0x02, 0x02, 0x02,
+ 0x02, 0x02, 0x02, 0x02};
+ TEST_2ARG_UNSIGNED(srl, unsigned, u128, 0x48d159e26af37bc,
+ 0x48d159e26af37bc)
+
+ /* Shift left by octect tests, vec_slo. Shift is by immediate value
+ bytes. Shift amount in bits 121:124. */
+ src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+ src_va_s128 = (src_va_s128 << 64)
+ | (vector signed __int128) {0x123456789ABCDEF0};
+ /* Note vb_sc is Endian specific. */
+ /* The left shift amount is 1 byte, i.e. 1 * 8 bits. */
+#if __LITTLE_ENDIAN__
+ src_vb_sc = (vector signed char) {0x1 << 3, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0};
+#else
+ src_vb_sc = (vector signed char) {0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x1 << 3};
+#endif
+
+ TEST_2ARG_SIGNED(slo, signed, s128, 0x3456789ABCDEF012,
+ 0x3456789ABCDEF000)
+ src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+ src_va_s128 = (src_va_s128 << 64)
+ | (vector signed __int128) {0x123456789ABCDEF0};
+
+ /* Note vb_sc is Endian specific. */
+ /* The left shift amount is 2 bytes, i.e. 2 * 8 bits. */
+#if __LITTLE_ENDIAN__
+ src_vb_uc = (vector unsigned char) {0x2 << 3, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0};
+#else
+ src_vb_uc = (vector unsigned char) {0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x2 << 3};
+#endif
+ TEST_2ARG_UNSIGNED(slo, signed, s128, 0x56789ABCDEF01234,
+ 0x56789ABCDEF00000)
+
+ src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+ src_va_u128 = src_va_u128 << 64
+ | (vector unsigned __int128) {0xFEDCBA9876543210};
+ /* Note vb_sc is Endian specific. */
+ /* The left shift amount is 3 bytes, i.e. 3 * 8 bits. */
+#if __LITTLE_ENDIAN__
+ src_vb_sc = (vector signed char) {0x03<<3, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x00, 0x00, 0x00, 0x0};
+#else
+ src_vb_sc = (vector signed char) {0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x00, 0x00, 0x00, 0x03<<3};
+#endif
+ TEST_2ARG_SIGNED(slo, unsigned, u128, 0x9876543210FEDCBA,
+ 0x9876543210000000)
+
+ src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+ src_va_u128 = src_va_u128 << 64
+ | (vector unsigned __int128) {0xFEDCBA9876543210};
+ /* Note vb_sc is Endian specific. */
+ /* The left shift amount is 4 bytes, i.e. 4 * 8 bits. */
+#if __LITTLE_ENDIAN__
+ src_vb_uc = (vector unsigned char) {0x04<<3, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0};
+#else
+ src_vb_uc = (vector unsigned char) {0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x00, 0x00, 0x00, 0x04<<3};
+#endif
+ TEST_2ARG_UNSIGNED(slo, unsigned, u128, 0x76543210FEDCBA98,
+ 0x7654321000000000)
+
+ /* Shift right by octect tests, vec_sro. Shift is by immediate value
+ times 8. Shift amount in bits 121:124. */
+ src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+ src_va_s128 = (src_va_s128 << 64)
+ | (vector signed __int128) {0x123456789ABCDEF0};
+ /* Note vb_sc is Endian specific. */
+ /* The left shift amount is 1 byte, i.e. 1 * 8 bits. */
+#if __LITTLE_ENDIAN__
+ src_vb_sc = (vector signed char) {0x1 << 3, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0};
+#else
+ src_vb_sc = (vector signed char) {0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x1 << 3};
+#endif
+ TEST_2ARG_SIGNED(sro, signed, s128, 0x00123456789ABCDE,
0xF0123456789ABCDE)
+
+ src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+ src_va_s128 = (src_va_s128 << 64)
+ | (vector signed __int128) {0x123456789ABCDEF0};
+ /* Note vb_sc is Endian specific. */
+ /* The left shift amount is 1 byte, i.e. 1 * 8 bits. */
+#if __LITTLE_ENDIAN__
+ src_vb_uc = (vector unsigned char) {0x2 << 3, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0};
+#else
+ src_vb_uc = (vector unsigned char) {0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x2 << 3};
+#endif
+ TEST_2ARG_UNSIGNED(sro, signed, s128, 0x0000123456789ABC,
+ 0xDEF0123456789ABC)
+
+ src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+ src_va_u128 = src_va_u128 << 64
+ | (vector unsigned __int128) {0xFEDCBA9876543210};
+ /* Note vb_sc is Endian specific. */
+ /* The left shift amount is 4 bytes, i.e. 4 * 8 bits. */
+#if __LITTLE_ENDIAN__
+ src_vb_sc = (vector signed char) {0x03<<3, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x00, 0x00, 0x00, 0x0};
+#else
+ src_vb_sc = (vector signed char) {0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x00, 0x00, 0x00, 0x03<<3};
+#endif
+ TEST_2ARG_SIGNED(sro, unsigned, u128, 0x000000FEDCBA9876,
+ 0x543210FEDCBA9876)
+
+ src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+ src_va_u128 = src_va_u128 << 64
+ | (vector unsigned __int128) {0xFEDCBA9876543210};
+ /* Note vb_sc is Endian specific. */
+ /* The left shift amount is 4 bytes, i.e. 4 * 8 bits. */
+#if __LITTLE_ENDIAN__
+ src_vb_uc = (vector unsigned char) {0x04<<3, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x00, 0x0};
+#else
+ src_vb_uc = (vector unsigned char) {0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0,
+ 0x00, 0x0, 0x0, 0x04<<3};
+#endif
+ TEST_2ARG_UNSIGNED(sro, unsigned, u128, 0x00000000FEDCBA98,
+ 0x76543210FEDCBA98)
+
+ /* 128-bit vector shift left tests, vec_sldw. */
+ src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+ src_va_s128 = (src_va_s128 << 64)
+ | (vector signed __int128) {0x123456789ABCDEF0};
+ src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210};
+ src_vb_s128 = (src_vb_s128 << 64)
+ | (vector signed __int128) {0xFEDCBA9876543210};
+ TEST_3ARG(sldw, signed, s128, 1, 0x9ABCDEF012345678, 0x9ABCDEF0FEDCBA98)
+
+ src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
+ src_va_u128 = (src_va_u128 << 64)
+ | (vector unsigned __int128) {0x123456789ABCDEF0};
+ src_vb_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+ src_vb_u128 = (src_vb_u128 << 64)
+ | (vector unsigned __int128) {0xFEDCBA9876543210};
+ TEST_3ARG(sldw, unsigned, u128, 2, 0x123456789ABCDEF0,
0xFEDCBA9876543210)
+