From patchwork Fri Nov 5 04:07:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 47084 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 300B9385803A for ; Fri, 5 Nov 2021 04:07:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 300B9385803A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1636085262; bh=eTw1pmheFXdMN+JWGw9wL68lgwFrQs930wBC+unokyM=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=nacVgKEZdYW6vzd2zWmX7Ws5Qk2R4gYT3D5tpZmix7xIlTZleNE+BXHmgpvmSVi+f DscmpWTTNVpHDZnxQ+6I2DCmx8ubjuI84iTDn6c0bHfAvZRaL6GTcqn6vXmuKJrmlM yF+UKhq8LdCnwG2Ctw7Fn/vN0sHq1it1l8FfX/lQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 00E663858D35 for ; Fri, 5 Nov 2021 04:07:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 00E663858D35 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1A53prkn004502; Fri, 5 Nov 2021 04:07:10 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3c4w32r885-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 05 Nov 2021 04:07:09 +0000 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1A53tKXQ015000; Fri, 5 Nov 2021 04:07:09 GMT Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com with ESMTP id 3c4w32r87v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 05 Nov 2021 04:07:09 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1A544Uaa011038; Fri, 5 Nov 2021 04:07:08 GMT Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by ppma02dal.us.ibm.com with ESMTP id 3c4t4ak9n1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 05 Nov 2021 04:07:08 +0000 Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1A5477TU29491694 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 5 Nov 2021 04:07:07 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 58711AC059; Fri, 5 Nov 2021 04:07:07 +0000 (GMT) Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F13E9AC05E; Fri, 5 Nov 2021 04:07:06 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.65.76.254]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTPS; Fri, 5 Nov 2021 04:07:06 +0000 (GMT) Date: Fri, 5 Nov 2021 00:07:05 -0400 To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt , Peter Bergner , Will Schmidt Subject: [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ) Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt , Peter Bergner , Will Schmidt References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: EnYMZlpzsfYspge_0Ys4pngUtofJ-K7E X-Proofpoint-GUID: ZGjPLUX4IJrKXHpgZp_fn4T8jhjsxQmb X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-11-05_01,2021-11-03_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 clxscore=1015 suspectscore=0 mlxlogscore=999 lowpriorityscore=0 mlxscore=0 priorityscore=1501 adultscore=0 malwarescore=0 spamscore=0 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111050021 X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Add LXVKQ support. This patch adds support to generate the LXVKQ instruction to load specific IEEE-128 floating point constants. Compared to the last time I submitted this patch, I modified it so that it uses the bit pattern of the vector to see if it can generate the LXVKQ instruction. This means on a little endian Power system, the following code will generate a LXVKQ 34,16 instruction: vector long long foo (void) { #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ return (vector long long) { 0x0000000000000000, 0x8000000000000000 }; #else return (vector long long) { 0x8000000000000000, 0x0000000000000000 }; #endif } because that vector pattern is the same bit pattern as -0.0F128. 2021-11-05 Michael Meissner gcc/ * config/rs6000/constraints.md (eQ): New constraint. * config/rs6000/predicates.md (easy_fp_constant): Add support for generating the LXVKQ instruction. (easy_vector_constant_ieee128): New predicate. (easy_vector_constant): Add support for generating the LXVKQ instruction. * config/rs6000/rs6000-protos.h (constant_generates_lxvkq): New declaration. * config/rs6000/rs6000.c (output_vec_const_move): Add support for generating LXVKQ. (constant_generates_lxvkq): New function. * config/rs6000/rs6000.opt (-mieee128-constant): New debug option. * config/rs6000/vsx.md (vsx_mov_64bit): Add support for generating LXVKQ. (vsx_mov_32bit): Likewise. * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the eQ constraint. gcc/testsuite/ * gcc.target/powerpc/float128-constant.c: New test. --- gcc/config/rs6000/constraints.md | 6 + gcc/config/rs6000/predicates.md | 34 ++++ gcc/config/rs6000/rs6000-protos.h | 1 + gcc/config/rs6000/rs6000.c | 62 +++++++ gcc/config/rs6000/rs6000.opt | 4 + gcc/config/rs6000/vsx.md | 14 ++ gcc/doc/md.texi | 4 + .../gcc.target/powerpc/float128-constant.c | 160 ++++++++++++++++++ 8 files changed, 285 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-constant.c diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md index c8cff1a3038..e72132b4c28 100644 --- a/gcc/config/rs6000/constraints.md +++ b/gcc/config/rs6000/constraints.md @@ -213,6 +213,12 @@ (define_constraint "eI" "A signed 34-bit integer constant if prefixed instructions are supported." (match_operand 0 "cint34_operand")) +;; A TF/KF scalar constant or a vector constant that can load certain IEEE +;; 128-bit constants into vector registers using LXVKQ. +(define_constraint "eQ" + "An IEEE 128-bit constant that can be loaded into VSX registers." + (match_operand 0 "easy_vector_constant_ieee128")) + ;; Floating-point constraints. These two are defined so that insn ;; length attributes can be calculated exactly. diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 956e42bc514..e0d1c718e9f 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -601,6 +601,14 @@ (define_predicate "easy_fp_constant" if (TARGET_VSX && op == CONST0_RTX (mode)) return 1; + /* Constants that can be generated with ISA 3.1 instructions are easy. */ + vec_const_128bit_type vsx_const; + if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const)) + { + if (constant_generates_lxvkq (&vsx_const) != 0) + return true; + } + /* Otherwise consider floating point constants hard, so that the constant gets pushed to memory during the early RTL phases. This has the advantage that double precision constants that can be @@ -609,6 +617,23 @@ (define_predicate "easy_fp_constant" return 0; }) +;; Return 1 if the operand is a special IEEE 128-bit value that can be loaded +;; via the LXVKQ instruction. + +(define_predicate "easy_vector_constant_ieee128" + (match_code "const_vector,const_double") +{ + vec_const_128bit_type vsx_const; + + /* Can we generate the LXVKQ instruction? */ + if (!TARGET_IEEE128_CONSTANT || !TARGET_FLOAT128_HW || !TARGET_POWER10 + || !TARGET_VSX) + return false; + + return (vec_const_128bit_to_bytes (op, mode, &vsx_const) + && constant_generates_lxvkq (&vsx_const) != 0); +}) + ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction. @@ -653,6 +678,15 @@ (define_predicate "easy_vector_constant" if (zero_constant (op, mode) || all_ones_constant (op, mode)) return true; + /* Constants that can be generated with ISA 3.1 instructions are + easy. */ + vec_const_128bit_type vsx_const; + if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const)) + { + if (constant_generates_lxvkq (&vsx_const) != 0) + return true; + } + if (TARGET_P9_VECTOR && xxspltib_constant_p (op, mode, &num_insns, &value)) return true; diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 490d6e33736..494a95cc6ee 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -250,6 +250,7 @@ typedef struct { extern bool vec_const_128bit_to_bytes (rtx, machine_mode, vec_const_128bit_type *); +extern unsigned constant_generates_lxvkq (vec_const_128bit_type *); #endif /* RTX_CODE */ #ifdef TREE_CODE diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index f285022294a..06d02085b06 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -6991,6 +6991,17 @@ output_vec_const_move (rtx *operands) gcc_unreachable (); } + vec_const_128bit_type vsx_const; + if (TARGET_POWER10 && vec_const_128bit_to_bytes (vec, mode, &vsx_const)) + { + unsigned imm = constant_generates_lxvkq (&vsx_const); + if (imm) + { + operands[2] = GEN_INT (imm); + return "lxvkq %x0,%2"; + } + } + if (TARGET_P9_VECTOR && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value)) { @@ -28872,6 +28883,57 @@ vec_const_128bit_to_bytes (rtx op, return true; } +/* Determine if an IEEE 128-bit constant can be loaded with LXVKQ. Return zero + if the LXVKQ instruction cannot be used. Otherwise return the immediate + value to be used with the LXVKQ instruction. */ + +unsigned +constant_generates_lxvkq (vec_const_128bit_type *vsx_const) +{ + /* Is the instruction supported with power10 code generation, IEEE 128-bit + floating point hardware and VSX registers are available. */ + if (!TARGET_IEEE128_CONSTANT || !TARGET_FLOAT128_HW || !TARGET_POWER10 + || !TARGET_VSX) + return 0; + + /* Verify that all of the bottom 3 words in the constants loaded by the + LXVKQ instruction are zero. */ + if (vsx_const->words[1] != 0 + || vsx_const->words[2] != 0 + || vsx_const->words[3] != 0) + return 0; + + /* See if we have a match. */ + switch (vsx_const->words[0]) + { + case 0x3FFF0000U: return 1; /* IEEE 128-bit +1.0. */ + case 0x40000000U: return 2; /* IEEE 128-bit +2.0. */ + case 0x40008000U: return 3; /* IEEE 128-bit +3.0. */ + case 0x40010000U: return 4; /* IEEE 128-bit +4.0. */ + case 0x40014000U: return 5; /* IEEE 128-bit +5.0. */ + case 0x40018000U: return 6; /* IEEE 128-bit +6.0. */ + case 0x4001C000U: return 7; /* IEEE 128-bit +7.0. */ + case 0x7FFF0000U: return 8; /* IEEE 128-bit +Infinity. */ + case 0x7FFF8000U: return 9; /* IEEE 128-bit quiet NaN. */ + case 0x80000000U: return 16; /* IEEE 128-bit -0.0. */ + case 0xBFFF0000U: return 17; /* IEEE 128-bit -1.0. */ + case 0xC0000000U: return 18; /* IEEE 128-bit -2.0. */ + case 0xC0008000U: return 19; /* IEEE 128-bit -3.0. */ + case 0xC0010000U: return 20; /* IEEE 128-bit -4.0. */ + case 0xC0014000U: return 21; /* IEEE 128-bit -5.0. */ + case 0xC0018000U: return 22; /* IEEE 128-bit -6.0. */ + case 0xC001C000U: return 23; /* IEEE 128-bit -7.0. */ + case 0xFFFF0000U: return 24; /* IEEE 128-bit -Infinity. */ + + /* anything else cannot be loaded. */ + default: + break; + } + + return 0; +} + + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-rs6000.h" diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 9d7878f144a..b7433ec4e30 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -640,6 +640,10 @@ mprivileged Target Var(rs6000_privileged) Init(0) Generate code that will run in privileged state. +mieee128-constant +Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save +Generate (do not generate) code that uses the LXVKQ instruction. + -param=rs6000-density-pct-threshold= Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param When costing for loop vectorization, we probably need to penalize the loop body diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 0bf04feb6c4..0a376ee4c28 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -1192,16 +1192,19 @@ (define_insn_and_split "*xxspltib__split" ;; VSX store VSX load VSX move VSX->GPR GPR->VSX LQ (GPR) ;; STQ (GPR) GPR load GPR store GPR move XXSPLTIB VSPLTISW +;; LXVKQ ;; VSX 0/-1 VMX const GPR const LVX (VMX) STVX (VMX) (define_insn "vsx_mov_64bit" [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=ZwO, wa, wa, r, we, ?wQ, ?&r, ??r, ??Y, , wa, v, + wa, ?wa, v, , wZ, v") (match_operand:VSX_M 1 "input_operand" "wa, ZwO, wa, we, r, r, wQ, Y, r, r, wE, jwM, + eQ, ?jwM, W, , v, wZ"))] "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode) @@ -1213,35 +1216,43 @@ (define_insn "vsx_mov_64bit" [(set_attr "type" "vecstore, vecload, vecsimple, mtvsr, mfvsr, load, store, load, store, *, vecsimple, vecsimple, + vecperm, vecsimple, *, *, vecstore, vecload") (set_attr "num_insns" "*, *, *, 2, *, 2, 2, 2, 2, 2, *, *, + *, *, 5, 2, *, *") (set_attr "max_prefixed_insns" "*, *, *, *, *, 2, 2, 2, 2, 2, *, *, + *, *, *, *, *, *") (set_attr "length" "*, *, *, 8, *, 8, 8, 8, 8, 8, *, *, + *, *, 20, 8, *, *") (set_attr "isa" ", , , *, *, *, *, *, *, *, p9v, *, + p10, , *, *, *, *")]) ;; VSX store VSX load VSX move GPR load GPR store GPR move +;; LXVKQ ;; XXSPLTIB VSPLTISW VSX 0/-1 VMX const GPR const ;; LVX (VMX) STVX (VMX) (define_insn "*vsx_mov_32bit" [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=ZwO, wa, wa, ??r, ??Y, , + wa, wa, v, ?wa, v, , wZ, v") (match_operand:VSX_M 1 "input_operand" "wa, ZwO, wa, Y, r, r, + eQ, wE, jwM, ?jwM, W, , v, wZ"))] @@ -1253,14 +1264,17 @@ (define_insn "*vsx_mov_32bit" } [(set_attr "type" "vecstore, vecload, vecsimple, load, store, *, + vecperm, vecsimple, vecsimple, vecsimple, *, *, vecstore, vecload") (set_attr "length" "*, *, *, 16, 16, 16, + *, *, *, *, 20, 16, *, *") (set_attr "isa" ", , , *, *, *, + p10, p9v, *, , *, *, *, *")]) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 41f1850bf6e..4af8fd76992 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -3336,6 +3336,10 @@ A constant whose negation is a signed 16-bit constant. @item eI A signed 34-bit integer constant if prefixed instructions are supported. +@item eQ +An IEEE 128-bit constant that can be loaded into a VSX register with a +single instruction. + @ifset INTERNALS @item G A floating point constant that can be loaded into a register with one diff --git a/gcc/testsuite/gcc.target/powerpc/float128-constant.c b/gcc/testsuite/gcc.target/powerpc/float128-constant.c new file mode 100644 index 00000000000..e3286a786a5 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/float128-constant.c @@ -0,0 +1,160 @@ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the LXVKQ instruction is generated to load special IEEE 128-bit + constants. */ + +_Float128 +return_0 (void) +{ + return 0.0f128; /* XXSPLTIB 34,0. */ +} + +_Float128 +return_1 (void) +{ + return 1.0f128; /* LXVKQ 34,1. */ +} + +_Float128 +return_2 (void) +{ + return 2.0f128; /* LXVKQ 34,2. */ +} + +_Float128 +return_3 (void) +{ + return 3.0f128; /* LXVKQ 34,3. */ +} + +_Float128 +return_4 (void) +{ + return 4.0f128; /* LXVKQ 34,4. */ +} + +_Float128 +return_5 (void) +{ + return 5.0f128; /* LXVKQ 34,5. */ +} + +_Float128 +return_6 (void) +{ + return 6.0f128; /* LXVKQ 34,6. */ +} + +_Float128 +return_7 (void) +{ + return 7.0f128; /* LXVKQ 34,7. */ +} + +_Float128 +return_m0 (void) +{ + return -0.0f128; /* LXVKQ 34,16. */ +} + +_Float128 +return_m1 (void) +{ + return -1.0f128; /* LXVKQ 34,17. */ +} + +_Float128 +return_m2 (void) +{ + return -2.0f128; /* LXVKQ 34,18. */ +} + +_Float128 +return_m3 (void) +{ + return -3.0f128; /* LXVKQ 34,19. */ +} + +_Float128 +return_m4 (void) +{ + return -4.0f128; /* LXVKQ 34,20. */ +} + +_Float128 +return_m5 (void) +{ + return -5.0f128; /* LXVKQ 34,21. */ +} + +_Float128 +return_m6 (void) +{ + return -6.0f128; /* LXVKQ 34,22. */ +} + +_Float128 +return_m7 (void) +{ + return -7.0f128; /* LXVKQ 34,23. */ +} + +_Float128 +return_inf (void) +{ + return __builtin_inff128 (); /* LXVKQ 34,8. */ +} + +_Float128 +return_minf (void) +{ + return - __builtin_inff128 (); /* LXVKQ 34,24. */ +} + +_Float128 +return_nan (void) +{ + return __builtin_nanf128 (""); /* LXVKQ 34,9. */ +} + +/* Note, the following NaNs should not generate a LXVKQ instruction. */ +_Float128 +return_mnan (void) +{ + return - __builtin_nanf128 (""); /* PLXV 34,... */ +} + +_Float128 +return_nan2 (void) +{ + return __builtin_nanf128 ("1"); /* PLXV 34,... */ +} + +_Float128 +return_nans (void) +{ + return __builtin_nansf128 (""); /* PLXV 34,... */ +} + +vector long long +return_longlong_neg_0 (void) +{ + /* This vector is the same pattern as -0.0F128. */ +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ +#define FIRST 0x8000000000000000 +#define SECOND 0x0000000000000000 + +#else +#define FIRST 0x0000000000000000 +#define SECOND 0x8000000000000000 +#endif + + return (vector long long) { FIRST, SECOND }; /* LXVKQ 34,16. */ +} + +/* { dg-final { scan-assembler-times {\mlxvkq\M} 19 } } */ +/* { dg-final { scan-assembler-times {\mplxv\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M} 1 } } */ +