[AArch64] Enable generation of FRINTNZ instructions

Message ID 8225375c-eb9e-f9b3-6bcd-9fbccf2fc87b@arm.com
State New
Headers
Series [AArch64] Enable generation of FRINTNZ instructions |

Commit Message

Andre Vieira (lists) Nov. 11, 2021, 5:51 p.m. UTC
  Hi,

This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding 
optabs and mappings. It also creates a backend pattern to implement them 
for aarch64 and a match.pd pattern to idiom recognize these.
These IFN's (and optabs) represent a truncation towards zero, as if 
performed by first casting it to a signed integer of 32 or 64 bits and 
then back to the same floating point type/mode.

The match.pd pattern choses to use these, when supported, regardless of 
trapping math, since these new patterns mimic the original behavior of 
truncating through an integer.

I didn't think any of the existing IFN's represented these. I know it's 
a bit late in stage 1, but I thought this might be OK given it's only 
used by a single target and should have very little impact on anything else.

Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New 
pattern.
         * config/aarch64/iterators.md (FRINTZ): New iterator.
         * doc/md.texi: New entry for ftrunc pattern name.
         * internal-fn.def (FTRUNC32): New IFN.
         (FTRUNC64): Likewise.
         * match.pd: Add to the existing TRUNC pattern match.
         * optabs.def (OPTAB_D): New entries for ftrunc.

gcc/testsuite/ChangeLog:

         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.
         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
         * gcc.target/aarch64/frintnz.c: New test.
  

Comments

Richard Biener Nov. 12, 2021, 10:56 a.m. UTC | #1
On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:

> Hi,
> 
> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
> optabs and mappings. It also creates a backend pattern to implement them for
> aarch64 and a match.pd pattern to idiom recognize these.
> These IFN's (and optabs) represent a truncation towards zero, as if performed
> by first casting it to a signed integer of 32 or 64 bits and then back to the
> same floating point type/mode.
> 
> The match.pd pattern choses to use these, when supported, regardless of
> trapping math, since these new patterns mimic the original behavior of
> truncating through an integer.
> 
> I didn't think any of the existing IFN's represented these. I know it's a bit
> late in stage 1, but I thought this might be OK given it's only used by a
> single target and should have very little impact on anything else.
> 
> Bootstrapped on aarch64-none-linux.
> 
> OK for trunk?

On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
optab (with two modes), so not

+OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
+OPTAB_D (ftrunc64_optab, "ftrunc$adi2")

but

OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")

or so?  I know that gets somewhat awkward for the internal function,
but IMHO we shouldn't tie our hands because of that?

Richard.


> gcc/ChangeLog:
> 
>         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New
> pattern.
>         * config/aarch64/iterators.md (FRINTZ): New iterator.
>         * doc/md.texi: New entry for ftrunc pattern name.
>         * internal-fn.def (FTRUNC32): New IFN.
>         (FTRUNC64): Likewise.
>         * match.pd: Add to the existing TRUNC pattern match.
>         * optabs.def (OPTAB_D): New entries for ftrunc.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz
> instruction available.
>         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
>         * gcc.target/aarch64/frintnz.c: New test.
> 
>
  
Andre Vieira (lists) Nov. 12, 2021, 11:48 a.m. UTC | #2
On 12/11/2021 10:56, Richard Biener wrote:
> On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:
>
>> Hi,
>>
>> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
>> optabs and mappings. It also creates a backend pattern to implement them for
>> aarch64 and a match.pd pattern to idiom recognize these.
>> These IFN's (and optabs) represent a truncation towards zero, as if performed
>> by first casting it to a signed integer of 32 or 64 bits and then back to the
>> same floating point type/mode.
>>
>> The match.pd pattern choses to use these, when supported, regardless of
>> trapping math, since these new patterns mimic the original behavior of
>> truncating through an integer.
>>
>> I didn't think any of the existing IFN's represented these. I know it's a bit
>> late in stage 1, but I thought this might be OK given it's only used by a
>> single target and should have very little impact on anything else.
>>
>> Bootstrapped on aarch64-none-linux.
>>
>> OK for trunk?
> On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
> optab (with two modes), so not
>
> +OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
> +OPTAB_D (ftrunc64_optab, "ftrunc$adi2")
>
> but
>
> OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")
>
> or so?  I know that gets somewhat awkward for the internal function,
> but IMHO we shouldn't tie our hands because of that?
I tried doing this originally, but indeed I couldn't find a way to 
correctly tie the internal function to it.

direct_optab_supported_p with multiple types expect those to be of the 
same mode. I see convert_optab_supported_p does but I don't know how 
that is used...

Any ideas?
  
Richard Biener Nov. 16, 2021, 12:10 p.m. UTC | #3
On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote:

> 
> On 12/11/2021 10:56, Richard Biener wrote:
> > On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:
> >
> >> Hi,
> >>
> >> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
> >> optabs and mappings. It also creates a backend pattern to implement them
> >> for
> >> aarch64 and a match.pd pattern to idiom recognize these.
> >> These IFN's (and optabs) represent a truncation towards zero, as if
> >> performed
> >> by first casting it to a signed integer of 32 or 64 bits and then back to
> >> the
> >> same floating point type/mode.
> >>
> >> The match.pd pattern choses to use these, when supported, regardless of
> >> trapping math, since these new patterns mimic the original behavior of
> >> truncating through an integer.
> >>
> >> I didn't think any of the existing IFN's represented these. I know it's a
> >> bit
> >> late in stage 1, but I thought this might be OK given it's only used by a
> >> single target and should have very little impact on anything else.
> >>
> >> Bootstrapped on aarch64-none-linux.
> >>
> >> OK for trunk?
> > On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
> > optab (with two modes), so not
> >
> > +OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
> > +OPTAB_D (ftrunc64_optab, "ftrunc$adi2")
> >
> > but
> >
> > OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")
> >
> > or so?  I know that gets somewhat awkward for the internal function,
> > but IMHO we shouldn't tie our hands because of that?
> I tried doing this originally, but indeed I couldn't find a way to correctly
> tie the internal function to it.
> 
> direct_optab_supported_p with multiple types expect those to be of the same
> mode. I see convert_optab_supported_p does but I don't know how that is
> used...
> 
> Any ideas?

No "nice" ones.  The "usual" way is to provide fake arguments that
specify the type/mode.  We could use an integer argument directly
secifying the mode (then the IL would look host dependent - ugh),
or specify a constant zero in the intended mode (less visibly
obvious - but at least with -gimple dumping you'd see the type...).

In any case if people think going with two optabs is OK then
please consider using ftruncsi and ftruncdi instead of 32/64.

Richard.
  
Andre Vieira (lists) Nov. 17, 2021, 1:30 p.m. UTC | #4
On 16/11/2021 12:10, Richard Biener wrote:
> On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote:
>
>> On 12/11/2021 10:56, Richard Biener wrote:
>>> On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:
>>>
>>>> Hi,
>>>>
>>>> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
>>>> optabs and mappings. It also creates a backend pattern to implement them
>>>> for
>>>> aarch64 and a match.pd pattern to idiom recognize these.
>>>> These IFN's (and optabs) represent a truncation towards zero, as if
>>>> performed
>>>> by first casting it to a signed integer of 32 or 64 bits and then back to
>>>> the
>>>> same floating point type/mode.
>>>>
>>>> The match.pd pattern choses to use these, when supported, regardless of
>>>> trapping math, since these new patterns mimic the original behavior of
>>>> truncating through an integer.
>>>>
>>>> I didn't think any of the existing IFN's represented these. I know it's a
>>>> bit
>>>> late in stage 1, but I thought this might be OK given it's only used by a
>>>> single target and should have very little impact on anything else.
>>>>
>>>> Bootstrapped on aarch64-none-linux.
>>>>
>>>> OK for trunk?
>>> On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
>>> optab (with two modes), so not
>>>
>>> +OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
>>> +OPTAB_D (ftrunc64_optab, "ftrunc$adi2")
>>>
>>> but
>>>
>>> OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")
>>>
>>> or so?  I know that gets somewhat awkward for the internal function,
>>> but IMHO we shouldn't tie our hands because of that?
>> I tried doing this originally, but indeed I couldn't find a way to correctly
>> tie the internal function to it.
>>
>> direct_optab_supported_p with multiple types expect those to be of the same
>> mode. I see convert_optab_supported_p does but I don't know how that is
>> used...
>>
>> Any ideas?
> No "nice" ones.  The "usual" way is to provide fake arguments that
> specify the type/mode.  We could use an integer argument directly
> secifying the mode (then the IL would look host dependent - ugh),
> or specify a constant zero in the intended mode (less visibly
> obvious - but at least with -gimple dumping you'd see the type...).
Hi,

So I reworked this to have a single optab and IFN. This required a bit 
of fiddling with custom expander and supported_p functions for the IFN. 
I decided to pass a MAX_INT for the 'int' type to the IFN to be able to 
pass on the size of the int we use as an intermediate cast.  I tried 0 
first, but gcc was being too smart and just demoted it to an 'int' for 
the long long test-cases.

Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New 
pattern.
         * config/aarch64/iterators.md (FRINTZ): New iterator.
         * doc/md.texi: New entry for ftrunc pattern name.
         * internal-fn.def (FTRUNC_INT): New IFN.
         * match.pd: Add to the existing TRUNC pattern match.
         * optabs.def (ftrunc_int): New entry.

gcc/testsuite/ChangeLog:

         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.
         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
         * gcc.target/aarch64/frintnz.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4035e061706793849c68ae09bcb2e4b9580ab7b6..62adbc4cb6bbbe0c856f9fbe451aee08f2dea3b5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7345,6 +7345,14 @@ (define_insn "despeculate_simpleti"
    (set_attr "speculation_barrier" "true")]
 )
 
+(define_expand "ftrunc<mode><frintnz_mode>2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+        (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+		      FRINTNZ))]
+  "TARGET_FRINT && TARGET_FLOAT
+   && !(VECTOR_MODE_P (<MODE>mode) && !TARGET_SIMD)"
+)
+
 (define_insn "aarch64_<frintnzs_op><mode>"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
 	(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..49510488a2a800689e95c399f2e6c967b566516d 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3067,6 +3067,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
 			       UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3482,6 +3484,8 @@ (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s")
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X "frint32x")
 			      (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X "frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_<xx>.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
 			 (UNSPEC_COND_CMPGE_WIDE "ge")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e95005647ca97a495a97d7e184d137..7bd66818144e87e1dca2ef13bef1d6f21f239570 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6175,6 +6175,13 @@ operands; otherwise, it may not.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
+@item @samp{ftrunc@var{m}@var{n}2}
+Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
+the result in operand 0. Both operands have mode @var{m}, which is a scalar or
+vector floating-point mode.
+
+
 @cindex @code{round@var{m}2} instruction pattern
 @item @samp{round@var{m}2}
 Round operand 1 to the nearest integer, rounding away from zero in the
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0cba95411a63423484dda5b1251f47de24e926ba..d8306b50807609573c2ff612e2a83dcf1c55d1de 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -130,6 +130,7 @@ init_internal_fns ()
 #define fold_left_direct { 1, 1, false }
 #define mask_fold_left_direct { 1, 1, false }
 #define check_ptrs_direct { 0, 0, false }
+#define ftrunc_int_direct { 0, 1, true }
 
 const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
@@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_optab optab)
   return convert_optab_handler (optab, imode, vmode);
 }
 
+/* Expand FTRUNC_INT call STMT using optab OPTAB.  */
+
+static void
+expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[2];
+  tree lhs, float_type, int_type;
+  rtx target, op;
+
+  lhs = gimple_call_lhs (stmt);
+  target = expand_normal (lhs);
+  op = expand_normal (gimple_call_arg (stmt, 0));
+
+  float_type = TREE_TYPE (lhs);
+  int_type = TREE_TYPE (gimple_call_arg (stmt, 1));
+
+  create_output_operand (&ops[0], target, TYPE_MODE (float_type));
+  create_input_operand (&ops[1], op, TYPE_MODE (float_type));
+
+  expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type),
+				      TYPE_MODE (int_type)), 2, ops);
+}
+
 /* Expand LOAD_LANES call STMT using optab OPTAB.  */
 
 static void
@@ -3712,6 +3736,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p
 #define direct_check_ptrs_optab_supported_p direct_optab_supported_p
 #define direct_vec_set_optab_supported_p direct_optab_supported_p
+#define direct_ftrunc_int_optab_supported_p convert_optab_supported_p
 
 /* Return the optab used by internal function FN.  */
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index bb13c6cce1bf55633760bc14980402f1f0ac1689..fb97d37cecae17cdb6444e7f3391361b214f0712 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -269,6 +269,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC_INT, ECF_CONST, ftruncint, ftrunc_int)
 
 /* Binary math functions.  */
 DEF_INTERNAL_FLT_FN (ATAN2, ECF_CONST, atan2, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index a319aefa8081ac177981ad425c461f8a771128f4..c37aa023b57838eba80c7a212ff1038eb6eed861 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    trapping behaviour, so require !flag_trapping_math. */
 #if GIMPLE
 (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-	&& types_match (type, TREE_TYPE (@0))
-	&& direct_internal_fn_supported_p (IFN_TRUNC, type,
-					  OPTIMIZE_FOR_BOTH))
-      (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+    (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+	 && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type,
+					    TREE_TYPE (@1), OPTIMIZE_FOR_BOTH))
+     (with {
+      tree int_type = TREE_TYPE (@1);
+      unsigned HOST_WIDE_INT max_int_c
+	= (1ULL << (element_precision (int_type) - 1)) - 1;
+      }
+      (IFN_FTRUNC_INT @0 { build_int_cst (int_type, max_int_c); }))
+     (if (!flag_trapping_math
+	  && direct_internal_fn_supported_p (IFN_TRUNC, type,
+					     OPTIMIZE_FOR_BOTH))
+      (IFN_TRUNC @0)))))
 #endif
 
 /* If we have a narrowing conversion to an integral type that is fed by a
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b889ad2e5a08613db51d16d072080ac6cb48404f..57d259d33409265df3af1646d123e4ab216c34c8 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -63,6 +63,7 @@ OPTAB_CX(fractuns_optab, "fractuns$Q$b$I$a2")
 OPTAB_CL(satfract_optab, "satfract$b$Q$a2", SAT_FRACT, "satfract", gen_satfract_conv_libfunc)
 OPTAB_CL(satfractuns_optab, "satfractuns$I$b$Q$a2", UNSIGNED_SAT_FRACT, "satfractuns", gen_satfractuns_conv_libfunc)
 
+OPTAB_CD(ftruncint_optab, "ftrunc$a$b2")
 OPTAB_CD(sfixtrunc_optab, "fix_trunc$F$b$I$a2")
 OPTAB_CD(ufixtrunc_optab, "fixuns_trunc$F$b$I$a2")
 
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
new file mode 100644
index 0000000000000000000000000000000000000000..2e1971f8aa11d8b95f454d03a03e050a3bf96747
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
@@ -0,0 +1,88 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.5-a" } */
+/* { dg-require-effective-target arm_v8_5a_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**	...
+**	frint32z	s0, s0
+**	...
+*/
+float
+f1 (float x)
+{
+  int y = x;
+  return (float) y;
+}
+
+/*
+** f2:
+**	...
+**	frint64z	s0, s0
+**	...
+*/
+float
+f2 (float x)
+{
+  long long int y = x;
+  return (float) y;
+}
+
+/*
+** f3:
+**	...
+**	frint32z	d0, d0
+**	...
+*/
+double
+f3 (double x)
+{
+  int y = x;
+  return (double) y;
+}
+
+/*
+** f4:
+**	...
+**	frint64z	d0, d0
+**	...
+*/
+double
+f4 (double x)
+{
+  long long int y = x;
+  return (double) y;
+}
+
+float
+f1_dont (float x)
+{
+  unsigned int y = x;
+  return (float) y;
+}
+
+float
+f2_dont (float x)
+{
+  unsigned long long int y = x;
+  return (float) y;
+}
+
+double
+f3_dont (double x)
+{
+  unsigned int y = x;
+  return (double) y;
+}
+
+double
+f4_dont (double x)
+{
+  unsigned long long int y = x;
+  return (double) y;
+}
+
+/* Make sure the 'dont's don't generate any frintNz.  */
+/* { dg-final { scan-assembler-times {frint32z} 2 } } */
+/* { dg-final { scan-assembler-times {frint64z} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
index 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3b34dc3ad79f1406a41ec4c00db10347ba1ca2c4 100644
--- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
+++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math" } */
+/* { dg-skip-if "" { arm_v8_5a_frintnzx_ok } } */
 
 float
 f1 (float x)
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8cbda192fe0fae59ea208ee43696b4d22c43e61e..7fa1659ce734257f3cd96f1e2e50ace4d02dcf51 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11365,6 +11365,33 @@ proc check_effective_target_arm_v8_3a_bkey_directive { } {
 	}]
 }
 
+# Return 1 if the target supports ARMv8.5 scalar and Adv.Simd FRINT32[ZX]
+# and FRINT64[ZX] instructions, 0 otherwise. The test is valid for AArch64.
+# Record the command line options needed.
+
+proc check_effective_target_arm_v8_5a_frintnzx_ok_nocache { } {
+
+    if { ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    if { [check_no_compiler_messages_nocache \
+	      arm_v8_5a_frintnzx_ok assembly {
+	#if !defined (__ARM_FEATURE_FRINT)
+	#error "__ARM_FEATURE_FRINT not defined"
+	#endif
+    } [current_compiler_flags]] } {
+	return 1;
+    }
+
+    return 0;
+}
+
+proc check_effective_target_arm_v8_5a_frintnzx_ok { } {
+    return [check_cached_effective_target arm_v8_5a_frintnzx_ok \
+                check_effective_target_arm_v8_5a_frintnzx_ok_nocache] 
+}
+
 # Return 1 if the target supports executing the Armv8.1-M Mainline Low
 # Overhead Loop, 0 otherwise.  The test is valid for ARM.
  
Richard Sandiford Nov. 17, 2021, 3:38 p.m. UTC | #5
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 4035e061706793849c68ae09bcb2e4b9580ab7b6..62adbc4cb6bbbe0c856f9fbe451aee08f2dea3b5 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -7345,6 +7345,14 @@ (define_insn "despeculate_simpleti"
>     (set_attr "speculation_barrier" "true")]
>  )
>  
> +(define_expand "ftrunc<mode><frintnz_mode>2"
> +  [(set (match_operand:VSFDF 0 "register_operand" "=w")
> +        (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
> +		      FRINTNZ))]
> +  "TARGET_FRINT && TARGET_FLOAT
> +   && !(VECTOR_MODE_P (<MODE>mode) && !TARGET_SIMD)"
> +)

Probably just me, but this condition seems quite hard to read.
I think it'd be better to add conditions to the VSFDF definition instead,
a bit like we do for the HF entries in VHSDF_HSDF and VHSDF_DF.  I.e.:

(define_mode_iterator VSFDF [(V2SF "TARGET_SIMD")
			     (V4SF "TARGET_SIMD")
			     (V2DF "TARGET_SIMD")
			     (SF "TARGET_FLOAT")
			     (DF "TARGET_FLOAT")])

Then the condition can be "TARGET_FRINT".

Same for the existing aarch64_<frintnzs_op><mode>.

> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index bb13c6cce1bf55633760bc14980402f1f0ac1689..fb97d37cecae17cdb6444e7f3391361b214f0712 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -269,6 +269,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
>  DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
>  DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
>  DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
> +DEF_INTERNAL_OPTAB_FN (FTRUNC_INT, ECF_CONST, ftruncint, ftrunc_int)

ftrunc_int should be described in the comment at the top of the file.
E.g.:

  - ftrunc_int: a unary conversion optab that takes and returns values
    of the same mode, but internally converts via another mode.  This
    second mode is specified using a dummy final function argument.

> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..2e1971f8aa11d8b95f454d03a03e050a3bf96747
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> @@ -0,0 +1,88 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8.5-a" } */
> +/* { dg-require-effective-target arm_v8_5a_frintnzx_ok } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** f1:
> +**	...
> +**	frint32z	s0, s0
> +**	...

Are these functions ever more than just:

f1:
	frint32z	s0, s0
	ret

?  If not, I think we should match that sequence and “defend” the
good codegen.  The problem with ... on both sides is that it's
then not clear why we can rely on register 0 being used.

> +*/
> +float
> +f1 (float x)
> +{
> +  int y = x;
> +  return (float) y;
> +}
> +
> +/*
> +** f2:
> +**	...
> +**	frint64z	s0, s0
> +**	...
> +*/
> +float
> +f2 (float x)
> +{
> +  long long int y = x;
> +  return (float) y;
> +}
> +
> +/*
> +** f3:
> +**	...
> +**	frint32z	d0, d0
> +**	...
> +*/
> +double
> +f3 (double x)
> +{
> +  int y = x;
> +  return (double) y;
> +}
> +
> +/*
> +** f4:
> +**	...
> +**	frint64z	d0, d0
> +**	...
> +*/
> +double
> +f4 (double x)
> +{
> +  long long int y = x;
> +  return (double) y;
> +}
> +
> +float
> +f1_dont (float x)
> +{
> +  unsigned int y = x;
> +  return (float) y;
> +}
> +
> +float
> +f2_dont (float x)
> +{
> +  unsigned long long int y = x;
> +  return (float) y;
> +}
> +
> +double
> +f3_dont (double x)
> +{
> +  unsigned int y = x;
> +  return (double) y;
> +}
> +
> +double
> +f4_dont (double x)
> +{
> +  unsigned long long int y = x;
> +  return (double) y;
> +}
> +
> +/* Make sure the 'dont's don't generate any frintNz.  */
> +/* { dg-final { scan-assembler-times {frint32z} 2 } } */
> +/* { dg-final { scan-assembler-times {frint64z} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
> index 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3b34dc3ad79f1406a41ec4c00db10347ba1ca2c4 100644
> --- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -ffast-math" } */
> +/* { dg-skip-if "" { arm_v8_5a_frintnzx_ok } } */
>  
>  float
>  f1 (float x)
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> index 8cbda192fe0fae59ea208ee43696b4d22c43e61e..7fa1659ce734257f3cd96f1e2e50ace4d02dcf51 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -11365,6 +11365,33 @@ proc check_effective_target_arm_v8_3a_bkey_directive { } {
>  	}]
>  }
>  
> +# Return 1 if the target supports ARMv8.5 scalar and Adv.Simd FRINT32[ZX]

Armv8.5-A

> +# and FRINT64[ZX] instructions, 0 otherwise. The test is valid for AArch64.
> +# Record the command line options needed.
> +
> +proc check_effective_target_arm_v8_5a_frintnzx_ok_nocache { } {
> +
> +    if { ![istarget aarch64*-*-*] } {
> +        return 0;
> +    }
> +
> +    if { [check_no_compiler_messages_nocache \
> +	      arm_v8_5a_frintnzx_ok assembly {
> +	#if !defined (__ARM_FEATURE_FRINT)
> +	#error "__ARM_FEATURE_FRINT not defined"
> +	#endif
> +    } [current_compiler_flags]] } {
> +	return 1;
> +    }
> +
> +    return 0;
> +}
> +
> +proc check_effective_target_arm_v8_5a_frintnzx_ok { } {

The new condition should be documented in sourcebuild.texi, near
the existing arm_v8_* tests.

OK for the non-match.pd parts with those changes.  I don't feel
qualified to review the match.pd bits. :-)

Thanks,
Richard

> +    return [check_cached_effective_target arm_v8_5a_frintnzx_ok \
> +                check_effective_target_arm_v8_5a_frintnzx_ok_nocache] 
> +}
> +
>  # Return 1 if the target supports executing the Armv8.1-M Mainline Low
>  # Overhead Loop, 0 otherwise.  The test is valid for ARM.
>
  
Richard Biener Nov. 18, 2021, 11:05 a.m. UTC | #6
On Wed, 17 Nov 2021, Andre Vieira (lists) wrote:

> 
> On 16/11/2021 12:10, Richard Biener wrote:
> > On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote:
> >
> >> On 12/11/2021 10:56, Richard Biener wrote:
> >>> On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
> >>>> optabs and mappings. It also creates a backend pattern to implement them
> >>>> for
> >>>> aarch64 and a match.pd pattern to idiom recognize these.
> >>>> These IFN's (and optabs) represent a truncation towards zero, as if
> >>>> performed
> >>>> by first casting it to a signed integer of 32 or 64 bits and then back to
> >>>> the
> >>>> same floating point type/mode.
> >>>>
> >>>> The match.pd pattern choses to use these, when supported, regardless of
> >>>> trapping math, since these new patterns mimic the original behavior of
> >>>> truncating through an integer.
> >>>>
> >>>> I didn't think any of the existing IFN's represented these. I know it's a
> >>>> bit
> >>>> late in stage 1, but I thought this might be OK given it's only used by a
> >>>> single target and should have very little impact on anything else.
> >>>>
> >>>> Bootstrapped on aarch64-none-linux.
> >>>>
> >>>> OK for trunk?
> >>> On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
> >>> optab (with two modes), so not
> >>>
> >>> +OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
> >>> +OPTAB_D (ftrunc64_optab, "ftrunc$adi2")
> >>>
> >>> but
> >>>
> >>> OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")
> >>>
> >>> or so?  I know that gets somewhat awkward for the internal function,
> >>> but IMHO we shouldn't tie our hands because of that?
> >> I tried doing this originally, but indeed I couldn't find a way to
> >> correctly
> >> tie the internal function to it.
> >>
> >> direct_optab_supported_p with multiple types expect those to be of the same
> >> mode. I see convert_optab_supported_p does but I don't know how that is
> >> used...
> >>
> >> Any ideas?
> > No "nice" ones.  The "usual" way is to provide fake arguments that
> > specify the type/mode.  We could use an integer argument directly
> > secifying the mode (then the IL would look host dependent - ugh),
> > or specify a constant zero in the intended mode (less visibly
> > obvious - but at least with -gimple dumping you'd see the type...).
> Hi,
> 
> So I reworked this to have a single optab and IFN. This required a bit of
> fiddling with custom expander and supported_p functions for the IFN. I decided
> to pass a MAX_INT for the 'int' type to the IFN to be able to pass on the size
> of the int we use as an intermediate cast.  I tried 0 first, but gcc was being
> too smart and just demoted it to an 'int' for the long long test-cases.
> 
> Bootstrapped on aarch64-none-linux.
> 
> OK for trunk?

@@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    trapping behaviour, so require !flag_trapping_math. */
 #if GIMPLE
 (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-       && types_match (type, TREE_TYPE (@0))
-       && direct_internal_fn_supported_p (IFN_TRUNC, type,
-                                         OPTIMIZE_FOR_BOTH))
-      (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+    (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+        && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type,
+                                           TREE_TYPE (@1),
OPTIMIZE_FOR_BOTH))
+     (with {
+      tree int_type = TREE_TYPE (@1);
+      unsigned HOST_WIDE_INT max_int_c
+       = (1ULL << (element_precision (int_type) - 1)) - 1;

That's only half-way supporting vector types I fear - you use
element_precision but then build a vector integer constant
in an unsupported way.  I suppose vector support isn't present
for arm?  The cleanest way would probably be to do

       tree int_type = element_type (@1);

with providing element_type in tree.[ch] like we provide
element_precision.

+      }
+      (IFN_FTRUNC_INT @0 { build_int_cst (int_type, max_int_c); }))

Then you could use wide_int_to_tree (int_type, wi::max_value 
(TYPE_PRECISION (int_type), SIGNED))
to build the special integer constant (which seems to be always
scalar).

+     (if (!flag_trapping_math
+         && direct_internal_fn_supported_p (IFN_TRUNC, type,
+                                            OPTIMIZE_FOR_BOTH))
+      (IFN_TRUNC @0)))))
 #endif

does IFN_FTRUNC_INT preserve the same exceptions as doing
explicit intermediate float->int conversions?  I think I'd
prefer to have !flag_trapping_math on both cases.

> gcc/ChangeLog:
> 
>         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New
> pattern.
>         * config/aarch64/iterators.md (FRINTZ): New iterator.
>         * doc/md.texi: New entry for ftrunc pattern name.
>         * internal-fn.def (FTRUNC_INT): New IFN.
>         * match.pd: Add to the existing TRUNC pattern match.
>         * optabs.def (ftrunc_int): New entry.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz
> instruction available.
>         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
>         * gcc.target/aarch64/frintnz.c: New test.
>
  
Andre Vieira (lists) Nov. 22, 2021, 11:38 a.m. UTC | #7
On 18/11/2021 11:05, Richard Biener wrote:
>
> @@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>      trapping behaviour, so require !flag_trapping_math. */
>   #if GIMPLE
>   (simplify
> -   (float (fix_trunc @0))
> -   (if (!flag_trapping_math
> -       && types_match (type, TREE_TYPE (@0))
> -       && direct_internal_fn_supported_p (IFN_TRUNC, type,
> -                                         OPTIMIZE_FOR_BOTH))
> -      (IFN_TRUNC @0)))
> +   (float (fix_trunc@1 @0))
> +   (if (types_match (type, TREE_TYPE (@0)))
> +    (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
> +        && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type,
> +                                           TREE_TYPE (@1),
> OPTIMIZE_FOR_BOTH))
> +     (with {
> +      tree int_type = TREE_TYPE (@1);
> +      unsigned HOST_WIDE_INT max_int_c
> +       = (1ULL << (element_precision (int_type) - 1)) - 1;
>
> That's only half-way supporting vector types I fear - you use
> element_precision but then build a vector integer constant
> in an unsupported way.  I suppose vector support isn't present
> for arm?  The cleanest way would probably be to do
>
>         tree int_type = element_type (@1);
>
> with providing element_type in tree.[ch] like we provide
> element_precision.
This is a good shout and made me think about something I hadn't 
before... I thought I could handle the vector forms later, but the 
problem is if I add support for the scalar, it will stop the vectorizer. 
It seems vectorizable_call expects all arguments to have the same type, 
which doesn't work with passing the integer type as an operand work around.

Should I go back to two separate IFN's, could still have the single optab.

Regards,
Andre
  
Richard Biener Nov. 22, 2021, 11:41 a.m. UTC | #8
On Mon, 22 Nov 2021, Andre Vieira (lists) wrote:

> 
> On 18/11/2021 11:05, Richard Biener wrote:
> >
> > @@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >      trapping behaviour, so require !flag_trapping_math. */
> >   #if GIMPLE
> >   (simplify
> > -   (float (fix_trunc @0))
> > -   (if (!flag_trapping_math
> > -       && types_match (type, TREE_TYPE (@0))
> > -       && direct_internal_fn_supported_p (IFN_TRUNC, type,
> > -                                         OPTIMIZE_FOR_BOTH))
> > -      (IFN_TRUNC @0)))
> > +   (float (fix_trunc@1 @0))
> > +   (if (types_match (type, TREE_TYPE (@0)))
> > +    (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
> > +        && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type,
> > +                                           TREE_TYPE (@1),
> > OPTIMIZE_FOR_BOTH))
> > +     (with {
> > +      tree int_type = TREE_TYPE (@1);
> > +      unsigned HOST_WIDE_INT max_int_c
> > +       = (1ULL << (element_precision (int_type) - 1)) - 1;
> >
> > That's only half-way supporting vector types I fear - you use
> > element_precision but then build a vector integer constant
> > in an unsupported way.  I suppose vector support isn't present
> > for arm?  The cleanest way would probably be to do
> >
> >         tree int_type = element_type (@1);
> >
> > with providing element_type in tree.[ch] like we provide
> > element_precision.
> This is a good shout and made me think about something I hadn't before... I
> thought I could handle the vector forms later, but the problem is if I add
> support for the scalar, it will stop the vectorizer. It seems
> vectorizable_call expects all arguments to have the same type, which doesn't
> work with passing the integer type as an operand work around.

We already special case some IFNs there (masked load/store and gather)
to ignore some args, so that would just add to this set.

Richard.
  
Andre Vieira (lists) Nov. 25, 2021, 1:53 p.m. UTC | #9
On 22/11/2021 11:41, Richard Biener wrote:
>
>> On 18/11/2021 11:05, Richard Biener wrote:
>>> This is a good shout and made me think about something I hadn't before... I
>>> thought I could handle the vector forms later, but the problem is if I add
>>> support for the scalar, it will stop the vectorizer. It seems
>>> vectorizable_call expects all arguments to have the same type, which doesn't
>>> work with passing the integer type as an operand work around.
> We already special case some IFNs there (masked load/store and gather)
> to ignore some args, so that would just add to this set.
>
> Richard.
Hi,

Reworked it to add support of the new IFN to the vectorizer. Was 
initially trying to make vectorizable_call and 
vectorizable_internal_function handle IFNs with different inputs more 
generically, using the information we have in the <IFN>_direct structs 
regarding what operands to get the modes from. Unfortunately, that 
wasn't straightforward because of how vectorizable_call assumes operands 
have the same type and uses the type of the DEF_STMT_INFO of the 
non-constant operands (either output operand or non-constant inputs) to 
determine the type of constants. I assume there is some reason why we 
use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on 
the argument types. That is why I ended up with this sort of half-way 
mix of both, which still allows room to add more IFNs that don't take 
inputs of the same type, but require adding a bit of special casing 
similar to the IFN_FTRUNC_INT and masking ones.

Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New 
pattern.
         * config/aarch64/iterators.md (FRINTNZ): New iterator.
         (frintnz_mode): New int attribute.
         (VSFDF): Make iterator conditional.
         * internal-fn.def (FTRUNC_INT): New IFN.
         * internal-fn.c (ftrunc_int_direct): New define.
         (expand_ftrunc_int_optab_fn): New custom expander.
         (direct_ftrunc_int_optab_supported_p): New supported_p.
         * match.pd: Add to the existing TRUNC pattern match.
         * optabs.def (ftrunc_int): New entry.
         * stor-layout.h (element_precision): Moved from here...
         * tree.h (element_precision): ... to here.
         (element_type): New declaration.
         * tree.c (element_type): New function.
         (element_precision): Changed to use element_type.
         * tree-vect-stmts.c (vectorizable_internal_function): Add 
support for
         IFNs with different input types.
         (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
         * doc/md.texi: New entry for ftrunc pattern name.
         * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.

gcc/testsuite/ChangeLog:

         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.
         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
         * gcc.target/aarch64/frintnz.c: New test.
         * gcc.target/aarch64/frintnz_vec.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4035e061706793849c68ae09bcb2e4b9580ab7b6..c5c60e7a810e22b0ea9ed6bf056ddd6431d60269 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7345,12 +7345,18 @@ (define_insn "despeculate_simpleti"
    (set_attr "speculation_barrier" "true")]
 )
 
+(define_expand "ftrunc<mode><frintnz_mode>2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+        (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+		      FRINTNZ))]
+  "TARGET_FRINT"
+)
+
 (define_insn "aarch64_<frintnzs_op><mode>"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
 	(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
 		      FRINTNZX))]
-  "TARGET_FRINT && TARGET_FLOAT
-   && !(VECTOR_MODE_P (<MODE>mode) && !TARGET_SIMD)"
+  "TARGET_FRINT"
   "<frintnzs_op>\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
   [(set_attr "type" "f_rint<stype>")]
 )
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463f6a46f9fd0fb43f 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST")
 				  SF DF])
 
 ;; Scalar and vetor modes for SF, DF.
-(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
+(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD")
+			      (V4SF "TARGET_SIMD")
+			      (V2DF "TARGET_SIMD")
+			      (DF "TARGET_FLOAT")
+			      (SF "TARGET_FLOAT")])
 
 ;; Advanced SIMD single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
@@ -3067,6 +3071,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
 			       UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3482,6 +3488,8 @@ (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s")
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X "frint32x")
 			      (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X "frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_<xx>.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
 			 (UNSPEC_COND_CMPGE_WIDE "ge")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e95005647ca97a495a97d7e184d137..d50d09b0ae60d98537b9aece4396a490f33f174c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6175,6 +6175,15 @@ operands; otherwise, it may not.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
+@item @samp{ftrunc@var{m}@var{n}2}
+Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
+the result in operand 0. Both operands have mode @var{m}, which is a scalar or
+vector floating-point mode.  Exception must be thrown if operand 1 does not fit
+in a @var{n} mode signed integer as it would have if the truncation happened
+through separate floating point to integer conversion.
+
+
 @cindex @code{round@var{m}2} instruction pattern
 @item @samp{round@var{m}2}
 Round operand 1 to the nearest integer, rounding away from zero in the
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 40b1e0d816789b225089c4143fb63e62a6af817a..15d4de24d15cce6793b3bb61d728e61cea00924d 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2282,6 +2282,10 @@ Like @code{aarch64_sve_hw}, but also test for an exact hardware vector length.
 @item aarch64_fjcvtzs_hw
 AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
 instruction.
+
+@item aarch64_frintzx_ok
+AArch64 target that is able to generate the Armv8.5-a FRINT32Z, FRINT64Z,
+FRINT32X and FRINT64X instructions.
 @end table
 
 @subsubsection MIPS-specific attributes
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0cba95411a63423484dda5b1251f47de24e926ba..60b404ef44360c8ae0cda1176fb888302ddbc98d 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -130,6 +130,7 @@ init_internal_fns ()
 #define fold_left_direct { 1, 1, false }
 #define mask_fold_left_direct { 1, 1, false }
 #define check_ptrs_direct { 0, 0, false }
+#define ftrunc_int_direct { 0, 1, true }
 
 const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
@@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_optab optab)
   return convert_optab_handler (optab, imode, vmode);
 }
 
+/* Expand FTRUNC_INT call STMT using optab OPTAB.  */
+
+static void
+expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[2];
+  tree lhs, float_type, int_type;
+  rtx target, op;
+
+  lhs = gimple_call_lhs (stmt);
+  target = expand_normal (lhs);
+  op = expand_normal (gimple_call_arg (stmt, 0));
+
+  float_type = TREE_TYPE (lhs);
+  int_type = element_type (gimple_call_arg (stmt, 1));
+
+  create_output_operand (&ops[0], target, TYPE_MODE (float_type));
+  create_input_operand (&ops[1], op, TYPE_MODE (float_type));
+
+  expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type),
+				      TYPE_MODE (int_type)), 2, ops);
+}
+
 /* Expand LOAD_LANES call STMT using optab OPTAB.  */
 
 static void
@@ -3688,6 +3712,15 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 	  != CODE_FOR_nothing);
 }
 
+static bool direct_ftrunc_int_optab_supported_p (convert_optab optab,
+						 tree_pair types,
+						 optimization_type opt_type)
+{
+  return (convert_optab_handler (optab, TYPE_MODE (types.first),
+				TYPE_MODE (element_type (types.second)),
+				opt_type) != CODE_FOR_nothing);
+}
+
 #define direct_unary_optab_supported_p direct_optab_supported_p
 #define direct_binary_optab_supported_p direct_optab_supported_p
 #define direct_ternary_optab_supported_p direct_optab_supported_p
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index bb13c6cce1bf55633760bc14980402f1f0ac1689..e58891e3d3ebc805dd55ac6f70bbda617b7302b7 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -66,6 +66,9 @@ along with GCC; see the file COPYING3.  If not see
 
    - fold_left: for scalar = FN (scalar, vector), keyed off the vector mode
    - check_ptrs: used for check_{raw,war}_ptrs
+   - ftrunc_int: a unary conversion optab that takes and returns values of the
+   same mode, but internally converts via another mode.  This second mode is
+   specified using a dummy final function argument.
 
    DEF_INTERNAL_SIGNED_OPTAB_FN defines an internal function that
    maps to one of two optabs, depending on the signedness of an input.
@@ -269,6 +272,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC_INT, ECF_CONST, ftruncint, ftrunc_int)
 
 /* Binary math functions.  */
 DEF_INTERNAL_FLT_FN (ATAN2, ECF_CONST, atan2, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index a319aefa8081ac177981ad425c461f8a771128f4..80660e6fd40bc6934e1fa0329c0fbcab1658ed44 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    trapping behaviour, so require !flag_trapping_math. */
 #if GIMPLE
 (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-	&& types_match (type, TREE_TYPE (@0))
-	&& direct_internal_fn_supported_p (IFN_TRUNC, type,
-					  OPTIMIZE_FOR_BOTH))
-      (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+    (with {
+      tree int_type = element_type (@1);
+     }
+     (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+	  && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type, int_type,
+					     OPTIMIZE_FOR_BOTH))
+      (IFN_FTRUNC_INT @0 {
+       wide_int_to_tree (int_type, wi::max_value (TYPE_PRECISION (int_type),
+						  SIGNED)); })
+      (if (!flag_trapping_math
+	   && direct_internal_fn_supported_p (IFN_TRUNC, type,
+					      OPTIMIZE_FOR_BOTH))
+       (IFN_TRUNC @0))))))
 #endif
 
 /* If we have a narrowing conversion to an integral type that is fed by a
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b889ad2e5a08613db51d16d072080ac6cb48404f..57d259d33409265df3af1646d123e4ab216c34c8 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -63,6 +63,7 @@ OPTAB_CX(fractuns_optab, "fractuns$Q$b$I$a2")
 OPTAB_CL(satfract_optab, "satfract$b$Q$a2", SAT_FRACT, "satfract", gen_satfract_conv_libfunc)
 OPTAB_CL(satfractuns_optab, "satfractuns$I$b$Q$a2", UNSIGNED_SAT_FRACT, "satfractuns", gen_satfractuns_conv_libfunc)
 
+OPTAB_CD(ftruncint_optab, "ftrunc$a$b2")
 OPTAB_CD(sfixtrunc_optab, "fix_trunc$F$b$I$a2")
 OPTAB_CD(ufixtrunc_optab, "fixuns_trunc$F$b$I$a2")
 
diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
index 9e892e50c8559e497fcae1b77a36401df82fabe2..165a592d4d2c7bf525060dd51ce6094eb4f4f68a 100644
--- a/gcc/stor-layout.h
+++ b/gcc/stor-layout.h
@@ -36,7 +36,6 @@ extern void place_field (record_layout_info, tree);
 extern void compute_record_mode (tree);
 extern void finish_bitfield_layout (tree);
 extern void finish_record_layout (record_layout_info, int);
-extern unsigned int element_precision (const_tree);
 extern void finalize_size_functions (void);
 extern void fixup_unsigned_type (tree);
 extern void initialize_sizetypes (void);
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
new file mode 100644
index 0000000000000000000000000000000000000000..008e1cf9f4a1b0148128c65c9ea0d1bb111467b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
@@ -0,0 +1,91 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.5-a" } */
+/* { dg-require-effective-target aarch64_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**	frint32z	s0, s0
+**	ret
+*/
+float
+f1 (float x)
+{
+  int y = x;
+  return (float) y;
+}
+
+/*
+** f2:
+**	frint64z	s0, s0
+**	ret
+*/
+float
+f2 (float x)
+{
+  long long int y = x;
+  return (float) y;
+}
+
+/*
+** f3:
+**	frint32z	d0, d0
+**	ret
+*/
+double
+f3 (double x)
+{
+  int y = x;
+  return (double) y;
+}
+
+/*
+** f4:
+**	frint64z	d0, d0
+**	ret
+*/
+double
+f4 (double x)
+{
+  long long int y = x;
+  return (double) y;
+}
+
+float
+f1_dont (float x)
+{
+  unsigned int y = x;
+  return (float) y;
+}
+
+float
+f2_dont (float x)
+{
+  unsigned long long int y = x;
+  return (float) y;
+}
+
+double
+f3_dont (double x)
+{
+  unsigned int y = x;
+  return (double) y;
+}
+
+double
+f4_dont (double x)
+{
+  unsigned long long int y = x;
+  return (double) y;
+}
+
+double
+f5_dont (double x)
+{
+  signed short y = x;
+  return (double) y;
+}
+
+/* Make sure the 'dont's don't generate any frintNz.  */
+/* { dg-final { scan-assembler-times {frint32z} 2 } } */
+/* { dg-final { scan-assembler-times {frint64z} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
new file mode 100644
index 0000000000000000000000000000000000000000..b93304eb2acb3d3d954eebee51d77ff23fee68ac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.5-a" } */
+/* { dg-require-effective-target aarch64_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define TEST(name,float_type,int_type)					\
+void									\
+name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
+{									\
+  for (int i = 0; i < n; ++i)					      \
+    {								      \
+      int_type x_i = x[i];					      \
+      y[i] = (float_type) x_i;					      \
+    }								      \
+}
+
+/*
+** f1:
+**	...
+**	frint32z	v0.4s, v0.4s
+**	...
+*/
+TEST(f1, float, int)
+
+/*
+** f2:
+**	...
+**	frint64z	v0.4s, v0.4s
+**	...
+*/
+TEST(f2, float, long long)
+
+/*
+** f3:
+**	...
+**	frint32z	v0.2d, v0.2d
+**	...
+*/
+TEST(f3, double, int)
+
+/*
+** f4:
+**	...
+**	frint64z	v0.2d, v0.2d
+**	...
+*/
+TEST(f4, double, long long)
diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
index 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3d80871c4cebd5fb5cac0714b3feee27038f05fd 100644
--- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
+++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math" } */
+/* { dg-skip-if "" { aarch64_frintnzx_ok } } */
 
 float
 f1 (float x)
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8cbda192fe0fae59ea208ee43696b4d22c43e61e..450ca78230faeba40b89fc7987af27b6bf0a0d53 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11365,6 +11365,32 @@ proc check_effective_target_arm_v8_3a_bkey_directive { } {
 	}]
 }
 
+# Return 1 if the target supports Armv8.5-A scalar and Advanced SIMD
+# FRINT32[ZX] andd FRINT64[ZX] instructions, 0 otherwise. The test is valid for
+# AArch64.
+proc check_effective_target_aarch64_frintnzx_ok_nocache { } {
+
+    if { ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    if { [check_no_compiler_messages_nocache \
+	      aarch64_frintnzx_ok assembly {
+	#if !defined (__ARM_FEATURE_FRINT)
+	#error "__ARM_FEATURE_FRINT not defined"
+	#endif
+    } [current_compiler_flags]] } {
+	return 1;
+    }
+
+    return 0;
+}
+
+proc check_effective_target_aarch64_frintnzx_ok { } {
+    return [check_cached_effective_target aarch64_frintnzx_ok \
+                check_effective_target_aarch64_frintnzx_ok_nocache] 
+}
+
 # Return 1 if the target supports executing the Armv8.1-M Mainline Low
 # Overhead Loop, 0 otherwise.  The test is valid for ARM.
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 03cc7267cf80d4ce73c0d89ab86b07e84752456a..35bb1f70f7b173ad0d1e9f70ce0ac9da891dbe62 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
 
 static internal_fn
 vectorizable_internal_function (combined_fn cfn, tree fndecl,
-				tree vectype_out, tree vectype_in)
+				tree vectype_out, tree vectype_in,
+				tree *vectypes)
 {
   internal_fn ifn;
   if (internal_fn_p (cfn))
@@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
       const direct_internal_fn_info &info = direct_internal_fn (ifn);
       if (info.vectorizable)
 	{
-	  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
-	  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+	  tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
+	  if (!type0)
+	    type0 = vectype_in;
+	  tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
+	  if (!type1)
+	    type1 = vectype_in;
 	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
 					      OPTIMIZE_FOR_SPEED))
 	    return ifn;
@@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
       rhs_type = unsigned_type_node;
     }
 
-  int mask_opno = -1;
+  /* The argument that is not of the same type as the others.  */
+  int diff_opno = -1;
+  bool masked = false;
   if (internal_fn_p (cfn))
-    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
+    {
+      if (cfn == CFN_FTRUNC_INT)
+	/* For FTRUNC this represents the argument that carries the type of the
+	   intermediate signed integer.  */
+	diff_opno = 1;
+      else
+	{
+	  /* For masked operations this represents the argument that carries the
+	     mask.  */
+	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
+	  masked = diff_opno >=  0;
+	}
+    }
 
   for (i = 0; i < nargs; i++)
     {
-      if ((int) i == mask_opno)
+      if ((int) i == diff_opno && masked)
 	{
-	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
-				       &op, &slp_op[i], &dt[i], &vectypes[i]))
+	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
+				       diff_opno, &op, &slp_op[i], &dt[i],
+				       &vectypes[i]))
 	    return false;
 	  continue;
 	}
@@ -3275,27 +3295,35 @@ vectorizable_call (vec_info *vinfo,
 	  return false;
 	}
 
-      /* We can only handle calls with arguments of the same type.  */
-      if (rhs_type
-	  && !types_compatible_p (rhs_type, TREE_TYPE (op)))
+      if ((int) i != diff_opno)
 	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "argument types differ.\n");
-	  return false;
-	}
-      if (!rhs_type)
-	rhs_type = TREE_TYPE (op);
+	  /* We can only handle calls with arguments of the same type.  */
+	  if (rhs_type
+	      && !types_compatible_p (rhs_type, TREE_TYPE (op)))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "argument types differ.\n");
+	      return false;
+	    }
+	  if (!rhs_type)
+	    rhs_type = TREE_TYPE (op);
 
-      if (!vectype_in)
-	vectype_in = vectypes[i];
-      else if (vectypes[i]
-	       && !types_compatible_p (vectypes[i], vectype_in))
+	  if (!vectype_in)
+	    vectype_in = vectypes[i];
+	  else if (vectypes[i]
+		   && !types_compatible_p (vectypes[i], vectype_in))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "argument vector types differ.\n");
+	      return false;
+	    }
+	}
+      else
 	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "argument vector types differ.\n");
-	  return false;
+	  vectypes[i] = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op),
+						     slp_node);
 	}
     }
   /* If all arguments are external or constant defs, infer the vector type
@@ -3371,8 +3399,8 @@ vectorizable_call (vec_info *vinfo,
 	  || (modifier == NARROW
 	      && simple_integer_narrowing (vectype_out, vectype_in,
 					   &convert_code))))
-    ifn = vectorizable_internal_function (cfn, callee, vectype_out,
-					  vectype_in);
+    ifn = vectorizable_internal_function (cfn, callee, vectype_out, vectype_in,
+					  &vectypes[0]);
 
   /* If that fails, try asking for a target-specific built-in function.  */
   if (ifn == IFN_LAST)
@@ -3446,12 +3474,12 @@ vectorizable_call (vec_info *vinfo,
 	record_stmt_cost (cost_vec, ncopies / 2,
 			  vec_promote_demote, stmt_info, 0, vect_body);
 
-      if (loop_vinfo && mask_opno >= 0)
+      if (loop_vinfo && masked)
 	{
 	  unsigned int nvectors = (slp_node
 				   ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node)
 				   : ncopies);
-	  tree scalar_mask = gimple_call_arg (stmt_info->stmt, mask_opno);
+	  tree scalar_mask = gimple_call_arg (stmt_info->stmt, diff_opno);
 	  vect_record_loop_mask (loop_vinfo, masks, nvectors,
 				 vectype_out, scalar_mask);
 	}
@@ -3499,7 +3527,7 @@ vectorizable_call (vec_info *vinfo,
 		    {
 		      /* We don't define any narrowing conditional functions
 			 at present.  */
-		      gcc_assert (mask_opno < 0);
+		      gcc_assert (!masked);
 		      tree half_res = make_ssa_name (vectype_in);
 		      gcall *call
 			= gimple_build_call_internal_vec (ifn, vargs);
@@ -3519,15 +3547,15 @@ vectorizable_call (vec_info *vinfo,
 		    }
 		  else
 		    {
-		      if (mask_opno >= 0 && masked_loop_p)
+		      if (masked && masked_loop_p)
 			{
 			  unsigned int vec_num = vec_oprnds0.length ();
 			  /* Always true for SLP.  */
 			  gcc_assert (ncopies == 1);
 			  tree mask = vect_get_loop_mask (gsi, masks, vec_num,
 							  vectype_out, i);
-			  vargs[mask_opno] = prepare_load_store_mask
-			    (TREE_TYPE (mask), mask, vargs[mask_opno], gsi);
+			  vargs[diff_opno] = prepare_load_store_mask
+			    (TREE_TYPE (mask), mask, vargs[diff_opno], gsi);
 			}
 
 		      gcall *call;
@@ -3559,13 +3587,13 @@ vectorizable_call (vec_info *vinfo,
 	      orig_vargs[i] = vargs[i] = vec_defs[i][j];
 	    }
 
-	  if (mask_opno >= 0 && masked_loop_p)
+	  if (masked && masked_loop_p)
 	    {
 	      tree mask = vect_get_loop_mask (gsi, masks, ncopies,
 					      vectype_out, j);
-	      vargs[mask_opno]
+	      vargs[diff_opno]
 		= prepare_load_store_mask (TREE_TYPE (mask), mask,
-					   vargs[mask_opno], gsi);
+					   vargs[diff_opno], gsi);
 	    }
 
 	  gimple *new_stmt;
@@ -3584,7 +3612,7 @@ vectorizable_call (vec_info *vinfo,
 	    {
 	      /* We don't define any narrowing conditional functions at
 		 present.  */
-	      gcc_assert (mask_opno < 0);
+	      gcc_assert (!masked);
 	      tree half_res = make_ssa_name (vectype_in);
 	      gcall *call = gimple_build_call_internal_vec (ifn, vargs);
 	      gimple_call_set_lhs (call, half_res);
@@ -3628,7 +3656,7 @@ vectorizable_call (vec_info *vinfo,
     {
       auto_vec<vec<tree> > vec_defs (nargs);
       /* We don't define any narrowing conditional functions at present.  */
-      gcc_assert (mask_opno < 0);
+      gcc_assert (!masked);
       for (j = 0; j < ncopies; ++j)
 	{
 	  /* Build argument list for the vectorized call.  */
diff --git a/gcc/tree.h b/gcc/tree.h
index f62c00bc8707029db52e2f3fe529948755235d3d..31ce45a84cc267ea2022c8ca6323368fbe15eb8b 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -6547,4 +6547,12 @@ extern unsigned fndecl_dealloc_argno (tree);
    object or pointer.  Otherwise return null.  */
 extern tree get_attr_nonstring_decl (tree, tree * = NULL);
 
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
+extern tree element_type (const_tree);
+
+/* Return the precision of the type, or for a complex or vector type the
+   precision of the type of its elements.  */
+extern unsigned int element_precision (const_tree);
+
 #endif  /* GCC_TREE_H  */
diff --git a/gcc/tree.c b/gcc/tree.c
index 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
   return true;
 }
 
-/* Return the precision of the type, or for a complex or vector type the
-   precision of the type of its elements.  */
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
 
-unsigned int
-element_precision (const_tree type)
+tree
+element_type (const_tree type)
 {
   if (!TYPE_P (type))
     type = TREE_TYPE (type);
@@ -6657,7 +6657,16 @@ element_precision (const_tree type)
   if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
     type = TREE_TYPE (type);
 
-  return TYPE_PRECISION (type);
+  return (tree) type;
+}
+
+/* Return the precision of the type, or for a complex or vector type the
+   precision of the type of its elements.  */
+
+unsigned int
+element_precision (const_tree type)
+{
+  return TYPE_PRECISION (element_type (type));
 }
 
 /* Return true if CODE represents an associative tree code.  Otherwise
  
Andre Vieira (lists) Nov. 29, 2021, 11:17 a.m. UTC | #10
On 18/11/2021 11:05, Richard Biener wrote:
>
> +     (if (!flag_trapping_math
> +         && direct_internal_fn_supported_p (IFN_TRUNC, type,
> +                                            OPTIMIZE_FOR_BOTH))
> +      (IFN_TRUNC @0)))))
>   #endif
>
> does IFN_FTRUNC_INT preserve the same exceptions as doing
> explicit intermediate float->int conversions?  I think I'd
> prefer to have !flag_trapping_math on both cases.
I realized I never responded to this. The AArch64 instructions mimic the 
behaviour you'd see if you were doing explicit conversions, so I'll be 
defining the new IFN and optab to require the same, such that these can 
be used by the compiler when flag_trapping_math. In the patch I sent 
last I added some likes to the md.texi description of the optab to that 
intent.
  
Andre Vieira (lists) Dec. 7, 2021, 11:29 a.m. UTC | #11
ping

On 25/11/2021 13:53, Andre Vieira (lists) via Gcc-patches wrote:
>
> On 22/11/2021 11:41, Richard Biener wrote:
>>
>>> On 18/11/2021 11:05, Richard Biener wrote:
>>>> This is a good shout and made me think about something I hadn't 
>>>> before... I
>>>> thought I could handle the vector forms later, but the problem is 
>>>> if I add
>>>> support for the scalar, it will stop the vectorizer. It seems
>>>> vectorizable_call expects all arguments to have the same type, 
>>>> which doesn't
>>>> work with passing the integer type as an operand work around.
>> We already special case some IFNs there (masked load/store and gather)
>> to ignore some args, so that would just add to this set.
>>
>> Richard.
> Hi,
>
> Reworked it to add support of the new IFN to the vectorizer. Was 
> initially trying to make vectorizable_call and 
> vectorizable_internal_function handle IFNs with different inputs more 
> generically, using the information we have in the <IFN>_direct structs 
> regarding what operands to get the modes from. Unfortunately, that 
> wasn't straightforward because of how vectorizable_call assumes 
> operands have the same type and uses the type of the DEF_STMT_INFO of 
> the non-constant operands (either output operand or non-constant 
> inputs) to determine the type of constants. I assume there is some 
> reason why we use the DEF_STMT_INFO and not always use 
> get_vectype_for_scalar_type on the argument types. That is why I ended 
> up with this sort of half-way mix of both, which still allows room to 
> add more IFNs that don't take inputs of the same type, but require 
> adding a bit of special casing similar to the IFN_FTRUNC_INT and 
> masking ones.
>
> Bootstrapped on aarch64-none-linux.
>
> OK for trunk?
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New 
> pattern.
>         * config/aarch64/iterators.md (FRINTNZ): New iterator.
>         (frintnz_mode): New int attribute.
>         (VSFDF): Make iterator conditional.
>         * internal-fn.def (FTRUNC_INT): New IFN.
>         * internal-fn.c (ftrunc_int_direct): New define.
>         (expand_ftrunc_int_optab_fn): New custom expander.
>         (direct_ftrunc_int_optab_supported_p): New supported_p.
>         * match.pd: Add to the existing TRUNC pattern match.
>         * optabs.def (ftrunc_int): New entry.
>         * stor-layout.h (element_precision): Moved from here...
>         * tree.h (element_precision): ... to here.
>         (element_type): New declaration.
>         * tree.c (element_type): New function.
>         (element_precision): Changed to use element_type.
>         * tree-vect-stmts.c (vectorizable_internal_function): Add 
> support for
>         IFNs with different input types.
>         (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
>         * doc/md.texi: New entry for ftrunc pattern name.
>         * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if 
> frintNz instruction available.
>         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
>         * gcc.target/aarch64/frintnz.c: New test.
>         * gcc.target/aarch64/frintnz_vec.c: New test.
  
Richard Sandiford Dec. 17, 2021, 12:44 p.m. UTC | #12
"Andre Vieira (lists) via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> On 22/11/2021 11:41, Richard Biener wrote:
>>
>>> On 18/11/2021 11:05, Richard Biener wrote:
>>>> This is a good shout and made me think about something I hadn't before... I
>>>> thought I could handle the vector forms later, but the problem is if I add
>>>> support for the scalar, it will stop the vectorizer. It seems
>>>> vectorizable_call expects all arguments to have the same type, which doesn't
>>>> work with passing the integer type as an operand work around.
>> We already special case some IFNs there (masked load/store and gather)
>> to ignore some args, so that would just add to this set.
>>
>> Richard.
> Hi,
>
> Reworked it to add support of the new IFN to the vectorizer. Was 
> initially trying to make vectorizable_call and 
> vectorizable_internal_function handle IFNs with different inputs more 
> generically, using the information we have in the <IFN>_direct structs 
> regarding what operands to get the modes from. Unfortunately, that 
> wasn't straightforward because of how vectorizable_call assumes operands 
> have the same type and uses the type of the DEF_STMT_INFO of the 
> non-constant operands (either output operand or non-constant inputs) to 
> determine the type of constants. I assume there is some reason why we 
> use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on 
> the argument types. That is why I ended up with this sort of half-way 
> mix of both, which still allows room to add more IFNs that don't take 
> inputs of the same type, but require adding a bit of special casing 
> similar to the IFN_FTRUNC_INT and masking ones.
>
> Bootstrapped on aarch64-none-linux.

Still leaving the match.pd stuff to Richard, but otherwise:

> index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463f6a46f9fd0fb43f 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST")
>  				  SF DF])
>  
>  ;; Scalar and vetor modes for SF, DF.
> -(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
> +(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD")

Nit: excess space between [ and (.

> +			      (V4SF "TARGET_SIMD")
> +			      (V2DF "TARGET_SIMD")
> +			      (DF "TARGET_FLOAT")
> +			      (SF "TARGET_FLOAT")])
>  
>  ;; Advanced SIMD single Float modes.
>  (define_mode_iterator VDQSF [V2SF V4SF])
> […]
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 41f1850bf6e95005647ca97a495a97d7e184d137..d50d09b0ae60d98537b9aece4396a490f33f174c 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6175,6 +6175,15 @@ operands; otherwise, it may not.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
> +@item @samp{ftrunc@var{m}@var{n}2}
> +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
> +the result in operand 0. Both operands have mode @var{m}, which is a scalar or
> +vector floating-point mode.  Exception must be thrown if operand 1 does not fit

Maybe “An exception must be raised”?  “thrown” makes it sound like a
signal must be raised or C++ exception thrown.

> +in a @var{n} mode signed integer as it would have if the truncation happened
> +through separate floating point to integer conversion.
> +
> +
>  @cindex @code{round@var{m}2} instruction pattern
>  @item @samp{round@var{m}2}
>  Round operand 1 to the nearest integer, rounding away from zero in the
> […]
> @@ -3688,6 +3712,15 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  	  != CODE_FOR_nothing);
>  }
>  
> +static bool direct_ftrunc_int_optab_supported_p (convert_optab optab,
> +						 tree_pair types,
> +						 optimization_type opt_type)

Formatting nit: should be a line break after “bool”

> +{
> +  return (convert_optab_handler (optab, TYPE_MODE (types.first),
> +				TYPE_MODE (element_type (types.second)),
> +				opt_type) != CODE_FOR_nothing);
> +}
> +
>  #define direct_unary_optab_supported_p direct_optab_supported_p
>  #define direct_binary_optab_supported_p direct_optab_supported_p
>  #define direct_ternary_optab_supported_p direct_optab_supported_p
> […]
> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b93304eb2acb3d3d954eebee51d77ff23fee68ac
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#define TEST(name,float_type,int_type)					\
> +void									\
> +name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
> +{									\
> +  for (int i = 0; i < n; ++i)					      \
> +    {								      \
> +      int_type x_i = x[i];					      \
> +      y[i] = (float_type) x_i;					      \
> +    }								      \
> +}
> +
> +/*
> +** f1:
> +**	...
> +**	frint32z	v0.4s, v0.4s

I don't think we can rely on v0 being used here.  v[0-9]+\.4s would
be safer.

> +**	...
> +*/
> +TEST(f1, float, int)
> +
> +/*
> +** f2:
> +**	...
> +**	frint64z	v0.4s, v0.4s
> +**	...
> +*/
> +TEST(f2, float, long long)
> +
> +/*
> +** f3:
> +**	...
> +**	frint32z	v0.2d, v0.2d
> +**	...
> +*/
> +TEST(f3, double, int)
> +
> +/*
> +** f4:
> +**	...
> +**	frint64z	v0.2d, v0.2d
> +**	...
> +*/
> +TEST(f4, double, long long)
> […]
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 03cc7267cf80d4ce73c0d89ab86b07e84752456a..35bb1f70f7b173ad0d1e9f70ce0ac9da891dbe62 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
>  
>  static internal_fn
>  vectorizable_internal_function (combined_fn cfn, tree fndecl,
> -				tree vectype_out, tree vectype_in)
> +				tree vectype_out, tree vectype_in,
> +				tree *vectypes)
>  {
>    internal_fn ifn;
>    if (internal_fn_p (cfn))
> @@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
>        const direct_internal_fn_info &info = direct_internal_fn (ifn);
>        if (info.vectorizable)
>  	{
> -	  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
> -	  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> +	  tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
> +	  if (!type0)
> +	    type0 = vectype_in;
> +	  tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
> +	  if (!type1)
> +	    type1 = vectype_in;
>  	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
>  					      OPTIMIZE_FOR_SPEED))
>  	    return ifn;
> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
>        rhs_type = unsigned_type_node;
>      }
>  
> -  int mask_opno = -1;
> +  /* The argument that is not of the same type as the others.  */
> +  int diff_opno = -1;
> +  bool masked = false;
>    if (internal_fn_p (cfn))
> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> +    {
> +      if (cfn == CFN_FTRUNC_INT)
> +	/* For FTRUNC this represents the argument that carries the type of the
> +	   intermediate signed integer.  */
> +	diff_opno = 1;
> +      else
> +	{
> +	  /* For masked operations this represents the argument that carries the
> +	     mask.  */
> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
> +	  masked = diff_opno >=  0;
> +	}
> +    }

I think it would be cleaner not to process argument 1 at all for
CFN_FTRUNC_INT.  There's no particular need to vectorise it.

>    for (i = 0; i < nargs; i++)
>      {
> -      if ((int) i == mask_opno)
> +      if ((int) i == diff_opno && masked)
>  	{
> -	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
> -				       &op, &slp_op[i], &dt[i], &vectypes[i]))
> +	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
> +				       diff_opno, &op, &slp_op[i], &dt[i],
> +				       &vectypes[i]))
>  	    return false;
>  	  continue;
>  	}
> […]
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
>    return true;
>  }
>  
> -/* Return the precision of the type, or for a complex or vector type the
> -   precision of the type of its elements.  */
> +/* Return the type, or for a complex or vector type the type of its
> +   elements.  */
>  
> -unsigned int
> -element_precision (const_tree type)
> +tree
> +element_type (const_tree type)
>  {
>    if (!TYPE_P (type))
>      type = TREE_TYPE (type);
> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
>    if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
>      type = TREE_TYPE (type);
>  
> -  return TYPE_PRECISION (type);
> +  return (tree) type;

I think we should stick a const_cast in element_precision and make
element_type take a plain “type”.  As it stands element_type is an
implicit const_cast for many cases.

Thanks,
Richard

> +}
> +
> +/* Return the precision of the type, or for a complex or vector type the
> +   precision of the type of its elements.  */
> +
> +unsigned int
> +element_precision (const_tree type)
> +{
> +  return TYPE_PRECISION (element_type (type));
>  }
>  
>  /* Return true if CODE represents an associative tree code.  Otherwise
  
Andre Vieira (lists) Dec. 29, 2021, 3:55 p.m. UTC | #13
Hi Richard,

Thank you for the review, I've adopted all above suggestions downstream, 
I am still surprised how many style things I still miss after years of 
gcc development :(

On 17/12/2021 12:44, Richard Sandiford wrote:
>
>> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
>>         rhs_type = unsigned_type_node;
>>       }
>>   
>> -  int mask_opno = -1;
>> +  /* The argument that is not of the same type as the others.  */
>> +  int diff_opno = -1;
>> +  bool masked = false;
>>     if (internal_fn_p (cfn))
>> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
>> +    {
>> +      if (cfn == CFN_FTRUNC_INT)
>> +	/* For FTRUNC this represents the argument that carries the type of the
>> +	   intermediate signed integer.  */
>> +	diff_opno = 1;
>> +      else
>> +	{
>> +	  /* For masked operations this represents the argument that carries the
>> +	     mask.  */
>> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
>> +	  masked = diff_opno >=  0;
>> +	}
>> +    }
> I think it would be cleaner not to process argument 1 at all for
> CFN_FTRUNC_INT.  There's no particular need to vectorise it.

I agree with this,  will change the loop to continue for argument 1 when 
dealing with non-masked CFN's.

>>   	}
>> […]
>> diff --git a/gcc/tree.c b/gcc/tree.c
>> index 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 100644
>> --- a/gcc/tree.c
>> +++ b/gcc/tree.c
>> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
>>     return true;
>>   }
>>   
>> -/* Return the precision of the type, or for a complex or vector type the
>> -   precision of the type of its elements.  */
>> +/* Return the type, or for a complex or vector type the type of its
>> +   elements.  */
>>   
>> -unsigned int
>> -element_precision (const_tree type)
>> +tree
>> +element_type (const_tree type)
>>   {
>>     if (!TYPE_P (type))
>>       type = TREE_TYPE (type);
>> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
>>     if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
>>       type = TREE_TYPE (type);
>>   
>> -  return TYPE_PRECISION (type);
>> +  return (tree) type;
> I think we should stick a const_cast in element_precision and make
> element_type take a plain “type”.  As it stands element_type is an
> implicit const_cast for many cases.
>
> Thanks,
Was just curious about something here, I thought the purpose of having 
element_precision (before) and element_type (now) take a const_tree as 
an argument was to make it clear we aren't changing the input type. I 
understand that as it stands element_type could be an implicit 
const_cast (which I should be using rather than the '(tree)' cast), but 
that's only if 'type' is a type that isn't complex/vector, either way, 
we are conforming to the promise that we aren't changing the incoming 
type, what the caller then does with the result is up to them no?

I don't mind making the changes, just trying to understand the reasoning 
behind it.

I'll send in a new patch with all the changes after the review on the 
match.pd stuff.

Thanks,
Andre
  
Richard Sandiford Dec. 29, 2021, 4:54 p.m. UTC | #14
"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> On 17/12/2021 12:44, Richard Sandiford wrote:
>>
>>> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
>>>         rhs_type = unsigned_type_node;
>>>       }
>>>   
>>> -  int mask_opno = -1;
>>> +  /* The argument that is not of the same type as the others.  */
>>> +  int diff_opno = -1;
>>> +  bool masked = false;
>>>     if (internal_fn_p (cfn))
>>> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
>>> +    {
>>> +      if (cfn == CFN_FTRUNC_INT)
>>> +	/* For FTRUNC this represents the argument that carries the type of the
>>> +	   intermediate signed integer.  */
>>> +	diff_opno = 1;
>>> +      else
>>> +	{
>>> +	  /* For masked operations this represents the argument that carries the
>>> +	     mask.  */
>>> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
>>> +	  masked = diff_opno >=  0;
>>> +	}
>>> +    }
>> I think it would be cleaner not to process argument 1 at all for
>> CFN_FTRUNC_INT.  There's no particular need to vectorise it.
>
> I agree with this,  will change the loop to continue for argument 1 when 
> dealing with non-masked CFN's.
>
>>>   	}
>>> […]
>>> diff --git a/gcc/tree.c b/gcc/tree.c
>>> index 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 100644
>>> --- a/gcc/tree.c
>>> +++ b/gcc/tree.c
>>> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
>>>     return true;
>>>   }
>>>   
>>> -/* Return the precision of the type, or for a complex or vector type the
>>> -   precision of the type of its elements.  */
>>> +/* Return the type, or for a complex or vector type the type of its
>>> +   elements.  */
>>>   
>>> -unsigned int
>>> -element_precision (const_tree type)
>>> +tree
>>> +element_type (const_tree type)
>>>   {
>>>     if (!TYPE_P (type))
>>>       type = TREE_TYPE (type);
>>> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
>>>     if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
>>>       type = TREE_TYPE (type);
>>>   
>>> -  return TYPE_PRECISION (type);
>>> +  return (tree) type;
>> I think we should stick a const_cast in element_precision and make
>> element_type take a plain “type”.  As it stands element_type is an
>> implicit const_cast for many cases.
>>
>> Thanks,
> Was just curious about something here, I thought the purpose of having 
> element_precision (before) and element_type (now) take a const_tree as 
> an argument was to make it clear we aren't changing the input type. I 
> understand that as it stands element_type could be an implicit 
> const_cast (which I should be using rather than the '(tree)' cast), but 
> that's only if 'type' is a type that isn't complex/vector, either way, 
> we are conforming to the promise that we aren't changing the incoming 
> type, what the caller then does with the result is up to them no?
>
> I don't mind making the changes, just trying to understand the reasoning 
> behind it.

The problem with the above is that functions like the following become
well-typed:

void
foo (const_tree t)
{
  TYPE_MODE (element_type (t)) = VOIDmode;
}

even though element_type (t) could well be t.

One of the points of const_tree (and const pointer targets in general)
is to use the type system to enforce the promise that the value isn't
changed.

I guess the above is similar to the traditional problem with functions
like index and strstr, which take a const char * but return a char *.
So for example:

void
foo (const char *x)
{
  index (x, '.') = 0;
}

is well-typed.  But the equivalent C++ code (using iterators) would be
rejected.  If C allowed overloading them the correct prototypes would be:

    const char *index (const char *, int);
    char *index (char *, int);

And I think the same applies here.  Either we should provide two functions:

    const_tree element_type (const_tree);
    tree element_type (tree);

or just the latter.

Thanks,
Richard
  
Richard Biener Jan. 3, 2022, 12:18 p.m. UTC | #15
On Wed, 29 Dec 2021, Andre Vieira (lists) wrote:

> Hi Richard,
> 
> Thank you for the review, I've adopted all above suggestions downstream, I am
> still surprised how many style things I still miss after years of gcc
> development :(
> 
> On 17/12/2021 12:44, Richard Sandiford wrote:
> >
> >> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
> >>         rhs_type = unsigned_type_node;
> >>       }
> >>   -  int mask_opno = -1;
> >> +  /* The argument that is not of the same type as the others.  */
> >> +  int diff_opno = -1;
> >> +  bool masked = false;
> >>     if (internal_fn_p (cfn))
> >> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> >> +    {
> >> +      if (cfn == CFN_FTRUNC_INT)
> >> +	/* For FTRUNC this represents the argument that carries the type of
> >> the
> >> +	   intermediate signed integer.  */
> >> +	diff_opno = 1;
> >> +      else
> >> +	{
> >> +	  /* For masked operations this represents the argument that carries
> >> the
> >> +	     mask.  */
> >> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
> >> +	  masked = diff_opno >=  0;
> >> +	}
> >> +    }
> > I think it would be cleaner not to process argument 1 at all for
> > CFN_FTRUNC_INT.  There's no particular need to vectorise it.
> 
> I agree with this,  will change the loop to continue for argument 1 when
> dealing with non-masked CFN's.
> 
> >>   	}
> >> […]
> >> diff --git a/gcc/tree.c b/gcc/tree.c
> >> index
> >> 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
> >> 100644
> >> --- a/gcc/tree.c
> >> +++ b/gcc/tree.c
> >> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size,
> >> cst_size_error *perr /* = NULL */)
> >>     return true;
> >>   }
> >>   
> >> -/* Return the precision of the type, or for a complex or vector type the
> >> -   precision of the type of its elements.  */
> >> +/* Return the type, or for a complex or vector type the type of its
> >> +   elements.  */
> >>   -unsigned int
> >> -element_precision (const_tree type)
> >> +tree
> >> +element_type (const_tree type)
> >>   {
> >>     if (!TYPE_P (type))
> >>       type = TREE_TYPE (type);
> >> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
> >>     if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
> >>       type = TREE_TYPE (type);
> >>   -  return TYPE_PRECISION (type);
> >> +  return (tree) type;
> > I think we should stick a const_cast in element_precision and make
> > element_type take a plain “type”.  As it stands element_type is an
> > implicit const_cast for many cases.
> >
> > Thanks,
> Was just curious about something here, I thought the purpose of having
> element_precision (before) and element_type (now) take a const_tree as an
> argument was to make it clear we aren't changing the input type. I understand
> that as it stands element_type could be an implicit const_cast (which I should
> be using rather than the '(tree)' cast), but that's only if 'type' is a type
> that isn't complex/vector, either way, we are conforming to the promise that
> we aren't changing the incoming type, what the caller then does with the
> result is up to them no?
> 
> I don't mind making the changes, just trying to understand the reasoning
> behind it.
> 
> I'll send in a new patch with all the changes after the review on the match.pd
> stuff.

I'm missing an updated patch after my initial review of the match.pd
stuff so not sure what to review.  Can you re-post and updated patch?

Thanks,
Richard.
  
Andre Vieira (lists) Jan. 10, 2022, 2:09 p.m. UTC | #16
Yeah seems I forgot to send the latest version, my bad.

Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New 
pattern.
         * config/aarch64/iterators.md (FRINTNZ): New iterator.
         (frintnz_mode): New int attribute.
         (VSFDF): Make iterator conditional.
         * internal-fn.def (FTRUNC_INT): New IFN.
         * internal-fn.c (ftrunc_int_direct): New define.
         (expand_ftrunc_int_optab_fn): New custom expander.
         (direct_ftrunc_int_optab_supported_p): New supported_p.
         * match.pd: Add to the existing TRUNC pattern match.
         * optabs.def (ftrunc_int): New entry.
         * stor-layout.h (element_precision): Moved from here...
         * tree.h (element_precision): ... to here.
         (element_type): New declaration.
         * tree.c (element_type): New function.
         (element_precision): Changed to use element_type.
         * tree-vect-stmts.c (vectorizable_internal_function): Add 
support for
         IFNs with different input types.
         (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
         * doc/md.texi: New entry for ftrunc pattern name.
         * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.

gcc/testsuite/ChangeLog:

         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.
         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
         * gcc.target/aarch64/frintnz.c: New test.
         * gcc.target/aarch64/frintnz_vec.c: New test.

On 03/01/2022 12:18, Richard Biener wrote:
> On Wed, 29 Dec 2021, Andre Vieira (lists) wrote:
>
>> Hi Richard,
>>
>> Thank you for the review, I've adopted all above suggestions downstream, I am
>> still surprised how many style things I still miss after years of gcc
>> development :(
>>
>> On 17/12/2021 12:44, Richard Sandiford wrote:
>>>> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
>>>>          rhs_type = unsigned_type_node;
>>>>        }
>>>>    -  int mask_opno = -1;
>>>> +  /* The argument that is not of the same type as the others.  */
>>>> +  int diff_opno = -1;
>>>> +  bool masked = false;
>>>>      if (internal_fn_p (cfn))
>>>> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
>>>> +    {
>>>> +      if (cfn == CFN_FTRUNC_INT)
>>>> +	/* For FTRUNC this represents the argument that carries the type of
>>>> the
>>>> +	   intermediate signed integer.  */
>>>> +	diff_opno = 1;
>>>> +      else
>>>> +	{
>>>> +	  /* For masked operations this represents the argument that carries
>>>> the
>>>> +	     mask.  */
>>>> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
>>>> +	  masked = diff_opno >=  0;
>>>> +	}
>>>> +    }
>>> I think it would be cleaner not to process argument 1 at all for
>>> CFN_FTRUNC_INT.  There's no particular need to vectorise it.
>> I agree with this,  will change the loop to continue for argument 1 when
>> dealing with non-masked CFN's.
>>
>>>>    	}
>>>> […]
>>>> diff --git a/gcc/tree.c b/gcc/tree.c
>>>> index
>>>> 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
>>>> 100644
>>>> --- a/gcc/tree.c
>>>> +++ b/gcc/tree.c
>>>> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size,
>>>> cst_size_error *perr /* = NULL */)
>>>>      return true;
>>>>    }
>>>>    
>>>> -/* Return the precision of the type, or for a complex or vector type the
>>>> -   precision of the type of its elements.  */
>>>> +/* Return the type, or for a complex or vector type the type of its
>>>> +   elements.  */
>>>>    -unsigned int
>>>> -element_precision (const_tree type)
>>>> +tree
>>>> +element_type (const_tree type)
>>>>    {
>>>>      if (!TYPE_P (type))
>>>>        type = TREE_TYPE (type);
>>>> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
>>>>      if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
>>>>        type = TREE_TYPE (type);
>>>>    -  return TYPE_PRECISION (type);
>>>> +  return (tree) type;
>>> I think we should stick a const_cast in element_precision and make
>>> element_type take a plain “type”.  As it stands element_type is an
>>> implicit const_cast for many cases.
>>>
>>> Thanks,
>> Was just curious about something here, I thought the purpose of having
>> element_precision (before) and element_type (now) take a const_tree as an
>> argument was to make it clear we aren't changing the input type. I understand
>> that as it stands element_type could be an implicit const_cast (which I should
>> be using rather than the '(tree)' cast), but that's only if 'type' is a type
>> that isn't complex/vector, either way, we are conforming to the promise that
>> we aren't changing the incoming type, what the caller then does with the
>> result is up to them no?
>>
>> I don't mind making the changes, just trying to understand the reasoning
>> behind it.
>>
>> I'll send in a new patch with all the changes after the review on the match.pd
>> stuff.
> I'm missing an updated patch after my initial review of the match.pd
> stuff so not sure what to review.  Can you re-post and updated patch?
>
> Thanks,
> Richard.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 3c72bdad01bfab49ee4ae6fb7b139202e89c8d34..9d04a2e088cd7d03963b58ed3708c339b841801c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7423,12 +7423,18 @@ (define_insn "despeculate_simpleti"
    (set_attr "speculation_barrier" "true")]
 )
 
+(define_expand "ftrunc<mode><frintnz_mode>2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+        (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+		      FRINTNZ))]
+  "TARGET_FRINT"
+)
+
 (define_insn "aarch64_<frintnzs_op><mode>"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
 	(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
 		      FRINTNZX))]
-  "TARGET_FRINT && TARGET_FLOAT
-   && !(VECTOR_MODE_P (<MODE>mode) && !TARGET_SIMD)"
+  "TARGET_FRINT"
   "<frintnzs_op>\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
   [(set_attr "type" "f_rint<stype>")]
 )
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 9160ce3e69c2c6b1b75e46f7aabd27e7949a269a..7962b15a87db2f1ede3836efbb827b8fb95266da 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -163,7 +163,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST")
 				  SF DF])
 
 ;; Scalar and vetor modes for SF, DF.
-(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
+(define_mode_iterator VSFDF [(V2SF "TARGET_SIMD")
+			     (V4SF "TARGET_SIMD")
+			     (V2DF "TARGET_SIMD")
+			     (DF "TARGET_FLOAT")
+			     (SF "TARGET_FLOAT")])
 
 ;; Advanced SIMD single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
@@ -3078,6 +3082,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
 			       UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3485,6 +3491,8 @@ (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s")
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X "frint32x")
 			      (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X "frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_<xx>.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
 			 (UNSPEC_COND_CMPGE_WIDE "ge")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 19e89ae502bc2f51db64667b236c1cb669718b02..3b0e4e0875b4392ab6833568b207580ef597a98f 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6191,6 +6191,15 @@ operands; otherwise, it may not.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
+@item @samp{ftrunc@var{m}@var{n}2}
+Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
+the result in operand 0. Both operands have mode @var{m}, which is a scalar or
+vector floating-point mode.  An exception must be raised if operand 1 does not
+fit in a @var{n} mode signed integer as it would have if the truncation
+happened through separate floating point to integer conversion.
+
+
 @cindex @code{round@var{m}2} instruction pattern
 @item @samp{round@var{m}2}
 Round operand 1 to the nearest integer, rounding away from zero in the
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 6095a35cd4565fdb7d758104e80fe6411230f758..a56bbb775572fa72379854f90a01ad543557e29a 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2286,6 +2286,10 @@ Like @code{aarch64_sve_hw}, but also test for an exact hardware vector length.
 @item aarch64_fjcvtzs_hw
 AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
 instruction.
+
+@item aarch64_frintzx_ok
+AArch64 target that is able to generate the Armv8.5-a FRINT32Z, FRINT64Z,
+FRINT32X and FRINT64X instructions.
 @end table
 
 @subsubsection MIPS-specific attributes
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index b24102a5990bea4cbb102069f7a6df497fc81ebf..9047b486f41948059a7a7f1ccc4032410a369139 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -130,6 +130,7 @@ init_internal_fns ()
 #define fold_left_direct { 1, 1, false }
 #define mask_fold_left_direct { 1, 1, false }
 #define check_ptrs_direct { 0, 0, false }
+#define ftrunc_int_direct { 0, 1, true }
 
 const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
@@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_optab optab)
   return convert_optab_handler (optab, imode, vmode);
 }
 
+/* Expand FTRUNC_INT call STMT using optab OPTAB.  */
+
+static void
+expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[2];
+  tree lhs, float_type, int_type;
+  rtx target, op;
+
+  lhs = gimple_call_lhs (stmt);
+  target = expand_normal (lhs);
+  op = expand_normal (gimple_call_arg (stmt, 0));
+
+  float_type = TREE_TYPE (lhs);
+  int_type = element_type (gimple_call_arg (stmt, 1));
+
+  create_output_operand (&ops[0], target, TYPE_MODE (float_type));
+  create_input_operand (&ops[1], op, TYPE_MODE (float_type));
+
+  expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type),
+				      TYPE_MODE (int_type)), 2, ops);
+}
+
 /* Expand LOAD_LANES call STMT using optab OPTAB.  */
 
 static void
@@ -3747,6 +3771,15 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 	  != CODE_FOR_nothing);
 }
 
+static bool
+direct_ftrunc_int_optab_supported_p (convert_optab optab, tree_pair types,
+				     optimization_type opt_type)
+{
+  return (convert_optab_handler (optab, TYPE_MODE (types.first),
+				TYPE_MODE (element_type (types.second)),
+				opt_type) != CODE_FOR_nothing);
+}
+
 #define direct_unary_optab_supported_p direct_optab_supported_p
 #define direct_binary_optab_supported_p direct_optab_supported_p
 #define direct_ternary_optab_supported_p direct_optab_supported_p
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 8891071a6a360961643731094379b607f317af17..a0fd75829e942f529c879c669e58c098b62b26ba 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -66,6 +66,9 @@ along with GCC; see the file COPYING3.  If not see
 
    - fold_left: for scalar = FN (scalar, vector), keyed off the vector mode
    - check_ptrs: used for check_{raw,war}_ptrs
+   - ftrunc_int: a unary conversion optab that takes and returns values of the
+   same mode, but internally converts via another mode.  This second mode is
+   specified using a dummy final function argument.
 
    DEF_INTERNAL_SIGNED_OPTAB_FN defines an internal function that
    maps to one of two optabs, depending on the signedness of an input.
@@ -275,6 +278,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC_INT, ECF_CONST, ftruncint, ftrunc_int)
 
 /* Binary math functions.  */
 DEF_INTERNAL_FLT_FN (ATAN2, ECF_CONST, atan2, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index 84c9b918041eef3409bdb0fbe04565b90b25d6e9..a5d892ac1ebfaa7b5d5fa970baa04c8e5b8acb28 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3751,12 +3751,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    trapping behaviour, so require !flag_trapping_math. */
 #if GIMPLE
 (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-	&& types_match (type, TREE_TYPE (@0))
-	&& direct_internal_fn_supported_p (IFN_TRUNC, type,
-					  OPTIMIZE_FOR_BOTH))
-      (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+    (with {
+      tree int_type = element_type (@1);
+     }
+     (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+	  && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type, int_type,
+					     OPTIMIZE_FOR_BOTH))
+      (IFN_FTRUNC_INT @0 {
+       wide_int_to_tree (int_type, wi::max_value (TYPE_PRECISION (int_type),
+						  SIGNED)); })
+      (if (!flag_trapping_math
+	   && direct_internal_fn_supported_p (IFN_TRUNC, type,
+					      OPTIMIZE_FOR_BOTH))
+       (IFN_TRUNC @0))))))
 #endif
 
 /* If we have a narrowing conversion to an integral type that is fed by a
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 5fcf5386a0b3112ef9004055c82e15fe47668970..04a4ee82e15fe7b52e726f2ee0bf704c30ac450d 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -63,6 +63,7 @@ OPTAB_CX(fractuns_optab, "fractuns$Q$b$I$a2")
 OPTAB_CL(satfract_optab, "satfract$b$Q$a2", SAT_FRACT, "satfract", gen_satfract_conv_libfunc)
 OPTAB_CL(satfractuns_optab, "satfractuns$I$b$Q$a2", UNSIGNED_SAT_FRACT, "satfractuns", gen_satfractuns_conv_libfunc)
 
+OPTAB_CD(ftruncint_optab, "ftrunc$a$b2")
 OPTAB_CD(sfixtrunc_optab, "fix_trunc$F$b$I$a2")
 OPTAB_CD(ufixtrunc_optab, "fixuns_trunc$F$b$I$a2")
 
diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
index b67abebc0096113272bfb1221eabaabd08657a58..e0219c8af4846ea0f947586b1915d9d06cb6c107 100644
--- a/gcc/stor-layout.h
+++ b/gcc/stor-layout.h
@@ -36,7 +36,6 @@ extern void place_field (record_layout_info, tree);
 extern void compute_record_mode (tree);
 extern void finish_bitfield_layout (tree);
 extern void finish_record_layout (record_layout_info, int);
-extern unsigned int element_precision (const_tree);
 extern void finalize_size_functions (void);
 extern void fixup_unsigned_type (tree);
 extern void initialize_sizetypes (void);
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
new file mode 100644
index 0000000000000000000000000000000000000000..008e1cf9f4a1b0148128c65c9ea0d1bb111467b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
@@ -0,0 +1,91 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.5-a" } */
+/* { dg-require-effective-target aarch64_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**	frint32z	s0, s0
+**	ret
+*/
+float
+f1 (float x)
+{
+  int y = x;
+  return (float) y;
+}
+
+/*
+** f2:
+**	frint64z	s0, s0
+**	ret
+*/
+float
+f2 (float x)
+{
+  long long int y = x;
+  return (float) y;
+}
+
+/*
+** f3:
+**	frint32z	d0, d0
+**	ret
+*/
+double
+f3 (double x)
+{
+  int y = x;
+  return (double) y;
+}
+
+/*
+** f4:
+**	frint64z	d0, d0
+**	ret
+*/
+double
+f4 (double x)
+{
+  long long int y = x;
+  return (double) y;
+}
+
+float
+f1_dont (float x)
+{
+  unsigned int y = x;
+  return (float) y;
+}
+
+float
+f2_dont (float x)
+{
+  unsigned long long int y = x;
+  return (float) y;
+}
+
+double
+f3_dont (double x)
+{
+  unsigned int y = x;
+  return (double) y;
+}
+
+double
+f4_dont (double x)
+{
+  unsigned long long int y = x;
+  return (double) y;
+}
+
+double
+f5_dont (double x)
+{
+  signed short y = x;
+  return (double) y;
+}
+
+/* Make sure the 'dont's don't generate any frintNz.  */
+/* { dg-final { scan-assembler-times {frint32z} 2 } } */
+/* { dg-final { scan-assembler-times {frint64z} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
new file mode 100644
index 0000000000000000000000000000000000000000..801d65ea8325cb680691286aab42747f43b90687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.5-a" } */
+/* { dg-require-effective-target aarch64_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define TEST(name,float_type,int_type)					\
+void									\
+name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
+{									\
+  for (int i = 0; i < n; ++i)					      \
+    {								      \
+      int_type x_i = x[i];					      \
+      y[i] = (float_type) x_i;					      \
+    }								      \
+}
+
+/*
+** f1:
+**	...
+**	frint32z	v[0-9]+\.4s, v[0-9]+\.4s
+**	...
+*/
+TEST(f1, float, int)
+
+/*
+** f2:
+**	...
+**	frint64z	v[0-9]+\.4s, v[0-9]+\.4s
+**	...
+*/
+TEST(f2, float, long long)
+
+/*
+** f3:
+**	...
+**	frint32z	v[0-9]+\.2d, v[0-9]+\.2d
+**	...
+*/
+TEST(f3, double, int)
+
+/*
+** f4:
+**	...
+**	frint64z	v[0-9]+\.2d, v[0-9]+\.2d
+**	...
+*/
+TEST(f4, double, long long)
diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
index 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3d80871c4cebd5fb5cac0714b3feee27038f05fd 100644
--- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
+++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math" } */
+/* { dg-skip-if "" { aarch64_frintnzx_ok } } */
 
 float
 f1 (float x)
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index c1ad97c6bd20d6e970edb24a125451580f014d55..5758e9cee4416b60b6766ecb37cbf3b37ac98522 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11399,6 +11399,32 @@ proc check_effective_target_arm_v8_3a_bkey_directive { } {
 	}]
 }
 
+# Return 1 if the target supports Armv8.5-A scalar and Advanced SIMD
+# FRINT32[ZX] andd FRINT64[ZX] instructions, 0 otherwise. The test is valid for
+# AArch64.
+proc check_effective_target_aarch64_frintnzx_ok_nocache { } {
+
+    if { ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    if { [check_no_compiler_messages_nocache \
+	      aarch64_frintnzx_ok assembly {
+	#if !defined (__ARM_FEATURE_FRINT)
+	#error "__ARM_FEATURE_FRINT not defined"
+	#endif
+    } [current_compiler_flags]] } {
+	return 1;
+    }
+
+    return 0;
+}
+
+proc check_effective_target_aarch64_frintnzx_ok { } {
+    return [check_cached_effective_target aarch64_frintnzx_ok \
+                check_effective_target_aarch64_frintnzx_ok_nocache] 
+}
+
 # Return 1 if the target supports executing the Armv8.1-M Mainline Low
 # Overhead Loop, 0 otherwise.  The test is valid for ARM.
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index f2625a2ff4089739326ce11785f1b68678c07f0e..435f2f4f5aeb2ed4c503c7b6a97d375634ae4514 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
 
 static internal_fn
 vectorizable_internal_function (combined_fn cfn, tree fndecl,
-				tree vectype_out, tree vectype_in)
+				tree vectype_out, tree vectype_in,
+				tree *vectypes)
 {
   internal_fn ifn;
   if (internal_fn_p (cfn))
@@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
       const direct_internal_fn_info &info = direct_internal_fn (ifn);
       if (info.vectorizable)
 	{
-	  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
-	  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+	  tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
+	  if (!type0)
+	    type0 = vectype_in;
+	  tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
+	  if (!type1)
+	    type1 = vectype_in;
 	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
 					      OPTIMIZE_FOR_SPEED))
 	    return ifn;
@@ -3263,18 +3268,40 @@ vectorizable_call (vec_info *vinfo,
       rhs_type = unsigned_type_node;
     }
 
-  int mask_opno = -1;
+  /* The argument that is not of the same type as the others.  */
+  int diff_opno = -1;
+  bool masked = false;
   if (internal_fn_p (cfn))
-    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
+    {
+      if (cfn == CFN_FTRUNC_INT)
+	/* For FTRUNC this represents the argument that carries the type of the
+	   intermediate signed integer.  */
+	diff_opno = 1;
+      else
+	{
+	  /* For masked operations this represents the argument that carries the
+	     mask.  */
+	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
+	  masked = diff_opno >=  0;
+	}
+    }
 
   for (i = 0; i < nargs; i++)
     {
-      if ((int) i == mask_opno)
+      if ((int) i == diff_opno)
 	{
-	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
-				       &op, &slp_op[i], &dt[i], &vectypes[i]))
-	    return false;
-	  continue;
+	  if (masked)
+	    {
+	      if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
+					   diff_opno, &op, &slp_op[i], &dt[i],
+					   &vectypes[i]))
+		return false;
+	    }
+	  else
+	    {
+	      vectypes[i] = TREE_TYPE (gimple_call_arg (stmt, i));
+	      continue;
+	    }
 	}
 
       if (!vect_is_simple_use (vinfo, stmt_info, slp_node,
@@ -3286,27 +3313,30 @@ vectorizable_call (vec_info *vinfo,
 	  return false;
 	}
 
-      /* We can only handle calls with arguments of the same type.  */
-      if (rhs_type
-	  && !types_compatible_p (rhs_type, TREE_TYPE (op)))
+      if ((int) i != diff_opno)
 	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "argument types differ.\n");
-	  return false;
-	}
-      if (!rhs_type)
-	rhs_type = TREE_TYPE (op);
+	  /* We can only handle calls with arguments of the same type.  */
+	  if (rhs_type
+	      && !types_compatible_p (rhs_type, TREE_TYPE (op)))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "argument types differ.\n");
+	      return false;
+	    }
+	  if (!rhs_type)
+	    rhs_type = TREE_TYPE (op);
 
-      if (!vectype_in)
-	vectype_in = vectypes[i];
-      else if (vectypes[i]
-	       && !types_compatible_p (vectypes[i], vectype_in))
-	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "argument vector types differ.\n");
-	  return false;
+	  if (!vectype_in)
+	    vectype_in = vectypes[i];
+	  else if (vectypes[i]
+		   && !types_compatible_p (vectypes[i], vectype_in))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "argument vector types differ.\n");
+	      return false;
+	    }
 	}
     }
   /* If all arguments are external or constant defs, infer the vector type
@@ -3382,8 +3412,8 @@ vectorizable_call (vec_info *vinfo,
 	  || (modifier == NARROW
 	      && simple_integer_narrowing (vectype_out, vectype_in,
 					   &convert_code))))
-    ifn = vectorizable_internal_function (cfn, callee, vectype_out,
-					  vectype_in);
+    ifn = vectorizable_internal_function (cfn, callee, vectype_out, vectype_in,
+					  &vectypes[0]);
 
   /* If that fails, try asking for a target-specific built-in function.  */
   if (ifn == IFN_LAST)
@@ -3461,7 +3491,7 @@ vectorizable_call (vec_info *vinfo,
 
       if (loop_vinfo
 	  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
-	  && (reduc_idx >= 0 || mask_opno >= 0))
+	  && (reduc_idx >= 0 || masked))
 	{
 	  if (reduc_idx >= 0
 	      && (cond_fn == IFN_LAST
@@ -3481,8 +3511,8 @@ vectorizable_call (vec_info *vinfo,
 		   ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node)
 		   : ncopies);
 	      tree scalar_mask = NULL_TREE;
-	      if (mask_opno >= 0)
-		scalar_mask = gimple_call_arg (stmt_info->stmt, mask_opno);
+	      if (masked)
+		scalar_mask = gimple_call_arg (stmt_info->stmt, diff_opno);
 	      vect_record_loop_mask (loop_vinfo, masks, nvectors,
 				     vectype_out, scalar_mask);
 	    }
@@ -3547,7 +3577,7 @@ vectorizable_call (vec_info *vinfo,
 		    {
 		      /* We don't define any narrowing conditional functions
 			 at present.  */
-		      gcc_assert (mask_opno < 0);
+		      gcc_assert (!masked);
 		      tree half_res = make_ssa_name (vectype_in);
 		      gcall *call
 			= gimple_build_call_internal_vec (ifn, vargs);
@@ -3567,16 +3597,16 @@ vectorizable_call (vec_info *vinfo,
 		    }
 		  else
 		    {
-		      if (mask_opno >= 0 && masked_loop_p)
+		      if (masked && masked_loop_p)
 			{
 			  unsigned int vec_num = vec_oprnds0.length ();
 			  /* Always true for SLP.  */
 			  gcc_assert (ncopies == 1);
 			  tree mask = vect_get_loop_mask (gsi, masks, vec_num,
 							  vectype_out, i);
-			  vargs[mask_opno] = prepare_vec_mask
+			  vargs[diff_opno] = prepare_vec_mask
 			    (loop_vinfo, TREE_TYPE (mask), mask,
-			     vargs[mask_opno], gsi);
+			     vargs[diff_opno], gsi);
 			}
 
 		      gcall *call;
@@ -3614,13 +3644,13 @@ vectorizable_call (vec_info *vinfo,
 	  if (masked_loop_p && reduc_idx >= 0)
 	    vargs[varg++] = vargs[reduc_idx + 1];
 
-	  if (mask_opno >= 0 && masked_loop_p)
+	  if (masked && masked_loop_p)
 	    {
 	      tree mask = vect_get_loop_mask (gsi, masks, ncopies,
 					      vectype_out, j);
-	      vargs[mask_opno]
+	      vargs[diff_opno]
 		= prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
-				    vargs[mask_opno], gsi);
+				    vargs[diff_opno], gsi);
 	    }
 
 	  gimple *new_stmt;
@@ -3639,7 +3669,7 @@ vectorizable_call (vec_info *vinfo,
 	    {
 	      /* We don't define any narrowing conditional functions at
 		 present.  */
-	      gcc_assert (mask_opno < 0);
+	      gcc_assert (!masked);
 	      tree half_res = make_ssa_name (vectype_in);
 	      gcall *call = gimple_build_call_internal_vec (ifn, vargs);
 	      gimple_call_set_lhs (call, half_res);
@@ -3683,7 +3713,7 @@ vectorizable_call (vec_info *vinfo,
     {
       auto_vec<vec<tree> > vec_defs (nargs);
       /* We don't define any narrowing conditional functions at present.  */
-      gcc_assert (mask_opno < 0);
+      gcc_assert (!masked);
       for (j = 0; j < ncopies; ++j)
 	{
 	  /* Build argument list for the vectorized call.  */
diff --git a/gcc/tree.h b/gcc/tree.h
index 318019c4dc5373271551f5d9a48dadb57a29d4a7..770d0ddfcc9a7acda01ed2fafa61eab0f1ba4cfa 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -6558,4 +6558,12 @@ extern unsigned fndecl_dealloc_argno (tree);
    object or pointer.  Otherwise return null.  */
 extern tree get_attr_nonstring_decl (tree, tree * = NULL);
 
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
+extern tree element_type (tree);
+
+/* Return the precision of the type, or for a complex or vector type the
+   precision of the type of its elements.  */
+extern unsigned int element_precision (const_tree);
+
 #endif  /* GCC_TREE_H  */
diff --git a/gcc/tree.c b/gcc/tree.c
index d98b77db50b29b22dc9af1f98cd86044f62af019..81e66dd710ce6bc237f508655cfb437b40ec0bfa 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -6646,11 +6646,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
   return true;
 }
 
-/* Return the precision of the type, or for a complex or vector type the
-   precision of the type of its elements.  */
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
 
-unsigned int
-element_precision (const_tree type)
+tree
+element_type (tree type)
 {
   if (!TYPE_P (type))
     type = TREE_TYPE (type);
@@ -6658,7 +6658,16 @@ element_precision (const_tree type)
   if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
     type = TREE_TYPE (type);
 
-  return TYPE_PRECISION (type);
+  return const_cast<tree> (type);
+}
+
+/* Return the precision of the type, or for a complex or vector type the
+   precision of the type of its elements.  */
+
+unsigned int
+element_precision (const_tree type)
+{
+  return TYPE_PRECISION (element_type (const_cast<tree> (type)));
 }
 
 /* Return true if CODE represents an associative tree code.  Otherwise
  
Richard Biener Jan. 10, 2022, 2:45 p.m. UTC | #17
On Mon, 10 Jan 2022, Andre Vieira (lists) wrote:

> Yeah seems I forgot to send the latest version, my bad.
> 
> Bootstrapped on aarch64-none-linux.
> 
> OK for trunk?

The match.pd part looks OK to me.

Thanks,
Richard.

> gcc/ChangeLog:
> 
>         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New
> pattern.
>         * config/aarch64/iterators.md (FRINTNZ): New iterator.
>         (frintnz_mode): New int attribute.
>         (VSFDF): Make iterator conditional.
>         * internal-fn.def (FTRUNC_INT): New IFN.
>         * internal-fn.c (ftrunc_int_direct): New define.
>         (expand_ftrunc_int_optab_fn): New custom expander.
>         (direct_ftrunc_int_optab_supported_p): New supported_p.
>         * match.pd: Add to the existing TRUNC pattern match.
>         * optabs.def (ftrunc_int): New entry.
>         * stor-layout.h (element_precision): Moved from here...
>         * tree.h (element_precision): ... to here.
>         (element_type): New declaration.
>         * tree.c (element_type): New function.
>         (element_precision): Changed to use element_type.
>         * tree-vect-stmts.c (vectorizable_internal_function): Add 
> support for
>         IFNs with different input types.
>         (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
>         * doc/md.texi: New entry for ftrunc pattern name.
>         * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz
> instruction available.
>         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
>         * gcc.target/aarch64/frintnz.c: New test.
>         * gcc.target/aarch64/frintnz_vec.c: New test.
> 
> On 03/01/2022 12:18, Richard Biener wrote:
> > On Wed, 29 Dec 2021, Andre Vieira (lists) wrote:
> >
> >> Hi Richard,
> >>
> >> Thank you for the review, I've adopted all above suggestions downstream, I
> >> am
> >> still surprised how many style things I still miss after years of gcc
> >> development :(
> >>
> >> On 17/12/2021 12:44, Richard Sandiford wrote:
> >>>> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
> >>>>          rhs_type = unsigned_type_node;
> >>>>        }
> >>>>    -  int mask_opno = -1;
> >>>> +  /* The argument that is not of the same type as the others.  */
> >>>> +  int diff_opno = -1;
> >>>> +  bool masked = false;
> >>>>      if (internal_fn_p (cfn))
> >>>> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> >>>> +    {
> >>>> +      if (cfn == CFN_FTRUNC_INT)
> >>>> +	/* For FTRUNC this represents the argument that carries the type of
> >>>> the
> >>>> +	   intermediate signed integer.  */
> >>>> +	diff_opno = 1;
> >>>> +      else
> >>>> +	{
> >>>> +	  /* For masked operations this represents the argument that carries
> >>>> the
> >>>> +	     mask.  */
> >>>> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
> >>>> +	  masked = diff_opno >=  0;
> >>>> +	}
> >>>> +    }
> >>> I think it would be cleaner not to process argument 1 at all for
> >>> CFN_FTRUNC_INT.  There's no particular need to vectorise it.
> >> I agree with this,  will change the loop to continue for argument 1 when
> >> dealing with non-masked CFN's.
> >>
> >>>>    	}
> >>>> […]
> >>>> diff --git a/gcc/tree.c b/gcc/tree.c
> >>>> index
> >>>> 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
> >>>> 100644
> >>>> --- a/gcc/tree.c
> >>>> +++ b/gcc/tree.c
> >>>> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size,
> >>>> cst_size_error *perr /* = NULL */)
> >>>>      return true;
> >>>>    }
> >>>>    
> >>>> -/* Return the precision of the type, or for a complex or vector type the
> >>>> -   precision of the type of its elements.  */
> >>>> +/* Return the type, or for a complex or vector type the type of its
> >>>> +   elements.  */
> >>>>    -unsigned int
> >>>> -element_precision (const_tree type)
> >>>> +tree
> >>>> +element_type (const_tree type)
> >>>>    {
> >>>>      if (!TYPE_P (type))
> >>>>        type = TREE_TYPE (type);
> >>>> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
> >>>>      if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
> >>>>        type = TREE_TYPE (type);
> >>>>    -  return TYPE_PRECISION (type);
> >>>> +  return (tree) type;
> >>> I think we should stick a const_cast in element_precision and make
> >>> element_type take a plain “type”.  As it stands element_type is an
> >>> implicit const_cast for many cases.
> >>>
> >>> Thanks,
> >> Was just curious about something here, I thought the purpose of having
> >> element_precision (before) and element_type (now) take a const_tree as an
> >> argument was to make it clear we aren't changing the input type. I
> >> understand
> >> that as it stands element_type could be an implicit const_cast (which I
> >> should
> >> be using rather than the '(tree)' cast), but that's only if 'type' is a
> >> type
> >> that isn't complex/vector, either way, we are conforming to the promise
> >> that
> >> we aren't changing the incoming type, what the caller then does with the
> >> result is up to them no?
> >>
> >> I don't mind making the changes, just trying to understand the reasoning
> >> behind it.
> >>
> >> I'll send in a new patch with all the changes after the review on the
> >> match.pd
> >> stuff.
> > I'm missing an updated patch after my initial review of the match.pd
> > stuff so not sure what to review.  Can you re-post and updated patch?
> >
> > Thanks,
> > Richard.
>
  
Richard Sandiford Jan. 14, 2022, 10:37 a.m. UTC | #18
"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 19e89ae502bc2f51db64667b236c1cb669718b02..3b0e4e0875b4392ab6833568b207580ef597a98f 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6191,6 +6191,15 @@ operands; otherwise, it may not.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
> +@item @samp{ftrunc@var{m}@var{n}2}
> +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
> +the result in operand 0. Both operands have mode @var{m}, which is a scalar or
> +vector floating-point mode.  An exception must be raised if operand 1 does not
> +fit in a @var{n} mode signed integer as it would have if the truncation
> +happened through separate floating point to integer conversion.
> +
> +

Nit: just one blank line here.

>  @cindex @code{round@var{m}2} instruction pattern
>  @item @samp{round@var{m}2}
>  Round operand 1 to the nearest integer, rounding away from zero in the
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 6095a35cd4565fdb7d758104e80fe6411230f758..a56bbb775572fa72379854f90a01ad543557e29a 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2286,6 +2286,10 @@ Like @code{aarch64_sve_hw}, but also test for an exact hardware vector length.
>  @item aarch64_fjcvtzs_hw
>  AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
>  instruction.
> +
> +@item aarch64_frintzx_ok
> +AArch64 target that is able to generate the Armv8.5-a FRINT32Z, FRINT64Z,
> +FRINT32X and FRINT64X instructions.
>  @end table
>  
>  @subsubsection MIPS-specific attributes
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index b24102a5990bea4cbb102069f7a6df497fc81ebf..9047b486f41948059a7a7f1ccc4032410a369139 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -130,6 +130,7 @@ init_internal_fns ()
>  #define fold_left_direct { 1, 1, false }
>  #define mask_fold_left_direct { 1, 1, false }
>  #define check_ptrs_direct { 0, 0, false }
> +#define ftrunc_int_direct { 0, 1, true }
>  
>  const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
>  #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
> @@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_optab optab)
>    return convert_optab_handler (optab, imode, vmode);
>  }
>  
> +/* Expand FTRUNC_INT call STMT using optab OPTAB.  */
> +
> +static void
> +expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  class expand_operand ops[2];
> +  tree lhs, float_type, int_type;
> +  rtx target, op;
> +
> +  lhs = gimple_call_lhs (stmt);
> +  target = expand_normal (lhs);
> +  op = expand_normal (gimple_call_arg (stmt, 0));
> +
> +  float_type = TREE_TYPE (lhs);
> +  int_type = element_type (gimple_call_arg (stmt, 1));

Sorry for the run-around, but now that we don't (need to) vectorise
the second argument, I think we can drop this element_type.  That in
turn means that…

> +
> +  create_output_operand (&ops[0], target, TYPE_MODE (float_type));
> +  create_input_operand (&ops[1], op, TYPE_MODE (float_type));
> +
> +  expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type),
> +				      TYPE_MODE (int_type)), 2, ops);
> +}
> +
>  /* Expand LOAD_LANES call STMT using optab OPTAB.  */
>  
>  static void
> @@ -3747,6 +3771,15 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  	  != CODE_FOR_nothing);
>  }
>  
> +static bool
> +direct_ftrunc_int_optab_supported_p (convert_optab optab, tree_pair types,
> +				     optimization_type opt_type)
> +{
> +  return (convert_optab_handler (optab, TYPE_MODE (types.first),
> +				TYPE_MODE (element_type (types.second)),
> +				opt_type) != CODE_FOR_nothing);
> +}
> +

…this can use convert_optab_supported_p.

> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..008e1cf9f4a1b0148128c65c9ea0d1bb111467b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> @@ -0,0 +1,91 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */

Is this just a cut-&-pasto from a run test?  If not, why do we need both
this and the dg-options?  It feels like one on its own should be enough,
with the dg-options being better.

The test looks OK without this line.

> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** f1:
> +**	frint32z	s0, s0
> +**	ret
> +*/
> +float
> +f1 (float x)
> +{
> +  int y = x;
> +  return (float) y;
> +}
> +
> +/*
> +** f2:
> +**	frint64z	s0, s0
> +**	ret
> +*/
> +float
> +f2 (float x)
> +{
> +  long long int y = x;
> +  return (float) y;
> +}
> +
> +/*
> +** f3:
> +**	frint32z	d0, d0
> +**	ret
> +*/
> +double
> +f3 (double x)
> +{
> +  int y = x;
> +  return (double) y;
> +}
> +
> +/*
> +** f4:
> +**	frint64z	d0, d0
> +**	ret
> +*/
> +double
> +f4 (double x)
> +{
> +  long long int y = x;
> +  return (double) y;
> +}
> +
> +float
> +f1_dont (float x)
> +{
> +  unsigned int y = x;
> +  return (float) y;
> +}
> +
> +float
> +f2_dont (float x)
> +{
> +  unsigned long long int y = x;
> +  return (float) y;
> +}
> +
> +double
> +f3_dont (double x)
> +{
> +  unsigned int y = x;
> +  return (double) y;
> +}
> +
> +double
> +f4_dont (double x)
> +{
> +  unsigned long long int y = x;
> +  return (double) y;
> +}
> +
> +double
> +f5_dont (double x)
> +{
> +  signed short y = x;
> +  return (double) y;
> +}
> +
> +/* Make sure the 'dont's don't generate any frintNz.  */
> +/* { dg-final { scan-assembler-times {frint32z} 2 } } */
> +/* { dg-final { scan-assembler-times {frint64z} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..801d65ea8325cb680691286aab42747f43b90687
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */

Same here.

> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#define TEST(name,float_type,int_type)					\
> +void									\
> +name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
> +{									\
> +  for (int i = 0; i < n; ++i)					      \
> +    {								      \
> +      int_type x_i = x[i];					      \
> +      y[i] = (float_type) x_i;					      \
> +    }								      \
> +}
> +
> +/*
> +** f1:
> +**	...
> +**	frint32z	v[0-9]+\.4s, v[0-9]+\.4s
> +**	...
> +*/
> +TEST(f1, float, int)
> +
> +/*
> +** f2:
> +**	...
> +**	frint64z	v[0-9]+\.4s, v[0-9]+\.4s
> +**	...
> +*/
> +TEST(f2, float, long long)
> +
> +/*
> +** f3:
> +**	...
> +**	frint32z	v[0-9]+\.2d, v[0-9]+\.2d
> +**	...
> +*/
> +TEST(f3, double, int)
> +
> +/*
> +** f4:
> +**	...
> +**	frint64z	v[0-9]+\.2d, v[0-9]+\.2d
> +**	...
> +*/
> +TEST(f4, double, long long)
> […]
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index f2625a2ff4089739326ce11785f1b68678c07f0e..435f2f4f5aeb2ed4c503c7b6a97d375634ae4514 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
>  
>  static internal_fn
>  vectorizable_internal_function (combined_fn cfn, tree fndecl,
> -				tree vectype_out, tree vectype_in)
> +				tree vectype_out, tree vectype_in,
> +				tree *vectypes)

Should be described in the comment above the function.

>  {
>    internal_fn ifn;
>    if (internal_fn_p (cfn))
> @@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
>        const direct_internal_fn_info &info = direct_internal_fn (ifn);
>        if (info.vectorizable)
>  	{
> -	  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
> -	  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> +	  tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
> +	  if (!type0)
> +	    type0 = vectype_in;
> +	  tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
> +	  if (!type1)
> +	    type1 = vectype_in;
>  	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
>  					      OPTIMIZE_FOR_SPEED))
>  	    return ifn;
> @@ -3263,18 +3268,40 @@ vectorizable_call (vec_info *vinfo,
>        rhs_type = unsigned_type_node;
>      }
>  
> -  int mask_opno = -1;
> +  /* The argument that is not of the same type as the others.  */
> +  int diff_opno = -1;
> +  bool masked = false;
>    if (internal_fn_p (cfn))
> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> +    {
> +      if (cfn == CFN_FTRUNC_INT)
> +	/* For FTRUNC this represents the argument that carries the type of the
> +	   intermediate signed integer.  */
> +	diff_opno = 1;
> +      else
> +	{
> +	  /* For masked operations this represents the argument that carries the
> +	     mask.  */
> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
> +	  masked = diff_opno >=  0;

Nit: excess space after “>=”.

> +	}
> +    }

I think it would be better to add a new flag to direct_internal_fn_info
to say whether type1 is scalar, rather than check based on function code.
type1 would then provide the value of diff_opno above.

Also, I think diff_opno should be separate from mask_opno.
Maybe scalar_opno would be a better name.

This would probably be simpler if we used:

  internal_fn ifn = associated_internal_fn (cfn, lhs_type);

before the loop (with lhs_type being new), then used ifn to get the
direct_internal_fn_info and passed ifn to vectorizable_internal_function.

>    for (i = 0; i < nargs; i++)
>      {
> -      if ((int) i == mask_opno)
> +      if ((int) i == diff_opno)
>  	{
> -	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
> -				       &op, &slp_op[i], &dt[i], &vectypes[i]))
> -	    return false;
> -	  continue;
> +	  if (masked)
> +	    {
> +	      if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
> +					   diff_opno, &op, &slp_op[i], &dt[i],
> +					   &vectypes[i]))
> +		return false;
> +	    }
> +	  else
> +	    {
> +	      vectypes[i] = TREE_TYPE (gimple_call_arg (stmt, i));
> +	      continue;
> +	    }
>  	}
>  
>        if (!vect_is_simple_use (vinfo, stmt_info, slp_node,
> @@ -3286,27 +3313,30 @@ vectorizable_call (vec_info *vinfo,
>  	  return false;
>  	}
>  
> -      /* We can only handle calls with arguments of the same type.  */
> -      if (rhs_type
> -	  && !types_compatible_p (rhs_type, TREE_TYPE (op)))
> +      if ((int) i != diff_opno)

Is this ever false?  It looks the continue above handles the other case.

>  	{
> -	  if (dump_enabled_p ())
> -	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                             "argument types differ.\n");
> -	  return false;
> -	}
> -      if (!rhs_type)
> -	rhs_type = TREE_TYPE (op);
> +	  /* We can only handle calls with arguments of the same type.  */
> +	  if (rhs_type
> +	      && !types_compatible_p (rhs_type, TREE_TYPE (op)))
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "argument types differ.\n");
> +	      return false;
> +	    }
> +	  if (!rhs_type)
> +	    rhs_type = TREE_TYPE (op);
>  
> -      if (!vectype_in)
> -	vectype_in = vectypes[i];
> -      else if (vectypes[i]
> -	       && !types_compatible_p (vectypes[i], vectype_in))
> -	{
> -	  if (dump_enabled_p ())
> -	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                             "argument vector types differ.\n");
> -	  return false;
> +	  if (!vectype_in)
> +	    vectype_in = vectypes[i];
> +	  else if (vectypes[i]
> +		   && !types_compatible_p (vectypes[i], vectype_in))
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "argument vector types differ.\n");
> +	      return false;
> +	    }
>  	}
>      }
>    /* If all arguments are external or constant defs, infer the vector type
> @@ -3382,8 +3412,8 @@ vectorizable_call (vec_info *vinfo,
>  	  || (modifier == NARROW
>  	      && simple_integer_narrowing (vectype_out, vectype_in,
>  					   &convert_code))))
> -    ifn = vectorizable_internal_function (cfn, callee, vectype_out,
> -					  vectype_in);
> +    ifn = vectorizable_internal_function (cfn, callee, vectype_out, vectype_in,
> +					  &vectypes[0]);
>  
>    /* If that fails, try asking for a target-specific built-in function.  */
>    if (ifn == IFN_LAST)
> @@ -3461,7 +3491,7 @@ vectorizable_call (vec_info *vinfo,
>  
>        if (loop_vinfo
>  	  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> -	  && (reduc_idx >= 0 || mask_opno >= 0))
> +	  && (reduc_idx >= 0 || masked))
>  	{
>  	  if (reduc_idx >= 0
>  	      && (cond_fn == IFN_LAST
> @@ -3481,8 +3511,8 @@ vectorizable_call (vec_info *vinfo,
>  		   ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node)
>  		   : ncopies);
>  	      tree scalar_mask = NULL_TREE;
> -	      if (mask_opno >= 0)
> -		scalar_mask = gimple_call_arg (stmt_info->stmt, mask_opno);
> +	      if (masked)
> +		scalar_mask = gimple_call_arg (stmt_info->stmt, diff_opno);
>  	      vect_record_loop_mask (loop_vinfo, masks, nvectors,
>  				     vectype_out, scalar_mask);
>  	    }
> @@ -3547,7 +3577,7 @@ vectorizable_call (vec_info *vinfo,
>  		    {
>  		      /* We don't define any narrowing conditional functions
>  			 at present.  */
> -		      gcc_assert (mask_opno < 0);
> +		      gcc_assert (!masked);
>  		      tree half_res = make_ssa_name (vectype_in);
>  		      gcall *call
>  			= gimple_build_call_internal_vec (ifn, vargs);
> @@ -3567,16 +3597,16 @@ vectorizable_call (vec_info *vinfo,
>  		    }
>  		  else
>  		    {
> -		      if (mask_opno >= 0 && masked_loop_p)
> +		      if (masked && masked_loop_p)
>  			{
>  			  unsigned int vec_num = vec_oprnds0.length ();
>  			  /* Always true for SLP.  */
>  			  gcc_assert (ncopies == 1);
>  			  tree mask = vect_get_loop_mask (gsi, masks, vec_num,
>  							  vectype_out, i);
> -			  vargs[mask_opno] = prepare_vec_mask
> +			  vargs[diff_opno] = prepare_vec_mask
>  			    (loop_vinfo, TREE_TYPE (mask), mask,
> -			     vargs[mask_opno], gsi);
> +			     vargs[diff_opno], gsi);
>  			}
>  
>  		      gcall *call;
> @@ -3614,13 +3644,13 @@ vectorizable_call (vec_info *vinfo,
>  	  if (masked_loop_p && reduc_idx >= 0)
>  	    vargs[varg++] = vargs[reduc_idx + 1];
>  
> -	  if (mask_opno >= 0 && masked_loop_p)
> +	  if (masked && masked_loop_p)
>  	    {
>  	      tree mask = vect_get_loop_mask (gsi, masks, ncopies,
>  					      vectype_out, j);
> -	      vargs[mask_opno]
> +	      vargs[diff_opno]
>  		= prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
> -				    vargs[mask_opno], gsi);
> +				    vargs[diff_opno], gsi);
>  	    }
>  
>  	  gimple *new_stmt;
> @@ -3639,7 +3669,7 @@ vectorizable_call (vec_info *vinfo,
>  	    {
>  	      /* We don't define any narrowing conditional functions at
>  		 present.  */
> -	      gcc_assert (mask_opno < 0);
> +	      gcc_assert (!masked);
>  	      tree half_res = make_ssa_name (vectype_in);
>  	      gcall *call = gimple_build_call_internal_vec (ifn, vargs);
>  	      gimple_call_set_lhs (call, half_res);
> @@ -3683,7 +3713,7 @@ vectorizable_call (vec_info *vinfo,
>      {
>        auto_vec<vec<tree> > vec_defs (nargs);
>        /* We don't define any narrowing conditional functions at present.  */
> -      gcc_assert (mask_opno < 0);
> +      gcc_assert (!masked);
>        for (j = 0; j < ncopies; ++j)
>  	{
>  	  /* Build argument list for the vectorized call.  */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 318019c4dc5373271551f5d9a48dadb57a29d4a7..770d0ddfcc9a7acda01ed2fafa61eab0f1ba4cfa 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -6558,4 +6558,12 @@ extern unsigned fndecl_dealloc_argno (tree);
>     object or pointer.  Otherwise return null.  */
>  extern tree get_attr_nonstring_decl (tree, tree * = NULL);
>  
> +/* Return the type, or for a complex or vector type the type of its
> +   elements.  */
> +extern tree element_type (tree);
> +
> +/* Return the precision of the type, or for a complex or vector type the
> +   precision of the type of its elements.  */
> +extern unsigned int element_precision (const_tree);
> +
>  #endif  /* GCC_TREE_H  */
> diff --git a/gcc/tree.c b/gcc/tree.c
> index d98b77db50b29b22dc9af1f98cd86044f62af019..81e66dd710ce6bc237f508655cfb437b40ec0bfa 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -6646,11 +6646,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
>    return true;
>  }
>  
> -/* Return the precision of the type, or for a complex or vector type the
> -   precision of the type of its elements.  */
> +/* Return the type, or for a complex or vector type the type of its
> +   elements.  */
>  
> -unsigned int
> -element_precision (const_tree type)
> +tree
> +element_type (tree type)
>  {
>    if (!TYPE_P (type))
>      type = TREE_TYPE (type);
> @@ -6658,7 +6658,16 @@ element_precision (const_tree type)
>    if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
>      type = TREE_TYPE (type);
>  
> -  return TYPE_PRECISION (type);
> +  return const_cast<tree> (type);

The const_cast<> is redundant.

Sorry for not thinking about it before, but we should probably have
a test for the SLP case.  E.g.:

  for (int i = 0; i < n; i += 2)
    {
      int_type x_i0 = x[i];
      int_type x_i1 = x[i + 1];
      y[i] = (float_type) x_i1;
      y[i + 1] = (float_type) x_i0;
    }

(with a permute thrown in for good measure).  This will make sure
that the (separate) SLP group matching code handles the call correctly.

Thanks,
Richard
  
Andre Vieira (lists) Nov. 4, 2022, 5:40 p.m. UTC | #19
Sorry for the delay, just been reminded I still had this patch 
outstanding from last stage 1. Hopefully since it has been mostly 
reviewed it could go in for this stage 1?

I addressed the comments and gave the slp-part of vectorizable_call some 
TLC to make it work.

I also changed vect_get_slp_defs as I noticed that the call from 
vectorizable_call was creating an auto_vec with 'nargs' that might be 
less than the number of children in the slp_node, so that quick_push 
might not be safe as is, so I added the reserve (n) to ensure it's safe 
to push. I didn't actually come across any failure because of it though. 
Happy to split this into a separate patch if needed.

Bootstrapped and regression tested on aarch64-none-linux-gnu and 
x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:

         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New 
pattern.
         * config/aarch64/iterators.md (FRINTNZ): New iterator.
         (frintnz_mode): New int attribute.
         (VSFDF): Make iterator conditional.
         * internal-fn.def (FTRUNC_INT): New IFN.
         * internal-fn.cc (ftrunc_int_direct): New define.
         (expand_ftrunc_int_optab_fn): New custom expander.
         (direct_ftrunc_int_optab_supported_p): New supported_p.
         * internal-fn.h (direct_internal_fn_info): Add new member
         type1_is_scalar_p.
         * match.pd: Add to the existing TRUNC pattern match.
         * optabs.def (ftrunc_int): New entry.
         * stor-layout.h (element_precision): Moved from here...
         * tree.h (element_precision): ... to here.
         (element_type): New declaration.
         * tree.cc (element_type): New function.
         (element_precision): Changed to use element_type.
         * tree-vect-stmts.cc (vectorizable_internal_function): Add 
support for
         IFNs with different input types.
         (vect_get_scalar_oprnds): New function.
         (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
         * tree-vect-slp.cc (check_scalar_arg_ok): New function.
         (vect_slp_analyze_node_operations): Use check_scalar_arg_ok.
         (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push.
         * doc/md.texi: New entry for ftrunc pattern name.
         * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.

gcc/testsuite/ChangeLog:

         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz 
instructions available.
         * lib/target-supports.exp: Added aarch64_frintnzx_ok target and 
aarch64_frintz options.
         * gcc.target/aarch64/frintnz.c: New test.
         * gcc.target/aarch64/frintnz_vec.c: New test.
         * gcc.target/aarch64/frintnz_slp.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index f2e3d905dbbeb2949f2947f5cfd68208c94c9272..47368e09b106e5b43640bd4f113abd0b9a15b9c8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7564,12 +7564,18 @@
    (set_attr "speculation_barrier" "true")]
 )
 
+(define_expand "ftrunc<mode><frintnz_mode>2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+        (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+		      FRINTNZ))]
+  "TARGET_FRINT"
+)
+
 (define_insn "aarch64_<frintnzs_op><mode>"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
 	(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
 		      FRINTNZX))]
-  "TARGET_FRINT && TARGET_FLOAT
-   && !(VECTOR_MODE_P (<MODE>mode) && !TARGET_SIMD)"
+  "TARGET_FRINT"
   "<frintnzs_op>\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
   [(set_attr "type" "f_rint<stype>")]
 )
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index a8ad4e5ff215ade06c3ca13a24ef18d259afcb6c..b1f78d87fbe6118e792b00580c6beb23ce63e27c 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -173,7 +173,11 @@
 				  SF DF])
 
 ;; Scalar and vetor modes for SF, DF.
-(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
+(define_mode_iterator VSFDF [(V2SF "TARGET_SIMD")
+			     (V4SF "TARGET_SIMD")
+			     (V2DF "TARGET_SIMD")
+			     (DF "TARGET_FLOAT")
+			     (SF "TARGET_FLOAT")])
 
 ;; Advanced SIMD single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
@@ -3136,6 +3140,8 @@
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
 			       UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRKP [UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3545,6 +3551,8 @@
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X "frint32x")
 			      (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X "frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_<xx>.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
 			 (UNSPEC_COND_CMPGE_WIDE "ge")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d0a71ecbb806de3a6564c6ffe973fec5da5c597b..722a03de79004c9d2f291882b346fecb74f9df1b 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6076,6 +6076,14 @@ operands; otherwise, it may not.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
+@item @samp{ftrunc@var{m}@var{n}2}
+Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
+the result in operand 0. Both operands have mode @var{m}, which is a scalar or
+vector floating-point mode.  An exception must be raised if operand 1 does not
+fit in a @var{n} mode signed integer as it would have if the truncation
+happened through separate floating point to integer conversion.
+
 @cindex @code{round@var{m}2} instruction pattern
 @item @samp{round@var{m}2}
 Round operand 1 to the nearest integer, rounding away from zero in the
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index a12175b649848c7dd7802ae960f1360cd9261b88..926f1859becfe32b5a157eba8031d9ed2f7fd249 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2317,6 +2317,10 @@ Like @code{aarch64_sve_hw}, but also test for an exact hardware vector length.
 @item aarch64_fjcvtzs_hw
 AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
 instruction.
+
+@item aarch64_frintzx_ok
+AArch64 target that is able to generate the Armv8.5-a FRINT32Z, FRINT64Z,
+FRINT32X and FRINT64X instructions.
 @end table
 
 @subsubsection MIPS-specific attributes
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9471f543191edf0aea125ff0fc426511b2306169..cce4cd153cb59751f54dfdf82eee3bdd4fc394fd 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -105,32 +105,33 @@ init_internal_fns ()
 
 /* Create static initializers for the information returned by
    direct_internal_fn.  */
-#define not_direct { -2, -2, false }
-#define mask_load_direct { -1, 2, false }
-#define load_lanes_direct { -1, -1, false }
-#define mask_load_lanes_direct { -1, -1, false }
-#define gather_load_direct { 3, 1, false }
-#define len_load_direct { -1, -1, false }
-#define mask_store_direct { 3, 2, false }
-#define store_lanes_direct { 0, 0, false }
-#define mask_store_lanes_direct { 0, 0, false }
-#define vec_cond_mask_direct { 1, 0, false }
-#define vec_cond_direct { 2, 0, false }
-#define scatter_store_direct { 3, 1, false }
-#define len_store_direct { 3, 3, false }
-#define vec_set_direct { 3, 3, false }
-#define unary_direct { 0, 0, true }
-#define unary_convert_direct { -1, 0, true }
-#define binary_direct { 0, 0, true }
-#define ternary_direct { 0, 0, true }
-#define cond_unary_direct { 1, 1, true }
-#define cond_binary_direct { 1, 1, true }
-#define cond_ternary_direct { 1, 1, true }
-#define while_direct { 0, 2, false }
-#define fold_extract_direct { 2, 2, false }
-#define fold_left_direct { 1, 1, false }
-#define mask_fold_left_direct { 1, 1, false }
-#define check_ptrs_direct { 0, 0, false }
+#define not_direct { -2, -2, false, false }
+#define mask_load_direct { -1, 2, false, false }
+#define load_lanes_direct { -1, -1, false, false }
+#define mask_load_lanes_direct { -1, -1, false, false }
+#define gather_load_direct { 3, 1, false, false }
+#define len_load_direct { -1, -1, false, false }
+#define mask_store_direct { 3, 2, false, false }
+#define store_lanes_direct { 0, 0, false, false }
+#define mask_store_lanes_direct { 0, 0, false, false }
+#define vec_cond_mask_direct { 1, 0, false, false }
+#define vec_cond_direct { 2, 0, false, false }
+#define scatter_store_direct { 3, 1, false, false }
+#define len_store_direct { 3, 3, false, false }
+#define vec_set_direct { 3, 3, false, false }
+#define unary_direct { 0, 0, false, true }
+#define unary_convert_direct { -1, 0, false, true }
+#define binary_direct { 0, 0, false, true }
+#define ternary_direct { 0, 0, false, true }
+#define cond_unary_direct { 1, 1, false, true }
+#define cond_binary_direct { 1, 1, false, true }
+#define cond_ternary_direct { 1, 1, false, true }
+#define while_direct { 0, 2, false, false }
+#define fold_extract_direct { 2, 2, false, false }
+#define fold_left_direct { 1, 1, false, false }
+#define mask_fold_left_direct { 1, 1, false, false }
+#define check_ptrs_direct { 0, 0, false, false }
+#define ftrunc_int_direct { 0, 1, true, true }
 
 const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
@@ -237,6 +238,29 @@ get_multi_vector_move (tree array_type, convert_optab optab)
   return convert_optab_handler (optab, imode, vmode);
 }
 
+/* Expand FTRUNC_INT call STMT using optab OPTAB.  */
+
+static void
+expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[2];
+  tree lhs, float_type, int_type;
+  rtx target, op;
+
+  lhs = gimple_call_lhs (stmt);
+  target = expand_normal (lhs);
+  op = expand_normal (gimple_call_arg (stmt, 0));
+
+  float_type = TREE_TYPE (lhs);
+  int_type = TREE_TYPE (gimple_call_arg (stmt, 1));
+
+  create_output_operand (&ops[0], target, TYPE_MODE (float_type));
+  create_input_operand (&ops[1], op, TYPE_MODE (float_type));
+
+  expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type),
+				      TYPE_MODE (int_type)), 2, ops);
+}
+
 /* Expand LOAD_LANES call STMT using optab OPTAB.  */
 
 static void
@@ -3848,6 +3872,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p
 #define direct_check_ptrs_optab_supported_p direct_optab_supported_p
 #define direct_vec_set_optab_supported_p direct_optab_supported_p
+#define direct_ftrunc_int_optab_supported_p convert_optab_supported_p
 
 /* Return the optab used by internal function FN.  */
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 61516dab66dc90e016622c47e832b790db8ea867..976869e5dba2a3d067b6eb64c7bddde04ba6fb78 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -69,6 +69,9 @@ along with GCC; see the file COPYING3.  If not see
 
    - fold_left: for scalar = FN (scalar, vector), keyed off the vector mode
    - check_ptrs: used for check_{raw,war}_ptrs
+   - ftrunc_int: a unary conversion optab that takes and returns values of the
+   same mode, but internally converts via another mode.  This second mode is
+   specified using a dummy final function argument.
 
    DEF_INTERNAL_SIGNED_OPTAB_FN defines an internal function that
    maps to one of two optabs, depending on the signedness of an input.
@@ -298,6 +301,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC_INT, ECF_CONST, ftruncint, ftrunc_int)
 
 /* Binary math functions.  */
 DEF_INTERNAL_FLT_FN (ATAN2, ECF_CONST, atan2, binary)
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 21b1ce43df6a926a59e4b9eaf9ce06d2440845e7..416b5fe42e356acf97c90efaebaa207d29551769 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -144,12 +144,14 @@ struct direct_internal_fn_info
      function isn't directly mapped to an optab.  */
   signed int type0 : 8;
   signed int type1 : 8;
+  /* Indicates whether type1 is a scalar type.  */
+  unsigned int type1_is_scalar_p : 1;
   /* True if the function is pointwise, so that it can be vectorized by
      converting the return type and all argument types to vectors of the
      same number of elements.  E.g. we can vectorize an IFN_SQRT on
      floats as an IFN_SQRT on vectors of N floats.
 
-     This only needs 1 bit, but occupies the full 16 to ensure a nice
+     This only needs 1 bit, but occupies the full 15 to ensure a nice
      layout.  */
   unsigned int vectorizable : 16;
 };
diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188e17056b9c9af790e9725e3e65bff4..c835a3922115c775131160679060fadccbdf1633 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4022,12 +4022,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    trapping behaviour, so require !flag_trapping_math. */
 #if GIMPLE
 (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-	&& types_match (type, TREE_TYPE (@0))
-	&& direct_internal_fn_supported_p (IFN_TRUNC, type,
-					  OPTIMIZE_FOR_BOTH))
-      (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+    (with {
+      tree int_type = element_type (@1);
+     }
+     (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+	  && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type, int_type,
+					     OPTIMIZE_FOR_BOTH))
+      (IFN_FTRUNC_INT @0 {
+       wide_int_to_tree (int_type, wi::max_value (TYPE_PRECISION (int_type),
+						  SIGNED)); })
+      (if (!flag_trapping_math
+	   && direct_internal_fn_supported_p (IFN_TRUNC, type,
+					      OPTIMIZE_FOR_BOTH))
+       (IFN_TRUNC @0))))))
 #endif
 
 /* If we have a narrowing conversion to an integral type that is fed by a
diff --git a/gcc/optabs.def b/gcc/optabs.def
index a6db2342bed6baf13ecbd84112c8432c6972e6fe..8c1c681a39b5aad4ee2058739b7676c0c5829ace 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -63,6 +63,7 @@ OPTAB_CX(fractuns_optab, "fractuns$Q$b$I$a2")
 OPTAB_CL(satfract_optab, "satfract$b$Q$a2", SAT_FRACT, "satfract", gen_satfract_conv_libfunc)
 OPTAB_CL(satfractuns_optab, "satfractuns$I$b$Q$a2", UNSIGNED_SAT_FRACT, "satfractuns", gen_satfractuns_conv_libfunc)
 
+OPTAB_CD(ftruncint_optab, "ftrunc$a$b2")
 OPTAB_CD(sfixtrunc_optab, "fix_trunc$F$b$I$a2")
 OPTAB_CD(ufixtrunc_optab, "fixuns_trunc$F$b$I$a2")
 
diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
index 22c915909385fd4bc1c68a4f58479322e9e90666..6f78491a8fa6dbb6798c637277f71f4b99eea5cb 100644
--- a/gcc/stor-layout.h
+++ b/gcc/stor-layout.h
@@ -36,7 +36,6 @@ extern void place_field (record_layout_info, tree);
 extern void compute_record_mode (tree);
 extern void finish_bitfield_layout (tree);
 extern void finish_record_layout (record_layout_info, int);
-extern unsigned int element_precision (const_tree);
 extern void finalize_size_functions (void);
 extern void fixup_unsigned_type (tree);
 extern void initialize_sizetypes (void);
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
new file mode 100644
index 0000000000000000000000000000000000000000..7a8e53e221e09d3da297f064fa3f4970ad0a4539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
@@ -0,0 +1,92 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c99 -O2" }  */
+/* { dg-require-effective-target aarch64_frintnzx_ok } */
+/* { dg-add-options aarch64_frintnzx } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**	frint32z	s0, s0
+**	ret
+*/
+float
+f1 (float x)
+{
+  int y = x;
+  return (float) y;
+}
+
+/*
+** f2:
+**	frint64z	s0, s0
+**	ret
+*/
+float
+f2 (float x)
+{
+  long long int y = x;
+  return (float) y;
+}
+
+/*
+** f3:
+**	frint32z	d0, d0
+**	ret
+*/
+double
+f3 (double x)
+{
+  int y = x;
+  return (double) y;
+}
+
+/*
+** f4:
+**	frint64z	d0, d0
+**	ret
+*/
+double
+f4 (double x)
+{
+  long long int y = x;
+  return (double) y;
+}
+
+float
+f1_dont (float x)
+{
+  unsigned int y = x;
+  return (float) y;
+}
+
+float
+f2_dont (float x)
+{
+  unsigned long long int y = x;
+  return (float) y;
+}
+
+double
+f3_dont (double x)
+{
+  unsigned int y = x;
+  return (double) y;
+}
+
+double
+f4_dont (double x)
+{
+  unsigned long long int y = x;
+  return (double) y;
+}
+
+double
+f5_dont (double x)
+{
+  signed short y = x;
+  return (double) y;
+}
+
+/* Make sure the 'dont's don't generate any frintNz.  */
+/* { dg-final { scan-assembler-times {frint32z} 2 } } */
+/* { dg-final { scan-assembler-times {frint64z} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_slp.c b/gcc/testsuite/gcc.target/aarch64/frintnz_slp.c
new file mode 100644
index 0000000000000000000000000000000000000000..208a328ce84df3c3ae7654c3db254e81d027c231
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz_slp.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c99 -O3" }  */
+/* { dg-require-effective-target aarch64_frintnzx_ok } */
+/* { dg-add-options aarch64_frintnzx } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define TEST(name,float_type,int_type)					\
+void									\
+name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
+{									\
+  for (int i = 0; i < n; i +=2)					      \
+    {								      \
+      int_type x_i0 = x[i];					      \
+      int_type x_i1 = x[i + 1];					      \
+      y[i] = (float_type) x_i1;					      \
+      y[i + 1] = (float_type) x_i0;				      \
+    }								      \
+}
+
+/*
+** f1:
+**	...
+**	frint32z	v[0-9]+\.4s, v[0-9]+\.4s
+**	...
+*/
+TEST(f1, float, int)
+
+/*
+** f2:
+**	...
+**	frint64z	v[0-9]+\.4s, v[0-9]+\.4s
+**	...
+*/
+TEST(f2, float, long long)
+
+/*
+** f3:
+**	...
+**	frint32z	v[0-9]+\.2d, v[0-9]+\.2d
+**	...
+*/
+TEST(f3, double, int)
+
+/*
+** f4:
+**	...
+**	frint64z	v[0-9]+\.2d, v[0-9]+\.2d
+**	...
+*/
+TEST(f4, double, long long)
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
new file mode 100644
index 0000000000000000000000000000000000000000..52232cb02649a3c3f65ab2fad13fdbd7ff9a0524
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c99 -O3" }  */
+/* { dg-require-effective-target aarch64_frintnzx_ok } */
+/* { dg-add-options aarch64_frintnzx } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define TEST(name,float_type,int_type)					\
+void									\
+name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
+{									\
+  for (int i = 0; i < n; ++i)					      \
+    {								      \
+      int_type x_i = x[i];					      \
+      y[i] = (float_type) x_i;					      \
+    }								      \
+}
+
+/*
+** f1:
+**	...
+**	frint32z	v[0-9]+\.4s, v[0-9]+\.4s
+**	...
+*/
+TEST(f1, float, int)
+
+/*
+** f2:
+**	...
+**	frint64z	v[0-9]+\.4s, v[0-9]+\.4s
+**	...
+*/
+TEST(f2, float, long long)
+
+/*
+** f3:
+**	...
+**	frint32z	v[0-9]+\.2d, v[0-9]+\.2d
+**	...
+*/
+TEST(f3, double, int)
+
+/*
+** f4:
+**	...
+**	frint64z	v[0-9]+\.2d, v[0-9]+\.2d
+**	...
+*/
+TEST(f4, double, long long)
diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
index 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3d80871c4cebd5fb5cac0714b3feee27038f05fd 100644
--- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
+++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math" } */
+/* { dg-skip-if "" { aarch64_frintnzx_ok } } */
 
 float
 f1 (float x)
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 750897d085480d791010c593b81e6910df246169..b76e7371d5c0c37d0b79eabd374ea8178af0c5dc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11729,6 +11729,45 @@ proc check_effective_target_arm_v8_3a_bkey_directive { } {
 	}]
 }
 
+
+proc add_options_for_aarch64_frintnzx { flags } {
+    if { ! [check_effective_target_aarch64_frintnzx_ok] } {
+        return "$flags"
+    }
+    global et_aarch64_frintnzx_flags
+    return "$flags $et_aarch64_frintnzx_flags"
+}
+
+# Return 1 if the target supports Armv8.5-A scalar and Advanced SIMD
+# FRINT32[ZX] andd FRINT64[ZX] instructions, 0 otherwise. The test is valid for
+# AArch64.
+proc check_effective_target_aarch64_frintnzx_ok_nocache { } {
+    global et_aarch64_frintnzx_flags
+    set et_aarch64_frintnzx_flags ""
+    if { ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    foreach flags {"" "-march=armv8.5-a"} {
+	if { [check_no_compiler_messages_nocache \
+		  aarch64_frintnzx_ok assembly {
+	    #if !defined (__ARM_FEATURE_FRINT)
+	    #error "__ARM_FEATURE_FRINT not defined"
+	    #endif
+	} $flags ] } {
+	    set et_aarch64_frintnzx_flags $flags
+	    return 1;
+	}
+    }
+
+    return 0;
+}
+
+proc check_effective_target_aarch64_frintnzx_ok { } {
+    return [check_cached_effective_target aarch64_frintnzx_ok \
+                check_effective_target_aarch64_frintnzx_ok_nocache]
+}
+
 # Return 1 if the target supports executing the Armv8.1-M Mainline Low
 # Overhead Loop, 0 otherwise.  The test is valid for ARM.
 
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index e54414f6befadcea95419bf9b84904b9cb4245b9..3d01e0506699b48b6e64c9fea7d37571292cbe68 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6081,6 +6081,35 @@ vect_prologue_cost_for_slp (slp_tree node,
     }
 }
 
+/* Check whether this NODE contains statements with an expected scalar argument
+   at INDEX.  */
+
+static bool
+check_scalar_arg_ok (slp_tree node, int index)
+{
+  if (index != 1)
+    return false;
+
+  enum stmt_vec_info_type stmt_type
+    = STMT_VINFO_TYPE (SLP_TREE_REPRESENTATIVE (node));
+
+  if (stmt_type == shift_vec_info_type)
+    return true;
+  else if (stmt_type == call_vec_info_type)
+    {
+      combined_fn cfn
+	= gimple_call_combined_fn (SLP_TREE_REPRESENTATIVE (node)->stmt);
+      if (!internal_fn (cfn))
+	return false;
+      internal_fn ifn = as_internal_fn (cfn);
+
+      return direct_internal_fn_p (ifn)
+	&& direct_internal_fn (ifn).type1_is_scalar_p;
+    }
+
+  return false;
+}
+
 /* Analyze statements contained in SLP tree NODE after recursively analyzing
    the subtree.  NODE_INSTANCE contains NODE and VINFO contains INSTANCE.
 
@@ -6180,10 +6209,10 @@ vect_slp_analyze_node_operations (vec_info *vinfo, slp_tree node,
 	    {
 	      /* For shifts with a scalar argument we don't need
 		 to cost or code-generate anything.
+		 The same is true for internal functions where
+		 type1_is_scalar_p.
 		 ???  Represent this more explicitely.  */
-	      gcc_assert ((STMT_VINFO_TYPE (SLP_TREE_REPRESENTATIVE (node))
-			   == shift_vec_info_type)
-			  && j == 1);
+	      gcc_assert (check_scalar_arg_ok (node, j));
 	      continue;
 	    }
 	  unsigned group_size = SLP_TREE_LANES (child);
@@ -8064,6 +8093,7 @@ vect_get_slp_defs (vec_info *,
   if (n == -1U)
     n = SLP_TREE_CHILDREN (slp_node).length ();
 
+  vec_oprnds->reserve (n);
   for (unsigned i = 0; i < n; ++i)
     {
       slp_tree child = SLP_TREE_CHILDREN (slp_node)[i];
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 4e0d75e0d7586ad57a37850d8a70f6182ecb13d0..e77c43efdffba7d7d8b5c625acd6eb333b1ebd11 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1654,13 +1654,16 @@ vect_finish_stmt_generation (vec_info *vinfo,
 }
 
 /* We want to vectorize a call to combined function CFN with function
-   decl FNDECL, using VECTYPE_OUT as the type of the output and VECTYPE_IN
-   as the types of all inputs.  Check whether this is possible using
-   an internal function, returning its code if so or IFN_LAST if not.  */
+   decl FNDECL, using VECTYPE_OUT as the type of the output, VECTYPES to find
+   the type for each argument as described in the direct_internal_fn_info or
+   if none is described there use VECTYPE_IN instead. Check whether vectorizing
+   this call is possible using an internal function, returning its code if so
+   or IFN_LAST if not.  */
 
 static internal_fn
 vectorizable_internal_function (combined_fn cfn, tree fndecl,
-				tree vectype_out, tree vectype_in)
+				tree vectype_out, tree vectype_in,
+				tree *vectypes)
 {
   internal_fn ifn;
   if (internal_fn_p (cfn))
@@ -1672,8 +1675,12 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
       const direct_internal_fn_info &info = direct_internal_fn (ifn);
       if (info.vectorizable)
 	{
-	  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
-	  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+	  tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
+	  if (!type0)
+	    type0 = vectype_in;
+	  tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
+	  if (!type1)
+	    type1 = vectype_in;
 	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
 					      OPTIMIZE_FOR_SPEED))
 	    return ifn;
@@ -3259,6 +3266,23 @@ simple_integer_narrowing (tree vectype_out, tree vectype_in,
   return true;
 }
 
+/* Function vect_get_scalar_oprnds.
+
+   This is a helper function for vectorizable_call to fill VEC_CSTS with the
+   ARGNO'th argument of the calls in SLP_NODE.  */
+
+static void
+vect_get_scalar_oprnds (slp_tree slp_node, int argno, vec<tree> *vec_csts)
+{
+  unsigned j;
+  stmt_vec_info stmt_vinfo;
+  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), j, stmt_vinfo)
+    {
+      gcc_assert (gimple_code (stmt_vinfo->stmt) == GIMPLE_CALL);
+      vec_csts->safe_push (gimple_call_arg (stmt_vinfo->stmt, argno));
+    }
+}
+
 /* Function vectorizable_call.
 
    Check if STMT_INFO performs a function call that can be vectorized.
@@ -3340,9 +3364,20 @@ vectorizable_call (vec_info *vinfo,
       rhs_type = unsigned_type_node;
     }
 
+  /* The argument that is not of the same type as the others.  */
   int mask_opno = -1;
+  int scalar_opno = -1;
   if (internal_fn_p (cfn))
-    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
+    {
+      internal_fn ifn = as_internal_fn (cfn);
+      if (direct_internal_fn_p (ifn)
+	  && direct_internal_fn (ifn).type1_is_scalar_p)
+	scalar_opno = direct_internal_fn (ifn).type1;
+      else
+	/* For masked operations this represents the argument that carries the
+	   mask.  */
+	mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
+    }
 
   for (i = 0; i < nargs; i++)
     {
@@ -3353,6 +3388,11 @@ vectorizable_call (vec_info *vinfo,
 	    return false;
 	  continue;
 	}
+      else if ((int) i == scalar_opno)
+	{
+	  vectypes[i] = TREE_TYPE (gimple_call_arg (stmt, i));
+	  continue;
+	}
 
       if (!vect_is_simple_use (vinfo, stmt_info, slp_node,
 			       i, &op, &slp_op[i], &dt[i], &vectypes[i]))
@@ -3467,8 +3507,8 @@ vectorizable_call (vec_info *vinfo,
 	  || (modifier == NARROW
 	      && simple_integer_narrowing (vectype_out, vectype_in,
 					   &convert_code))))
-    ifn = vectorizable_internal_function (cfn, callee, vectype_out,
-					  vectype_in);
+    ifn = vectorizable_internal_function (cfn, callee, vectype_out, vectype_in,
+					  &vectypes[0]);
 
   /* If that fails, try asking for a target-specific built-in function.  */
   if (ifn == IFN_LAST)
@@ -3608,6 +3648,10 @@ vectorizable_call (vec_info *vinfo,
 
 	      vect_get_slp_defs (vinfo, slp_node, &vec_defs);
 	      vec_oprnds0 = vec_defs[0];
+	      unsigned int children_n = SLP_TREE_CHILDREN (slp_node).length ();
+	      auto_vec<tree> scalar_defs (children_n);
+	      if (scalar_opno > -1)
+		vect_get_scalar_oprnds (slp_node, scalar_opno, &scalar_defs);
 
 	      /* Arguments are ready.  Create the new vector stmt.  */
 	      FOR_EACH_VEC_ELT (vec_oprnds0, i, vec_oprnd0)
@@ -3624,8 +3668,15 @@ vectorizable_call (vec_info *vinfo,
 		  size_t k;
 		  for (k = 0; k < nargs; k++)
 		    {
-		      vec<tree> vec_oprndsk = vec_defs[k];
-		      vargs[varg++] = vec_oprndsk[i];
+		      tree operand;
+		      if (scalar_opno == (int) k)
+			operand = scalar_defs[i];
+		      else
+			{
+			  vec<tree> vec_oprndsk = vec_defs[k];
+			  operand = vec_oprndsk[i];
+			}
+		      vargs[varg++] = operand;
 		    }
 		  if (masked_loop_p && reduc_idx >= 0)
 		    vargs[varg++] = vargs[reduc_idx + 1];
diff --git a/gcc/tree.h b/gcc/tree.h
index d6a5fdf6d81bf10044249c015083e6db8b35b519..42b2ad74d260041118f079f05083f7498a60fba4 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -6698,4 +6698,12 @@ extern tree get_attr_nonstring_decl (tree, tree * = NULL);
 
 extern int get_target_clone_attr_len (tree);
 
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
+extern tree element_type (tree);
+
+/* Return the precision of the type, or for a complex or vector type the
+   precision of the type of its elements.  */
+extern unsigned int element_precision (const_tree);
+
 #endif  /* GCC_TREE_H  */
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 172098787dd924ec23101e7495cf0e67ca47d787..127c2d1fad3fe488ff4d60119ce0ec8c13a78528 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -6699,11 +6699,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
   return true;
 }
 
-/* Return the precision of the type, or for a complex or vector type the
-   precision of the type of its elements.  */
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
 
-unsigned int
-element_precision (const_tree type)
+tree
+element_type (tree type)
 {
   if (!TYPE_P (type))
     type = TREE_TYPE (type);
@@ -6711,7 +6711,16 @@ element_precision (const_tree type)
   if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
     type = TREE_TYPE (type);
 
-  return TYPE_PRECISION (type);
+  return type;
+}
+
+/* Return the precision of the type, or for a complex or vector type the
+   precision of the type of its elements.  */
+
+unsigned int
+element_precision (const_tree type)
+{
+  return TYPE_PRECISION (element_type (const_cast<tree> (type)));
 }
 
 /* Return true if CODE represents an associative tree code.  Otherwise
  
Richard Biener Nov. 7, 2022, 11:05 a.m. UTC | #20
On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:

> Sorry for the delay, just been reminded I still had this patch outstanding
> from last stage 1. Hopefully since it has been mostly reviewed it could go in
> for this stage 1?
> 
> I addressed the comments and gave the slp-part of vectorizable_call some TLC
> to make it work.
> 
> I also changed vect_get_slp_defs as I noticed that the call from
> vectorizable_call was creating an auto_vec with 'nargs' that might be less
> than the number of children in the slp_node

how so?  Please fix that in the caller.  It looks like it probably
shoud use vect_nargs instead?

> , so that quick_push might not be
> safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't
> actually come across any failure because of it though. Happy to split this
> into a separate patch if needed.
> 
> Bootstrapped and regression tested on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu.
> 
> OK for trunk?

I'll leave final approval to Richard but

-     This only needs 1 bit, but occupies the full 16 to ensure a nice
+     This only needs 1 bit, but occupies the full 15 to ensure a nice
      layout.  */
   unsigned int vectorizable : 16;

you don't actually change the width of the bitfield.  I would find
it more natural to have

  signed int type0 : 7;
  signed int type0_vtrans : 1;
  signed int type1 : 7;
  signed int type1_vtrans : 1;

with typeN_vtrans specifying how the types transform when vectorized.
I would imagine another variant we could need is narrow/widen
according to either result or other argument type?  That said,
just your flag would then be

  signed int type0 : 7;
  signed int pad   : 1;
  signed int type1 : 7;
  signed int type1_vect_as_scalar : 1; 

?

> gcc/ChangeLog:
> 
>         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New
> pattern.
>         * config/aarch64/iterators.md (FRINTNZ): New iterator.
>         (frintnz_mode): New int attribute.
>         (VSFDF): Make iterator conditional.
>         * internal-fn.def (FTRUNC_INT): New IFN.
>         * internal-fn.cc (ftrunc_int_direct): New define.
>         (expand_ftrunc_int_optab_fn): New custom expander.
>         (direct_ftrunc_int_optab_supported_p): New supported_p.
>         * internal-fn.h (direct_internal_fn_info): Add new member
>         type1_is_scalar_p.
>         * match.pd: Add to the existing TRUNC pattern match.
>         * optabs.def (ftrunc_int): New entry.
>         * stor-layout.h (element_precision): Moved from here...
>         * tree.h (element_precision): ... to here.
>         (element_type): New declaration.
>         * tree.cc (element_type): New function.
>         (element_precision): Changed to use element_type.
>         * tree-vect-stmts.cc (vectorizable_internal_function): Add 
> support for
>         IFNs with different input types.
>         (vect_get_scalar_oprnds): New function.
>         (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
>         * tree-vect-slp.cc (check_scalar_arg_ok): New function.
>         (vect_slp_analyze_node_operations): Use check_scalar_arg_ok.
>         (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push.
>         * doc/md.texi: New entry for ftrunc pattern name.
>         * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz
> instructions available.
>         * lib/target-supports.exp: Added aarch64_frintnzx_ok target and
> aarch64_frintz options.
>         * gcc.target/aarch64/frintnz.c: New test.
>         * gcc.target/aarch64/frintnz_vec.c: New test.
>         * gcc.target/aarch64/frintnz_slp.c: New test.
>
  
Andre Vieira (lists) Nov. 7, 2022, 2:19 p.m. UTC | #21
On 07/11/2022 11:05, Richard Biener wrote:
> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:
>
>> Sorry for the delay, just been reminded I still had this patch outstanding
>> from last stage 1. Hopefully since it has been mostly reviewed it could go in
>> for this stage 1?
>>
>> I addressed the comments and gave the slp-part of vectorizable_call some TLC
>> to make it work.
>>
>> I also changed vect_get_slp_defs as I noticed that the call from
>> vectorizable_call was creating an auto_vec with 'nargs' that might be less
>> than the number of children in the slp_node
> how so?  Please fix that in the caller.  It looks like it probably
> shoud use vect_nargs instead?
Well that was my first intuition, but when I looked at it further the 
variant it's calling:
void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec<vec<tree> > 
*vec_oprnds, unsigned n)

Is actually creating a vector of vectors of slp defs. So for each child 
of slp_node it calls:
void vect_get_slp_defs (slp_tree slp_node, vec<tree> *vec_defs)

Which returns a vector of vectorized defs. So vect_nargs would be the 
right size for the inner vec<tree> of vec_defs, but the outer should 
have the same number of elements as the original slp_node has children.

However, at the call site (vectorizable_call), the operand we pass to 
vect_get_slp_defs 'vec_defs', is initialized before the code-path is 
specialized for slp_node. I'll go see if I can change the call site to 
not have to do that, given the continue at the end of the if (slp_node) 
BB I don't think it needs to use vec_defs after it, but it may require 
some massaging to be able to define it separately for each code-path.

>
>> , so that quick_push might not be
>> safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't
>> actually come across any failure because of it though. Happy to split this
>> into a separate patch if needed.
>>
>> Bootstrapped and regression tested on aarch64-none-linux-gnu and
>> x86_64-pc-linux-gnu.
>>
>> OK for trunk?
> I'll leave final approval to Richard but
>
> -     This only needs 1 bit, but occupies the full 16 to ensure a nice
> +     This only needs 1 bit, but occupies the full 15 to ensure a nice
>        layout.  */
>     unsigned int vectorizable : 16;
>
> you don't actually change the width of the bitfield.  I would find
> it more natural to have
>
>    signed int type0 : 7;
>    signed int type0_vtrans : 1;
>    signed int type1 : 7;
>    signed int type1_vtrans : 1;
>
> with typeN_vtrans specifying how the types transform when vectorized.
> I would imagine another variant we could need is narrow/widen
> according to either result or other argument type?  That said,
> just your flag would then be
>
>    signed int type0 : 7;
>    signed int pad   : 1;
>    signed int type1 : 7;
>    signed int type1_vect_as_scalar : 1;
>
> ?
That's a cool idea! I'll leave it as a single bit for now like that, if 
we want to re-use it for multiple transformations we will obviously need 
to rename & give it more bits.
>
>> gcc/ChangeLog:
>>
>>          * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New
>> pattern.
>>          * config/aarch64/iterators.md (FRINTNZ): New iterator.
>>          (frintnz_mode): New int attribute.
>>          (VSFDF): Make iterator conditional.
>>          * internal-fn.def (FTRUNC_INT): New IFN.
>>          * internal-fn.cc (ftrunc_int_direct): New define.
>>          (expand_ftrunc_int_optab_fn): New custom expander.
>>          (direct_ftrunc_int_optab_supported_p): New supported_p.
>>          * internal-fn.h (direct_internal_fn_info): Add new member
>>          type1_is_scalar_p.
>>          * match.pd: Add to the existing TRUNC pattern match.
>>          * optabs.def (ftrunc_int): New entry.
>>          * stor-layout.h (element_precision): Moved from here...
>>          * tree.h (element_precision): ... to here.
>>          (element_type): New declaration.
>>          * tree.cc (element_type): New function.
>>          (element_precision): Changed to use element_type.
>>          * tree-vect-stmts.cc (vectorizable_internal_function): Add
>> support for
>>          IFNs with different input types.
>>          (vect_get_scalar_oprnds): New function.
>>          (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
>>          * tree-vect-slp.cc (check_scalar_arg_ok): New function.
>>          (vect_slp_analyze_node_operations): Use check_scalar_arg_ok.
>>          (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push.
>>          * doc/md.texi: New entry for ftrunc pattern name.
>>          * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
>>
>> gcc/testsuite/ChangeLog:
>>
>>          * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz
>> instructions available.
>>          * lib/target-supports.exp: Added aarch64_frintnzx_ok target and
>> aarch64_frintz options.
>>          * gcc.target/aarch64/frintnz.c: New test.
>>          * gcc.target/aarch64/frintnz_vec.c: New test.
>>          * gcc.target/aarch64/frintnz_slp.c: New test.
>>
  
Richard Biener Nov. 7, 2022, 2:56 p.m. UTC | #22
On Mon, 7 Nov 2022, Andre Vieira (lists) wrote:

> 
> On 07/11/2022 11:05, Richard Biener wrote:
> > On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:
> >
> >> Sorry for the delay, just been reminded I still had this patch outstanding
> >> from last stage 1. Hopefully since it has been mostly reviewed it could go
> >> in
> >> for this stage 1?
> >>
> >> I addressed the comments and gave the slp-part of vectorizable_call some
> >> TLC
> >> to make it work.
> >>
> >> I also changed vect_get_slp_defs as I noticed that the call from
> >> vectorizable_call was creating an auto_vec with 'nargs' that might be less
> >> than the number of children in the slp_node
> > how so?  Please fix that in the caller.  It looks like it probably
> > shoud use vect_nargs instead?
> Well that was my first intuition, but when I looked at it further the variant
> it's calling:
> void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec<vec<tree> >
> *vec_oprnds, unsigned n)
> 
> Is actually creating a vector of vectors of slp defs. So for each child of
> slp_node it calls:
> void vect_get_slp_defs (slp_tree slp_node, vec<tree> *vec_defs)
> 
> Which returns a vector of vectorized defs. So vect_nargs would be the right
> size for the inner vec<tree> of vec_defs, but the outer should have the same
> number of elements as the original slp_node has children.

No, the inner vector is the vector of vectors for each arg, the outer
vector should be the one for each argument.  Hm, that was a confusing
sentence.

That said, the number of SLP children of a call node should eventually
be the number of arguments of the call (plus masks, etc.).  So it
looks about correct besides the vec_nargs issue?

> 
> However, at the call site (vectorizable_call), the operand we pass to
> vect_get_slp_defs 'vec_defs', is initialized before the code-path is
> specialized for slp_node. I'll go see if I can change the call site to not
> have to do that, given the continue at the end of the if (slp_node) BB I don't
> think it needs to use vec_defs after it, but it may require some massaging to
> be able to define it separately for each code-path.
> 
> >
> >> , so that quick_push might not be
> >> safe as is, so I added the reserve (n) to ensure it's safe to push. I
> >> didn't
> >> actually come across any failure because of it though. Happy to split this
> >> into a separate patch if needed.
> >>
> >> Bootstrapped and regression tested on aarch64-none-linux-gnu and
> >> x86_64-pc-linux-gnu.
> >>
> >> OK for trunk?
> > I'll leave final approval to Richard but
> >
> > -     This only needs 1 bit, but occupies the full 16 to ensure a nice
> > +     This only needs 1 bit, but occupies the full 15 to ensure a nice
> >        layout.  */
> >     unsigned int vectorizable : 16;
> >
> > you don't actually change the width of the bitfield.  I would find
> > it more natural to have
> >
> >    signed int type0 : 7;
> >    signed int type0_vtrans : 1;
> >    signed int type1 : 7;
> >    signed int type1_vtrans : 1;
> >
> > with typeN_vtrans specifying how the types transform when vectorized.
> > I would imagine another variant we could need is narrow/widen
> > according to either result or other argument type?  That said,
> > just your flag would then be
> >
> >    signed int type0 : 7;
> >    signed int pad   : 1;
> >    signed int type1 : 7;
> >    signed int type1_vect_as_scalar : 1;
> >
> > ?
> That's a cool idea! I'll leave it as a single bit for now like that, if we
> want to re-use it for multiple transformations we will obviously need to
> rename & give it more bits.
> >
> >> gcc/ChangeLog:
> >>
> >>         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New
> >> pattern.
> >>          * config/aarch64/iterators.md (FRINTNZ): New iterator.
> >>          (frintnz_mode): New int attribute.
> >>          (VSFDF): Make iterator conditional.
> >>          * internal-fn.def (FTRUNC_INT): New IFN.
> >>          * internal-fn.cc (ftrunc_int_direct): New define.
> >>          (expand_ftrunc_int_optab_fn): New custom expander.
> >>          (direct_ftrunc_int_optab_supported_p): New supported_p.
> >>          * internal-fn.h (direct_internal_fn_info): Add new member
> >>          type1_is_scalar_p.
> >>          * match.pd: Add to the existing TRUNC pattern match.
> >>          * optabs.def (ftrunc_int): New entry.
> >>          * stor-layout.h (element_precision): Moved from here...
> >>          * tree.h (element_precision): ... to here.
> >>          (element_type): New declaration.
> >>          * tree.cc (element_type): New function.
> >>          (element_precision): Changed to use element_type.
> >>          * tree-vect-stmts.cc (vectorizable_internal_function): Add
> >> support for
> >>          IFNs with different input types.
> >>          (vect_get_scalar_oprnds): New function.
> >>          (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
> >>          * tree-vect-slp.cc (check_scalar_arg_ok): New function.
> >>          (vect_slp_analyze_node_operations): Use check_scalar_arg_ok.
> >>          (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push.
> >>          * doc/md.texi: New entry for ftrunc pattern name.
> >>          * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz
> >> instructions available.
> >>         * lib/target-supports.exp: Added aarch64_frintnzx_ok target and
> >> aarch64_frintz options.
> >>          * gcc.target/aarch64/frintnz.c: New test.
> >>          * gcc.target/aarch64/frintnz_vec.c: New test.
> >>          * gcc.target/aarch64/frintnz_slp.c: New test.
> >>
>
  
Andre Vieira (lists) Nov. 9, 2022, 11:33 a.m. UTC | #23
On 07/11/2022 14:56, Richard Biener wrote:
> On Mon, 7 Nov 2022, Andre Vieira (lists) wrote:
>
>> On 07/11/2022 11:05, Richard Biener wrote:
>>> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:
>>>
>>>> Sorry for the delay, just been reminded I still had this patch outstanding
>>>> from last stage 1. Hopefully since it has been mostly reviewed it could go
>>>> in
>>>> for this stage 1?
>>>>
>>>> I addressed the comments and gave the slp-part of vectorizable_call some
>>>> TLC
>>>> to make it work.
>>>>
>>>> I also changed vect_get_slp_defs as I noticed that the call from
>>>> vectorizable_call was creating an auto_vec with 'nargs' that might be less
>>>> than the number of children in the slp_node
>>> how so?  Please fix that in the caller.  It looks like it probably
>>> shoud use vect_nargs instead?
>> Well that was my first intuition, but when I looked at it further the variant
>> it's calling:
>> void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec<vec<tree> >
>> *vec_oprnds, unsigned n)
>>
>> Is actually creating a vector of vectors of slp defs. So for each child of
>> slp_node it calls:
>> void vect_get_slp_defs (slp_tree slp_node, vec<tree> *vec_defs)
>>
>> Which returns a vector of vectorized defs. So vect_nargs would be the right
>> size for the inner vec<tree> of vec_defs, but the outer should have the same
>> number of elements as the original slp_node has children.
> No, the inner vector is the vector of vectors for each arg, the outer
> vector should be the one for each argument.  Hm, that was a confusing
> sentence.
>
> That said, the number of SLP children of a call node should eventually
> be the number of arguments of the call (plus masks, etc.).  So it
> looks about correct besides the vec_nargs issue?
Yeah you are right, I misunderstood what the children were, so you have 
a child node for each argument of the call. Though, since you iterate 
over the 'scalar' arguments of the call I actually think 'nargs' was 
correct to begin with, which would explain why this never went wrong... 
So I think it is actually correct as is, I must have gotten confused by 
some earlier investigation into how to deal with the scalar arguments... 
Sorry for the noise, I'll undo these changes to the patch.
  
Richard Sandiford Nov. 15, 2022, 6:24 p.m. UTC | #24
"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> On 07/11/2022 11:05, Richard Biener wrote:
>> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:
>>
>>> Sorry for the delay, just been reminded I still had this patch outstanding
>>> from last stage 1. Hopefully since it has been mostly reviewed it could go in
>>> for this stage 1?
>>>
>>> I addressed the comments and gave the slp-part of vectorizable_call some TLC
>>> to make it work.
>>>
>>> I also changed vect_get_slp_defs as I noticed that the call from
>>> vectorizable_call was creating an auto_vec with 'nargs' that might be less
>>> than the number of children in the slp_node
>> how so?  Please fix that in the caller.  It looks like it probably
>> shoud use vect_nargs instead?
> Well that was my first intuition, but when I looked at it further the 
> variant it's calling:
> void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec<vec<tree> > 
> *vec_oprnds, unsigned n)
>
> Is actually creating a vector of vectors of slp defs. So for each child 
> of slp_node it calls:
> void vect_get_slp_defs (slp_tree slp_node, vec<tree> *vec_defs)
>
> Which returns a vector of vectorized defs. So vect_nargs would be the 
> right size for the inner vec<tree> of vec_defs, but the outer should 
> have the same number of elements as the original slp_node has children.
>
> However, at the call site (vectorizable_call), the operand we pass to 
> vect_get_slp_defs 'vec_defs', is initialized before the code-path is 
> specialized for slp_node. I'll go see if I can change the call site to 
> not have to do that, given the continue at the end of the if (slp_node) 
> BB I don't think it needs to use vec_defs after it, but it may require 
> some massaging to be able to define it separately for each code-path.
>
>>
>>> , so that quick_push might not be
>>> safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't
>>> actually come across any failure because of it though. Happy to split this
>>> into a separate patch if needed.
>>>
>>> Bootstrapped and regression tested on aarch64-none-linux-gnu and
>>> x86_64-pc-linux-gnu.
>>>
>>> OK for trunk?
>> I'll leave final approval to Richard but
>>
>> -     This only needs 1 bit, but occupies the full 16 to ensure a nice
>> +     This only needs 1 bit, but occupies the full 15 to ensure a nice
>>        layout.  */
>>     unsigned int vectorizable : 16;
>>
>> you don't actually change the width of the bitfield.  I would find
>> it more natural to have
>>
>>    signed int type0 : 7;
>>    signed int type0_vtrans : 1;
>>    signed int type1 : 7;
>>    signed int type1_vtrans : 1;
>>
>> with typeN_vtrans specifying how the types transform when vectorized.
>> I would imagine another variant we could need is narrow/widen
>> according to either result or other argument type?  That said,
>> just your flag would then be
>>
>>    signed int type0 : 7;
>>    signed int pad   : 1;
>>    signed int type1 : 7;
>>    signed int type1_vect_as_scalar : 1;
>>
>> ?
> That's a cool idea! I'll leave it as a single bit for now like that, if 
> we want to re-use it for multiple transformations we will obviously need 
> to rename & give it more bits.

I think we should steal bits from vectorizable rather than shrink
type0 and type1 though.  Then add a 14-bit padding field to show
how many bits are left.

> @@ -3340,9 +3364,20 @@ vectorizable_call (vec_info *vinfo,
>        rhs_type = unsigned_type_node;
>      }
> 
> +  /* The argument that is not of the same type as the others.  */
>    int mask_opno = -1;
> +  int scalar_opno = -1;
>    if (internal_fn_p (cfn))
> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> +    {
> +      internal_fn ifn = as_internal_fn (cfn);
> +      if (direct_internal_fn_p (ifn)
> +	  && direct_internal_fn (ifn).type1_is_scalar_p)
> +	scalar_opno = direct_internal_fn (ifn).type1;
> +      else
> +	/* For masked operations this represents the argument that carries the
> +	   mask.  */
> +	mask_opno = internal_fn_mask_index (as_internal_fn (cfn));

This doesn't seem logically like an else.  We should do both.

LGTM otherwise for the bits outside match.pd.  If Richard's happy with
the match.pd bits then I think the patch is OK with those changes and
without the vect_get_slp_defs thing (as you mentioned downthread).

Thanks,
Richard


>>
>>> gcc/ChangeLog:
>>>
>>>          * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New
>>> pattern.
>>>          * config/aarch64/iterators.md (FRINTNZ): New iterator.
>>>          (frintnz_mode): New int attribute.
>>>          (VSFDF): Make iterator conditional.
>>>          * internal-fn.def (FTRUNC_INT): New IFN.
>>>          * internal-fn.cc (ftrunc_int_direct): New define.
>>>          (expand_ftrunc_int_optab_fn): New custom expander.
>>>          (direct_ftrunc_int_optab_supported_p): New supported_p.
>>>          * internal-fn.h (direct_internal_fn_info): Add new member
>>>          type1_is_scalar_p.
>>>          * match.pd: Add to the existing TRUNC pattern match.
>>>          * optabs.def (ftrunc_int): New entry.
>>>          * stor-layout.h (element_precision): Moved from here...
>>>          * tree.h (element_precision): ... to here.
>>>          (element_type): New declaration.
>>>          * tree.cc (element_type): New function.
>>>          (element_precision): Changed to use element_type.
>>>          * tree-vect-stmts.cc (vectorizable_internal_function): Add
>>> support for
>>>          IFNs with different input types.
>>>          (vect_get_scalar_oprnds): New function.
>>>          (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
>>>          * tree-vect-slp.cc (check_scalar_arg_ok): New function.
>>>          (vect_slp_analyze_node_operations): Use check_scalar_arg_ok.
>>>          (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push.
>>>          * doc/md.texi: New entry for ftrunc pattern name.
>>>          * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>          * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz
>>> instructions available.
>>>          * lib/target-supports.exp: Added aarch64_frintnzx_ok target and
>>> aarch64_frintz options.
>>>          * gcc.target/aarch64/frintnz.c: New test.
>>>          * gcc.target/aarch64/frintnz_vec.c: New test.
>>>          * gcc.target/aarch64/frintnz_slp.c: New test.
>>>
  
Richard Biener Nov. 16, 2022, 12:25 p.m. UTC | #25
On Tue, 15 Nov 2022, Richard Sandiford wrote:

> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> > On 07/11/2022 11:05, Richard Biener wrote:
> >> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:
> >>
> >>> Sorry for the delay, just been reminded I still had this patch outstanding
> >>> from last stage 1. Hopefully since it has been mostly reviewed it could go in
> >>> for this stage 1?
> >>>
> >>> I addressed the comments and gave the slp-part of vectorizable_call some TLC
> >>> to make it work.
> >>>
> >>> I also changed vect_get_slp_defs as I noticed that the call from
> >>> vectorizable_call was creating an auto_vec with 'nargs' that might be less
> >>> than the number of children in the slp_node
> >> how so?  Please fix that in the caller.  It looks like it probably
> >> shoud use vect_nargs instead?
> > Well that was my first intuition, but when I looked at it further the 
> > variant it's calling:
> > void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec<vec<tree> > 
> > *vec_oprnds, unsigned n)
> >
> > Is actually creating a vector of vectors of slp defs. So for each child 
> > of slp_node it calls:
> > void vect_get_slp_defs (slp_tree slp_node, vec<tree> *vec_defs)
> >
> > Which returns a vector of vectorized defs. So vect_nargs would be the 
> > right size for the inner vec<tree> of vec_defs, but the outer should 
> > have the same number of elements as the original slp_node has children.
> >
> > However, at the call site (vectorizable_call), the operand we pass to 
> > vect_get_slp_defs 'vec_defs', is initialized before the code-path is 
> > specialized for slp_node. I'll go see if I can change the call site to 
> > not have to do that, given the continue at the end of the if (slp_node) 
> > BB I don't think it needs to use vec_defs after it, but it may require 
> > some massaging to be able to define it separately for each code-path.
> >
> >>
> >>> , so that quick_push might not be
> >>> safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't
> >>> actually come across any failure because of it though. Happy to split this
> >>> into a separate patch if needed.
> >>>
> >>> Bootstrapped and regression tested on aarch64-none-linux-gnu and
> >>> x86_64-pc-linux-gnu.
> >>>
> >>> OK for trunk?
> >> I'll leave final approval to Richard but
> >>
> >> -     This only needs 1 bit, but occupies the full 16 to ensure a nice
> >> +     This only needs 1 bit, but occupies the full 15 to ensure a nice
> >>        layout.  */
> >>     unsigned int vectorizable : 16;
> >>
> >> you don't actually change the width of the bitfield.  I would find
> >> it more natural to have
> >>
> >>    signed int type0 : 7;
> >>    signed int type0_vtrans : 1;
> >>    signed int type1 : 7;
> >>    signed int type1_vtrans : 1;
> >>
> >> with typeN_vtrans specifying how the types transform when vectorized.
> >> I would imagine another variant we could need is narrow/widen
> >> according to either result or other argument type?  That said,
> >> just your flag would then be
> >>
> >>    signed int type0 : 7;
> >>    signed int pad   : 1;
> >>    signed int type1 : 7;
> >>    signed int type1_vect_as_scalar : 1;
> >>
> >> ?
> > That's a cool idea! I'll leave it as a single bit for now like that, if 
> > we want to re-use it for multiple transformations we will obviously need 
> > to rename & give it more bits.
> 
> I think we should steal bits from vectorizable rather than shrink
> type0 and type1 though.  Then add a 14-bit padding field to show
> how many bits are left.
> 
> > @@ -3340,9 +3364,20 @@ vectorizable_call (vec_info *vinfo,
> >        rhs_type = unsigned_type_node;
> >      }
> > 
> > +  /* The argument that is not of the same type as the others.  */
> >    int mask_opno = -1;
> > +  int scalar_opno = -1;
> >    if (internal_fn_p (cfn))
> > -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> > +    {
> > +      internal_fn ifn = as_internal_fn (cfn);
> > +      if (direct_internal_fn_p (ifn)
> > +	  && direct_internal_fn (ifn).type1_is_scalar_p)
> > +	scalar_opno = direct_internal_fn (ifn).type1;
> > +      else
> > +	/* For masked operations this represents the argument that carries the
> > +	   mask.  */
> > +	mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> 
> This doesn't seem logically like an else.  We should do both.
> 
> LGTM otherwise for the bits outside match.pd.  If Richard's happy with
> the match.pd bits then I think the patch is OK with those changes and
> without the vect_get_slp_defs thing (as you mentioned downthread).

Yes, the match.pd part looked OK.

> Thanks,
> Richard
> 
> 
> >>
> >>> gcc/ChangeLog:
> >>>
> >>>          * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New
> >>> pattern.
> >>>          * config/aarch64/iterators.md (FRINTNZ): New iterator.
> >>>          (frintnz_mode): New int attribute.
> >>>          (VSFDF): Make iterator conditional.
> >>>          * internal-fn.def (FTRUNC_INT): New IFN.
> >>>          * internal-fn.cc (ftrunc_int_direct): New define.
> >>>          (expand_ftrunc_int_optab_fn): New custom expander.
> >>>          (direct_ftrunc_int_optab_supported_p): New supported_p.
> >>>          * internal-fn.h (direct_internal_fn_info): Add new member
> >>>          type1_is_scalar_p.
> >>>          * match.pd: Add to the existing TRUNC pattern match.
> >>>          * optabs.def (ftrunc_int): New entry.
> >>>          * stor-layout.h (element_precision): Moved from here...
> >>>          * tree.h (element_precision): ... to here.
> >>>          (element_type): New declaration.
> >>>          * tree.cc (element_type): New function.
> >>>          (element_precision): Changed to use element_type.
> >>>          * tree-vect-stmts.cc (vectorizable_internal_function): Add
> >>> support for
> >>>          IFNs with different input types.
> >>>          (vect_get_scalar_oprnds): New function.
> >>>          (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
> >>>          * tree-vect-slp.cc (check_scalar_arg_ok): New function.
> >>>          (vect_slp_analyze_node_operations): Use check_scalar_arg_ok.
> >>>          (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push.
> >>>          * doc/md.texi: New entry for ftrunc pattern name.
> >>>          * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>          * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz
> >>> instructions available.
> >>>          * lib/target-supports.exp: Added aarch64_frintnzx_ok target and
> >>> aarch64_frintz options.
> >>>          * gcc.target/aarch64/frintnz.c: New test.
> >>>          * gcc.target/aarch64/frintnz_vec.c: New test.
> >>>          * gcc.target/aarch64/frintnz_slp.c: New test.
> >>>
>
  

Patch

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4035e061706793849c68ae09bcb2e4b9580ab7b6..ad4e04d7c874da095513442e7d7f247791d8921d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7345,6 +7345,16 @@  (define_insn "despeculate_simpleti"
    (set_attr "speculation_barrier" "true")]
 )
 
+(define_insn "ftrunc<mode><frintnz_mode>2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+        (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+		      FRINTNZ))]
+  "TARGET_FRINT && TARGET_FLOAT
+   && !(VECTOR_MODE_P (<MODE>mode) && !TARGET_SIMD)"
+  "<frintnzs_op>\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
+  [(set_attr "type" "f_rint<stype>")]
+)
+
 (define_insn "aarch64_<frintnzs_op><mode>"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
 	(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..49510488a2a800689e95c399f2e6c967b566516d 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3067,6 +3067,8 @@  (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
 			       UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3482,6 +3484,8 @@  (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s")
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X "frint32x")
 			      (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X "frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_<xx>.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
 			 (UNSPEC_COND_CMPGE_WIDE "ge")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e95005647ca97a495a97d7e184d137..7bd66818144e87e1dca2ef13bef1d6f21f239570 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6175,6 +6175,13 @@  operands; otherwise, it may not.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
+@item @samp{ftrunc@var{m}@var{n}2}
+Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
+the result in operand 0. Both operands have mode @var{m}, which is a scalar or
+vector floating-point mode.
+
+
 @cindex @code{round@var{m}2} instruction pattern
 @item @samp{round@var{m}2}
 Round operand 1 to the nearest integer, rounding away from zero in the
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index bb13c6cce1bf55633760bc14980402f1f0ac1689..64263cbb83548b140f613cb4bf5ce6565373f96d 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -269,6 +269,8 @@  DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC32, ECF_CONST, ftrunc32, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC64, ECF_CONST, ftrunc64, unary)
 
 /* Binary math functions.  */
 DEF_INTERNAL_FLT_FN (ATAN2, ECF_CONST, atan2, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index a319aefa8081ac177981ad425c461f8a771128f4..7937eeb7865ce05d32dd5fdc2a90699a0e15230e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3713,12 +3713,22 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    trapping behaviour, so require !flag_trapping_math. */
 #if GIMPLE
 (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-	&& types_match (type, TREE_TYPE (@0))
-	&& direct_internal_fn_supported_p (IFN_TRUNC, type,
-					  OPTIMIZE_FOR_BOTH))
-      (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+    (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+	 && TYPE_MODE (TREE_TYPE (@1)) == SImode
+	 && direct_internal_fn_supported_p (IFN_FTRUNC32, type,
+					    OPTIMIZE_FOR_BOTH))
+     (IFN_FTRUNC32 @0)
+     (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+	  && TYPE_MODE (TREE_TYPE (@1)) == DImode
+	  && direct_internal_fn_supported_p (IFN_FTRUNC64, type,
+					     OPTIMIZE_FOR_BOTH))
+      (IFN_FTRUNC64 @0)
+      (if (!flag_trapping_math
+	   && direct_internal_fn_supported_p (IFN_TRUNC, type,
+					      OPTIMIZE_FOR_BOTH))
+       (IFN_TRUNC @0))))))
 #endif
 
 /* If we have a narrowing conversion to an integral type that is fed by a
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b889ad2e5a08613db51d16d072080ac6cb48404f..740af19fcf5c53e25663038ff6c2e88cf8d7334f 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -282,6 +282,8 @@  OPTAB_D (floor_optab, "floor$a2")
 OPTAB_D (ceil_optab, "ceil$a2")
 OPTAB_D (btrunc_optab, "btrunc$a2")
 OPTAB_D (nearbyint_optab, "nearbyint$a2")
+OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
+OPTAB_D (ftrunc64_optab, "ftrunc$adi2")
 
 OPTAB_D (acos_optab, "acos$a2")
 OPTAB_D (acosh_optab, "acosh$a2")
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
new file mode 100644
index 0000000000000000000000000000000000000000..2e1971f8aa11d8b95f454d03a03e050a3bf96747
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
@@ -0,0 +1,88 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.5-a" } */
+/* { dg-require-effective-target arm_v8_5a_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**	...
+**	frint32z	s0, s0
+**	...
+*/
+float
+f1 (float x)
+{
+  int y = x;
+  return (float) y;
+}
+
+/*
+** f2:
+**	...
+**	frint64z	s0, s0
+**	...
+*/
+float
+f2 (float x)
+{
+  long long int y = x;
+  return (float) y;
+}
+
+/*
+** f3:
+**	...
+**	frint32z	d0, d0
+**	...
+*/
+double
+f3 (double x)
+{
+  int y = x;
+  return (double) y;
+}
+
+/*
+** f4:
+**	...
+**	frint64z	d0, d0
+**	...
+*/
+double
+f4 (double x)
+{
+  long long int y = x;
+  return (double) y;
+}
+
+float
+f1_dont (float x)
+{
+  unsigned int y = x;
+  return (float) y;
+}
+
+float
+f2_dont (float x)
+{
+  unsigned long long int y = x;
+  return (float) y;
+}
+
+double
+f3_dont (double x)
+{
+  unsigned int y = x;
+  return (double) y;
+}
+
+double
+f4_dont (double x)
+{
+  unsigned long long int y = x;
+  return (double) y;
+}
+
+/* Make sure the 'dont's don't generate any frintNz.  */
+/* { dg-final { scan-assembler-times {frint32z} 2 } } */
+/* { dg-final { scan-assembler-times {frint64z} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
index 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3b34dc3ad79f1406a41ec4c00db10347ba1ca2c4 100644
--- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
+++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
@@ -1,5 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math" } */
+/* { dg-skip-if "" { arm_v8_5a_frintnzx_ok } } */
 
 float
 f1 (float x)
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8cbda192fe0fae59ea208ee43696b4d22c43e61e..0d64acb987614710d84490fce20e49db2ebf2e48 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11365,6 +11365,33 @@  proc check_effective_target_arm_v8_3a_bkey_directive { } {
 	}]
 }
 
+# Return 1 if the target supports Armv8.5-A scalar and Advanced SIMD
+# FRINT32[ZX] and FRINT64[ZX] instructions, 0 otherwise. The test is valid
+# for AArch64.
+
+proc check_effective_target_arm_v8_5a_frintnzx_ok_nocache { } {
+
+    if { ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    if { [check_no_compiler_messages_nocache \
+	      arm_v8_5a_frintnzx_ok assembly {
+	#if !defined (__ARM_FEATURE_FRINT)
+	#error "__ARM_FEATURE_FRINT not defined"
+	#endif
+    } [current_compiler_flags]] } {
+	return 1;
+    }
+
+    return 0;
+}
+
+proc check_effective_target_arm_v8_5a_frintnzx_ok { } {
+    return [check_cached_effective_target arm_v8_5a_frintnzx_ok \
+                check_effective_target_arm_v8_5a_frintnzx_ok_nocache] 
+}
+
 # Return 1 if the target supports executing the Armv8.1-M Mainline Low
 # Overhead Loop, 0 otherwise.  The test is valid for ARM.