diff mbox series

[AArch64] Enable generation of FRINTNZ instructions

Message ID 8225375c-eb9e-f9b3-6bcd-9fbccf2fc87b@arm.com
State New
Headers show
Series [AArch64] Enable generation of FRINTNZ instructions | expand

Commit Message

Andre Vieira \(lists\) Nov. 11, 2021, 5:51 p.m. UTC
Hi,

This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding 
optabs and mappings. It also creates a backend pattern to implement them 
for aarch64 and a match.pd pattern to idiom recognize these.
These IFN's (and optabs) represent a truncation towards zero, as if 
performed by first casting it to a signed integer of 32 or 64 bits and 
then back to the same floating point type/mode.

The match.pd pattern choses to use these, when supported, regardless of 
trapping math, since these new patterns mimic the original behavior of 
truncating through an integer.

I didn't think any of the existing IFN's represented these. I know it's 
a bit late in stage 1, but I thought this might be OK given it's only 
used by a single target and should have very little impact on anything else.

Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New 
pattern.
         * config/aarch64/iterators.md (FRINTZ): New iterator.
         * doc/md.texi: New entry for ftrunc pattern name.
         * internal-fn.def (FTRUNC32): New IFN.
         (FTRUNC64): Likewise.
         * match.pd: Add to the existing TRUNC pattern match.
         * optabs.def (OPTAB_D): New entries for ftrunc.

gcc/testsuite/ChangeLog:

         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.
         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
         * gcc.target/aarch64/frintnz.c: New test.

Comments

Richard Biener Nov. 12, 2021, 10:56 a.m. UTC | #1
On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:

> Hi,
> 
> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
> optabs and mappings. It also creates a backend pattern to implement them for
> aarch64 and a match.pd pattern to idiom recognize these.
> These IFN's (and optabs) represent a truncation towards zero, as if performed
> by first casting it to a signed integer of 32 or 64 bits and then back to the
> same floating point type/mode.
> 
> The match.pd pattern choses to use these, when supported, regardless of
> trapping math, since these new patterns mimic the original behavior of
> truncating through an integer.
> 
> I didn't think any of the existing IFN's represented these. I know it's a bit
> late in stage 1, but I thought this might be OK given it's only used by a
> single target and should have very little impact on anything else.
> 
> Bootstrapped on aarch64-none-linux.
> 
> OK for trunk?

On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
optab (with two modes), so not

+OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
+OPTAB_D (ftrunc64_optab, "ftrunc$adi2")

but

OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")

or so?  I know that gets somewhat awkward for the internal function,
but IMHO we shouldn't tie our hands because of that?

Richard.


> gcc/ChangeLog:
> 
>         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New
> pattern.
>         * config/aarch64/iterators.md (FRINTZ): New iterator.
>         * doc/md.texi: New entry for ftrunc pattern name.
>         * internal-fn.def (FTRUNC32): New IFN.
>         (FTRUNC64): Likewise.
>         * match.pd: Add to the existing TRUNC pattern match.
>         * optabs.def (OPTAB_D): New entries for ftrunc.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz
> instruction available.
>         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
>         * gcc.target/aarch64/frintnz.c: New test.
> 
>
Andre Vieira \(lists\) Nov. 12, 2021, 11:48 a.m. UTC | #2
On 12/11/2021 10:56, Richard Biener wrote:
> On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:
>
>> Hi,
>>
>> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
>> optabs and mappings. It also creates a backend pattern to implement them for
>> aarch64 and a match.pd pattern to idiom recognize these.
>> These IFN's (and optabs) represent a truncation towards zero, as if performed
>> by first casting it to a signed integer of 32 or 64 bits and then back to the
>> same floating point type/mode.
>>
>> The match.pd pattern choses to use these, when supported, regardless of
>> trapping math, since these new patterns mimic the original behavior of
>> truncating through an integer.
>>
>> I didn't think any of the existing IFN's represented these. I know it's a bit
>> late in stage 1, but I thought this might be OK given it's only used by a
>> single target and should have very little impact on anything else.
>>
>> Bootstrapped on aarch64-none-linux.
>>
>> OK for trunk?
> On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
> optab (with two modes), so not
>
> +OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
> +OPTAB_D (ftrunc64_optab, "ftrunc$adi2")
>
> but
>
> OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")
>
> or so?  I know that gets somewhat awkward for the internal function,
> but IMHO we shouldn't tie our hands because of that?
I tried doing this originally, but indeed I couldn't find a way to 
correctly tie the internal function to it.

direct_optab_supported_p with multiple types expect those to be of the 
same mode. I see convert_optab_supported_p does but I don't know how 
that is used...

Any ideas?
Richard Biener Nov. 16, 2021, 12:10 p.m. UTC | #3
On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote:

> 
> On 12/11/2021 10:56, Richard Biener wrote:
> > On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:
> >
> >> Hi,
> >>
> >> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
> >> optabs and mappings. It also creates a backend pattern to implement them
> >> for
> >> aarch64 and a match.pd pattern to idiom recognize these.
> >> These IFN's (and optabs) represent a truncation towards zero, as if
> >> performed
> >> by first casting it to a signed integer of 32 or 64 bits and then back to
> >> the
> >> same floating point type/mode.
> >>
> >> The match.pd pattern choses to use these, when supported, regardless of
> >> trapping math, since these new patterns mimic the original behavior of
> >> truncating through an integer.
> >>
> >> I didn't think any of the existing IFN's represented these. I know it's a
> >> bit
> >> late in stage 1, but I thought this might be OK given it's only used by a
> >> single target and should have very little impact on anything else.
> >>
> >> Bootstrapped on aarch64-none-linux.
> >>
> >> OK for trunk?
> > On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
> > optab (with two modes), so not
> >
> > +OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
> > +OPTAB_D (ftrunc64_optab, "ftrunc$adi2")
> >
> > but
> >
> > OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")
> >
> > or so?  I know that gets somewhat awkward for the internal function,
> > but IMHO we shouldn't tie our hands because of that?
> I tried doing this originally, but indeed I couldn't find a way to correctly
> tie the internal function to it.
> 
> direct_optab_supported_p with multiple types expect those to be of the same
> mode. I see convert_optab_supported_p does but I don't know how that is
> used...
> 
> Any ideas?

No "nice" ones.  The "usual" way is to provide fake arguments that
specify the type/mode.  We could use an integer argument directly
secifying the mode (then the IL would look host dependent - ugh),
or specify a constant zero in the intended mode (less visibly
obvious - but at least with -gimple dumping you'd see the type...).

In any case if people think going with two optabs is OK then
please consider using ftruncsi and ftruncdi instead of 32/64.

Richard.
Andre Vieira \(lists\) Nov. 17, 2021, 1:30 p.m. UTC | #4
On 16/11/2021 12:10, Richard Biener wrote:
> On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote:
>
>> On 12/11/2021 10:56, Richard Biener wrote:
>>> On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:
>>>
>>>> Hi,
>>>>
>>>> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
>>>> optabs and mappings. It also creates a backend pattern to implement them
>>>> for
>>>> aarch64 and a match.pd pattern to idiom recognize these.
>>>> These IFN's (and optabs) represent a truncation towards zero, as if
>>>> performed
>>>> by first casting it to a signed integer of 32 or 64 bits and then back to
>>>> the
>>>> same floating point type/mode.
>>>>
>>>> The match.pd pattern choses to use these, when supported, regardless of
>>>> trapping math, since these new patterns mimic the original behavior of
>>>> truncating through an integer.
>>>>
>>>> I didn't think any of the existing IFN's represented these. I know it's a
>>>> bit
>>>> late in stage 1, but I thought this might be OK given it's only used by a
>>>> single target and should have very little impact on anything else.
>>>>
>>>> Bootstrapped on aarch64-none-linux.
>>>>
>>>> OK for trunk?
>>> On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
>>> optab (with two modes), so not
>>>
>>> +OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
>>> +OPTAB_D (ftrunc64_optab, "ftrunc$adi2")
>>>
>>> but
>>>
>>> OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")
>>>
>>> or so?  I know that gets somewhat awkward for the internal function,
>>> but IMHO we shouldn't tie our hands because of that?
>> I tried doing this originally, but indeed I couldn't find a way to correctly
>> tie the internal function to it.
>>
>> direct_optab_supported_p with multiple types expect those to be of the same
>> mode. I see convert_optab_supported_p does but I don't know how that is
>> used...
>>
>> Any ideas?
> No "nice" ones.  The "usual" way is to provide fake arguments that
> specify the type/mode.  We could use an integer argument directly
> secifying the mode (then the IL would look host dependent - ugh),
> or specify a constant zero in the intended mode (less visibly
> obvious - but at least with -gimple dumping you'd see the type...).
Hi,

So I reworked this to have a single optab and IFN. This required a bit 
of fiddling with custom expander and supported_p functions for the IFN. 
I decided to pass a MAX_INT for the 'int' type to the IFN to be able to 
pass on the size of the int we use as an intermediate cast.  I tried 0 
first, but gcc was being too smart and just demoted it to an 'int' for 
the long long test-cases.

Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New 
pattern.
         * config/aarch64/iterators.md (FRINTZ): New iterator.
         * doc/md.texi: New entry for ftrunc pattern name.
         * internal-fn.def (FTRUNC_INT): New IFN.
         * match.pd: Add to the existing TRUNC pattern match.
         * optabs.def (ftrunc_int): New entry.

gcc/testsuite/ChangeLog:

         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.
         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
         * gcc.target/aarch64/frintnz.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4035e061706793849c68ae09bcb2e4b9580ab7b6..62adbc4cb6bbbe0c856f9fbe451aee08f2dea3b5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7345,6 +7345,14 @@ (define_insn "despeculate_simpleti"
    (set_attr "speculation_barrier" "true")]
 )
 
+(define_expand "ftrunc<mode><frintnz_mode>2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+        (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+		      FRINTNZ))]
+  "TARGET_FRINT && TARGET_FLOAT
+   && !(VECTOR_MODE_P (<MODE>mode) && !TARGET_SIMD)"
+)
+
 (define_insn "aarch64_<frintnzs_op><mode>"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
 	(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..49510488a2a800689e95c399f2e6c967b566516d 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3067,6 +3067,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
 			       UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3482,6 +3484,8 @@ (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s")
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X "frint32x")
 			      (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X "frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_<xx>.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
 			 (UNSPEC_COND_CMPGE_WIDE "ge")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e95005647ca97a495a97d7e184d137..7bd66818144e87e1dca2ef13bef1d6f21f239570 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6175,6 +6175,13 @@ operands; otherwise, it may not.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
+@item @samp{ftrunc@var{m}@var{n}2}
+Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
+the result in operand 0. Both operands have mode @var{m}, which is a scalar or
+vector floating-point mode.
+
+
 @cindex @code{round@var{m}2} instruction pattern
 @item @samp{round@var{m}2}
 Round operand 1 to the nearest integer, rounding away from zero in the
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0cba95411a63423484dda5b1251f47de24e926ba..d8306b50807609573c2ff612e2a83dcf1c55d1de 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -130,6 +130,7 @@ init_internal_fns ()
 #define fold_left_direct { 1, 1, false }
 #define mask_fold_left_direct { 1, 1, false }
 #define check_ptrs_direct { 0, 0, false }
+#define ftrunc_int_direct { 0, 1, true }
 
 const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
@@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_optab optab)
   return convert_optab_handler (optab, imode, vmode);
 }
 
+/* Expand FTRUNC_INT call STMT using optab OPTAB.  */
+
+static void
+expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[2];
+  tree lhs, float_type, int_type;
+  rtx target, op;
+
+  lhs = gimple_call_lhs (stmt);
+  target = expand_normal (lhs);
+  op = expand_normal (gimple_call_arg (stmt, 0));
+
+  float_type = TREE_TYPE (lhs);
+  int_type = TREE_TYPE (gimple_call_arg (stmt, 1));
+
+  create_output_operand (&ops[0], target, TYPE_MODE (float_type));
+  create_input_operand (&ops[1], op, TYPE_MODE (float_type));
+
+  expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type),
+				      TYPE_MODE (int_type)), 2, ops);
+}
+
 /* Expand LOAD_LANES call STMT using optab OPTAB.  */
 
 static void
@@ -3712,6 +3736,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p
 #define direct_check_ptrs_optab_supported_p direct_optab_supported_p
 #define direct_vec_set_optab_supported_p direct_optab_supported_p
+#define direct_ftrunc_int_optab_supported_p convert_optab_supported_p
 
 /* Return the optab used by internal function FN.  */
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index bb13c6cce1bf55633760bc14980402f1f0ac1689..fb97d37cecae17cdb6444e7f3391361b214f0712 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -269,6 +269,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC_INT, ECF_CONST, ftruncint, ftrunc_int)
 
 /* Binary math functions.  */
 DEF_INTERNAL_FLT_FN (ATAN2, ECF_CONST, atan2, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index a319aefa8081ac177981ad425c461f8a771128f4..c37aa023b57838eba80c7a212ff1038eb6eed861 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    trapping behaviour, so require !flag_trapping_math. */
 #if GIMPLE
 (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-	&& types_match (type, TREE_TYPE (@0))
-	&& direct_internal_fn_supported_p (IFN_TRUNC, type,
-					  OPTIMIZE_FOR_BOTH))
-      (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+    (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+	 && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type,
+					    TREE_TYPE (@1), OPTIMIZE_FOR_BOTH))
+     (with {
+      tree int_type = TREE_TYPE (@1);
+      unsigned HOST_WIDE_INT max_int_c
+	= (1ULL << (element_precision (int_type) - 1)) - 1;
+      }
+      (IFN_FTRUNC_INT @0 { build_int_cst (int_type, max_int_c); }))
+     (if (!flag_trapping_math
+	  && direct_internal_fn_supported_p (IFN_TRUNC, type,
+					     OPTIMIZE_FOR_BOTH))
+      (IFN_TRUNC @0)))))
 #endif
 
 /* If we have a narrowing conversion to an integral type that is fed by a
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b889ad2e5a08613db51d16d072080ac6cb48404f..57d259d33409265df3af1646d123e4ab216c34c8 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -63,6 +63,7 @@ OPTAB_CX(fractuns_optab, "fractuns$Q$b$I$a2")
 OPTAB_CL(satfract_optab, "satfract$b$Q$a2", SAT_FRACT, "satfract", gen_satfract_conv_libfunc)
 OPTAB_CL(satfractuns_optab, "satfractuns$I$b$Q$a2", UNSIGNED_SAT_FRACT, "satfractuns", gen_satfractuns_conv_libfunc)
 
+OPTAB_CD(ftruncint_optab, "ftrunc$a$b2")
 OPTAB_CD(sfixtrunc_optab, "fix_trunc$F$b$I$a2")
 OPTAB_CD(ufixtrunc_optab, "fixuns_trunc$F$b$I$a2")
 
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
new file mode 100644
index 0000000000000000000000000000000000000000..2e1971f8aa11d8b95f454d03a03e050a3bf96747
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
@@ -0,0 +1,88 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.5-a" } */
+/* { dg-require-effective-target arm_v8_5a_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**	...
+**	frint32z	s0, s0
+**	...
+*/
+float
+f1 (float x)
+{
+  int y = x;
+  return (float) y;
+}
+
+/*
+** f2:
+**	...
+**	frint64z	s0, s0
+**	...
+*/
+float
+f2 (float x)
+{
+  long long int y = x;
+  return (float) y;
+}
+
+/*
+** f3:
+**	...
+**	frint32z	d0, d0
+**	...
+*/
+double
+f3 (double x)
+{
+  int y = x;
+  return (double) y;
+}
+
+/*
+** f4:
+**	...
+**	frint64z	d0, d0
+**	...
+*/
+double
+f4 (double x)
+{
+  long long int y = x;
+  return (double) y;
+}
+
+float
+f1_dont (float x)
+{
+  unsigned int y = x;
+  return (float) y;
+}
+
+float
+f2_dont (float x)
+{
+  unsigned long long int y = x;
+  return (float) y;
+}
+
+double
+f3_dont (double x)
+{
+  unsigned int y = x;
+  return (double) y;
+}
+
+double
+f4_dont (double x)
+{
+  unsigned long long int y = x;
+  return (double) y;
+}
+
+/* Make sure the 'dont's don't generate any frintNz.  */
+/* { dg-final { scan-assembler-times {frint32z} 2 } } */
+/* { dg-final { scan-assembler-times {frint64z} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
index 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3b34dc3ad79f1406a41ec4c00db10347ba1ca2c4 100644
--- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
+++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math" } */
+/* { dg-skip-if "" { arm_v8_5a_frintnzx_ok } } */
 
 float
 f1 (float x)
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8cbda192fe0fae59ea208ee43696b4d22c43e61e..7fa1659ce734257f3cd96f1e2e50ace4d02dcf51 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11365,6 +11365,33 @@ proc check_effective_target_arm_v8_3a_bkey_directive { } {
 	}]
 }
 
+# Return 1 if the target supports ARMv8.5 scalar and Adv.Simd FRINT32[ZX]
+# and FRINT64[ZX] instructions, 0 otherwise. The test is valid for AArch64.
+# Record the command line options needed.
+
+proc check_effective_target_arm_v8_5a_frintnzx_ok_nocache { } {
+
+    if { ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    if { [check_no_compiler_messages_nocache \
+	      arm_v8_5a_frintnzx_ok assembly {
+	#if !defined (__ARM_FEATURE_FRINT)
+	#error "__ARM_FEATURE_FRINT not defined"
+	#endif
+    } [current_compiler_flags]] } {
+	return 1;
+    }
+
+    return 0;
+}
+
+proc check_effective_target_arm_v8_5a_frintnzx_ok { } {
+    return [check_cached_effective_target arm_v8_5a_frintnzx_ok \
+                check_effective_target_arm_v8_5a_frintnzx_ok_nocache] 
+}
+
 # Return 1 if the target supports executing the Armv8.1-M Mainline Low
 # Overhead Loop, 0 otherwise.  The test is valid for ARM.
Richard Sandiford Nov. 17, 2021, 3:38 p.m. UTC | #5
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 4035e061706793849c68ae09bcb2e4b9580ab7b6..62adbc4cb6bbbe0c856f9fbe451aee08f2dea3b5 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -7345,6 +7345,14 @@ (define_insn "despeculate_simpleti"
>     (set_attr "speculation_barrier" "true")]
>  )
>  
> +(define_expand "ftrunc<mode><frintnz_mode>2"
> +  [(set (match_operand:VSFDF 0 "register_operand" "=w")
> +        (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
> +		      FRINTNZ))]
> +  "TARGET_FRINT && TARGET_FLOAT
> +   && !(VECTOR_MODE_P (<MODE>mode) && !TARGET_SIMD)"
> +)

Probably just me, but this condition seems quite hard to read.
I think it'd be better to add conditions to the VSFDF definition instead,
a bit like we do for the HF entries in VHSDF_HSDF and VHSDF_DF.  I.e.:

(define_mode_iterator VSFDF [(V2SF "TARGET_SIMD")
			     (V4SF "TARGET_SIMD")
			     (V2DF "TARGET_SIMD")
			     (SF "TARGET_FLOAT")
			     (DF "TARGET_FLOAT")])

Then the condition can be "TARGET_FRINT".

Same for the existing aarch64_<frintnzs_op><mode>.

> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index bb13c6cce1bf55633760bc14980402f1f0ac1689..fb97d37cecae17cdb6444e7f3391361b214f0712 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -269,6 +269,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
>  DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
>  DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
>  DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
> +DEF_INTERNAL_OPTAB_FN (FTRUNC_INT, ECF_CONST, ftruncint, ftrunc_int)

ftrunc_int should be described in the comment at the top of the file.
E.g.:

  - ftrunc_int: a unary conversion optab that takes and returns values
    of the same mode, but internally converts via another mode.  This
    second mode is specified using a dummy final function argument.

> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..2e1971f8aa11d8b95f454d03a03e050a3bf96747
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> @@ -0,0 +1,88 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8.5-a" } */
> +/* { dg-require-effective-target arm_v8_5a_frintnzx_ok } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** f1:
> +**	...
> +**	frint32z	s0, s0
> +**	...

Are these functions ever more than just:

f1:
	frint32z	s0, s0
	ret

?  If not, I think we should match that sequence and “defend” the
good codegen.  The problem with ... on both sides is that it's
then not clear why we can rely on register 0 being used.

> +*/
> +float
> +f1 (float x)
> +{
> +  int y = x;
> +  return (float) y;
> +}
> +
> +/*
> +** f2:
> +**	...
> +**	frint64z	s0, s0
> +**	...
> +*/
> +float
> +f2 (float x)
> +{
> +  long long int y = x;
> +  return (float) y;
> +}
> +
> +/*
> +** f3:
> +**	...
> +**	frint32z	d0, d0
> +**	...
> +*/
> +double
> +f3 (double x)
> +{
> +  int y = x;
> +  return (double) y;
> +}
> +
> +/*
> +** f4:
> +**	...
> +**	frint64z	d0, d0
> +**	...
> +*/
> +double
> +f4 (double x)
> +{
> +  long long int y = x;
> +  return (double) y;
> +}
> +
> +float
> +f1_dont (float x)
> +{
> +  unsigned int y = x;
> +  return (float) y;
> +}
> +
> +float
> +f2_dont (float x)
> +{
> +  unsigned long long int y = x;
> +  return (float) y;
> +}
> +
> +double
> +f3_dont (double x)
> +{
> +  unsigned int y = x;
> +  return (double) y;
> +}
> +
> +double
> +f4_dont (double x)
> +{
> +  unsigned long long int y = x;
> +  return (double) y;
> +}
> +
> +/* Make sure the 'dont's don't generate any frintNz.  */
> +/* { dg-final { scan-assembler-times {frint32z} 2 } } */
> +/* { dg-final { scan-assembler-times {frint64z} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
> index 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3b34dc3ad79f1406a41ec4c00db10347ba1ca2c4 100644
> --- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -ffast-math" } */
> +/* { dg-skip-if "" { arm_v8_5a_frintnzx_ok } } */
>  
>  float
>  f1 (float x)
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> index 8cbda192fe0fae59ea208ee43696b4d22c43e61e..7fa1659ce734257f3cd96f1e2e50ace4d02dcf51 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -11365,6 +11365,33 @@ proc check_effective_target_arm_v8_3a_bkey_directive { } {
>  	}]
>  }
>  
> +# Return 1 if the target supports ARMv8.5 scalar and Adv.Simd FRINT32[ZX]

Armv8.5-A

> +# and FRINT64[ZX] instructions, 0 otherwise. The test is valid for AArch64.
> +# Record the command line options needed.
> +
> +proc check_effective_target_arm_v8_5a_frintnzx_ok_nocache { } {
> +
> +    if { ![istarget aarch64*-*-*] } {
> +        return 0;
> +    }
> +
> +    if { [check_no_compiler_messages_nocache \
> +	      arm_v8_5a_frintnzx_ok assembly {
> +	#if !defined (__ARM_FEATURE_FRINT)
> +	#error "__ARM_FEATURE_FRINT not defined"
> +	#endif
> +    } [current_compiler_flags]] } {
> +	return 1;
> +    }
> +
> +    return 0;
> +}
> +
> +proc check_effective_target_arm_v8_5a_frintnzx_ok { } {

The new condition should be documented in sourcebuild.texi, near
the existing arm_v8_* tests.

OK for the non-match.pd parts with those changes.  I don't feel
qualified to review the match.pd bits. :-)

Thanks,
Richard

> +    return [check_cached_effective_target arm_v8_5a_frintnzx_ok \
> +                check_effective_target_arm_v8_5a_frintnzx_ok_nocache] 
> +}
> +
>  # Return 1 if the target supports executing the Armv8.1-M Mainline Low
>  # Overhead Loop, 0 otherwise.  The test is valid for ARM.
>
Richard Biener Nov. 18, 2021, 11:05 a.m. UTC | #6
On Wed, 17 Nov 2021, Andre Vieira (lists) wrote:

> 
> On 16/11/2021 12:10, Richard Biener wrote:
> > On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote:
> >
> >> On 12/11/2021 10:56, Richard Biener wrote:
> >>> On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
> >>>> optabs and mappings. It also creates a backend pattern to implement them
> >>>> for
> >>>> aarch64 and a match.pd pattern to idiom recognize these.
> >>>> These IFN's (and optabs) represent a truncation towards zero, as if
> >>>> performed
> >>>> by first casting it to a signed integer of 32 or 64 bits and then back to
> >>>> the
> >>>> same floating point type/mode.
> >>>>
> >>>> The match.pd pattern choses to use these, when supported, regardless of
> >>>> trapping math, since these new patterns mimic the original behavior of
> >>>> truncating through an integer.
> >>>>
> >>>> I didn't think any of the existing IFN's represented these. I know it's a
> >>>> bit
> >>>> late in stage 1, but I thought this might be OK given it's only used by a
> >>>> single target and should have very little impact on anything else.
> >>>>
> >>>> Bootstrapped on aarch64-none-linux.
> >>>>
> >>>> OK for trunk?
> >>> On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
> >>> optab (with two modes), so not
> >>>
> >>> +OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
> >>> +OPTAB_D (ftrunc64_optab, "ftrunc$adi2")
> >>>
> >>> but
> >>>
> >>> OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")
> >>>
> >>> or so?  I know that gets somewhat awkward for the internal function,
> >>> but IMHO we shouldn't tie our hands because of that?
> >> I tried doing this originally, but indeed I couldn't find a way to
> >> correctly
> >> tie the internal function to it.
> >>
> >> direct_optab_supported_p with multiple types expect those to be of the same
> >> mode. I see convert_optab_supported_p does but I don't know how that is
> >> used...
> >>
> >> Any ideas?
> > No "nice" ones.  The "usual" way is to provide fake arguments that
> > specify the type/mode.  We could use an integer argument directly
> > secifying the mode (then the IL would look host dependent - ugh),
> > or specify a constant zero in the intended mode (less visibly
> > obvious - but at least with -gimple dumping you'd see the type...).
> Hi,
> 
> So I reworked this to have a single optab and IFN. This required a bit of
> fiddling with custom expander and supported_p functions for the IFN. I decided
> to pass a MAX_INT for the 'int' type to the IFN to be able to pass on the size
> of the int we use as an intermediate cast.  I tried 0 first, but gcc was being
> too smart and just demoted it to an 'int' for the long long test-cases.
> 
> Bootstrapped on aarch64-none-linux.
> 
> OK for trunk?

@@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    trapping behaviour, so require !flag_trapping_math. */
 #if GIMPLE
 (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-       && types_match (type, TREE_TYPE (@0))
-       && direct_internal_fn_supported_p (IFN_TRUNC, type,
-                                         OPTIMIZE_FOR_BOTH))
-      (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+    (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+        && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type,
+                                           TREE_TYPE (@1),
OPTIMIZE_FOR_BOTH))
+     (with {
+      tree int_type = TREE_TYPE (@1);
+      unsigned HOST_WIDE_INT max_int_c
+       = (1ULL << (element_precision (int_type) - 1)) - 1;

That's only half-way supporting vector types I fear - you use
element_precision but then build a vector integer constant
in an unsupported way.  I suppose vector support isn't present
for arm?  The cleanest way would probably be to do

       tree int_type = element_type (@1);

with providing element_type in tree.[ch] like we provide
element_precision.

+      }
+      (IFN_FTRUNC_INT @0 { build_int_cst (int_type, max_int_c); }))

Then you could use wide_int_to_tree (int_type, wi::max_value 
(TYPE_PRECISION (int_type), SIGNED))
to build the special integer constant (which seems to be always
scalar).

+     (if (!flag_trapping_math
+         && direct_internal_fn_supported_p (IFN_TRUNC, type,
+                                            OPTIMIZE_FOR_BOTH))
+      (IFN_TRUNC @0)))))
 #endif

does IFN_FTRUNC_INT preserve the same exceptions as doing
explicit intermediate float->int conversions?  I think I'd
prefer to have !flag_trapping_math on both cases.

> gcc/ChangeLog:
> 
>         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New
> pattern.
>         * config/aarch64/iterators.md (FRINTZ): New iterator.
>         * doc/md.texi: New entry for ftrunc pattern name.
>         * internal-fn.def (FTRUNC_INT): New IFN.
>         * match.pd: Add to the existing TRUNC pattern match.
>         * optabs.def (ftrunc_int): New entry.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz
> instruction available.
>         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
>         * gcc.target/aarch64/frintnz.c: New test.
>
Andre Vieira \(lists\) Nov. 22, 2021, 11:38 a.m. UTC | #7
On 18/11/2021 11:05, Richard Biener wrote:
>
> @@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>      trapping behaviour, so require !flag_trapping_math. */
>   #if GIMPLE
>   (simplify
> -   (float (fix_trunc @0))
> -   (if (!flag_trapping_math
> -       && types_match (type, TREE_TYPE (@0))
> -       && direct_internal_fn_supported_p (IFN_TRUNC, type,
> -                                         OPTIMIZE_FOR_BOTH))
> -      (IFN_TRUNC @0)))
> +   (float (fix_trunc@1 @0))
> +   (if (types_match (type, TREE_TYPE (@0)))
> +    (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
> +        && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type,
> +                                           TREE_TYPE (@1),
> OPTIMIZE_FOR_BOTH))
> +     (with {
> +      tree int_type = TREE_TYPE (@1);
> +      unsigned HOST_WIDE_INT max_int_c
> +       = (1ULL << (element_precision (int_type) - 1)) - 1;
>
> That's only half-way supporting vector types I fear - you use
> element_precision but then build a vector integer constant
> in an unsupported way.  I suppose vector support isn't present
> for arm?  The cleanest way would probably be to do
>
>         tree int_type = element_type (@1);
>
> with providing element_type in tree.[ch] like we provide
> element_precision.
This is a good shout and made me think about something I hadn't 
before... I thought I could handle the vector forms later, but the 
problem is if I add support for the scalar, it will stop the vectorizer. 
It seems vectorizable_call expects all arguments to have the same type, 
which doesn't work with passing the integer type as an operand work around.

Should I go back to two separate IFN's, could still have the single optab.

Regards,
Andre
Richard Biener Nov. 22, 2021, 11:41 a.m. UTC | #8
On Mon, 22 Nov 2021, Andre Vieira (lists) wrote:

> 
> On 18/11/2021 11:05, Richard Biener wrote:
> >
> > @@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >      trapping behaviour, so require !flag_trapping_math. */
> >   #if GIMPLE
> >   (simplify
> > -   (float (fix_trunc @0))
> > -   (if (!flag_trapping_math
> > -       && types_match (type, TREE_TYPE (@0))
> > -       && direct_internal_fn_supported_p (IFN_TRUNC, type,
> > -                                         OPTIMIZE_FOR_BOTH))
> > -      (IFN_TRUNC @0)))
> > +   (float (fix_trunc@1 @0))
> > +   (if (types_match (type, TREE_TYPE (@0)))
> > +    (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
> > +        && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type,
> > +                                           TREE_TYPE (@1),
> > OPTIMIZE_FOR_BOTH))
> > +     (with {
> > +      tree int_type = TREE_TYPE (@1);
> > +      unsigned HOST_WIDE_INT max_int_c
> > +       = (1ULL << (element_precision (int_type) - 1)) - 1;
> >
> > That's only half-way supporting vector types I fear - you use
> > element_precision but then build a vector integer constant
> > in an unsupported way.  I suppose vector support isn't present
> > for arm?  The cleanest way would probably be to do
> >
> >         tree int_type = element_type (@1);
> >
> > with providing element_type in tree.[ch] like we provide
> > element_precision.
> This is a good shout and made me think about something I hadn't before... I
> thought I could handle the vector forms later, but the problem is if I add
> support for the scalar, it will stop the vectorizer. It seems
> vectorizable_call expects all arguments to have the same type, which doesn't
> work with passing the integer type as an operand work around.

We already special case some IFNs there (masked load/store and gather)
to ignore some args, so that would just add to this set.

Richard.
Andre Vieira \(lists\) Nov. 25, 2021, 1:53 p.m. UTC | #9
On 22/11/2021 11:41, Richard Biener wrote:
>
>> On 18/11/2021 11:05, Richard Biener wrote:
>>> This is a good shout and made me think about something I hadn't before... I
>>> thought I could handle the vector forms later, but the problem is if I add
>>> support for the scalar, it will stop the vectorizer. It seems
>>> vectorizable_call expects all arguments to have the same type, which doesn't
>>> work with passing the integer type as an operand work around.
> We already special case some IFNs there (masked load/store and gather)
> to ignore some args, so that would just add to this set.
>
> Richard.
Hi,

Reworked it to add support of the new IFN to the vectorizer. Was 
initially trying to make vectorizable_call and 
vectorizable_internal_function handle IFNs with different inputs more 
generically, using the information we have in the <IFN>_direct structs 
regarding what operands to get the modes from. Unfortunately, that 
wasn't straightforward because of how vectorizable_call assumes operands 
have the same type and uses the type of the DEF_STMT_INFO of the 
non-constant operands (either output operand or non-constant inputs) to 
determine the type of constants. I assume there is some reason why we 
use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on 
the argument types. That is why I ended up with this sort of half-way 
mix of both, which still allows room to add more IFNs that don't take 
inputs of the same type, but require adding a bit of special casing 
similar to the IFN_FTRUNC_INT and masking ones.

Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New 
pattern.
         * config/aarch64/iterators.md (FRINTNZ): New iterator.
         (frintnz_mode): New int attribute.
         (VSFDF): Make iterator conditional.
         * internal-fn.def (FTRUNC_INT): New IFN.
         * internal-fn.c (ftrunc_int_direct): New define.
         (expand_ftrunc_int_optab_fn): New custom expander.
         (direct_ftrunc_int_optab_supported_p): New supported_p.
         * match.pd: Add to the existing TRUNC pattern match.
         * optabs.def (ftrunc_int): New entry.
         * stor-layout.h (element_precision): Moved from here...
         * tree.h (element_precision): ... to here.
         (element_type): New declaration.
         * tree.c (element_type): New function.
         (element_precision): Changed to use element_type.
         * tree-vect-stmts.c (vectorizable_internal_function): Add 
support for
         IFNs with different input types.
         (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
         * doc/md.texi: New entry for ftrunc pattern name.
         * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.

gcc/testsuite/ChangeLog:

         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.
         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
         * gcc.target/aarch64/frintnz.c: New test.
         * gcc.target/aarch64/frintnz_vec.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4035e061706793849c68ae09bcb2e4b9580ab7b6..c5c60e7a810e22b0ea9ed6bf056ddd6431d60269 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7345,12 +7345,18 @@ (define_insn "despeculate_simpleti"
    (set_attr "speculation_barrier" "true")]
 )
 
+(define_expand "ftrunc<mode><frintnz_mode>2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+        (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+		      FRINTNZ))]
+  "TARGET_FRINT"
+)
+
 (define_insn "aarch64_<frintnzs_op><mode>"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
 	(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
 		      FRINTNZX))]
-  "TARGET_FRINT && TARGET_FLOAT
-   && !(VECTOR_MODE_P (<MODE>mode) && !TARGET_SIMD)"
+  "TARGET_FRINT"
   "<frintnzs_op>\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
   [(set_attr "type" "f_rint<stype>")]
 )
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463f6a46f9fd0fb43f 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST")
 				  SF DF])
 
 ;; Scalar and vetor modes for SF, DF.
-(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
+(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD")
+			      (V4SF "TARGET_SIMD")
+			      (V2DF "TARGET_SIMD")
+			      (DF "TARGET_FLOAT")
+			      (SF "TARGET_FLOAT")])
 
 ;; Advanced SIMD single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
@@ -3067,6 +3071,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
 			       UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3482,6 +3488,8 @@ (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s")
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X "frint32x")
 			      (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X "frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_<xx>.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
 			 (UNSPEC_COND_CMPGE_WIDE "ge")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e95005647ca97a495a97d7e184d137..d50d09b0ae60d98537b9aece4396a490f33f174c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6175,6 +6175,15 @@ operands; otherwise, it may not.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
+@item @samp{ftrunc@var{m}@var{n}2}
+Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
+the result in operand 0. Both operands have mode @var{m}, which is a scalar or
+vector floating-point mode.  Exception must be thrown if operand 1 does not fit
+in a @var{n} mode signed integer as it would have if the truncation happened
+through separate floating point to integer conversion.
+
+
 @cindex @code{round@var{m}2} instruction pattern
 @item @samp{round@var{m}2}
 Round operand 1 to the nearest integer, rounding away from zero in the
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 40b1e0d816789b225089c4143fb63e62a6af817a..15d4de24d15cce6793b3bb61d728e61cea00924d 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2282,6 +2282,10 @@ Like @code{aarch64_sve_hw}, but also test for an exact hardware vector length.
 @item aarch64_fjcvtzs_hw
 AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
 instruction.
+
+@item aarch64_frintzx_ok
+AArch64 target that is able to generate the Armv8.5-a FRINT32Z, FRINT64Z,
+FRINT32X and FRINT64X instructions.
 @end table
 
 @subsubsection MIPS-specific attributes
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0cba95411a63423484dda5b1251f47de24e926ba..60b404ef44360c8ae0cda1176fb888302ddbc98d 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -130,6 +130,7 @@ init_internal_fns ()
 #define fold_left_direct { 1, 1, false }
 #define mask_fold_left_direct { 1, 1, false }
 #define check_ptrs_direct { 0, 0, false }
+#define ftrunc_int_direct { 0, 1, true }
 
 const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
@@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_optab optab)
   return convert_optab_handler (optab, imode, vmode);
 }
 
+/* Expand FTRUNC_INT call STMT using optab OPTAB.  */
+
+static void
+expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[2];
+  tree lhs, float_type, int_type;
+  rtx target, op;
+
+  lhs = gimple_call_lhs (stmt);
+  target = expand_normal (lhs);
+  op = expand_normal (gimple_call_arg (stmt, 0));
+
+  float_type = TREE_TYPE (lhs);
+  int_type = element_type (gimple_call_arg (stmt, 1));
+
+  create_output_operand (&ops[0], target, TYPE_MODE (float_type));
+  create_input_operand (&ops[1], op, TYPE_MODE (float_type));
+
+  expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type),
+				      TYPE_MODE (int_type)), 2, ops);
+}
+
 /* Expand LOAD_LANES call STMT using optab OPTAB.  */
 
 static void
@@ -3688,6 +3712,15 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 	  != CODE_FOR_nothing);
 }
 
+static bool direct_ftrunc_int_optab_supported_p (convert_optab optab,
+						 tree_pair types,
+						 optimization_type opt_type)
+{
+  return (convert_optab_handler (optab, TYPE_MODE (types.first),
+				TYPE_MODE (element_type (types.second)),
+				opt_type) != CODE_FOR_nothing);
+}
+
 #define direct_unary_optab_supported_p direct_optab_supported_p
 #define direct_binary_optab_supported_p direct_optab_supported_p
 #define direct_ternary_optab_supported_p direct_optab_supported_p
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index bb13c6cce1bf55633760bc14980402f1f0ac1689..e58891e3d3ebc805dd55ac6f70bbda617b7302b7 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -66,6 +66,9 @@ along with GCC; see the file COPYING3.  If not see
 
    - fold_left: for scalar = FN (scalar, vector), keyed off the vector mode
    - check_ptrs: used for check_{raw,war}_ptrs
+   - ftrunc_int: a unary conversion optab that takes and returns values of the
+   same mode, but internally converts via another mode.  This second mode is
+   specified using a dummy final function argument.
 
    DEF_INTERNAL_SIGNED_OPTAB_FN defines an internal function that
    maps to one of two optabs, depending on the signedness of an input.
@@ -269,6 +272,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC_INT, ECF_CONST, ftruncint, ftrunc_int)
 
 /* Binary math functions.  */
 DEF_INTERNAL_FLT_FN (ATAN2, ECF_CONST, atan2, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index a319aefa8081ac177981ad425c461f8a771128f4..80660e6fd40bc6934e1fa0329c0fbcab1658ed44 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    trapping behaviour, so require !flag_trapping_math. */
 #if GIMPLE
 (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-	&& types_match (type, TREE_TYPE (@0))
-	&& direct_internal_fn_supported_p (IFN_TRUNC, type,
-					  OPTIMIZE_FOR_BOTH))
-      (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+    (with {
+      tree int_type = element_type (@1);
+     }
+     (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+	  && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type, int_type,
+					     OPTIMIZE_FOR_BOTH))
+      (IFN_FTRUNC_INT @0 {
+       wide_int_to_tree (int_type, wi::max_value (TYPE_PRECISION (int_type),
+						  SIGNED)); })
+      (if (!flag_trapping_math
+	   && direct_internal_fn_supported_p (IFN_TRUNC, type,
+					      OPTIMIZE_FOR_BOTH))
+       (IFN_TRUNC @0))))))
 #endif
 
 /* If we have a narrowing conversion to an integral type that is fed by a
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b889ad2e5a08613db51d16d072080ac6cb48404f..57d259d33409265df3af1646d123e4ab216c34c8 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -63,6 +63,7 @@ OPTAB_CX(fractuns_optab, "fractuns$Q$b$I$a2")
 OPTAB_CL(satfract_optab, "satfract$b$Q$a2", SAT_FRACT, "satfract", gen_satfract_conv_libfunc)
 OPTAB_CL(satfractuns_optab, "satfractuns$I$b$Q$a2", UNSIGNED_SAT_FRACT, "satfractuns", gen_satfractuns_conv_libfunc)
 
+OPTAB_CD(ftruncint_optab, "ftrunc$a$b2")
 OPTAB_CD(sfixtrunc_optab, "fix_trunc$F$b$I$a2")
 OPTAB_CD(ufixtrunc_optab, "fixuns_trunc$F$b$I$a2")
 
diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
index 9e892e50c8559e497fcae1b77a36401df82fabe2..165a592d4d2c7bf525060dd51ce6094eb4f4f68a 100644
--- a/gcc/stor-layout.h
+++ b/gcc/stor-layout.h
@@ -36,7 +36,6 @@ extern void place_field (record_layout_info, tree);
 extern void compute_record_mode (tree);
 extern void finish_bitfield_layout (tree);
 extern void finish_record_layout (record_layout_info, int);
-extern unsigned int element_precision (const_tree);
 extern void finalize_size_functions (void);
 extern void fixup_unsigned_type (tree);
 extern void initialize_sizetypes (void);
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
new file mode 100644
index 0000000000000000000000000000000000000000..008e1cf9f4a1b0148128c65c9ea0d1bb111467b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
@@ -0,0 +1,91 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.5-a" } */
+/* { dg-require-effective-target aarch64_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**	frint32z	s0, s0
+**	ret
+*/
+float
+f1 (float x)
+{
+  int y = x;
+  return (float) y;
+}
+
+/*
+** f2:
+**	frint64z	s0, s0
+**	ret
+*/
+float
+f2 (float x)
+{
+  long long int y = x;
+  return (float) y;
+}
+
+/*
+** f3:
+**	frint32z	d0, d0
+**	ret
+*/
+double
+f3 (double x)
+{
+  int y = x;
+  return (double) y;
+}
+
+/*
+** f4:
+**	frint64z	d0, d0
+**	ret
+*/
+double
+f4 (double x)
+{
+  long long int y = x;
+  return (double) y;
+}
+
+float
+f1_dont (float x)
+{
+  unsigned int y = x;
+  return (float) y;
+}
+
+float
+f2_dont (float x)
+{
+  unsigned long long int y = x;
+  return (float) y;
+}
+
+double
+f3_dont (double x)
+{
+  unsigned int y = x;
+  return (double) y;
+}
+
+double
+f4_dont (double x)
+{
+  unsigned long long int y = x;
+  return (double) y;
+}
+
+double
+f5_dont (double x)
+{
+  signed short y = x;
+  return (double) y;
+}
+
+/* Make sure the 'dont's don't generate any frintNz.  */
+/* { dg-final { scan-assembler-times {frint32z} 2 } } */
+/* { dg-final { scan-assembler-times {frint64z} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
new file mode 100644
index 0000000000000000000000000000000000000000..b93304eb2acb3d3d954eebee51d77ff23fee68ac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.5-a" } */
+/* { dg-require-effective-target aarch64_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define TEST(name,float_type,int_type)					\
+void									\
+name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
+{									\
+  for (int i = 0; i < n; ++i)					      \
+    {								      \
+      int_type x_i = x[i];					      \
+      y[i] = (float_type) x_i;					      \
+    }								      \
+}
+
+/*
+** f1:
+**	...
+**	frint32z	v0.4s, v0.4s
+**	...
+*/
+TEST(f1, float, int)
+
+/*
+** f2:
+**	...
+**	frint64z	v0.4s, v0.4s
+**	...
+*/
+TEST(f2, float, long long)
+
+/*
+** f3:
+**	...
+**	frint32z	v0.2d, v0.2d
+**	...
+*/
+TEST(f3, double, int)
+
+/*
+** f4:
+**	...
+**	frint64z	v0.2d, v0.2d
+**	...
+*/
+TEST(f4, double, long long)
diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
index 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3d80871c4cebd5fb5cac0714b3feee27038f05fd 100644
--- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
+++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math" } */
+/* { dg-skip-if "" { aarch64_frintnzx_ok } } */
 
 float
 f1 (float x)
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8cbda192fe0fae59ea208ee43696b4d22c43e61e..450ca78230faeba40b89fc7987af27b6bf0a0d53 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11365,6 +11365,32 @@ proc check_effective_target_arm_v8_3a_bkey_directive { } {
 	}]
 }
 
+# Return 1 if the target supports Armv8.5-A scalar and Advanced SIMD
+# FRINT32[ZX] andd FRINT64[ZX] instructions, 0 otherwise. The test is valid for
+# AArch64.
+proc check_effective_target_aarch64_frintnzx_ok_nocache { } {
+
+    if { ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    if { [check_no_compiler_messages_nocache \
+	      aarch64_frintnzx_ok assembly {
+	#if !defined (__ARM_FEATURE_FRINT)
+	#error "__ARM_FEATURE_FRINT not defined"
+	#endif
+    } [current_compiler_flags]] } {
+	return 1;
+    }
+
+    return 0;
+}
+
+proc check_effective_target_aarch64_frintnzx_ok { } {
+    return [check_cached_effective_target aarch64_frintnzx_ok \
+                check_effective_target_aarch64_frintnzx_ok_nocache] 
+}
+
 # Return 1 if the target supports executing the Armv8.1-M Mainline Low
 # Overhead Loop, 0 otherwise.  The test is valid for ARM.
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 03cc7267cf80d4ce73c0d89ab86b07e84752456a..35bb1f70f7b173ad0d1e9f70ce0ac9da891dbe62 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
 
 static internal_fn
 vectorizable_internal_function (combined_fn cfn, tree fndecl,
-				tree vectype_out, tree vectype_in)
+				tree vectype_out, tree vectype_in,
+				tree *vectypes)
 {
   internal_fn ifn;
   if (internal_fn_p (cfn))
@@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
       const direct_internal_fn_info &info = direct_internal_fn (ifn);
       if (info.vectorizable)
 	{
-	  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
-	  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+	  tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
+	  if (!type0)
+	    type0 = vectype_in;
+	  tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
+	  if (!type1)
+	    type1 = vectype_in;
 	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
 					      OPTIMIZE_FOR_SPEED))
 	    return ifn;
@@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
       rhs_type = unsigned_type_node;
     }
 
-  int mask_opno = -1;
+  /* The argument that is not of the same type as the others.  */
+  int diff_opno = -1;
+  bool masked = false;
   if (internal_fn_p (cfn))
-    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
+    {
+      if (cfn == CFN_FTRUNC_INT)
+	/* For FTRUNC this represents the argument that carries the type of the
+	   intermediate signed integer.  */
+	diff_opno = 1;
+      else
+	{
+	  /* For masked operations this represents the argument that carries the
+	     mask.  */
+	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
+	  masked = diff_opno >=  0;
+	}
+    }
 
   for (i = 0; i < nargs; i++)
     {
-      if ((int) i == mask_opno)
+      if ((int) i == diff_opno && masked)
 	{
-	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
-				       &op, &slp_op[i], &dt[i], &vectypes[i]))
+	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
+				       diff_opno, &op, &slp_op[i], &dt[i],
+				       &vectypes[i]))
 	    return false;
 	  continue;
 	}
@@ -3275,27 +3295,35 @@ vectorizable_call (vec_info *vinfo,
 	  return false;
 	}
 
-      /* We can only handle calls with arguments of the same type.  */
-      if (rhs_type
-	  && !types_compatible_p (rhs_type, TREE_TYPE (op)))
+      if ((int) i != diff_opno)
 	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "argument types differ.\n");
-	  return false;
-	}
-      if (!rhs_type)
-	rhs_type = TREE_TYPE (op);
+	  /* We can only handle calls with arguments of the same type.  */
+	  if (rhs_type
+	      && !types_compatible_p (rhs_type, TREE_TYPE (op)))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "argument types differ.\n");
+	      return false;
+	    }
+	  if (!rhs_type)
+	    rhs_type = TREE_TYPE (op);
 
-      if (!vectype_in)
-	vectype_in = vectypes[i];
-      else if (vectypes[i]
-	       && !types_compatible_p (vectypes[i], vectype_in))
+	  if (!vectype_in)
+	    vectype_in = vectypes[i];
+	  else if (vectypes[i]
+		   && !types_compatible_p (vectypes[i], vectype_in))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "argument vector types differ.\n");
+	      return false;
+	    }
+	}
+      else
 	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "argument vector types differ.\n");
-	  return false;
+	  vectypes[i] = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op),
+						     slp_node);
 	}
     }
   /* If all arguments are external or constant defs, infer the vector type
@@ -3371,8 +3399,8 @@ vectorizable_call (vec_info *vinfo,
 	  || (modifier == NARROW
 	      && simple_integer_narrowing (vectype_out, vectype_in,
 					   &convert_code))))
-    ifn = vectorizable_internal_function (cfn, callee, vectype_out,
-					  vectype_in);
+    ifn = vectorizable_internal_function (cfn, callee, vectype_out, vectype_in,
+					  &vectypes[0]);
 
   /* If that fails, try asking for a target-specific built-in function.  */
   if (ifn == IFN_LAST)
@@ -3446,12 +3474,12 @@ vectorizable_call (vec_info *vinfo,
 	record_stmt_cost (cost_vec, ncopies / 2,
 			  vec_promote_demote, stmt_info, 0, vect_body);
 
-      if (loop_vinfo && mask_opno >= 0)
+      if (loop_vinfo && masked)
 	{
 	  unsigned int nvectors = (slp_node
 				   ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node)
 				   : ncopies);
-	  tree scalar_mask = gimple_call_arg (stmt_info->stmt, mask_opno);
+	  tree scalar_mask = gimple_call_arg (stmt_info->stmt, diff_opno);
 	  vect_record_loop_mask (loop_vinfo, masks, nvectors,
 				 vectype_out, scalar_mask);
 	}
@@ -3499,7 +3527,7 @@ vectorizable_call (vec_info *vinfo,
 		    {
 		      /* We don't define any narrowing conditional functions
 			 at present.  */
-		      gcc_assert (mask_opno < 0);
+		      gcc_assert (!masked);
 		      tree half_res = make_ssa_name (vectype_in);
 		      gcall *call
 			= gimple_build_call_internal_vec (ifn, vargs);
@@ -3519,15 +3547,15 @@ vectorizable_call (vec_info *vinfo,
 		    }
 		  else
 		    {
-		      if (mask_opno >= 0 && masked_loop_p)
+		      if (masked && masked_loop_p)
 			{
 			  unsigned int vec_num = vec_oprnds0.length ();
 			  /* Always true for SLP.  */
 			  gcc_assert (ncopies == 1);
 			  tree mask = vect_get_loop_mask (gsi, masks, vec_num,
 							  vectype_out, i);
-			  vargs[mask_opno] = prepare_load_store_mask
-			    (TREE_TYPE (mask), mask, vargs[mask_opno], gsi);
+			  vargs[diff_opno] = prepare_load_store_mask
+			    (TREE_TYPE (mask), mask, vargs[diff_opno], gsi);
 			}
 
 		      gcall *call;
@@ -3559,13 +3587,13 @@ vectorizable_call (vec_info *vinfo,
 	      orig_vargs[i] = vargs[i] = vec_defs[i][j];
 	    }
 
-	  if (mask_opno >= 0 && masked_loop_p)
+	  if (masked && masked_loop_p)
 	    {
 	      tree mask = vect_get_loop_mask (gsi, masks, ncopies,
 					      vectype_out, j);
-	      vargs[mask_opno]
+	      vargs[diff_opno]
 		= prepare_load_store_mask (TREE_TYPE (mask), mask,
-					   vargs[mask_opno], gsi);
+					   vargs[diff_opno], gsi);
 	    }
 
 	  gimple *new_stmt;
@@ -3584,7 +3612,7 @@ vectorizable_call (vec_info *vinfo,
 	    {
 	      /* We don't define any narrowing conditional functions at
 		 present.  */
-	      gcc_assert (mask_opno < 0);
+	      gcc_assert (!masked);
 	      tree half_res = make_ssa_name (vectype_in);
 	      gcall *call = gimple_build_call_internal_vec (ifn, vargs);
 	      gimple_call_set_lhs (call, half_res);
@@ -3628,7 +3656,7 @@ vectorizable_call (vec_info *vinfo,
     {
       auto_vec<vec<tree> > vec_defs (nargs);
       /* We don't define any narrowing conditional functions at present.  */
-      gcc_assert (mask_opno < 0);
+      gcc_assert (!masked);
       for (j = 0; j < ncopies; ++j)
 	{
 	  /* Build argument list for the vectorized call.  */
diff --git a/gcc/tree.h b/gcc/tree.h
index f62c00bc8707029db52e2f3fe529948755235d3d..31ce45a84cc267ea2022c8ca6323368fbe15eb8b 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -6547,4 +6547,12 @@ extern unsigned fndecl_dealloc_argno (tree);
    object or pointer.  Otherwise return null.  */
 extern tree get_attr_nonstring_decl (tree, tree * = NULL);
 
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
+extern tree element_type (const_tree);
+
+/* Return the precision of the type, or for a complex or vector type the
+   precision of the type of its elements.  */
+extern unsigned int element_precision (const_tree);
+
 #endif  /* GCC_TREE_H  */
diff --git a/gcc/tree.c b/gcc/tree.c
index 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
   return true;
 }
 
-/* Return the precision of the type, or for a complex or vector type the
-   precision of the type of its elements.  */
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
 
-unsigned int
-element_precision (const_tree type)
+tree
+element_type (const_tree type)
 {
   if (!TYPE_P (type))
     type = TREE_TYPE (type);
@@ -6657,7 +6657,16 @@ element_precision (const_tree type)
   if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
     type = TREE_TYPE (type);
 
-  return TYPE_PRECISION (type);
+  return (tree) type;
+}
+
+/* Return the precision of the type, or for a complex or vector type the
+   precision of the type of its elements.  */
+
+unsigned int
+element_precision (const_tree type)
+{
+  return TYPE_PRECISION (element_type (type));
 }
 
 /* Return true if CODE represents an associative tree code.  Otherwise
Andre Vieira \(lists\) Nov. 29, 2021, 11:17 a.m. UTC | #10
On 18/11/2021 11:05, Richard Biener wrote:
>
> +     (if (!flag_trapping_math
> +         && direct_internal_fn_supported_p (IFN_TRUNC, type,
> +                                            OPTIMIZE_FOR_BOTH))
> +      (IFN_TRUNC @0)))))
>   #endif
>
> does IFN_FTRUNC_INT preserve the same exceptions as doing
> explicit intermediate float->int conversions?  I think I'd
> prefer to have !flag_trapping_math on both cases.
I realized I never responded to this. The AArch64 instructions mimic the 
behaviour you'd see if you were doing explicit conversions, so I'll be 
defining the new IFN and optab to require the same, such that these can 
be used by the compiler when flag_trapping_math. In the patch I sent 
last I added some likes to the md.texi description of the optab to that 
intent.
Andre Vieira \(lists\) Dec. 7, 2021, 11:29 a.m. UTC | #11
ping

On 25/11/2021 13:53, Andre Vieira (lists) via Gcc-patches wrote:
>
> On 22/11/2021 11:41, Richard Biener wrote:
>>
>>> On 18/11/2021 11:05, Richard Biener wrote:
>>>> This is a good shout and made me think about something I hadn't 
>>>> before... I
>>>> thought I could handle the vector forms later, but the problem is 
>>>> if I add
>>>> support for the scalar, it will stop the vectorizer. It seems
>>>> vectorizable_call expects all arguments to have the same type, 
>>>> which doesn't
>>>> work with passing the integer type as an operand work around.
>> We already special case some IFNs there (masked load/store and gather)
>> to ignore some args, so that would just add to this set.
>>
>> Richard.
> Hi,
>
> Reworked it to add support of the new IFN to the vectorizer. Was 
> initially trying to make vectorizable_call and 
> vectorizable_internal_function handle IFNs with different inputs more 
> generically, using the information we have in the <IFN>_direct structs 
> regarding what operands to get the modes from. Unfortunately, that 
> wasn't straightforward because of how vectorizable_call assumes 
> operands have the same type and uses the type of the DEF_STMT_INFO of 
> the non-constant operands (either output operand or non-constant 
> inputs) to determine the type of constants. I assume there is some 
> reason why we use the DEF_STMT_INFO and not always use 
> get_vectype_for_scalar_type on the argument types. That is why I ended 
> up with this sort of half-way mix of both, which still allows room to 
> add more IFNs that don't take inputs of the same type, but require 
> adding a bit of special casing similar to the IFN_FTRUNC_INT and 
> masking ones.
>
> Bootstrapped on aarch64-none-linux.
>
> OK for trunk?
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New 
> pattern.
>         * config/aarch64/iterators.md (FRINTNZ): New iterator.
>         (frintnz_mode): New int attribute.
>         (VSFDF): Make iterator conditional.
>         * internal-fn.def (FTRUNC_INT): New IFN.
>         * internal-fn.c (ftrunc_int_direct): New define.
>         (expand_ftrunc_int_optab_fn): New custom expander.
>         (direct_ftrunc_int_optab_supported_p): New supported_p.
>         * match.pd: Add to the existing TRUNC pattern match.
>         * optabs.def (ftrunc_int): New entry.
>         * stor-layout.h (element_precision): Moved from here...
>         * tree.h (element_precision): ... to here.
>         (element_type): New declaration.
>         * tree.c (element_type): New function.
>         (element_precision): Changed to use element_type.
>         * tree-vect-stmts.c (vectorizable_internal_function): Add 
> support for
>         IFNs with different input types.
>         (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
>         * doc/md.texi: New entry for ftrunc pattern name.
>         * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if 
> frintNz instruction available.
>         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
>         * gcc.target/aarch64/frintnz.c: New test.
>         * gcc.target/aarch64/frintnz_vec.c: New test.
Richard Sandiford Dec. 17, 2021, 12:44 p.m. UTC | #12
"Andre Vieira (lists) via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> On 22/11/2021 11:41, Richard Biener wrote:
>>
>>> On 18/11/2021 11:05, Richard Biener wrote:
>>>> This is a good shout and made me think about something I hadn't before... I
>>>> thought I could handle the vector forms later, but the problem is if I add
>>>> support for the scalar, it will stop the vectorizer. It seems
>>>> vectorizable_call expects all arguments to have the same type, which doesn't
>>>> work with passing the integer type as an operand work around.
>> We already special case some IFNs there (masked load/store and gather)
>> to ignore some args, so that would just add to this set.
>>
>> Richard.
> Hi,
>
> Reworked it to add support of the new IFN to the vectorizer. Was 
> initially trying to make vectorizable_call and 
> vectorizable_internal_function handle IFNs with different inputs more 
> generically, using the information we have in the <IFN>_direct structs 
> regarding what operands to get the modes from. Unfortunately, that 
> wasn't straightforward because of how vectorizable_call assumes operands 
> have the same type and uses the type of the DEF_STMT_INFO of the 
> non-constant operands (either output operand or non-constant inputs) to 
> determine the type of constants. I assume there is some reason why we 
> use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on 
> the argument types. That is why I ended up with this sort of half-way 
> mix of both, which still allows room to add more IFNs that don't take 
> inputs of the same type, but require adding a bit of special casing 
> similar to the IFN_FTRUNC_INT and masking ones.
>
> Bootstrapped on aarch64-none-linux.

Still leaving the match.pd stuff to Richard, but otherwise:

> index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463f6a46f9fd0fb43f 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST")
>  				  SF DF])
>  
>  ;; Scalar and vetor modes for SF, DF.
> -(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
> +(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD")

Nit: excess space between [ and (.

> +			      (V4SF "TARGET_SIMD")
> +			      (V2DF "TARGET_SIMD")
> +			      (DF "TARGET_FLOAT")
> +			      (SF "TARGET_FLOAT")])
>  
>  ;; Advanced SIMD single Float modes.
>  (define_mode_iterator VDQSF [V2SF V4SF])
> […]
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 41f1850bf6e95005647ca97a495a97d7e184d137..d50d09b0ae60d98537b9aece4396a490f33f174c 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6175,6 +6175,15 @@ operands; otherwise, it may not.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
> +@item @samp{ftrunc@var{m}@var{n}2}
> +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
> +the result in operand 0. Both operands have mode @var{m}, which is a scalar or
> +vector floating-point mode.  Exception must be thrown if operand 1 does not fit

Maybe “An exception must be raised”?  “thrown” makes it sound like a
signal must be raised or C++ exception thrown.

> +in a @var{n} mode signed integer as it would have if the truncation happened
> +through separate floating point to integer conversion.
> +
> +
>  @cindex @code{round@var{m}2} instruction pattern
>  @item @samp{round@var{m}2}
>  Round operand 1 to the nearest integer, rounding away from zero in the
> […]
> @@ -3688,6 +3712,15 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  	  != CODE_FOR_nothing);
>  }
>  
> +static bool direct_ftrunc_int_optab_supported_p (convert_optab optab,
> +						 tree_pair types,
> +						 optimization_type opt_type)

Formatting nit: should be a line break after “bool”

> +{
> +  return (convert_optab_handler (optab, TYPE_MODE (types.first),
> +				TYPE_MODE (element_type (types.second)),
> +				opt_type) != CODE_FOR_nothing);
> +}
> +
>  #define direct_unary_optab_supported_p direct_optab_supported_p
>  #define direct_binary_optab_supported_p direct_optab_supported_p
>  #define direct_ternary_optab_supported_p direct_optab_supported_p
> […]
> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b93304eb2acb3d3d954eebee51d77ff23fee68ac
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#define TEST(name,float_type,int_type)					\
> +void									\
> +name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
> +{									\
> +  for (int i = 0; i < n; ++i)					      \
> +    {								      \
> +      int_type x_i = x[i];					      \
> +      y[i] = (float_type) x_i;					      \
> +    }								      \
> +}
> +
> +/*
> +** f1:
> +**	...
> +**	frint32z	v0.4s, v0.4s

I don't think we can rely on v0 being used here.  v[0-9]+\.4s would
be safer.

> +**	...
> +*/
> +TEST(f1, float, int)
> +
> +/*
> +** f2:
> +**	...
> +**	frint64z	v0.4s, v0.4s
> +**	...
> +*/
> +TEST(f2, float, long long)
> +
> +/*
> +** f3:
> +**	...
> +**	frint32z	v0.2d, v0.2d
> +**	...
> +*/
> +TEST(f3, double, int)
> +
> +/*
> +** f4:
> +**	...
> +**	frint64z	v0.2d, v0.2d
> +**	...
> +*/
> +TEST(f4, double, long long)
> […]
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 03cc7267cf80d4ce73c0d89ab86b07e84752456a..35bb1f70f7b173ad0d1e9f70ce0ac9da891dbe62 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
>  
>  static internal_fn
>  vectorizable_internal_function (combined_fn cfn, tree fndecl,
> -				tree vectype_out, tree vectype_in)
> +				tree vectype_out, tree vectype_in,
> +				tree *vectypes)
>  {
>    internal_fn ifn;
>    if (internal_fn_p (cfn))
> @@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
>        const direct_internal_fn_info &info = direct_internal_fn (ifn);
>        if (info.vectorizable)
>  	{
> -	  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
> -	  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> +	  tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
> +	  if (!type0)
> +	    type0 = vectype_in;
> +	  tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
> +	  if (!type1)
> +	    type1 = vectype_in;
>  	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
>  					      OPTIMIZE_FOR_SPEED))
>  	    return ifn;
> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
>        rhs_type = unsigned_type_node;
>      }
>  
> -  int mask_opno = -1;
> +  /* The argument that is not of the same type as the others.  */
> +  int diff_opno = -1;
> +  bool masked = false;
>    if (internal_fn_p (cfn))
> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> +    {
> +      if (cfn == CFN_FTRUNC_INT)
> +	/* For FTRUNC this represents the argument that carries the type of the
> +	   intermediate signed integer.  */
> +	diff_opno = 1;
> +      else
> +	{
> +	  /* For masked operations this represents the argument that carries the
> +	     mask.  */
> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
> +	  masked = diff_opno >=  0;
> +	}
> +    }

I think it would be cleaner not to process argument 1 at all for
CFN_FTRUNC_INT.  There's no particular need to vectorise it.

>    for (i = 0; i < nargs; i++)
>      {
> -      if ((int) i == mask_opno)
> +      if ((int) i == diff_opno && masked)
>  	{
> -	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
> -				       &op, &slp_op[i], &dt[i], &vectypes[i]))
> +	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
> +				       diff_opno, &op, &slp_op[i], &dt[i],
> +				       &vectypes[i]))
>  	    return false;
>  	  continue;
>  	}
> […]
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
>    return true;
>  }
>  
> -/* Return the precision of the type, or for a complex or vector type the
> -   precision of the type of its elements.  */
> +/* Return the type, or for a complex or vector type the type of its
> +   elements.  */
>  
> -unsigned int
> -element_precision (const_tree type)
> +tree
> +element_type (const_tree type)
>  {
>    if (!TYPE_P (type))
>      type = TREE_TYPE (type);
> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
>    if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
>      type = TREE_TYPE (type);
>  
> -  return TYPE_PRECISION (type);
> +  return (tree) type;

I think we should stick a const_cast in element_precision and make
element_type take a plain “type”.  As it stands element_type is an
implicit const_cast for many cases.

Thanks,
Richard

> +}
> +
> +/* Return the precision of the type, or for a complex or vector type the
> +   precision of the type of its elements.  */
> +
> +unsigned int
> +element_precision (const_tree type)
> +{
> +  return TYPE_PRECISION (element_type (type));
>  }
>  
>  /* Return true if CODE represents an associative tree code.  Otherwise
Andre Vieira \(lists\) Dec. 29, 2021, 3:55 p.m. UTC | #13
Hi Richard,

Thank you for the review, I've adopted all above suggestions downstream, 
I am still surprised how many style things I still miss after years of 
gcc development :(

On 17/12/2021 12:44, Richard Sandiford wrote:
>
>> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
>>         rhs_type = unsigned_type_node;
>>       }
>>   
>> -  int mask_opno = -1;
>> +  /* The argument that is not of the same type as the others.  */
>> +  int diff_opno = -1;
>> +  bool masked = false;
>>     if (internal_fn_p (cfn))
>> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
>> +    {
>> +      if (cfn == CFN_FTRUNC_INT)
>> +	/* For FTRUNC this represents the argument that carries the type of the
>> +	   intermediate signed integer.  */
>> +	diff_opno = 1;
>> +      else
>> +	{
>> +	  /* For masked operations this represents the argument that carries the
>> +	     mask.  */
>> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
>> +	  masked = diff_opno >=  0;
>> +	}
>> +    }
> I think it would be cleaner not to process argument 1 at all for
> CFN_FTRUNC_INT.  There's no particular need to vectorise it.

I agree with this,  will change the loop to continue for argument 1 when 
dealing with non-masked CFN's.

>>   	}
>> […]
>> diff --git a/gcc/tree.c b/gcc/tree.c
>> index 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 100644
>> --- a/gcc/tree.c
>> +++ b/gcc/tree.c
>> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
>>     return true;
>>   }
>>   
>> -/* Return the precision of the type, or for a complex or vector type the
>> -   precision of the type of its elements.  */
>> +/* Return the type, or for a complex or vector type the type of its
>> +   elements.  */
>>   
>> -unsigned int
>> -element_precision (const_tree type)
>> +tree
>> +element_type (const_tree type)
>>   {
>>     if (!TYPE_P (type))
>>       type = TREE_TYPE (type);
>> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
>>     if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
>>       type = TREE_TYPE (type);
>>   
>> -  return TYPE_PRECISION (type);
>> +  return (tree) type;
> I think we should stick a const_cast in element_precision and make
> element_type take a plain “type”.  As it stands element_type is an
> implicit const_cast for many cases.
>
> Thanks,
Was just curious about something here, I thought the purpose of having 
element_precision (before) and element_type (now) take a const_tree as 
an argument was to make it clear we aren't changing the input type. I 
understand that as it stands element_type could be an implicit 
const_cast (which I should be using rather than the '(tree)' cast), but 
that's only if 'type' is a type that isn't complex/vector, either way, 
we are conforming to the promise that we aren't changing the incoming 
type, what the caller then does with the result is up to them no?

I don't mind making the changes, just trying to understand the reasoning 
behind it.

I'll send in a new patch with all the changes after the review on the 
match.pd stuff.

Thanks,
Andre
Richard Sandiford Dec. 29, 2021, 4:54 p.m. UTC | #14
"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> On 17/12/2021 12:44, Richard Sandiford wrote:
>>
>>> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
>>>         rhs_type = unsigned_type_node;
>>>       }
>>>   
>>> -  int mask_opno = -1;
>>> +  /* The argument that is not of the same type as the others.  */
>>> +  int diff_opno = -1;
>>> +  bool masked = false;
>>>     if (internal_fn_p (cfn))
>>> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
>>> +    {
>>> +      if (cfn == CFN_FTRUNC_INT)
>>> +	/* For FTRUNC this represents the argument that carries the type of the
>>> +	   intermediate signed integer.  */
>>> +	diff_opno = 1;
>>> +      else
>>> +	{
>>> +	  /* For masked operations this represents the argument that carries the
>>> +	     mask.  */
>>> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
>>> +	  masked = diff_opno >=  0;
>>> +	}
>>> +    }
>> I think it would be cleaner not to process argument 1 at all for
>> CFN_FTRUNC_INT.  There's no particular need to vectorise it.
>
> I agree with this,  will change the loop to continue for argument 1 when 
> dealing with non-masked CFN's.
>
>>>   	}
>>> […]
>>> diff --git a/gcc/tree.c b/gcc/tree.c
>>> index 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 100644
>>> --- a/gcc/tree.c
>>> +++ b/gcc/tree.c
>>> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
>>>     return true;
>>>   }
>>>   
>>> -/* Return the precision of the type, or for a complex or vector type the
>>> -   precision of the type of its elements.  */
>>> +/* Return the type, or for a complex or vector type the type of its
>>> +   elements.  */
>>>   
>>> -unsigned int
>>> -element_precision (const_tree type)
>>> +tree
>>> +element_type (const_tree type)
>>>   {
>>>     if (!TYPE_P (type))
>>>       type = TREE_TYPE (type);
>>> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
>>>     if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
>>>       type = TREE_TYPE (type);
>>>   
>>> -  return TYPE_PRECISION (type);
>>> +  return (tree) type;
>> I think we should stick a const_cast in element_precision and make
>> element_type take a plain “type”.  As it stands element_type is an
>> implicit const_cast for many cases.
>>
>> Thanks,
> Was just curious about something here, I thought the purpose of having 
> element_precision (before) and element_type (now) take a const_tree as 
> an argument was to make it clear we aren't changing the input type. I 
> understand that as it stands element_type could be an implicit 
> const_cast (which I should be using rather than the '(tree)' cast), but 
> that's only if 'type' is a type that isn't complex/vector, either way, 
> we are conforming to the promise that we aren't changing the incoming 
> type, what the caller then does with the result is up to them no?
>
> I don't mind making the changes, just trying to understand the reasoning 
> behind it.

The problem with the above is that functions like the following become
well-typed:

void
foo (const_tree t)
{
  TYPE_MODE (element_type (t)) = VOIDmode;
}

even though element_type (t) could well be t.

One of the points of const_tree (and const pointer targets in general)
is to use the type system to enforce the promise that the value isn't
changed.

I guess the above is similar to the traditional problem with functions
like index and strstr, which take a const char * but return a char *.
So for example:

void
foo (const char *x)
{
  index (x, '.') = 0;
}

is well-typed.  But the equivalent C++ code (using iterators) would be
rejected.  If C allowed overloading them the correct prototypes would be:

    const char *index (const char *, int);
    char *index (char *, int);

And I think the same applies here.  Either we should provide two functions:

    const_tree element_type (const_tree);
    tree element_type (tree);

or just the latter.

Thanks,
Richard
Richard Biener Jan. 3, 2022, 12:18 p.m. UTC | #15
On Wed, 29 Dec 2021, Andre Vieira (lists) wrote:

> Hi Richard,
> 
> Thank you for the review, I've adopted all above suggestions downstream, I am
> still surprised how many style things I still miss after years of gcc
> development :(
> 
> On 17/12/2021 12:44, Richard Sandiford wrote:
> >
> >> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
> >>         rhs_type = unsigned_type_node;
> >>       }
> >>   -  int mask_opno = -1;
> >> +  /* The argument that is not of the same type as the others.  */
> >> +  int diff_opno = -1;
> >> +  bool masked = false;
> >>     if (internal_fn_p (cfn))
> >> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> >> +    {
> >> +      if (cfn == CFN_FTRUNC_INT)
> >> +	/* For FTRUNC this represents the argument that carries the type of
> >> the
> >> +	   intermediate signed integer.  */
> >> +	diff_opno = 1;
> >> +      else
> >> +	{
> >> +	  /* For masked operations this represents the argument that carries
> >> the
> >> +	     mask.  */
> >> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
> >> +	  masked = diff_opno >=  0;
> >> +	}
> >> +    }
> > I think it would be cleaner not to process argument 1 at all for
> > CFN_FTRUNC_INT.  There's no particular need to vectorise it.
> 
> I agree with this,  will change the loop to continue for argument 1 when
> dealing with non-masked CFN's.
> 
> >>   	}
> >> […]
> >> diff --git a/gcc/tree.c b/gcc/tree.c
> >> index
> >> 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
> >> 100644
> >> --- a/gcc/tree.c
> >> +++ b/gcc/tree.c
> >> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size,
> >> cst_size_error *perr /* = NULL */)
> >>     return true;
> >>   }
> >>   
> >> -/* Return the precision of the type, or for a complex or vector type the
> >> -   precision of the type of its elements.  */
> >> +/* Return the type, or for a complex or vector type the type of its
> >> +   elements.  */
> >>   -unsigned int
> >> -element_precision (const_tree type)
> >> +tree
> >> +element_type (const_tree type)
> >>   {
> >>     if (!TYPE_P (type))
> >>       type = TREE_TYPE (type);
> >> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
> >>     if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
> >>       type = TREE_TYPE (type);
> >>   -  return TYPE_PRECISION (type);
> >> +  return (tree) type;
> > I think we should stick a const_cast in element_precision and make
> > element_type take a plain “type”.  As it stands element_type is an
> > implicit const_cast for many cases.
> >
> > Thanks,
> Was just curious about something here, I thought the purpose of having
> element_precision (before) and element_type (now) take a const_tree as an
> argument was to make it clear we aren't changing the input type. I understand
> that as it stands element_type could be an implicit const_cast (which I should
> be using rather than the '(tree)' cast), but that's only if 'type' is a type
> that isn't complex/vector, either way, we are conforming to the promise that
> we aren't changing the incoming type, what the caller then does with the
> result is up to them no?
> 
> I don't mind making the changes, just trying to understand the reasoning
> behind it.
> 
> I'll send in a new patch with all the changes after the review on the match.pd
> stuff.

I'm missing an updated patch after my initial review of the match.pd
stuff so not sure what to review.  Can you re-post and updated patch?

Thanks,
Richard.
Andre Vieira \(lists\) Jan. 10, 2022, 2:09 p.m. UTC | #16
Yeah seems I forgot to send the latest version, my bad.

Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New 
pattern.
         * config/aarch64/iterators.md (FRINTNZ): New iterator.
         (frintnz_mode): New int attribute.
         (VSFDF): Make iterator conditional.
         * internal-fn.def (FTRUNC_INT): New IFN.
         * internal-fn.c (ftrunc_int_direct): New define.
         (expand_ftrunc_int_optab_fn): New custom expander.
         (direct_ftrunc_int_optab_supported_p): New supported_p.
         * match.pd: Add to the existing TRUNC pattern match.
         * optabs.def (ftrunc_int): New entry.
         * stor-layout.h (element_precision): Moved from here...
         * tree.h (element_precision): ... to here.
         (element_type): New declaration.
         * tree.c (element_type): New function.
         (element_precision): Changed to use element_type.
         * tree-vect-stmts.c (vectorizable_internal_function): Add 
support for
         IFNs with different input types.
         (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
         * doc/md.texi: New entry for ftrunc pattern name.
         * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.

gcc/testsuite/ChangeLog:

         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.
         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
         * gcc.target/aarch64/frintnz.c: New test.
         * gcc.target/aarch64/frintnz_vec.c: New test.

On 03/01/2022 12:18, Richard Biener wrote:
> On Wed, 29 Dec 2021, Andre Vieira (lists) wrote:
>
>> Hi Richard,
>>
>> Thank you for the review, I've adopted all above suggestions downstream, I am
>> still surprised how many style things I still miss after years of gcc
>> development :(
>>
>> On 17/12/2021 12:44, Richard Sandiford wrote:
>>>> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
>>>>          rhs_type = unsigned_type_node;
>>>>        }
>>>>    -  int mask_opno = -1;
>>>> +  /* The argument that is not of the same type as the others.  */
>>>> +  int diff_opno = -1;
>>>> +  bool masked = false;
>>>>      if (internal_fn_p (cfn))
>>>> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
>>>> +    {
>>>> +      if (cfn == CFN_FTRUNC_INT)
>>>> +	/* For FTRUNC this represents the argument that carries the type of
>>>> the
>>>> +	   intermediate signed integer.  */
>>>> +	diff_opno = 1;
>>>> +      else
>>>> +	{
>>>> +	  /* For masked operations this represents the argument that carries
>>>> the
>>>> +	     mask.  */
>>>> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
>>>> +	  masked = diff_opno >=  0;
>>>> +	}
>>>> +    }
>>> I think it would be cleaner not to process argument 1 at all for
>>> CFN_FTRUNC_INT.  There's no particular need to vectorise it.
>> I agree with this,  will change the loop to continue for argument 1 when
>> dealing with non-masked CFN's.
>>
>>>>    	}
>>>> […]
>>>> diff --git a/gcc/tree.c b/gcc/tree.c
>>>> index
>>>> 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
>>>> 100644
>>>> --- a/gcc/tree.c
>>>> +++ b/gcc/tree.c
>>>> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size,
>>>> cst_size_error *perr /* = NULL */)
>>>>      return true;
>>>>    }
>>>>    
>>>> -/* Return the precision of the type, or for a complex or vector type the
>>>> -   precision of the type of its elements.  */
>>>> +/* Return the type, or for a complex or vector type the type of its
>>>> +   elements.  */
>>>>    -unsigned int
>>>> -element_precision (const_tree type)
>>>> +tree
>>>> +element_type (const_tree type)
>>>>    {
>>>>      if (!TYPE_P (type))
>>>>        type = TREE_TYPE (type);
>>>> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
>>>>      if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
>>>>        type = TREE_TYPE (type);
>>>>    -  return TYPE_PRECISION (type);
>>>> +  return (tree) type;
>>> I think we should stick a const_cast in element_precision and make
>>> element_type take a plain “type”.  As it stands element_type is an
>>> implicit const_cast for many cases.
>>>
>>> Thanks,
>> Was just curious about something here, I thought the purpose of having
>> element_precision (before) and element_type (now) take a const_tree as an
>> argument was to make it clear we aren't changing the input type. I understand
>> that as it stands element_type could be an implicit const_cast (which I should
>> be using rather than the '(tree)' cast), but that's only if 'type' is a type
>> that isn't complex/vector, either way, we are conforming to the promise that
>> we aren't changing the incoming type, what the caller then does with the
>> result is up to them no?
>>
>> I don't mind making the changes, just trying to understand the reasoning
>> behind it.
>>
>> I'll send in a new patch with all the changes after the review on the match.pd
>> stuff.
> I'm missing an updated patch after my initial review of the match.pd
> stuff so not sure what to review.  Can you re-post and updated patch?
>
> Thanks,
> Richard.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 3c72bdad01bfab49ee4ae6fb7b139202e89c8d34..9d04a2e088cd7d03963b58ed3708c339b841801c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7423,12 +7423,18 @@ (define_insn "despeculate_simpleti"
    (set_attr "speculation_barrier" "true")]
 )
 
+(define_expand "ftrunc<mode><frintnz_mode>2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+        (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+		      FRINTNZ))]
+  "TARGET_FRINT"
+)
+
 (define_insn "aarch64_<frintnzs_op><mode>"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
 	(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
 		      FRINTNZX))]
-  "TARGET_FRINT && TARGET_FLOAT
-   && !(VECTOR_MODE_P (<MODE>mode) && !TARGET_SIMD)"
+  "TARGET_FRINT"
   "<frintnzs_op>\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
   [(set_attr "type" "f_rint<stype>")]
 )
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 9160ce3e69c2c6b1b75e46f7aabd27e7949a269a..7962b15a87db2f1ede3836efbb827b8fb95266da 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -163,7 +163,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST")
 				  SF DF])
 
 ;; Scalar and vetor modes for SF, DF.
-(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
+(define_mode_iterator VSFDF [(V2SF "TARGET_SIMD")
+			     (V4SF "TARGET_SIMD")
+			     (V2DF "TARGET_SIMD")
+			     (DF "TARGET_FLOAT")
+			     (SF "TARGET_FLOAT")])
 
 ;; Advanced SIMD single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
@@ -3078,6 +3082,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
 			       UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3485,6 +3491,8 @@ (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s")
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X "frint32x")
 			      (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X "frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_<xx>.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
 			 (UNSPEC_COND_CMPGE_WIDE "ge")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 19e89ae502bc2f51db64667b236c1cb669718b02..3b0e4e0875b4392ab6833568b207580ef597a98f 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6191,6 +6191,15 @@ operands; otherwise, it may not.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
+@item @samp{ftrunc@var{m}@var{n}2}
+Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
+the result in operand 0. Both operands have mode @var{m}, which is a scalar or
+vector floating-point mode.  An exception must be raised if operand 1 does not
+fit in a @var{n} mode signed integer as it would have if the truncation
+happened through separate floating point to integer conversion.
+
+
 @cindex @code{round@var{m}2} instruction pattern
 @item @samp{round@var{m}2}
 Round operand 1 to the nearest integer, rounding away from zero in the
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 6095a35cd4565fdb7d758104e80fe6411230f758..a56bbb775572fa72379854f90a01ad543557e29a 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2286,6 +2286,10 @@ Like @code{aarch64_sve_hw}, but also test for an exact hardware vector length.
 @item aarch64_fjcvtzs_hw
 AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
 instruction.
+
+@item aarch64_frintzx_ok
+AArch64 target that is able to generate the Armv8.5-a FRINT32Z, FRINT64Z,
+FRINT32X and FRINT64X instructions.
 @end table
 
 @subsubsection MIPS-specific attributes
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index b24102a5990bea4cbb102069f7a6df497fc81ebf..9047b486f41948059a7a7f1ccc4032410a369139 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -130,6 +130,7 @@ init_internal_fns ()
 #define fold_left_direct { 1, 1, false }
 #define mask_fold_left_direct { 1, 1, false }
 #define check_ptrs_direct { 0, 0, false }
+#define ftrunc_int_direct { 0, 1, true }
 
 const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
@@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_optab optab)
   return convert_optab_handler (optab, imode, vmode);
 }
 
+/* Expand FTRUNC_INT call STMT using optab OPTAB.  */
+
+static void
+expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[2];
+  tree lhs, float_type, int_type;
+  rtx target, op;
+
+  lhs = gimple_call_lhs (stmt);
+  target = expand_normal (lhs);
+  op = expand_normal (gimple_call_arg (stmt, 0));
+
+  float_type = TREE_TYPE (lhs);
+  int_type = element_type (gimple_call_arg (stmt, 1));
+
+  create_output_operand (&ops[0], target, TYPE_MODE (float_type));
+  create_input_operand (&ops[1], op, TYPE_MODE (float_type));
+
+  expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type),
+				      TYPE_MODE (int_type)), 2, ops);
+}
+
 /* Expand LOAD_LANES call STMT using optab OPTAB.  */
 
 static void
@@ -3747,6 +3771,15 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 	  != CODE_FOR_nothing);
 }
 
+static bool
+direct_ftrunc_int_optab_supported_p (convert_optab optab, tree_pair types,
+				     optimization_type opt_type)
+{
+  return (convert_optab_handler (optab, TYPE_MODE (types.first),
+				TYPE_MODE (element_type (types.second)),
+				opt_type) != CODE_FOR_nothing);
+}
+
 #define direct_unary_optab_supported_p direct_optab_supported_p
 #define direct_binary_optab_supported_p direct_optab_supported_p
 #define direct_ternary_optab_supported_p direct_optab_supported_p
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 8891071a6a360961643731094379b607f317af17..a0fd75829e942f529c879c669e58c098b62b26ba 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -66,6 +66,9 @@ along with GCC; see the file COPYING3.  If not see
 
    - fold_left: for scalar = FN (scalar, vector), keyed off the vector mode
    - check_ptrs: used for check_{raw,war}_ptrs
+   - ftrunc_int: a unary conversion optab that takes and returns values of the
+   same mode, but internally converts via another mode.  This second mode is
+   specified using a dummy final function argument.
 
    DEF_INTERNAL_SIGNED_OPTAB_FN defines an internal function that
    maps to one of two optabs, depending on the signedness of an input.
@@ -275,6 +278,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC_INT, ECF_CONST, ftruncint, ftrunc_int)
 
 /* Binary math functions.  */
 DEF_INTERNAL_FLT_FN (ATAN2, ECF_CONST, atan2, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index 84c9b918041eef3409bdb0fbe04565b90b25d6e9..a5d892ac1ebfaa7b5d5fa970baa04c8e5b8acb28 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3751,12 +3751,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    trapping behaviour, so require !flag_trapping_math. */
 #if GIMPLE
 (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-	&& types_match (type, TREE_TYPE (@0))
-	&& direct_internal_fn_supported_p (IFN_TRUNC, type,
-					  OPTIMIZE_FOR_BOTH))
-      (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+    (with {
+      tree int_type = element_type (@1);
+     }
+     (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+	  && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type, int_type,
+					     OPTIMIZE_FOR_BOTH))
+      (IFN_FTRUNC_INT @0 {
+       wide_int_to_tree (int_type, wi::max_value (TYPE_PRECISION (int_type),
+						  SIGNED)); })
+      (if (!flag_trapping_math
+	   && direct_internal_fn_supported_p (IFN_TRUNC, type,
+					      OPTIMIZE_FOR_BOTH))
+       (IFN_TRUNC @0))))))
 #endif
 
 /* If we have a narrowing conversion to an integral type that is fed by a
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 5fcf5386a0b3112ef9004055c82e15fe47668970..04a4ee82e15fe7b52e726f2ee0bf704c30ac450d 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -63,6 +63,7 @@ OPTAB_CX(fractuns_optab, "fractuns$Q$b$I$a2")
 OPTAB_CL(satfract_optab, "satfract$b$Q$a2", SAT_FRACT, "satfract", gen_satfract_conv_libfunc)
 OPTAB_CL(satfractuns_optab, "satfractuns$I$b$Q$a2", UNSIGNED_SAT_FRACT, "satfractuns", gen_satfractuns_conv_libfunc)
 
+OPTAB_CD(ftruncint_optab, "ftrunc$a$b2")
 OPTAB_CD(sfixtrunc_optab, "fix_trunc$F$b$I$a2")
 OPTAB_CD(ufixtrunc_optab, "fixuns_trunc$F$b$I$a2")
 
diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
index b67abebc0096113272bfb1221eabaabd08657a58..e0219c8af4846ea0f947586b1915d9d06cb6c107 100644
--- a/gcc/stor-layout.h
+++ b/gcc/stor-layout.h
@@ -36,7 +36,6 @@ extern void place_field (record_layout_info, tree);
 extern void compute_record_mode (tree);
 extern void finish_bitfield_layout (tree);
 extern void finish_record_layout (record_layout_info, int);
-extern unsigned int element_precision (const_tree);
 extern void finalize_size_functions (void);
 extern void fixup_unsigned_type (tree);
 extern void initialize_sizetypes (void);
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
new file mode 100644
index 0000000000000000000000000000000000000000..008e1cf9f4a1b0148128c65c9ea0d1bb111467b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
@@ -0,0 +1,91 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.5-a" } */
+/* { dg-require-effective-target aarch64_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**	frint32z	s0, s0
+**	ret
+*/
+float
+f1 (float x)
+{
+  int y = x;
+  return (float) y;
+}
+
+/*
+** f2:
+**	frint64z	s0, s0
+**	ret
+*/
+float
+f2 (float x)
+{
+  long long int y = x;
+  return (float) y;
+}
+
+/*
+** f3:
+**	frint32z	d0, d0
+**	ret
+*/
+double
+f3 (double x)
+{
+  int y = x;
+  return (double) y;
+}
+
+/*
+** f4:
+**	frint64z	d0, d0
+**	ret
+*/
+double
+f4 (double x)
+{
+  long long int y = x;
+  return (double) y;
+}
+
+float
+f1_dont (float x)
+{
+  unsigned int y = x;
+  return (float) y;
+}
+
+float
+f2_dont (float x)
+{
+  unsigned long long int y = x;
+  return (float) y;
+}
+
+double
+f3_dont (double x)
+{
+  unsigned int y = x;
+  return (double) y;
+}
+
+double
+f4_dont (double x)
+{
+  unsigned long long int y = x;
+  return (double) y;
+}
+
+double
+f5_dont (double x)
+{
+  signed short y = x;
+  return (double) y;
+}
+
+/* Make sure the 'dont's don't generate any frintNz.  */
+/* { dg-final { scan-assembler-times {frint32z} 2 } } */
+/* { dg-final { scan-assembler-times {frint64z} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
new file mode 100644
index 0000000000000000000000000000000000000000..801d65ea8325cb680691286aab42747f43b90687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.5-a" } */
+/* { dg-require-effective-target aarch64_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define TEST(name,float_type,int_type)					\
+void									\
+name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
+{									\
+  for (int i = 0; i < n; ++i)					      \
+    {								      \
+      int_type x_i = x[i];					      \
+      y[i] = (float_type) x_i;					      \
+    }								      \
+}
+
+/*
+** f1:
+**	...
+**	frint32z	v[0-9]+\.4s, v[0-9]+\.4s
+**	...
+*/
+TEST(f1, float, int)
+
+/*
+** f2:
+**	...
+**	frint64z	v[0-9]+\.4s, v[0-9]+\.4s
+**	...
+*/
+TEST(f2, float, long long)
+
+/*
+** f3:
+**	...
+**	frint32z	v[0-9]+\.2d, v[0-9]+\.2d
+**	...
+*/
+TEST(f3, double, int)
+
+/*
+** f4:
+**	...
+**	frint64z	v[0-9]+\.2d, v[0-9]+\.2d
+**	...
+*/
+TEST(f4, double, long long)
diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
index 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3d80871c4cebd5fb5cac0714b3feee27038f05fd 100644
--- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
+++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math" } */
+/* { dg-skip-if "" { aarch64_frintnzx_ok } } */
 
 float
 f1 (float x)
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index c1ad97c6bd20d6e970edb24a125451580f014d55..5758e9cee4416b60b6766ecb37cbf3b37ac98522 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11399,6 +11399,32 @@ proc check_effective_target_arm_v8_3a_bkey_directive { } {
 	}]
 }
 
+# Return 1 if the target supports Armv8.5-A scalar and Advanced SIMD
+# FRINT32[ZX] andd FRINT64[ZX] instructions, 0 otherwise. The test is valid for
+# AArch64.
+proc check_effective_target_aarch64_frintnzx_ok_nocache { } {
+
+    if { ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    if { [check_no_compiler_messages_nocache \
+	      aarch64_frintnzx_ok assembly {
+	#if !defined (__ARM_FEATURE_FRINT)
+	#error "__ARM_FEATURE_FRINT not defined"
+	#endif
+    } [current_compiler_flags]] } {
+	return 1;
+    }
+
+    return 0;
+}
+
+proc check_effective_target_aarch64_frintnzx_ok { } {
+    return [check_cached_effective_target aarch64_frintnzx_ok \
+                check_effective_target_aarch64_frintnzx_ok_nocache] 
+}
+
 # Return 1 if the target supports executing the Armv8.1-M Mainline Low
 # Overhead Loop, 0 otherwise.  The test is valid for ARM.
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index f2625a2ff4089739326ce11785f1b68678c07f0e..435f2f4f5aeb2ed4c503c7b6a97d375634ae4514 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
 
 static internal_fn
 vectorizable_internal_function (combined_fn cfn, tree fndecl,
-				tree vectype_out, tree vectype_in)
+				tree vectype_out, tree vectype_in,
+				tree *vectypes)
 {
   internal_fn ifn;
   if (internal_fn_p (cfn))
@@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
       const direct_internal_fn_info &info = direct_internal_fn (ifn);
       if (info.vectorizable)
 	{
-	  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
-	  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+	  tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
+	  if (!type0)
+	    type0 = vectype_in;
+	  tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
+	  if (!type1)
+	    type1 = vectype_in;
 	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
 					      OPTIMIZE_FOR_SPEED))
 	    return ifn;
@@ -3263,18 +3268,40 @@ vectorizable_call (vec_info *vinfo,
       rhs_type = unsigned_type_node;
     }
 
-  int mask_opno = -1;
+  /* The argument that is not of the same type as the others.  */
+  int diff_opno = -1;
+  bool masked = false;
   if (internal_fn_p (cfn))
-    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
+    {
+      if (cfn == CFN_FTRUNC_INT)
+	/* For FTRUNC this represents the argument that carries the type of the
+	   intermediate signed integer.  */
+	diff_opno = 1;
+      else
+	{
+	  /* For masked operations this represents the argument that carries the
+	     mask.  */
+	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
+	  masked = diff_opno >=  0;
+	}
+    }
 
   for (i = 0; i < nargs; i++)
     {
-      if ((int) i == mask_opno)
+      if ((int) i == diff_opno)
 	{
-	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
-				       &op, &slp_op[i], &dt[i], &vectypes[i]))
-	    return false;
-	  continue;
+	  if (masked)
+	    {
+	      if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
+					   diff_opno, &op, &slp_op[i], &dt[i],
+					   &vectypes[i]))
+		return false;
+	    }
+	  else
+	    {
+	      vectypes[i] = TREE_TYPE (gimple_call_arg (stmt, i));
+	      continue;
+	    }
 	}
 
       if (!vect_is_simple_use (vinfo, stmt_info, slp_node,
@@ -3286,27 +3313,30 @@ vectorizable_call (vec_info *vinfo,
 	  return false;
 	}
 
-      /* We can only handle calls with arguments of the same type.  */
-      if (rhs_type
-	  && !types_compatible_p (rhs_type, TREE_TYPE (op)))
+      if ((int) i != diff_opno)
 	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "argument types differ.\n");
-	  return false;
-	}
-      if (!rhs_type)
-	rhs_type = TREE_TYPE (op);
+	  /* We can only handle calls with arguments of the same type.  */
+	  if (rhs_type
+	      && !types_compatible_p (rhs_type, TREE_TYPE (op)))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "argument types differ.\n");
+	      return false;
+	    }
+	  if (!rhs_type)
+	    rhs_type = TREE_TYPE (op);
 
-      if (!vectype_in)
-	vectype_in = vectypes[i];
-      else if (vectypes[i]
-	       && !types_compatible_p (vectypes[i], vectype_in))
-	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "argument vector types differ.\n");
-	  return false;
+	  if (!vectype_in)
+	    vectype_in = vectypes[i];
+	  else if (vectypes[i]
+		   && !types_compatible_p (vectypes[i], vectype_in))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "argument vector types differ.\n");
+	      return false;
+	    }
 	}
     }
   /* If all arguments are external or constant defs, infer the vector type
@@ -3382,8 +3412,8 @@ vectorizable_call (vec_info *vinfo,
 	  || (modifier == NARROW
 	      && simple_integer_narrowing (vectype_out, vectype_in,
 					   &convert_code))))
-    ifn = vectorizable_internal_function (cfn, callee, vectype_out,
-					  vectype_in);
+    ifn = vectorizable_internal_function (cfn, callee, vectype_out, vectype_in,
+					  &vectypes[0]);
 
   /* If that fails, try asking for a target-specific built-in function.  */
   if (ifn == IFN_LAST)
@@ -3461,7 +3491,7 @@ vectorizable_call (vec_info *vinfo,
 
       if (loop_vinfo
 	  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
-	  && (reduc_idx >= 0 || mask_opno >= 0))
+	  && (reduc_idx >= 0 || masked))
 	{
 	  if (reduc_idx >= 0
 	      && (cond_fn == IFN_LAST
@@ -3481,8 +3511,8 @@ vectorizable_call (vec_info *vinfo,
 		   ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node)
 		   : ncopies);
 	      tree scalar_mask = NULL_TREE;
-	      if (mask_opno >= 0)
-		scalar_mask = gimple_call_arg (stmt_info->stmt, mask_opno);
+	      if (masked)
+		scalar_mask = gimple_call_arg (stmt_info->stmt, diff_opno);
 	      vect_record_loop_mask (loop_vinfo, masks, nvectors,
 				     vectype_out, scalar_mask);
 	    }
@@ -3547,7 +3577,7 @@ vectorizable_call (vec_info *vinfo,
 		    {
 		      /* We don't define any narrowing conditional functions
 			 at present.  */
-		      gcc_assert (mask_opno < 0);
+		      gcc_assert (!masked);
 		      tree half_res = make_ssa_name (vectype_in);
 		      gcall *call
 			= gimple_build_call_internal_vec (ifn, vargs);
@@ -3567,16 +3597,16 @@ vectorizable_call (vec_info *vinfo,
 		    }
 		  else
 		    {
-		      if (mask_opno >= 0 && masked_loop_p)
+		      if (masked && masked_loop_p)
 			{
 			  unsigned int vec_num = vec_oprnds0.length ();
 			  /* Always true for SLP.  */
 			  gcc_assert (ncopies == 1);
 			  tree mask = vect_get_loop_mask (gsi, masks, vec_num,
 							  vectype_out, i);
-			  vargs[mask_opno] = prepare_vec_mask
+			  vargs[diff_opno] = prepare_vec_mask
 			    (loop_vinfo, TREE_TYPE (mask), mask,
-			     vargs[mask_opno], gsi);
+			     vargs[diff_opno], gsi);
 			}
 
 		      gcall *call;
@@ -3614,13 +3644,13 @@ vectorizable_call (vec_info *vinfo,
 	  if (masked_loop_p && reduc_idx >= 0)
 	    vargs[varg++] = vargs[reduc_idx + 1];
 
-	  if (mask_opno >= 0 && masked_loop_p)
+	  if (masked && masked_loop_p)
 	    {
 	      tree mask = vect_get_loop_mask (gsi, masks, ncopies,
 					      vectype_out, j);
-	      vargs[mask_opno]
+	      vargs[diff_opno]
 		= prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
-				    vargs[mask_opno], gsi);
+				    vargs[diff_opno], gsi);
 	    }
 
 	  gimple *new_stmt;
@@ -3639,7 +3669,7 @@ vectorizable_call (vec_info *vinfo,
 	    {
 	      /* We don't define any narrowing conditional functions at
 		 present.  */
-	      gcc_assert (mask_opno < 0);
+	      gcc_assert (!masked);
 	      tree half_res = make_ssa_name (vectype_in);
 	      gcall *call = gimple_build_call_internal_vec (ifn, vargs);
 	      gimple_call_set_lhs (call, half_res);
@@ -3683,7 +3713,7 @@ vectorizable_call (vec_info *vinfo,
     {
       auto_vec<vec<tree> > vec_defs (nargs);
       /* We don't define any narrowing conditional functions at present.  */
-      gcc_assert (mask_opno < 0);
+      gcc_assert (!masked);
       for (j = 0; j < ncopies; ++j)
 	{
 	  /* Build argument list for the vectorized call.  */
diff --git a/gcc/tree.h b/gcc/tree.h
index 318019c4dc5373271551f5d9a48dadb57a29d4a7..770d0ddfcc9a7acda01ed2fafa61eab0f1ba4cfa 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -6558,4 +6558,12 @@ extern unsigned fndecl_dealloc_argno (tree);
    object or pointer.  Otherwise return null.  */
 extern tree get_attr_nonstring_decl (tree, tree * = NULL);
 
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
+extern tree element_type (tree);
+
+/* Return the precision of the type, or for a complex or vector type the
+   precision of the type of its elements.  */
+extern unsigned int element_precision (const_tree);
+
 #endif  /* GCC_TREE_H  */
diff --git a/gcc/tree.c b/gcc/tree.c
index d98b77db50b29b22dc9af1f98cd86044f62af019..81e66dd710ce6bc237f508655cfb437b40ec0bfa 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -6646,11 +6646,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
   return true;
 }
 
-/* Return the precision of the type, or for a complex or vector type the
-   precision of the type of its elements.  */
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
 
-unsigned int
-element_precision (const_tree type)
+tree
+element_type (tree type)
 {
   if (!TYPE_P (type))
     type = TREE_TYPE (type);
@@ -6658,7 +6658,16 @@ element_precision (const_tree type)
   if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
     type = TREE_TYPE (type);
 
-  return TYPE_PRECISION (type);
+  return const_cast<tree> (type);
+}
+
+/* Return the precision of the type, or for a complex or vector type the
+   precision of the type of its elements.  */
+
+unsigned int
+element_precision (const_tree type)
+{
+  return TYPE_PRECISION (element_type (const_cast<tree> (type)));
 }
 
 /* Return true if CODE represents an associative tree code.  Otherwise
Richard Biener Jan. 10, 2022, 2:45 p.m. UTC | #17
On Mon, 10 Jan 2022, Andre Vieira (lists) wrote:

> Yeah seems I forgot to send the latest version, my bad.
> 
> Bootstrapped on aarch64-none-linux.
> 
> OK for trunk?

The match.pd part looks OK to me.

Thanks,
Richard.

> gcc/ChangeLog:
> 
>         * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New
> pattern.
>         * config/aarch64/iterators.md (FRINTNZ): New iterator.
>         (frintnz_mode): New int attribute.
>         (VSFDF): Make iterator conditional.
>         * internal-fn.def (FTRUNC_INT): New IFN.
>         * internal-fn.c (ftrunc_int_direct): New define.
>         (expand_ftrunc_int_optab_fn): New custom expander.
>         (direct_ftrunc_int_optab_supported_p): New supported_p.
>         * match.pd: Add to the existing TRUNC pattern match.
>         * optabs.def (ftrunc_int): New entry.
>         * stor-layout.h (element_precision): Moved from here...
>         * tree.h (element_precision): ... to here.
>         (element_type): New declaration.
>         * tree.c (element_type): New function.
>         (element_precision): Changed to use element_type.
>         * tree-vect-stmts.c (vectorizable_internal_function): Add 
> support for
>         IFNs with different input types.
>         (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
>         * doc/md.texi: New entry for ftrunc pattern name.
>         * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz
> instruction available.
>         * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
>         * gcc.target/aarch64/frintnz.c: New test.
>         * gcc.target/aarch64/frintnz_vec.c: New test.
> 
> On 03/01/2022 12:18, Richard Biener wrote:
> > On Wed, 29 Dec 2021, Andre Vieira (lists) wrote:
> >
> >> Hi Richard,
> >>
> >> Thank you for the review, I've adopted all above suggestions downstream, I
> >> am
> >> still surprised how many style things I still miss after years of gcc
> >> development :(
> >>
> >> On 17/12/2021 12:44, Richard Sandiford wrote:
> >>>> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
> >>>>          rhs_type = unsigned_type_node;
> >>>>        }
> >>>>    -  int mask_opno = -1;
> >>>> +  /* The argument that is not of the same type as the others.  */
> >>>> +  int diff_opno = -1;
> >>>> +  bool masked = false;
> >>>>      if (internal_fn_p (cfn))
> >>>> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> >>>> +    {
> >>>> +      if (cfn == CFN_FTRUNC_INT)
> >>>> +	/* For FTRUNC this represents the argument that carries the type of
> >>>> the
> >>>> +	   intermediate signed integer.  */
> >>>> +	diff_opno = 1;
> >>>> +      else
> >>>> +	{
> >>>> +	  /* For masked operations this represents the argument that carries
> >>>> the
> >>>> +	     mask.  */
> >>>> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
> >>>> +	  masked = diff_opno >=  0;
> >>>> +	}
> >>>> +    }
> >>> I think it would be cleaner not to process argument 1 at all for
> >>> CFN_FTRUNC_INT.  There's no particular need to vectorise it.
> >> I agree with this,  will change the loop to continue for argument 1 when
> >> dealing with non-masked CFN's.
> >>
> >>>>    	}
> >>>> […]
> >>>> diff --git a/gcc/tree.c b/gcc/tree.c
> >>>> index
> >>>> 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
> >>>> 100644
> >>>> --- a/gcc/tree.c
> >>>> +++ b/gcc/tree.c
> >>>> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size,
> >>>> cst_size_error *perr /* = NULL */)
> >>>>      return true;
> >>>>    }
> >>>>    
> >>>> -/* Return the precision of the type, or for a complex or vector type the
> >>>> -   precision of the type of its elements.  */
> >>>> +/* Return the type, or for a complex or vector type the type of its
> >>>> +   elements.  */
> >>>>    -unsigned int
> >>>> -element_precision (const_tree type)
> >>>> +tree
> >>>> +element_type (const_tree type)
> >>>>    {
> >>>>      if (!TYPE_P (type))
> >>>>        type = TREE_TYPE (type);
> >>>> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
> >>>>      if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
> >>>>        type = TREE_TYPE (type);
> >>>>    -  return TYPE_PRECISION (type);
> >>>> +  return (tree) type;
> >>> I think we should stick a const_cast in element_precision and make
> >>> element_type take a plain “type”.  As it stands element_type is an
> >>> implicit const_cast for many cases.
> >>>
> >>> Thanks,
> >> Was just curious about something here, I thought the purpose of having
> >> element_precision (before) and element_type (now) take a const_tree as an
> >> argument was to make it clear we aren't changing the input type. I
> >> understand
> >> that as it stands element_type could be an implicit const_cast (which I
> >> should
> >> be using rather than the '(tree)' cast), but that's only if 'type' is a
> >> type
> >> that isn't complex/vector, either way, we are conforming to the promise
> >> that
> >> we aren't changing the incoming type, what the caller then does with the
> >> result is up to them no?
> >>
> >> I don't mind making the changes, just trying to understand the reasoning
> >> behind it.
> >>
> >> I'll send in a new patch with all the changes after the review on the
> >> match.pd
> >> stuff.
> > I'm missing an updated patch after my initial review of the match.pd
> > stuff so not sure what to review.  Can you re-post and updated patch?
> >
> > Thanks,
> > Richard.
>
Richard Sandiford Jan. 14, 2022, 10:37 a.m. UTC | #18
"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 19e89ae502bc2f51db64667b236c1cb669718b02..3b0e4e0875b4392ab6833568b207580ef597a98f 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6191,6 +6191,15 @@ operands; otherwise, it may not.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
> +@item @samp{ftrunc@var{m}@var{n}2}
> +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
> +the result in operand 0. Both operands have mode @var{m}, which is a scalar or
> +vector floating-point mode.  An exception must be raised if operand 1 does not
> +fit in a @var{n} mode signed integer as it would have if the truncation
> +happened through separate floating point to integer conversion.
> +
> +

Nit: just one blank line here.

>  @cindex @code{round@var{m}2} instruction pattern
>  @item @samp{round@var{m}2}
>  Round operand 1 to the nearest integer, rounding away from zero in the
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 6095a35cd4565fdb7d758104e80fe6411230f758..a56bbb775572fa72379854f90a01ad543557e29a 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2286,6 +2286,10 @@ Like @code{aarch64_sve_hw}, but also test for an exact hardware vector length.
>  @item aarch64_fjcvtzs_hw
>  AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
>  instruction.
> +
> +@item aarch64_frintzx_ok
> +AArch64 target that is able to generate the Armv8.5-a FRINT32Z, FRINT64Z,
> +FRINT32X and FRINT64X instructions.
>  @end table
>  
>  @subsubsection MIPS-specific attributes
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index b24102a5990bea4cbb102069f7a6df497fc81ebf..9047b486f41948059a7a7f1ccc4032410a369139 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -130,6 +130,7 @@ init_internal_fns ()
>  #define fold_left_direct { 1, 1, false }
>  #define mask_fold_left_direct { 1, 1, false }
>  #define check_ptrs_direct { 0, 0, false }
> +#define ftrunc_int_direct { 0, 1, true }
>  
>  const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
>  #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
> @@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_optab optab)
>    return convert_optab_handler (optab, imode, vmode);
>  }
>  
> +/* Expand FTRUNC_INT call STMT using optab OPTAB.  */
> +
> +static void
> +expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  class expand_operand ops[2];
> +  tree lhs, float_type, int_type;
> +  rtx target, op;
> +
> +  lhs = gimple_call_lhs (stmt);
> +  target = expand_normal (lhs);
> +  op = expand_normal (gimple_call_arg (stmt, 0));
> +
> +  float_type = TREE_TYPE (lhs);
> +  int_type = element_type (gimple_call_arg (stmt, 1));

Sorry for the run-around, but now that we don't (need to) vectorise
the second argument, I think we can drop this element_type.  That in
turn means that…

> +
> +  create_output_operand (&ops[0], target, TYPE_MODE (float_type));
> +  create_input_operand (&ops[1], op, TYPE_MODE (float_type));
> +
> +  expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type),
> +				      TYPE_MODE (int_type)), 2, ops);
> +}
> +
>  /* Expand LOAD_LANES call STMT using optab OPTAB.  */
>  
>  static void
> @@ -3747,6 +3771,15 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  	  != CODE_FOR_nothing);
>  }
>  
> +static bool
> +direct_ftrunc_int_optab_supported_p (convert_optab optab, tree_pair types,
> +				     optimization_type opt_type)
> +{
> +  return (convert_optab_handler (optab, TYPE_MODE (types.first),
> +				TYPE_MODE (element_type (types.second)),
> +				opt_type) != CODE_FOR_nothing);
> +}
> +

…this can use convert_optab_supported_p.

> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..008e1cf9f4a1b0148128c65c9ea0d1bb111467b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> @@ -0,0 +1,91 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */

Is this just a cut-&-pasto from a run test?  If not, why do we need both
this and the dg-options?  It feels like one on its own should be enough,
with the dg-options being better.

The test looks OK without this line.

> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** f1:
> +**	frint32z	s0, s0
> +**	ret
> +*/
> +float
> +f1 (float x)
> +{
> +  int y = x;
> +  return (float) y;
> +}
> +
> +/*
> +** f2:
> +**	frint64z	s0, s0
> +**	ret
> +*/
> +float
> +f2 (float x)
> +{
> +  long long int y = x;
> +  return (float) y;
> +}
> +
> +/*
> +** f3:
> +**	frint32z	d0, d0
> +**	ret
> +*/
> +double
> +f3 (double x)
> +{
> +  int y = x;
> +  return (double) y;
> +}
> +
> +/*
> +** f4:
> +**	frint64z	d0, d0
> +**	ret
> +*/
> +double
> +f4 (double x)
> +{
> +  long long int y = x;
> +  return (double) y;
> +}
> +
> +float
> +f1_dont (float x)
> +{
> +  unsigned int y = x;
> +  return (float) y;
> +}
> +
> +float
> +f2_dont (float x)
> +{
> +  unsigned long long int y = x;
> +  return (float) y;
> +}
> +
> +double
> +f3_dont (double x)
> +{
> +  unsigned int y = x;
> +  return (double) y;
> +}
> +
> +double
> +f4_dont (double x)
> +{
> +  unsigned long long int y = x;
> +  return (double) y;
> +}
> +
> +double
> +f5_dont (double x)
> +{
> +  signed short y = x;
> +  return (double) y;
> +}
> +
> +/* Make sure the 'dont's don't generate any frintNz.  */
> +/* { dg-final { scan-assembler-times {frint32z} 2 } } */
> +/* { dg-final { scan-assembler-times {frint64z} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..801d65ea8325cb680691286aab42747f43b90687
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */

Same here.

> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#define TEST(name,float_type,int_type)					\
> +void									\
> +name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
> +{									\
> +  for (int i = 0; i < n; ++i)					      \
> +    {								      \
> +      int_type x_i = x[i];					      \
> +      y[i] = (float_type) x_i;					      \
> +    }								      \
> +}
> +
> +/*
> +** f1:
> +**	...
> +**	frint32z	v[0-9]+\.4s, v[0-9]+\.4s
> +**	...
> +*/
> +TEST(f1, float, int)
> +
> +/*
> +** f2:
> +**	...
> +**	frint64z	v[0-9]+\.4s, v[0-9]+\.4s
> +**	...
> +*/
> +TEST(f2, float, long long)
> +
> +/*
> +** f3:
> +**	...
> +**	frint32z	v[0-9]+\.2d, v[0-9]+\.2d
> +**	...
> +*/
> +TEST(f3, double, int)
> +
> +/*
> +** f4:
> +**	...
> +**	frint64z	v[0-9]+\.2d, v[0-9]+\.2d
> +**	...
> +*/
> +TEST(f4, double, long long)
> […]
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index f2625a2ff4089739326ce11785f1b68678c07f0e..435f2f4f5aeb2ed4c503c7b6a97d375634ae4514 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
>  
>  static internal_fn
>  vectorizable_internal_function (combined_fn cfn, tree fndecl,
> -				tree vectype_out, tree vectype_in)
> +				tree vectype_out, tree vectype_in,
> +				tree *vectypes)

Should be described in the comment above the function.

>  {
>    internal_fn ifn;
>    if (internal_fn_p (cfn))
> @@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
>        const direct_internal_fn_info &info = direct_internal_fn (ifn);
>        if (info.vectorizable)
>  	{
> -	  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
> -	  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> +	  tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
> +	  if (!type0)
> +	    type0 = vectype_in;
> +	  tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
> +	  if (!type1)
> +	    type1 = vectype_in;
>  	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
>  					      OPTIMIZE_FOR_SPEED))
>  	    return ifn;
> @@ -3263,18 +3268,40 @@ vectorizable_call (vec_info *vinfo,
>        rhs_type = unsigned_type_node;
>      }
>  
> -  int mask_opno = -1;
> +  /* The argument that is not of the same type as the others.  */
> +  int diff_opno = -1;
> +  bool masked = false;
>    if (internal_fn_p (cfn))
> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> +    {
> +      if (cfn == CFN_FTRUNC_INT)
> +	/* For FTRUNC this represents the argument that carries the type of the
> +	   intermediate signed integer.  */
> +	diff_opno = 1;
> +      else
> +	{
> +	  /* For masked operations this represents the argument that carries the
> +	     mask.  */
> +	  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
> +	  masked = diff_opno >=  0;

Nit: excess space after “>=”.

> +	}
> +    }

I think it would be better to add a new flag to direct_internal_fn_info
to say whether type1 is scalar, rather than check based on function code.
type1 would then provide the value of diff_opno above.

Also, I think diff_opno should be separate from mask_opno.
Maybe scalar_opno would be a better name.

This would probably be simpler if we used:

  internal_fn ifn = associated_internal_fn (cfn, lhs_type);

before the loop (with lhs_type being new), then used ifn to get the
direct_internal_fn_info and passed ifn to vectorizable_internal_function.

>    for (i = 0; i < nargs; i++)
>      {
> -      if ((int) i == mask_opno)
> +      if ((int) i == diff_opno)
>  	{
> -	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
> -				       &op, &slp_op[i], &dt[i], &vectypes[i]))
> -	    return false;
> -	  continue;
> +	  if (masked)
> +	    {
> +	      if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
> +					   diff_opno, &op, &slp_op[i], &dt[i],
> +					   &vectypes[i]))
> +		return false;
> +	    }
> +	  else
> +	    {
> +	      vectypes[i] = TREE_TYPE (gimple_call_arg (stmt, i));
> +	      continue;
> +	    }
>  	}
>  
>        if (!vect_is_simple_use (vinfo, stmt_info, slp_node,
> @@ -3286,27 +3313,30 @@ vectorizable_call (vec_info *vinfo,
>  	  return false;
>  	}
>  
> -      /* We can only handle calls with arguments of the same type.  */
> -      if (rhs_type
> -	  && !types_compatible_p (rhs_type, TREE_TYPE (op)))
> +      if ((int) i != diff_opno)

Is this ever false?  It looks the continue above handles the other case.

>  	{
> -	  if (dump_enabled_p ())
> -	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                             "argument types differ.\n");
> -	  return false;
> -	}
> -      if (!rhs_type)
> -	rhs_type = TREE_TYPE (op);
> +	  /* We can only handle calls with arguments of the same type.  */
> +	  if (rhs_type
> +	      && !types_compatible_p (rhs_type, TREE_TYPE (op)))
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "argument types differ.\n");
> +	      return false;
> +	    }
> +	  if (!rhs_type)
> +	    rhs_type = TREE_TYPE (op);
>  
> -      if (!vectype_in)
> -	vectype_in = vectypes[i];
> -      else if (vectypes[i]
> -	       && !types_compatible_p (vectypes[i], vectype_in))
> -	{
> -	  if (dump_enabled_p ())
> -	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                             "argument vector types differ.\n");
> -	  return false;
> +	  if (!vectype_in)
> +	    vectype_in = vectypes[i];
> +	  else if (vectypes[i]
> +		   && !types_compatible_p (vectypes[i], vectype_in))
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "argument vector types differ.\n");
> +	      return false;
> +	    }
>  	}
>      }
>    /* If all arguments are external or constant defs, infer the vector type
> @@ -3382,8 +3412,8 @@ vectorizable_call (vec_info *vinfo,
>  	  || (modifier == NARROW
>  	      && simple_integer_narrowing (vectype_out, vectype_in,
>  					   &convert_code))))
> -    ifn = vectorizable_internal_function (cfn, callee, vectype_out,
> -					  vectype_in);
> +    ifn = vectorizable_internal_function (cfn, callee, vectype_out, vectype_in,
> +					  &vectypes[0]);
>  
>    /* If that fails, try asking for a target-specific built-in function.  */
>    if (ifn == IFN_LAST)
> @@ -3461,7 +3491,7 @@ vectorizable_call (vec_info *vinfo,
>  
>        if (loop_vinfo
>  	  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> -	  && (reduc_idx >= 0 || mask_opno >= 0))
> +	  && (reduc_idx >= 0 || masked))
>  	{
>  	  if (reduc_idx >= 0
>  	      && (cond_fn == IFN_LAST
> @@ -3481,8 +3511,8 @@ vectorizable_call (vec_info *vinfo,
>  		   ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node)
>  		   : ncopies);
>  	      tree scalar_mask = NULL_TREE;
> -	      if (mask_opno >= 0)
> -		scalar_mask = gimple_call_arg (stmt_info->stmt, mask_opno);
> +	      if (masked)
> +		scalar_mask = gimple_call_arg (stmt_info->stmt, diff_opno);
>  	      vect_record_loop_mask (loop_vinfo, masks, nvectors,
>  				     vectype_out, scalar_mask);
>  	    }
> @@ -3547,7 +3577,7 @@ vectorizable_call (vec_info *vinfo,
>  		    {
>  		      /* We don't define any narrowing conditional functions
>  			 at present.  */
> -		      gcc_assert (mask_opno < 0);
> +		      gcc_assert (!masked);
>  		      tree half_res = make_ssa_name (vectype_in);
>  		      gcall *call
>  			= gimple_build_call_internal_vec (ifn, vargs);
> @@ -3567,16 +3597,16 @@ vectorizable_call (vec_info *vinfo,
>  		    }
>  		  else
>  		    {
> -		      if (mask_opno >= 0 && masked_loop_p)
> +		      if (masked && masked_loop_p)
>  			{
>  			  unsigned int vec_num = vec_oprnds0.length ();
>  			  /* Always true for SLP.  */
>  			  gcc_assert (ncopies == 1);
>  			  tree mask = vect_get_loop_mask (gsi, masks, vec_num,
>  							  vectype_out, i);
> -			  vargs[mask_opno] = prepare_vec_mask
> +			  vargs[diff_opno] = prepare_vec_mask
>  			    (loop_vinfo, TREE_TYPE (mask), mask,
> -			     vargs[mask_opno], gsi);
> +			     vargs[diff_opno], gsi);
>  			}
>  
>  		      gcall *call;
> @@ -3614,13 +3644,13 @@ vectorizable_call (vec_info *vinfo,
>  	  if (masked_loop_p && reduc_idx >= 0)
>  	    vargs[varg++] = vargs[reduc_idx + 1];
>  
> -	  if (mask_opno >= 0 && masked_loop_p)
> +	  if (masked && masked_loop_p)
>  	    {
>  	      tree mask = vect_get_loop_mask (gsi, masks, ncopies,
>  					      vectype_out, j);
> -	      vargs[mask_opno]
> +	      vargs[diff_opno]
>  		= prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
> -				    vargs[mask_opno], gsi);
> +				    vargs[diff_opno], gsi);
>  	    }
>  
>  	  gimple *new_stmt;
> @@ -3639,7 +3669,7 @@ vectorizable_call (vec_info *vinfo,
>  	    {
>  	      /* We don't define any narrowing conditional functions at
>  		 present.  */
> -	      gcc_assert (mask_opno < 0);
> +	      gcc_assert (!masked);
>  	      tree half_res = make_ssa_name (vectype_in);
>  	      gcall *call = gimple_build_call_internal_vec (ifn, vargs);
>  	      gimple_call_set_lhs (call, half_res);
> @@ -3683,7 +3713,7 @@ vectorizable_call (vec_info *vinfo,
>      {
>        auto_vec<vec<tree> > vec_defs (nargs);
>        /* We don't define any narrowing conditional functions at present.  */
> -      gcc_assert (mask_opno < 0);
> +      gcc_assert (!masked);
>        for (j = 0; j < ncopies; ++j)
>  	{
>  	  /* Build argument list for the vectorized call.  */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 318019c4dc5373271551f5d9a48dadb57a29d4a7..770d0ddfcc9a7acda01ed2fafa61eab0f1ba4cfa 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -6558,4 +6558,12 @@ extern unsigned fndecl_dealloc_argno (tree);
>     object or pointer.  Otherwise return null.  */
>  extern tree get_attr_nonstring_decl (tree, tree * = NULL);
>  
> +/* Return the type, or for a complex or vector type the type of its
> +   elements.  */
> +extern tree element_type (tree);
> +
> +/* Return the precision of the type, or for a complex or vector type the
> +   precision of the type of its elements.  */
> +extern unsigned int element_precision (const_tree);
> +
>  #endif  /* GCC_TREE_H  */
> diff --git a/gcc/tree.c b/gcc/tree.c
> index d98b77db50b29b22dc9af1f98cd86044f62af019..81e66dd710ce6bc237f508655cfb437b40ec0bfa 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -6646,11 +6646,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
>    return true;
>  }
>  
> -/* Return the precision of the type, or for a complex or vector type the
> -   precision of the type of its elements.  */
> +/* Return the type, or for a complex or vector type the type of its
> +   elements.  */
>  
> -unsigned int
> -element_precision (const_tree type)
> +tree
> +element_type (tree type)
>  {
>    if (!TYPE_P (type))
>      type = TREE_TYPE (type);
> @@ -6658,7 +6658,16 @@ element_precision (const_tree type)
>    if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
>      type = TREE_TYPE (type);
>  
> -  return TYPE_PRECISION (type);
> +  return const_cast<tree> (type);

The const_cast<> is redundant.

Sorry for not thinking about it before, but we should probably have
a test for the SLP case.  E.g.:

  for (int i = 0; i < n; i += 2)
    {
      int_type x_i0 = x[i];
      int_type x_i1 = x[i + 1];
      y[i] = (float_type) x_i1;
      y[i + 1] = (float_type) x_i0;
    }

(with a permute thrown in for good measure).  This will make sure
that the (separate) SLP group matching code handles the call correctly.

Thanks,
Richard
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4035e061706793849c68ae09bcb2e4b9580ab7b6..ad4e04d7c874da095513442e7d7f247791d8921d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7345,6 +7345,16 @@  (define_insn "despeculate_simpleti"
    (set_attr "speculation_barrier" "true")]
 )
 
+(define_insn "ftrunc<mode><frintnz_mode>2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+        (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+		      FRINTNZ))]
+  "TARGET_FRINT && TARGET_FLOAT
+   && !(VECTOR_MODE_P (<MODE>mode) && !TARGET_SIMD)"
+  "<frintnzs_op>\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
+  [(set_attr "type" "f_rint<stype>")]
+)
+
 (define_insn "aarch64_<frintnzs_op><mode>"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
 	(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..49510488a2a800689e95c399f2e6c967b566516d 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3067,6 +3067,8 @@  (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
 			       UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3482,6 +3484,8 @@  (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s")
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X "frint32x")
 			      (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X "frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_<xx>.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
 			 (UNSPEC_COND_CMPGE_WIDE "ge")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e95005647ca97a495a97d7e184d137..7bd66818144e87e1dca2ef13bef1d6f21f239570 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6175,6 +6175,13 @@  operands; otherwise, it may not.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
+@item @samp{ftrunc@var{m}@var{n}2}
+Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
+the result in operand 0. Both operands have mode @var{m}, which is a scalar or
+vector floating-point mode.
+
+
 @cindex @code{round@var{m}2} instruction pattern
 @item @samp{round@var{m}2}
 Round operand 1 to the nearest integer, rounding away from zero in the
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index bb13c6cce1bf55633760bc14980402f1f0ac1689..64263cbb83548b140f613cb4bf5ce6565373f96d 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -269,6 +269,8 @@  DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC32, ECF_CONST, ftrunc32, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC64, ECF_CONST, ftrunc64, unary)
 
 /* Binary math functions.  */
 DEF_INTERNAL_FLT_FN (ATAN2, ECF_CONST, atan2, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index a319aefa8081ac177981ad425c461f8a771128f4..7937eeb7865ce05d32dd5fdc2a90699a0e15230e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3713,12 +3713,22 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    trapping behaviour, so require !flag_trapping_math. */
 #if GIMPLE
 (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-	&& types_match (type, TREE_TYPE (@0))
-	&& direct_internal_fn_supported_p (IFN_TRUNC, type,
-					  OPTIMIZE_FOR_BOTH))
-      (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+    (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+	 && TYPE_MODE (TREE_TYPE (@1)) == SImode
+	 && direct_internal_fn_supported_p (IFN_FTRUNC32, type,
+					    OPTIMIZE_FOR_BOTH))
+     (IFN_FTRUNC32 @0)
+     (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+	  && TYPE_MODE (TREE_TYPE (@1)) == DImode
+	  && direct_internal_fn_supported_p (IFN_FTRUNC64, type,
+					     OPTIMIZE_FOR_BOTH))
+      (IFN_FTRUNC64 @0)
+      (if (!flag_trapping_math
+	   && direct_internal_fn_supported_p (IFN_TRUNC, type,
+					      OPTIMIZE_FOR_BOTH))
+       (IFN_TRUNC @0))))))
 #endif
 
 /* If we have a narrowing conversion to an integral type that is fed by a
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b889ad2e5a08613db51d16d072080ac6cb48404f..740af19fcf5c53e25663038ff6c2e88cf8d7334f 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -282,6 +282,8 @@  OPTAB_D (floor_optab, "floor$a2")
 OPTAB_D (ceil_optab, "ceil$a2")
 OPTAB_D (btrunc_optab, "btrunc$a2")
 OPTAB_D (nearbyint_optab, "nearbyint$a2")
+OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
+OPTAB_D (ftrunc64_optab, "ftrunc$adi2")
 
 OPTAB_D (acos_optab, "acos$a2")
 OPTAB_D (acosh_optab, "acosh$a2")
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
new file mode 100644
index 0000000000000000000000000000000000000000..2e1971f8aa11d8b95f454d03a03e050a3bf96747
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
@@ -0,0 +1,88 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.5-a" } */
+/* { dg-require-effective-target arm_v8_5a_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**	...
+**	frint32z	s0, s0
+**	...
+*/
+float
+f1 (float x)
+{
+  int y = x;
+  return (float) y;
+}
+
+/*
+** f2:
+**	...
+**	frint64z	s0, s0
+**	...
+*/
+float
+f2 (float x)
+{
+  long long int y = x;
+  return (float) y;
+}
+
+/*
+** f3:
+**	...
+**	frint32z	d0, d0
+**	...
+*/
+double
+f3 (double x)
+{
+  int y = x;
+  return (double) y;
+}
+
+/*
+** f4:
+**	...
+**	frint64z	d0, d0
+**	...
+*/
+double
+f4 (double x)
+{
+  long long int y = x;
+  return (double) y;
+}
+
+float
+f1_dont (float x)
+{
+  unsigned int y = x;
+  return (float) y;
+}
+
+float
+f2_dont (float x)
+{
+  unsigned long long int y = x;
+  return (float) y;
+}
+
+double
+f3_dont (double x)
+{
+  unsigned int y = x;
+  return (double) y;
+}
+
+double
+f4_dont (double x)
+{
+  unsigned long long int y = x;
+  return (double) y;
+}
+
+/* Make sure the 'dont's don't generate any frintNz.  */
+/* { dg-final { scan-assembler-times {frint32z} 2 } } */
+/* { dg-final { scan-assembler-times {frint64z} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
index 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3b34dc3ad79f1406a41ec4c00db10347ba1ca2c4 100644
--- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
+++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
@@ -1,5 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math" } */
+/* { dg-skip-if "" { arm_v8_5a_frintnzx_ok } } */
 
 float
 f1 (float x)
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8cbda192fe0fae59ea208ee43696b4d22c43e61e..0d64acb987614710d84490fce20e49db2ebf2e48 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11365,6 +11365,33 @@  proc check_effective_target_arm_v8_3a_bkey_directive { } {
 	}]
 }
 
+# Return 1 if the target supports Armv8.5-A scalar and Advanced SIMD
+# FRINT32[ZX] and FRINT64[ZX] instructions, 0 otherwise. The test is valid
+# for AArch64.
+
+proc check_effective_target_arm_v8_5a_frintnzx_ok_nocache { } {
+
+    if { ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    if { [check_no_compiler_messages_nocache \
+	      arm_v8_5a_frintnzx_ok assembly {
+	#if !defined (__ARM_FEATURE_FRINT)
+	#error "__ARM_FEATURE_FRINT not defined"
+	#endif
+    } [current_compiler_flags]] } {
+	return 1;
+    }
+
+    return 0;
+}
+
+proc check_effective_target_arm_v8_5a_frintnzx_ok { } {
+    return [check_cached_effective_target arm_v8_5a_frintnzx_ok \
+                check_effective_target_arm_v8_5a_frintnzx_ok_nocache] 
+}
+
 # Return 1 if the target supports executing the Armv8.1-M Mainline Low
 # Overhead Loop, 0 otherwise.  The test is valid for ARM.