middle-end: Fix ifcvt predicate generation for masked function calls
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Test passed
|
Commit Message
Up until now, due to a latent bug in the code for the ifcvt pass,
irrespective of the branch taken in a conditional statement, the
original condition for the if statement was used in masking the
function call.
Thus, for code such as:
if (a[i] > limit)
b[i] = fixed_const;
else
b[i] = fn (a[i]);
we would generate the following (wrong) if-converted tree code:
_1 = a[i_1];
_2 = _1 > limit;
_3 = .MASK_CALL (fn, _1, _2);
cstore_4 = _2 ? fixed_const : _3;
as opposed to the correct expected sequence:
_1 = a[i_1];
_2 = _1 > limit;
_3 = ~_2;
_4 = .MASK_CALL (fn, _1, _3);
cstore_5 = _2 ? fixed_const : _4;
This patch ensures that the correct predicate mask generation is
carried out such that, upon autovectorization, the correct vector
lanes are selected in the vectorized function call.
gcc/ChangeLog:
* tree-if-conv.cc (predicate_statements): Fix handling of
predicated function calls.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-fncall-mask.c: New.
---
gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c | 31 ++++++++++++++++++++
gcc/tree-if-conv.cc | 14 ++++++++-
2 files changed, 44 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
Comments
Hi Victor,
Thanks! This looks good to me with one minor comment:
> -----Original Message-----
> From: Victor Do Nascimento <victor.donascimento@arm.com>
> Sent: Monday, September 30, 2024 2:34 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Tamar Christina <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> Victor Do Nascimento <Victor.DoNascimento@arm.com>
> Subject: [PATCH] middle-end: Fix ifcvt predicate generation for masked function
> calls
>
> Up until now, due to a latent bug in the code for the ifcvt pass,
> irrespective of the branch taken in a conditional statement, the
> original condition for the if statement was used in masking the
> function call.
>
> Thus, for code such as:
>
> if (a[i] > limit)
> b[i] = fixed_const;
> else
> b[i] = fn (a[i]);
>
> we would generate the following (wrong) if-converted tree code:
>
> _1 = a[i_1];
> _2 = _1 > limit;
> _3 = .MASK_CALL (fn, _1, _2);
> cstore_4 = _2 ? fixed_const : _3;
>
> as opposed to the correct expected sequence:
>
> _1 = a[i_1];
> _2 = _1 > limit;
> _3 = ~_2;
> _4 = .MASK_CALL (fn, _1, _3);
> cstore_5 = _2 ? fixed_const : _4;
>
> This patch ensures that the correct predicate mask generation is
> carried out such that, upon autovectorization, the correct vector
> lanes are selected in the vectorized function call.
>
> gcc/ChangeLog:
>
> * tree-if-conv.cc (predicate_statements): Fix handling of
> predicated function calls.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-fncall-mask.c: New.
> ---
> gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c | 31 ++++++++++++++++++++
> gcc/tree-if-conv.cc | 14 ++++++++-
> 2 files changed, 44 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
> b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
> new file mode 100644
> index 00000000000..554488e0630
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast"
> { target { aarch64*-*-* } } } */
> +
> +extern int __attribute__ ((simd, const)) fn (int);
> +
> +const int N = 20;
> +const float lim = 101.0;
> +const float cst = -1.0;
> +float tot = 0.0;
> +
> +float b[20];
> +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
> + [10 ... 19] = 100.0 }; /* Else branch. */
> +
> +int main (void)
> +{
> + #pragma omp simd
> + for (int i = 0; i < N; i += 1)
> + {
> + if (a[i] > lim)
> + b[i] = cst;
> + else
> + b[i] = fn (a[i]);
> + tot += b[i];
> + }
> + return (0);
> +}
> +
> +/* { dg-final { scan-tree-dump {gimple_assign <gt_expr, _12, _1, 1.01e\+2,
> NULL>} ifcvt } } */
> +/* { dg-final { scan-tree-dump {gimple_assign <bit_not_expr, _34, _12, NULL,
> NULL>} ifcvt } } */
> +/* { dg-final { scan-tree-dump {gimple_call <.MASK_CALL, _3, fn, _2, _34>} ifcvt } }
> */
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 0346a1376c5..246a6bb5bd1 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -2907,6 +2907,8 @@ predicate_statements (loop_p loop)
> This will cause the vectorizer to match the "in branch"
> clone variants, and serves to build the mask vector
> in a natural way. */
> + tree mask = cond;
> + gimple_seq stmts = NULL;
> gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
> tree orig_fn = gimple_call_fn (call);
> int orig_nargs = gimple_call_num_args (call);
> @@ -2914,7 +2916,17 @@ predicate_statements (loop_p loop)
> args.safe_push (orig_fn);
> for (int i = 0; i < orig_nargs; i++)
> args.safe_push (gimple_call_arg (call, i));
> - args.safe_push (cond);
> + /* If `swap', we invert the mask used for the if branch for use
> + when masking the function call. */
> + if (swap)
> + {
> + tree true_val
> + = constant_boolean_node (true, TREE_TYPE (mask));
> + mask = gimple_build (&stmts, BIT_XOR_EXPR,
> + TREE_TYPE (mask), mask, true_val);
> + }
> + gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
Looks like this mirrors what is currently being done for gimple_assign, but
you can move the gsi_insert_seq and the declaration of stmts into the if
block since they're only used there.
Otherwise looks good to me but can't approve.
Thanks,
Tamar
> + args.safe_push (mask);
>
> /* Replace the call with a IFN_MASK_CALL that has the extra
> condition parameter. */
> --
> 2.34.1
On Mon, Sep 30, 2024 at 8:40 PM Tamar Christina <Tamar.Christina@arm.com> wrote:
>
> Hi Victor,
>
> Thanks! This looks good to me with one minor comment:
>
> > -----Original Message-----
> > From: Victor Do Nascimento <victor.donascimento@arm.com>
> > Sent: Monday, September 30, 2024 2:34 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Tamar Christina <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> > Victor Do Nascimento <Victor.DoNascimento@arm.com>
> > Subject: [PATCH] middle-end: Fix ifcvt predicate generation for masked function
> > calls
> >
> > Up until now, due to a latent bug in the code for the ifcvt pass,
> > irrespective of the branch taken in a conditional statement, the
> > original condition for the if statement was used in masking the
> > function call.
> >
> > Thus, for code such as:
> >
> > if (a[i] > limit)
> > b[i] = fixed_const;
> > else
> > b[i] = fn (a[i]);
> >
> > we would generate the following (wrong) if-converted tree code:
> >
> > _1 = a[i_1];
> > _2 = _1 > limit;
> > _3 = .MASK_CALL (fn, _1, _2);
> > cstore_4 = _2 ? fixed_const : _3;
> >
> > as opposed to the correct expected sequence:
> >
> > _1 = a[i_1];
> > _2 = _1 > limit;
> > _3 = ~_2;
> > _4 = .MASK_CALL (fn, _1, _3);
> > cstore_5 = _2 ? fixed_const : _4;
> >
> > This patch ensures that the correct predicate mask generation is
> > carried out such that, upon autovectorization, the correct vector
> > lanes are selected in the vectorized function call.
> >
> > gcc/ChangeLog:
> >
> > * tree-if-conv.cc (predicate_statements): Fix handling of
> > predicated function calls.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/vect/vect-fncall-mask.c: New.
> > ---
> > gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c | 31 ++++++++++++++++++++
> > gcc/tree-if-conv.cc | 14 ++++++++-
> > 2 files changed, 44 insertions(+), 1 deletion(-)
> > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
> > b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
> > new file mode 100644
> > index 00000000000..554488e0630
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
> > @@ -0,0 +1,31 @@
> > +/* { dg-do compile { target { aarch64*-*-* } } } */
> > +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast"
> > { target { aarch64*-*-* } } } */
> > +
> > +extern int __attribute__ ((simd, const)) fn (int);
> > +
> > +const int N = 20;
> > +const float lim = 101.0;
> > +const float cst = -1.0;
> > +float tot = 0.0;
> > +
> > +float b[20];
> > +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
> > + [10 ... 19] = 100.0 }; /* Else branch. */
> > +
> > +int main (void)
> > +{
> > + #pragma omp simd
> > + for (int i = 0; i < N; i += 1)
> > + {
> > + if (a[i] > lim)
> > + b[i] = cst;
> > + else
> > + b[i] = fn (a[i]);
> > + tot += b[i];
> > + }
> > + return (0);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump {gimple_assign <gt_expr, _12, _1, 1.01e\+2,
> > NULL>} ifcvt } } */
> > +/* { dg-final { scan-tree-dump {gimple_assign <bit_not_expr, _34, _12, NULL,
> > NULL>} ifcvt } } */
> > +/* { dg-final { scan-tree-dump {gimple_call <.MASK_CALL, _3, fn, _2, _34>} ifcvt } }
> > */
> > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> > index 0346a1376c5..246a6bb5bd1 100644
> > --- a/gcc/tree-if-conv.cc
> > +++ b/gcc/tree-if-conv.cc
> > @@ -2907,6 +2907,8 @@ predicate_statements (loop_p loop)
> > This will cause the vectorizer to match the "in branch"
> > clone variants, and serves to build the mask vector
> > in a natural way. */
> > + tree mask = cond;
> > + gimple_seq stmts = NULL;
> > gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
> > tree orig_fn = gimple_call_fn (call);
> > int orig_nargs = gimple_call_num_args (call);
> > @@ -2914,7 +2916,17 @@ predicate_statements (loop_p loop)
> > args.safe_push (orig_fn);
> > for (int i = 0; i < orig_nargs; i++)
> > args.safe_push (gimple_call_arg (call, i));
> > - args.safe_push (cond);
> > + /* If `swap', we invert the mask used for the if branch for use
> > + when masking the function call. */
> > + if (swap)
> > + {
> > + tree true_val
> > + = constant_boolean_node (true, TREE_TYPE (mask));
> > + mask = gimple_build (&stmts, BIT_XOR_EXPR,
> > + TREE_TYPE (mask), mask, true_val);
> > + }
> > + gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
>
> Looks like this mirrors what is currently being done for gimple_assign, but
> you can move the gsi_insert_seq and the declaration of stmts into the if
> block since they're only used there.
>
> Otherwise looks good to me but can't approve.
OK.
The issue is also present on the 13 and 14 branches? Can you see to
backport the fix?
Thanks,
Richard.
> Thanks,
> Tamar
>
> > + args.safe_push (mask);
> >
> > /* Replace the call with a IFN_MASK_CALL that has the extra
> > condition parameter. */
> > --
> > 2.34.1
>
On 10/1/24 13:10, Richard Biener wrote:
> On Mon, Sep 30, 2024 at 8:40 PM Tamar Christina <Tamar.Christina@arm.com> wrote:
>>
>> Hi Victor,
>>
>> Thanks! This looks good to me with one minor comment:
>>
>>> -----Original Message-----
>>> From: Victor Do Nascimento <victor.donascimento@arm.com>
>>> Sent: Monday, September 30, 2024 2:34 PM
>>> To: gcc-patches@gcc.gnu.org
>>> Cc: Tamar Christina <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
>>> Victor Do Nascimento <Victor.DoNascimento@arm.com>
>>> Subject: [PATCH] middle-end: Fix ifcvt predicate generation for masked function
>>> calls
>>>
>>> Up until now, due to a latent bug in the code for the ifcvt pass,
>>> irrespective of the branch taken in a conditional statement, the
>>> original condition for the if statement was used in masking the
>>> function call.
>>>
>>> Thus, for code such as:
>>>
>>> if (a[i] > limit)
>>> b[i] = fixed_const;
>>> else
>>> b[i] = fn (a[i]);
>>>
>>> we would generate the following (wrong) if-converted tree code:
>>>
>>> _1 = a[i_1];
>>> _2 = _1 > limit;
>>> _3 = .MASK_CALL (fn, _1, _2);
>>> cstore_4 = _2 ? fixed_const : _3;
>>>
>>> as opposed to the correct expected sequence:
>>>
>>> _1 = a[i_1];
>>> _2 = _1 > limit;
>>> _3 = ~_2;
>>> _4 = .MASK_CALL (fn, _1, _3);
>>> cstore_5 = _2 ? fixed_const : _4;
>>>
>>> This patch ensures that the correct predicate mask generation is
>>> carried out such that, upon autovectorization, the correct vector
>>> lanes are selected in the vectorized function call.
>>>
>>> gcc/ChangeLog:
>>>
>>> * tree-if-conv.cc (predicate_statements): Fix handling of
>>> predicated function calls.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.dg/vect/vect-fncall-mask.c: New.
>>> ---
>>> gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c | 31 ++++++++++++++++++++
>>> gcc/tree-if-conv.cc | 14 ++++++++-
>>> 2 files changed, 44 insertions(+), 1 deletion(-)
>>> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
>>>
>>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
>>> b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
>>> new file mode 100644
>>> index 00000000000..554488e0630
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
>>> @@ -0,0 +1,31 @@
>>> +/* { dg-do compile { target { aarch64*-*-* } } } */
>>> +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast"
>>> { target { aarch64*-*-* } } } */
>>> +
>>> +extern int __attribute__ ((simd, const)) fn (int);
>>> +
>>> +const int N = 20;
>>> +const float lim = 101.0;
>>> +const float cst = -1.0;
>>> +float tot = 0.0;
>>> +
>>> +float b[20];
>>> +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
>>> + [10 ... 19] = 100.0 }; /* Else branch. */
>>> +
>>> +int main (void)
>>> +{
>>> + #pragma omp simd
>>> + for (int i = 0; i < N; i += 1)
>>> + {
>>> + if (a[i] > lim)
>>> + b[i] = cst;
>>> + else
>>> + b[i] = fn (a[i]);
>>> + tot += b[i];
>>> + }
>>> + return (0);
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump {gimple_assign <gt_expr, _12, _1, 1.01e\+2,
>>> NULL>} ifcvt } } */
>>> +/* { dg-final { scan-tree-dump {gimple_assign <bit_not_expr, _34, _12, NULL,
>>> NULL>} ifcvt } } */
>>> +/* { dg-final { scan-tree-dump {gimple_call <.MASK_CALL, _3, fn, _2, _34>} ifcvt } }
>>> */
>>> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
>>> index 0346a1376c5..246a6bb5bd1 100644
>>> --- a/gcc/tree-if-conv.cc
>>> +++ b/gcc/tree-if-conv.cc
>>> @@ -2907,6 +2907,8 @@ predicate_statements (loop_p loop)
>>> This will cause the vectorizer to match the "in branch"
>>> clone variants, and serves to build the mask vector
>>> in a natural way. */
>>> + tree mask = cond;
>>> + gimple_seq stmts = NULL;
>>> gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
>>> tree orig_fn = gimple_call_fn (call);
>>> int orig_nargs = gimple_call_num_args (call);
>>> @@ -2914,7 +2916,17 @@ predicate_statements (loop_p loop)
>>> args.safe_push (orig_fn);
>>> for (int i = 0; i < orig_nargs; i++)
>>> args.safe_push (gimple_call_arg (call, i));
>>> - args.safe_push (cond);
>>> + /* If `swap', we invert the mask used for the if branch for use
>>> + when masking the function call. */
>>> + if (swap)
>>> + {
>>> + tree true_val
>>> + = constant_boolean_node (true, TREE_TYPE (mask));
>>> + mask = gimple_build (&stmts, BIT_XOR_EXPR,
>>> + TREE_TYPE (mask), mask, true_val);
>>> + }
>>> + gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
>>
>> Looks like this mirrors what is currently being done for gimple_assign, but
>> you can move the gsi_insert_seq and the declaration of stmts into the if
>> block since they're only used there.
>>
>> Otherwise looks good to me but can't approve.
>
> OK.
>
> The issue is also present on the 13 and 14 branches? Can you see to
> backport the fix?
>
> Thanks,
> Richard.
Of course, will look at getting it backported ASAP.
Many thanks,
Victor.
>> Thanks,
>> Tamar
>>
>>> + args.safe_push (mask);
>>>
>>> /* Replace the call with a IFN_MASK_CALL that has the extra
>>> condition parameter. */
>>> --
>>> 2.34.1
>>
new file mode 100644
@@ -0,0 +1,31 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast" { target { aarch64*-*-* } } } */
+
+extern int __attribute__ ((simd, const)) fn (int);
+
+const int N = 20;
+const float lim = 101.0;
+const float cst = -1.0;
+float tot = 0.0;
+
+float b[20];
+float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
+ [10 ... 19] = 100.0 }; /* Else branch. */
+
+int main (void)
+{
+ #pragma omp simd
+ for (int i = 0; i < N; i += 1)
+ {
+ if (a[i] > lim)
+ b[i] = cst;
+ else
+ b[i] = fn (a[i]);
+ tot += b[i];
+ }
+ return (0);
+}
+
+/* { dg-final { scan-tree-dump {gimple_assign <gt_expr, _12, _1, 1.01e\+2, NULL>} ifcvt } } */
+/* { dg-final { scan-tree-dump {gimple_assign <bit_not_expr, _34, _12, NULL, NULL>} ifcvt } } */
+/* { dg-final { scan-tree-dump {gimple_call <.MASK_CALL, _3, fn, _2, _34>} ifcvt } } */
@@ -2907,6 +2907,8 @@ predicate_statements (loop_p loop)
This will cause the vectorizer to match the "in branch"
clone variants, and serves to build the mask vector
in a natural way. */
+ tree mask = cond;
+ gimple_seq stmts = NULL;
gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
tree orig_fn = gimple_call_fn (call);
int orig_nargs = gimple_call_num_args (call);
@@ -2914,7 +2916,17 @@ predicate_statements (loop_p loop)
args.safe_push (orig_fn);
for (int i = 0; i < orig_nargs; i++)
args.safe_push (gimple_call_arg (call, i));
- args.safe_push (cond);
+ /* If `swap', we invert the mask used for the if branch for use
+ when masking the function call. */
+ if (swap)
+ {
+ tree true_val
+ = constant_boolean_node (true, TREE_TYPE (mask));
+ mask = gimple_build (&stmts, BIT_XOR_EXPR,
+ TREE_TYPE (mask), mask, true_val);
+ }
+ gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
+ args.safe_push (mask);
/* Replace the call with a IFN_MASK_CALL that has the extra
condition parameter. */