middle-end: delay checking for alignment to load [PR118464]
Checks
| Context |
Check |
Description |
| linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gcc_check--master-arm |
fail
|
Test failed
|
| linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
fail
|
Test failed
|
Commit Message
Hi All,
This fixes two PRs on Early break vectorization by delaying the safety checks to
vectorizable_load when the VF, VMAT and vectype are all known.
This patch does add two new restrictions:
1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
group sizes, as they are unaligned every n % 2 iterations and so may cross
a page unwittingly.
2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if
we cannot peel for alignment, as the alignment requirement is quite large at
GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
don't support it for now.
There are other steps documented inside the code itself so that the reasoning
is next to the code.
Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
PR tree-optimization/118464
PR tree-optimization/116855
* doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
checks.
(vect_compute_data_ref_alignment): Remove alignment checks and move to
vectorizable_load.
(vect_enhance_data_refs_alignment): Add note to comment needing
investigating.
(vect_analyze_data_refs_alignment): Likewise.
(vect_supportable_dr_alignment): For group loads look at first DR.
* tree-vect-stmts.cc (get_load_store_type, vectorizable_load):
Perform safety checks for early break pfa.
* tree-vectorizer.h (dr_peeling_alignment): New.
gcc/testsuite/ChangeLog:
PR tree-optimization/118464
PR tree-optimization/116855
* gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
load type is relaxed later.
* gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
* gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
* g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
* gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
---
--
Comments
Looks like a last minute change I made accidentally blocked SVE. Fixed and re-sending:
Hi All,
This fixes two PRs on Early break vectorization by delaying the safety checks to
vectorizable_load when the VF, VMAT and vectype are all known.
This patch does add two new restrictions:
1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
group sizes, as they are unaligned every n % 2 iterations and so may cross
a page unwittingly.
2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if
we cannot peel for alignment, as the alignment requirement is quite large at
GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
don't support it for now.
There are other steps documented inside the code itself so that the reasoning
is next to the code.
Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
-m32, -m64 and no issues.
On arm-none-linux-gnueabihf some tests are failing to vectorize because it looks
like LOAD_LANES is often misaligned. I need to debug those a bit more to see if
it's the patch or backend.
For now I think the patch itself is fine.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
PR tree-optimization/118464
PR tree-optimization/116855
* doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
checks.
(vect_compute_data_ref_alignment): Remove alignment checks and move to
vectorizable_load.
(vect_enhance_data_refs_alignment): Add note to comment needing
investigating.
(vect_analyze_data_refs_alignment): Likewise.
(vect_supportable_dr_alignment): For group loads look at first DR.
* tree-vect-stmts.cc (get_load_store_type, vectorizable_load):
Perform safety checks for early break pfa.
* tree-vectorizer.h (dr_peeling_alignment): New.
gcc/testsuite/ChangeLog:
PR tree-optimization/118464
PR tree-optimization/116855
* gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
load type is relaxed later.
* gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
* gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
* g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
* gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
-- inline copy of patch --
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index dddde54a287dbdf504f540bc499e024d077746a8..85f9c49eff437221f2cea77c114064a6a603b732 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17246,7 +17246,7 @@ Maximum number of relations the oracle will register in a basic block.
Work bound when discovering transitive relations from existing relations.
@item min-pagesize
-Minimum page size for warning purposes.
+Minimum page size for warning and early break vectorization purposes.
@item openacc-kernels
Specify mode of OpenACC `kernels' constructs handling.
diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
new file mode 100644
index 0000000000000000000000000000000000000000..5e50e56ad17515e278c05c92263af120c3ab2c21
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+#include <cstddef>
+
+struct ts1 {
+ int spans[6][2];
+};
+struct gg {
+ int t[6];
+};
+ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
+ ts1 ret;
+ for (size_t i = 0; i != t; i++) {
+ if (!(i < t)) __builtin_abort();
+ ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
+ ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
index 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d93c950629f3231554 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
@@ -55,7 +55,9 @@ int main()
}
}
rephase ();
+#pragma GCC novector
for (i = 0; i < 32; ++i)
+#pragma GCC novector
for (j = 0; j < 3; ++j)
#pragma GCC novector
for (k = 0; k < 3; ++k)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
index 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..8bd85f3893f08157e640414b5b252b716a8ba93a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
@@ -5,7 +5,8 @@
/* { dg-additional-options "-O3" } */
/* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* Arm creates a group size of 3 here, which we can't support yet. */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! arm*-*-* } } } } */
typedef struct filter_list_entry {
const char *name;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
new file mode 100644
index 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8ac83ab569fc9fbde126
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+int a, b, c, d, e, f;
+short g[1];
+int main() {
+ int h;
+ while (a) {
+ while (h)
+ ;
+ for (b = 2; b; b--) {
+ while (c)
+ ;
+ f = g[a];
+ if (d)
+ break;
+ }
+ while (e)
+ ;
+ }
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
new file mode 100644
index 0000000000000000000000000000000000000000..dc771186efafe25bb65490da7a383ad7f6ceb0a7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+char string[1020];
+
+char * find(int n, char c)
+{
+ for (int i = 1; i < n; i++) {
+ if (string[i] == c)
+ return &string[i];
+ }
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
new file mode 100644
index 0000000000000000000000000000000000000000..1cf58e4f6307f3d258cf093afe6c86a998cdd216
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
@@ -0,0 +1,24 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* Alignment requirement too big, load lanes targets can't safely vectorize this. */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump-not "Peeling for alignment will be applied" "vect" { target { ! vect_load_lanes } } } } */
+
+unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < (n - 2); i+=2)
+ {
+ if (vect_a[i] > x || vect_a[i+2] > x)
+ return 1;
+
+ vect_b[i] = x;
+ vect_b[i+1] = x+1;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
new file mode 100644
index 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758ca29a5f3f9d3f6e0d1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+char string[1020];
+
+char * find(int n, char c)
+{
+ for (int i = 0; i < n; i++) {
+ if (string[i] == c)
+ return &string[i];
+ }
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
new file mode 100644
index 0000000000000000000000000000000000000000..374a051b945e97eedb9be9da423cf54b5e564d6f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+char string[1020] __attribute__((aligned(1)));
+
+char * find(int n, char c)
+{
+ for (int i = 1; i < n; i++) {
+ if (string[i] == c)
+ return &string[i];
+ }
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
+/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
new file mode 100644
index 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f257ceea1c065fcc6ae9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+char string[1020] __attribute__((aligned(1)));
+
+char * find(int n, char c)
+{
+ for (int i = 0; i < n; i++) {
+ if (string[i] == c)
+ return &string[i];
+ }
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
+/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
new file mode 100644
index 0000000000000000000000000000000000000000..32a4cee68f3418ed4fc6604ffd03ff4d8ff53d6b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
@@ -0,0 +1,23 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+unsigned test4(char x, char *vect, int n)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < n; i++)
+ {
+ if (vect[i] > x)
+ return 1;
+
+ vect[i] = x;
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "Peeling for alignment will be applied" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
new file mode 100644
index 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c64a61c97b1b6268743
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
@@ -0,0 +1,23 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+unsigned test4(char x, char *vect_a, char *vect_b, int n)
+{
+ unsigned ret = 0;
+ for (int i = 1; i < n; i++)
+ {
+ if (vect_a[i] > x || vect_b[i] > x)
+ return 1;
+
+ vect_a[i] = x;
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
new file mode 100644
index 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f50776299531824ce9c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
@@ -0,0 +1,23 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* This should be vectorizable through load_lanes and linear targets. */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < n; i+=2)
+ {
+ if (vect_a[i] > x || vect_a[i+1] > x)
+ return 1;
+
+ vect_b[i] = x;
+ vect_b[i+1] = x+1;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
new file mode 100644
index 0000000000000000000000000000000000000000..d4d87768a60803ed943f95ccbda19e7e7812bf29
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
@@ -0,0 +1,27 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "Access will not read beyond buffer to due known size buffer" "vect" } } */
+
+
+char vect_a[1025];
+char vect_b[1025];
+
+unsigned test4(char x, int n)
+{
+ unsigned ret = 0;
+ for (int i = 1; i < (n - 2); i+=2)
+ {
+ if (vect_a[i] > x || vect_a[i+1] > x)
+ return 1;
+
+ vect_b[i] = x;
+ vect_b[i+1] = x+1;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
new file mode 100644
index 0000000000000000000000000000000000000000..8419d7a9201dc8c433a238f59520dea7e35c666e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
@@ -0,0 +1,28 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* Group size is uneven, load lanes targets can't safely vectorize this. */
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Peeling for alignment will be applied" "vect" } } */
+
+
+char vect_a[1025];
+char vect_b[1025];
+
+unsigned test4(char x, int n)
+{
+ unsigned ret = 0;
+ for (int i = 1; i < (n - 2); i+=2)
+ {
+ if (vect_a[i-1] > x || vect_a[i+2] > x)
+ return 1;
+
+ vect_b[i] = x;
+ vect_b[i+1] = x+1;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
index b3f5984f682f30f79331d48a264c2cc4af3e2503..a4bab5a72e369892c65569a04ec7507e32993ce8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
@@ -42,4 +42,6 @@ main ()
return 0;
}
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
+/* Targets that make this LOAD_LANES fail due to the group misalignment. */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target vect_load_lanes } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" { target { ! vect_load_lanes } } } } */
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 6eda40267bd1382938a77826d11f20dcc959a166..d0148d4938cafc3c59edfa6a60002933f384f65b 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -731,7 +731,8 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
if (is_gimple_debug (stmt))
continue;
- stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+ stmt_vec_info stmt_vinfo
+ = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
if (!dr_ref)
continue;
@@ -748,26 +749,15 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
bounded by VF so accesses are within range. We only need to check
the reads since writes are moved to a safe place where if we get
there we know they are safe to perform. */
- if (DR_IS_READ (dr_ref)
- && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
+ if (DR_IS_READ (dr_ref))
{
- if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
- || STMT_VINFO_STRIDED_P (stmt_vinfo))
- {
- const char *msg
- = "early break not supported: cannot peel "
- "for alignment, vectorization would read out of "
- "bounds at %G";
- return opt_result::failure_at (stmt, msg, stmt);
- }
-
dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
dr_info->need_peeling_for_alignment = true;
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
- "marking DR (read) as needing peeling for "
- "alignment at %G", stmt);
+ "marking DR (read) as possibly needing peeling "
+ "for alignment at %G", stmt);
}
if (DR_IS_READ (dr_ref))
@@ -1326,9 +1316,6 @@ vect_record_base_alignments (vec_info *vinfo)
Compute the misalignment of the data reference DR_INFO when vectorizing
with VECTYPE.
- RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
- be set appropriately on failure (but is otherwise left unchanged).
-
Output:
1. initialized misalignment info for DR_INFO
@@ -1337,7 +1324,7 @@ vect_record_base_alignments (vec_info *vinfo)
static void
vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
- tree vectype, opt_result *result = nullptr)
+ tree vectype)
{
stmt_vec_info stmt_info = dr_info->stmt;
vec_base_alignments *base_alignments = &vinfo->base_alignments;
@@ -1365,66 +1352,6 @@ vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
= exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
BITS_PER_UNIT);
- /* If this DR needs peeling for alignment for correctness, we must
- ensure the target alignment is a constant power-of-two multiple of the
- amount read per vector iteration (overriding the above hook where
- necessary). */
- if (dr_info->need_peeling_for_alignment)
- {
- /* Vector size in bytes. */
- poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
-
- /* We can only peel for loops, of course. */
- gcc_checking_assert (loop_vinfo);
-
- /* Calculate the number of vectors read per vector iteration. If
- it is a power of two, multiply through to get the required
- alignment in bytes. Otherwise, fail analysis since alignment
- peeling wouldn't work in such a case. */
- poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
- if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
- num_scalars *= DR_GROUP_SIZE (stmt_info);
-
- auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
- if (!pow2p_hwi (num_vectors))
- {
- *result = opt_result::failure_at (vect_location,
- "non-power-of-two num vectors %u "
- "for DR needing peeling for "
- "alignment at %G",
- num_vectors, stmt_info->stmt);
- return;
- }
-
- safe_align *= num_vectors;
- if (maybe_gt (safe_align, 4096U))
- {
- pretty_printer pp;
- pp_wide_integer (&pp, safe_align);
- *result = opt_result::failure_at (vect_location,
- "alignment required for correctness"
- " (%s) may exceed page size",
- pp_formatted_text (&pp));
- return;
- }
-
- unsigned HOST_WIDE_INT multiple;
- if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
- || !pow2p_hwi (multiple))
- {
- if (dump_enabled_p ())
- {
- dump_printf_loc (MSG_NOTE, vect_location,
- "forcing alignment for DR from preferred (");
- dump_dec (MSG_NOTE, vector_alignment);
- dump_printf (MSG_NOTE, ") to safe align (");
- dump_dec (MSG_NOTE, safe_align);
- dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
- }
- vector_alignment = safe_align;
- }
- }
-
SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment);
/* If the main loop has peeled for alignment we have no way of knowing
@@ -2479,7 +2406,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
|| !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
loop_preheader_edge (loop))
|| loop->inner
- || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
+ || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) // <<-- ??
do_peeling = false;
struct _vect_peel_extended_info peel_for_known_alignment;
@@ -2942,12 +2869,9 @@ vect_analyze_data_refs_alignment (loop_vec_info loop_vinfo)
if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
&& DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
continue;
- opt_result res = opt_result::success ();
+
vect_compute_data_ref_alignment (loop_vinfo, dr_info,
- STMT_VINFO_VECTYPE (dr_info->stmt),
- &res);
- if (!res)
- return res;
+ STMT_VINFO_VECTYPE (dr_info->stmt));
}
}
@@ -7211,7 +7135,7 @@ vect_supportable_dr_alignment (vec_info *vinfo, dr_vec_info *dr_info,
if (misalignment == 0)
return dr_aligned;
- else if (dr_info->need_peeling_for_alignment)
+ else if (dr_peeling_alignment (stmt_info))
return dr_unaligned_unsupported;
/* For now assume all conditional loads/stores support unaligned
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 1b639ae3b174779bb838d0f7ce4886d84ecafcfe..c213ec46c4bdb456f99a43a3aff7c1af80e8769a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2597,6 +2597,69 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
return false;
}
+ auto unit_size = GET_MODE_UNIT_SIZE (TYPE_MODE (vectype));
+ /* Check if a misalignment with an unsupported peeling for early break is
+ still OK. First we need to distinguish between when we've reached here do
+ to dependency analysis or when the user has requested -mstrict-align or
+ similar. In those cases we must not override it. */
+ if (dr_peeling_alignment (stmt_info)
+ && *alignment_support_scheme == dr_unaligned_unsupported
+ /* We can only attempt to override if the misalignment is a multiple of
+ the element being loaded, otherwise peeling or versioning would have
+ really been required. */
+ && multiple_p (*misalignment, unit_size))
+ {
+ bool inbounds
+ = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
+ DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
+ /* If we have a known misalignment, and are doing a group load for a DR
+ that requires aligned access, check if the misalignment is a multiple
+ of the unit size. In which case the group load will be issued aligned
+ as long as the first load in the group is aligned.
+
+ For the non-inbound case we'd need goup_size * vectype alignment. But
+ this is quite huge and unlikely to ever happen so if we can't peel for
+ it, just reject it. */
+ if (*memory_access_type == VMAT_LOAD_STORE_LANES
+ && (STMT_VINFO_GROUPED_ACCESS (stmt_info) || slp_node))
+ {
+ /* ?? This needs updating whenever we support slp group > 1. */
+ auto group_size = DR_GROUP_SIZE (stmt_info);
+ /* For the inbound case it's enough to check for an alignment of
+ GROUP_SIZE * element size. */
+ if (inbounds
+ && (*misalignment % (group_size * unit_size)) == 0
+ && group_size % 2 == 0)
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Assuming grouped access is aligned due to load "
+ "lanes, overriding alignment scheme\n");
+
+ *alignment_support_scheme = dr_unaligned_supported;
+ }
+ }
+ /* If we have a linear access and know the misalignment and know we won't
+ read out of bounds then it's also ok if the misalignment is a multiple
+ of the element size. We get this when the loop has known misalignments
+ but the misalignments of the DRs can't be peeled to reach mutual
+ alignment. Because the misalignments are known however we also know
+ that versioning won't work. If the target does support unaligned
+ accesses and we know we are free to read the entire buffer then we
+ can allow the unaligned access if it's on elements for an early break
+ condition. */
+ else if (*memory_access_type != VMAT_GATHER_SCATTER
+ && inbounds)
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Access will not read beyond buffer to due known size "
+ "buffer, overriding alignment scheme\n");
+
+ *alignment_support_scheme = dr_unaligned_supported;
+ }
+ }
+
if (*alignment_support_scheme == dr_unaligned_unsupported)
{
if (dump_enabled_p ())
@@ -10520,6 +10583,68 @@ vectorizable_load (vec_info *vinfo,
/* Transform. */
dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info), *first_dr_info = NULL;
+
+ /* Check if we support the operation if early breaks are needed. */
+ if (loop_vinfo
+ && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+ && (memory_access_type == VMAT_GATHER_SCATTER
+ || memory_access_type == VMAT_STRIDED_SLP))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "early break not supported: cannot peel for "
+ "alignment. With non-contiguous memory vectorization"
+ " could read out of bounds at %G ",
+ STMT_VINFO_STMT (stmt_info));
+ return false;
+ }
+
+ /* If this DR needs peeling for alignment for correctness, we must
+ ensure the target alignment is a constant power-of-two multiple of the
+ amount read per vector iteration (overriding the above hook where
+ necessary). We don't support group loads, which would have been filterd
+ out in the check above. For now it means we don't have to look at the
+ group info and just check that the load is continguous and can just use
+ dr_info. For known size buffers we still need to check if the vector
+ is misaligned and if so we need to peel. */
+ if (costing_p && dr_info->need_peeling_for_alignment)
+ {
+ /* Vector size in bytes. */
+ poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
+
+ /* We can only peel for loops, of course. */
+ gcc_checking_assert (loop_vinfo);
+
+ auto num_vectors = ncopies;
+ if (!pow2p_hwi (num_vectors))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "non-power-of-two num vectors %u "
+ "for DR needing peeling for "
+ "alignment at %G",
+ num_vectors, STMT_VINFO_STMT (stmt_info));
+ return false;
+ }
+
+ safe_align *= num_vectors;
+ if (known_gt (safe_align, (unsigned)param_min_pagesize)
+ /* For VLA we don't support PFA when any unrolling needs to be done.
+ We could though but too much work for GCC 15. For now we assume a
+ vector is not larger than a page size so allow single loads. */
+ && (num_vectors > 1 && !vf.is_constant ()))
+ {
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "alignment required for correctness (");
+ dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
+ dump_printf (MSG_NOTE, ") may exceed page size\n");
+ }
+ return false;
+ }
+ }
+
ensure_base_align (dr_info);
if (memory_access_type == VMAT_INVARIANT)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 44d3a1d46c409597f1e67a275211a1da414fc7c7..6ef97ee84336ac9a0e1922145f3d418436d709f4 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1998,6 +1998,19 @@ dr_target_alignment (dr_vec_info *dr_info)
}
#define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
+/* Return if the stmt_vec_info requires peeling for alignment. */
+inline bool
+dr_peeling_alignment (stmt_vec_info stmt_info)
+{
+ dr_vec_info *dr_info;
+ if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
+ dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
+ else
+ dr_info = STMT_VINFO_DR_INFO (stmt_info);
+
+ return dr_info->need_peeling_for_alignment;
+}
+
inline void
set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
{
On Tue, 4 Feb 2025, Tamar Christina wrote:
> Looks like a last minute change I made accidentally blocked SVE. Fixed and re-sending:
>
> Hi All,
>
> This fixes two PRs on Early break vectorization by delaying the safety checks to
> vectorizable_load when the VF, VMAT and vectype are all known.
>
> This patch does add two new restrictions:
>
> 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
> group sizes, as they are unaligned every n % 2 iterations and so may cross
> a page unwittingly.
>
> 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if
> we cannot peel for alignment, as the alignment requirement is quite large at
> GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
> don't support it for now.
>
> There are other steps documented inside the code itself so that the reasoning
> is next to the code.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
>
> On arm-none-linux-gnueabihf some tests are failing to vectorize because it looks
> like LOAD_LANES is often misaligned. I need to debug those a bit more to see if
> it's the patch or backend.
>
> For now I think the patch itself is fine.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR tree-optimization/118464
> PR tree-optimization/116855
> * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
> * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
> checks.
> (vect_compute_data_ref_alignment): Remove alignment checks and move to
> vectorizable_load.
> (vect_enhance_data_refs_alignment): Add note to comment needing
> investigating.
> (vect_analyze_data_refs_alignment): Likewise.
> (vect_supportable_dr_alignment): For group loads look at first DR.
> * tree-vect-stmts.cc (get_load_store_type, vectorizable_load):
> Perform safety checks for early break pfa.
> * tree-vectorizer.h (dr_peeling_alignment): New.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/118464
> PR tree-optimization/116855
> * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
> load type is relaxed later.
> * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
> * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
>
> -- inline copy of patch --
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index dddde54a287dbdf504f540bc499e024d077746a8..85f9c49eff437221f2cea77c114064a6a603b732 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17246,7 +17246,7 @@ Maximum number of relations the oracle will register in a basic block.
> Work bound when discovering transitive relations from existing relations.
>
> @item min-pagesize
> -Minimum page size for warning purposes.
> +Minimum page size for warning and early break vectorization purposes.
>
> @item openacc-kernels
> Specify mode of OpenACC `kernels' constructs handling.
> diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> new file mode 100644
> index 0000000000000000000000000000000000000000..5e50e56ad17515e278c05c92263af120c3ab2c21
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include <cstddef>
> +
> +struct ts1 {
> + int spans[6][2];
> +};
> +struct gg {
> + int t[6];
> +};
> +ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
> + ts1 ret;
> + for (size_t i = 0; i != t; i++) {
> + if (!(i < t)) __builtin_abort();
> + ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
> + ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
> + }
> + return ret;
> +}
This one ICEd at some point? Otherwise it doesn't test anything?
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> index 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d93c950629f3231554 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> @@ -55,7 +55,9 @@ int main()
> }
> }
> rephase ();
> +#pragma GCC novector
> for (i = 0; i < 32; ++i)
> +#pragma GCC novector
> for (j = 0; j < 3; ++j)
> #pragma GCC novector
> for (k = 0; k < 3; ++k)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> index 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..8bd85f3893f08157e640414b5b252b716a8ba93a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> @@ -5,7 +5,8 @@
> /* { dg-additional-options "-O3" } */
> /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* Arm creates a group size of 3 here, which we can't support yet. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! arm*-*-* } } } } */
>
> typedef struct filter_list_entry {
> const char *name;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8ac83ab569fc9fbde126
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +int a, b, c, d, e, f;
> +short g[1];
> +int main() {
> + int h;
> + while (a) {
> + while (h)
> + ;
> + for (b = 2; b; b--) {
> + while (c)
> + ;
> + f = g[a];
> + if (d)
> + break;
> + }
> + while (e)
> + ;
> + }
> + return 0;
> +}
again, what does this test?
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..dc771186efafe25bb65490da7a383ad7f6ceb0a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020];
> +
> +char * find(int n, char c)
> +{
> + for (int i = 1; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..1cf58e4f6307f3d258cf093afe6c86a998cdd216
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> @@ -0,0 +1,24 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* Alignment requirement too big, load lanes targets can't safely vectorize this. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump-not "Peeling for alignment will be applied" "vect" { target { ! vect_load_lanes } } } } */
> +
Any reason to not use "Alignment of access forced using peeling" like in
the other testcases?
> +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < (n - 2); i+=2)
> + {
> + if (vect_a[i] > x || vect_a[i+2] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758ca29a5f3f9d3f6e0d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020];
> +
> +char * find(int n, char c)
> +{
> + for (int i = 0; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..374a051b945e97eedb9be9da423cf54b5e564d6f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020] __attribute__((aligned(1)));
> +
> +char * find(int n, char c)
> +{
> + for (int i = 1; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f257ceea1c065fcc6ae9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020] __attribute__((aligned(1)));
> +
> +char * find(int n, char c)
> +{
> + for (int i = 0; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..32a4cee68f3418ed4fc6604ffd03ff4d8ff53d6b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char *vect, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < n; i++)
> + {
> + if (vect[i] > x)
> + return 1;
> +
> + vect[i] = x;
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "Peeling for alignment will be applied" "vect" } } */
See above.
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c64a61c97b1b6268743
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char *vect_a, char *vect_b, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 1; i < n; i++)
> + {
> + if (vect_a[i] > x || vect_b[i] > x)
> + return 1;
> +
> + vect_a[i] = x;
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f50776299531824ce9c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* This should be vectorizable through load_lanes and linear targets. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < n; i+=2)
> + {
> + if (vect_a[i] > x || vect_a[i+1] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..d4d87768a60803ed943f95ccbda19e7e7812bf29
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> @@ -0,0 +1,27 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "Access will not read beyond buffer to due known size buffer" "vect" } } */
> +
> +
> +char vect_a[1025];
> +char vect_b[1025];
> +
> +unsigned test4(char x, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 1; i < (n - 2); i+=2)
> + {
> + if (vect_a[i] > x || vect_a[i+1] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..8419d7a9201dc8c433a238f59520dea7e35c666e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> @@ -0,0 +1,28 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* Group size is uneven, load lanes targets can't safely vectorize this. */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Peeling for alignment will be applied" "vect" } } */
Likewise.
> +
> +
> +char vect_a[1025];
> +char vect_b[1025];
> +
> +unsigned test4(char x, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 1; i < (n - 2); i+=2)
> + {
> + if (vect_a[i-1] > x || vect_a[i+2] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> index b3f5984f682f30f79331d48a264c2cc4af3e2503..a4bab5a72e369892c65569a04ec7507e32993ce8 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> @@ -42,4 +42,6 @@ main ()
> return 0;
> }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> +/* Targets that make this LOAD_LANES fail due to the group misalignment. */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target vect_load_lanes } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" { target { ! vect_load_lanes } } } } */
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 6eda40267bd1382938a77826d11f20dcc959a166..d0148d4938cafc3c59edfa6a60002933f384f65b 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> if (is_gimple_debug (stmt))
> continue;
>
> - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> + stmt_vec_info stmt_vinfo
> + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> if (!dr_ref)
> continue;
> @@ -748,26 +749,15 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> bounded by VF so accesses are within range. We only need to check
> the reads since writes are moved to a safe place where if we get
> there we know they are safe to perform. */
> - if (DR_IS_READ (dr_ref)
> - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> + if (DR_IS_READ (dr_ref))
> {
> - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> - {
> - const char *msg
> - = "early break not supported: cannot peel "
> - "for alignment, vectorization would read out of "
> - "bounds at %G";
> - return opt_result::failure_at (stmt, msg, stmt);
> - }
> -
> dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> dr_info->need_peeling_for_alignment = true;
You're setting the flag on any DR of a DR group here ...
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> - "marking DR (read) as needing peeling for "
> - "alignment at %G", stmt);
> + "marking DR (read) as possibly needing peeling "
> + "for alignment at %G", stmt);
> }
>
> if (DR_IS_READ (dr_ref))
> @@ -1326,9 +1316,6 @@ vect_record_base_alignments (vec_info *vinfo)
> Compute the misalignment of the data reference DR_INFO when vectorizing
> with VECTYPE.
>
> - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
> - be set appropriately on failure (but is otherwise left unchanged).
> -
> Output:
> 1. initialized misalignment info for DR_INFO
>
> @@ -1337,7 +1324,7 @@ vect_record_base_alignments (vec_info *vinfo)
>
> static void
> vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> - tree vectype, opt_result *result = nullptr)
> + tree vectype)
> {
> stmt_vec_info stmt_info = dr_info->stmt;
> vec_base_alignments *base_alignments = &vinfo->base_alignments;
> @@ -1365,66 +1352,6 @@ vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> BITS_PER_UNIT);
>
> - /* If this DR needs peeling for alignment for correctness, we must
> - ensure the target alignment is a constant power-of-two multiple of the
> - amount read per vector iteration (overriding the above hook where
> - necessary). */
> - if (dr_info->need_peeling_for_alignment)
> - {
> - /* Vector size in bytes. */
> - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> -
> - /* We can only peel for loops, of course. */
> - gcc_checking_assert (loop_vinfo);
> -
> - /* Calculate the number of vectors read per vector iteration. If
> - it is a power of two, multiply through to get the required
> - alignment in bytes. Otherwise, fail analysis since alignment
> - peeling wouldn't work in such a case. */
> - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> - if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> - num_scalars *= DR_GROUP_SIZE (stmt_info);
> -
> - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> - if (!pow2p_hwi (num_vectors))
> - {
> - *result = opt_result::failure_at (vect_location,
> - "non-power-of-two num vectors %u "
> - "for DR needing peeling for "
> - "alignment at %G",
> - num_vectors, stmt_info->stmt);
> - return;
> - }
> -
> - safe_align *= num_vectors;
> - if (maybe_gt (safe_align, 4096U))
> - {
> - pretty_printer pp;
> - pp_wide_integer (&pp, safe_align);
> - *result = opt_result::failure_at (vect_location,
> - "alignment required for correctness"
> - " (%s) may exceed page size",
> - pp_formatted_text (&pp));
> - return;
> - }
> -
> - unsigned HOST_WIDE_INT multiple;
> - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> - || !pow2p_hwi (multiple))
> - {
> - if (dump_enabled_p ())
> - {
> - dump_printf_loc (MSG_NOTE, vect_location,
> - "forcing alignment for DR from preferred (");
> - dump_dec (MSG_NOTE, vector_alignment);
> - dump_printf (MSG_NOTE, ") to safe align (");
> - dump_dec (MSG_NOTE, safe_align);
> - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> - }
> - vector_alignment = safe_align;
> - }
> - }
> -
> SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment);
>
> /* If the main loop has peeled for alignment we have no way of knowing
> @@ -2479,7 +2406,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
> || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> loop_preheader_edge (loop))
> || loop->inner
> - || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> + || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) // <<-- ??
Spurious change(?)
> do_peeling = false;
>
> struct _vect_peel_extended_info peel_for_known_alignment;
> @@ -2942,12 +2869,9 @@ vect_analyze_data_refs_alignment (loop_vec_info loop_vinfo)
> if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
> continue;
> - opt_result res = opt_result::success ();
> +
> vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> - STMT_VINFO_VECTYPE (dr_info->stmt),
> - &res);
> - if (!res)
> - return res;
> + STMT_VINFO_VECTYPE (dr_info->stmt));
> }
> }
>
> @@ -7211,7 +7135,7 @@ vect_supportable_dr_alignment (vec_info *vinfo, dr_vec_info *dr_info,
>
> if (misalignment == 0)
> return dr_aligned;
> - else if (dr_info->need_peeling_for_alignment)
> + else if (dr_peeling_alignment (stmt_info))
> return dr_unaligned_unsupported;
>
> /* For now assume all conditional loads/stores support unaligned
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 1b639ae3b174779bb838d0f7ce4886d84ecafcfe..c213ec46c4bdb456f99a43a3aff7c1af80e8769a 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2597,6 +2597,69 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
> return false;
> }
>
> + auto unit_size = GET_MODE_UNIT_SIZE (TYPE_MODE (vectype));
> + /* Check if a misalignment with an unsupported peeling for early break is
> + still OK. First we need to distinguish between when we've reached here do
due to
> + to dependency analysis or when the user has requested -mstrict-align or
> + similar. In those cases we must not override it. */
> + if (dr_peeling_alignment (stmt_info)
> + && *alignment_support_scheme == dr_unaligned_unsupported
> + /* We can only attempt to override if the misalignment is a multiple of
> + the element being loaded, otherwise peeling or versioning would have
> + really been required. */
> + && multiple_p (*misalignment, unit_size))
Hmm, but wouldn't that mean dr_info->target_alignment is bigger
than the vector size? Does that ever happen? I'll note that
*alignment_support_scheme == dr_aligned means alignment according
to dr_info->target_alignment which might be actually less than
the vector size (we've noticed this recently in other PRs), so
we might want to make sure that dr_info->target_alignment is
at least vector size when ->need_peeling_for_alignment I think.
So - which of the testcases gets you here? I think we
set *misalignment to be modulo target_alignment, never larger than that.
> + {
> + bool inbounds
> + = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
> + DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
> + /* If we have a known misalignment, and are doing a group load for a DR
> + that requires aligned access, check if the misalignment is a multiple
> + of the unit size. In which case the group load will be issued aligned
> + as long as the first load in the group is aligned.
> +
> + For the non-inbound case we'd need goup_size * vectype alignment. But
> + this is quite huge and unlikely to ever happen so if we can't peel for
> + it, just reject it. */
> + if (*memory_access_type == VMAT_LOAD_STORE_LANES
> + && (STMT_VINFO_GROUPED_ACCESS (stmt_info) || slp_node))
> + {
> + /* ?? This needs updating whenever we support slp group > 1. */
> + auto group_size = DR_GROUP_SIZE (stmt_info);
> + /* For the inbound case it's enough to check for an alignment of
> + GROUP_SIZE * element size. */
> + if (inbounds
> + && (*misalignment % (group_size * unit_size)) == 0
> + && group_size % 2 == 0)
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "Assuming grouped access is aligned due to load "
> + "lanes, overriding alignment scheme\n");
> +
> + *alignment_support_scheme = dr_unaligned_supported;
> + }
> + }
> + /* If we have a linear access and know the misalignment and know we won't
> + read out of bounds then it's also ok if the misalignment is a multiple
> + of the element size. We get this when the loop has known misalignments
> + but the misalignments of the DRs can't be peeled to reach mutual
> + alignment. Because the misalignments are known however we also know
> + that versioning won't work. If the target does support unaligned
> + accesses and we know we are free to read the entire buffer then we
> + can allow the unaligned access if it's on elements for an early break
> + condition. */
> + else if (*memory_access_type != VMAT_GATHER_SCATTER
> + && inbounds)
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "Access will not read beyond buffer to due known size "
> + "buffer, overriding alignment scheme\n");
> +
> + *alignment_support_scheme = dr_unaligned_supported;
> + }
> + }
> +
> if (*alignment_support_scheme == dr_unaligned_unsupported)
> {
> if (dump_enabled_p ())
> @@ -10520,6 +10583,68 @@ vectorizable_load (vec_info *vinfo,
> /* Transform. */
>
> dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info), *first_dr_info = NULL;
> +
> + /* Check if we support the operation if early breaks are needed. */
> + if (loop_vinfo
> + && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> + && (memory_access_type == VMAT_GATHER_SCATTER
> + || memory_access_type == VMAT_STRIDED_SLP))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "early break not supported: cannot peel for "
> + "alignment. With non-contiguous memory vectorization"
> + " could read out of bounds at %G ",
> + STMT_VINFO_STMT (stmt_info));
Hmm, this is now more restrictive than the original check in
vect_analyze_early_break_dependences because it covers all accesses.
The simplest fix would be to leave it there.
> + return false;
> + }
> +
> + /* If this DR needs peeling for alignment for correctness, we must
> + ensure the target alignment is a constant power-of-two multiple of the
> + amount read per vector iteration (overriding the above hook where
> + necessary). We don't support group loads, which would have been filterd
> + out in the check above. For now it means we don't have to look at the
> + group info and just check that the load is continguous and can just use
> + dr_info. For known size buffers we still need to check if the vector
> + is misaligned and if so we need to peel. */
> + if (costing_p && dr_info->need_peeling_for_alignment)
dr_peeling_alignment ()
I think this belongs in get_load_store_type, specifically I think
we want to only allow VMAT_CONTIGUOUS for need_peeling_for_alignment refs
and by construction dr_aligned should ensure type size alignment
(by altering target_alignment for the respective refs). Given that
both VF and vector type are fixed at vect_analyze_data_refs_alignment
time we should be able to compute the appropriate target alignment
there (I'm not sure we support peeling of more than VF-1 iterations
though).
> + {
> + /* Vector size in bytes. */
> + poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> +
> + /* We can only peel for loops, of course. */
> + gcc_checking_assert (loop_vinfo);
> +
> + auto num_vectors = ncopies;
> + if (!pow2p_hwi (num_vectors))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "non-power-of-two num vectors %u "
> + "for DR needing peeling for "
> + "alignment at %G",
> + num_vectors, STMT_VINFO_STMT (stmt_info));
> + return false;
> + }
> +
> + safe_align *= num_vectors;
> + if (known_gt (safe_align, (unsigned)param_min_pagesize)
> + /* For VLA we don't support PFA when any unrolling needs to be done.
> + We could though but too much work for GCC 15. For now we assume a
> + vector is not larger than a page size so allow single loads. */
> + && (num_vectors > 1 && !vf.is_constant ()))
> + {
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "alignment required for correctness (");
> + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> + dump_printf (MSG_NOTE, ") may exceed page size\n");
> + }
> + return false;
> + }
> + }
> +
> ensure_base_align (dr_info);
>
> if (memory_access_type == VMAT_INVARIANT)
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 44d3a1d46c409597f1e67a275211a1da414fc7c7..6ef97ee84336ac9a0e1922145f3d418436d709f4 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1998,6 +1998,19 @@ dr_target_alignment (dr_vec_info *dr_info)
> }
> #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
>
> +/* Return if the stmt_vec_info requires peeling for alignment. */
> +inline bool
> +dr_peeling_alignment (stmt_vec_info stmt_info)
> +{
> + dr_vec_info *dr_info;
> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
... but checking it on the first only.
> + else
> + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +
> + return dr_info->need_peeling_for_alignment;
> +}
> +
> inline void
> set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> {
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Tuesday, February 4, 2025 12:49 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]
>
> On Tue, 4 Feb 2025, Tamar Christina wrote:
>
> > Looks like a last minute change I made accidentally blocked SVE. Fixed and re-
> sending:
> >
> > Hi All,
> >
> > This fixes two PRs on Early break vectorization by delaying the safety checks to
> > vectorizable_load when the VF, VMAT and vectype are all known.
> >
> > This patch does add two new restrictions:
> >
> > 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
> > group sizes, as they are unaligned every n % 2 iterations and so may cross
> > a page unwittingly.
> >
> > 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization
> if
> > we cannot peel for alignment, as the alignment requirement is quite large at
> > GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
> > don't support it for now.
> >
> > There are other steps documented inside the code itself so that the reasoning
> > is next to the code.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > On arm-none-linux-gnueabihf some tests are failing to vectorize because it looks
> > like LOAD_LANES is often misaligned. I need to debug those a bit more to see if
> > it's the patch or backend.
> >
> > For now I think the patch itself is fine.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/118464
> > PR tree-optimization/116855
> > * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
> > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
> > checks.
> > (vect_compute_data_ref_alignment): Remove alignment checks and move
> to
> > vectorizable_load.
> > (vect_enhance_data_refs_alignment): Add note to comment needing
> > investigating.
> > (vect_analyze_data_refs_alignment): Likewise.
> > (vect_supportable_dr_alignment): For group loads look at first DR.
> > * tree-vect-stmts.cc (get_load_store_type, vectorizable_load):
> > Perform safety checks for early break pfa.
> > * tree-vectorizer.h (dr_peeling_alignment): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/118464
> > PR tree-optimization/116855
> > * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
> > load type is relaxed later.
> > * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> > * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
> > * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> > * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
> >
> > -- inline copy of patch --
> >
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index
> dddde54a287dbdf504f540bc499e024d077746a8..85f9c49eff437221f2cea77c1
> 14064a6a603b732 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -17246,7 +17246,7 @@ Maximum number of relations the oracle will
> register in a basic block.
> > Work bound when discovering transitive relations from existing relations.
> >
> > @item min-pagesize
> > -Minimum page size for warning purposes.
> > +Minimum page size for warning and early break vectorization purposes.
> >
> > @item openacc-kernels
> > Specify mode of OpenACC `kernels' constructs handling.
> > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..5e50e56ad17515e278c05c
> 92263af120c3ab2c21
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +#include <cstddef>
> > +
> > +struct ts1 {
> > + int spans[6][2];
> > +};
> > +struct gg {
> > + int t[6];
> > +};
> > +ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
> > + ts1 ret;
> > + for (size_t i = 0; i != t; i++) {
> > + if (!(i < t)) __builtin_abort();
> > + ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
> > + ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
> > + }
> > + return ret;
> > +}
>
> This one ICEd at some point? Otherwise it doesn't test anything?
>
Yes. It and the one below it were ICEng and are not necessarily vectorizable
code as they were ICEing early.
> > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > index
> 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d
> 93c950629f3231554 100644
> > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > @@ -55,7 +55,9 @@ int main()
> > }
> > }
> > rephase ();
> > +#pragma GCC novector
> > for (i = 0; i < 32; ++i)
> > +#pragma GCC novector
> > for (j = 0; j < 3; ++j)
> > #pragma GCC novector
> > for (k = 0; k < 3; ++k)
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > index
> 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..8bd85f3893f08157e640414b
> 5b252b716a8ba93a 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > @@ -5,7 +5,8 @@
> > /* { dg-additional-options "-O3" } */
> > /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
> >
> > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* Arm creates a group size of 3 here, which we can't support yet. */
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! arm*-*-* }
> } } } */
> >
> > typedef struct filter_list_entry {
> > const char *name;
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8a
> c83ab569fc9fbde126
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +int a, b, c, d, e, f;
> > +short g[1];
> > +int main() {
> > + int h;
> > + while (a) {
> > + while (h)
> > + ;
> > + for (b = 2; b; b--) {
> > + while (c)
> > + ;
> > + f = g[a];
> > + if (d)
> > + break;
> > + }
> > + while (e)
> > + ;
> > + }
> > + return 0;
> > +}
>
> again, what does this test?
>
That we don't ICE during data ref analysis.
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..dc771186efafe25bb65490
> da7a383ad7f6ceb0a7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +char string[1020];
> > +
> > +char * find(int n, char c)
> > +{
> > + for (int i = 1; i < n; i++) {
> > + if (string[i] == c)
> > + return &string[i];
> > + }
> > + return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect"
> } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..1cf58e4f6307f3d258cf093
> afe6c86a998cdd216
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* Alignment requirement too big, load lanes targets can't safely vectorize this.
> */
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { !
> vect_load_lanes } } } } */
> > +/* { dg-final { scan-tree-dump-not "Peeling for alignment will be applied" "vect"
> { target { ! vect_load_lanes } } } } */
> > +
>
> Any reason to not use "Alignment of access forced using peeling" like in
> the other testcases?
Didn't know there was a preference.. will update.
> > +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < (n - 2); i+=2)
> > + {
> > + if (vect_a[i] > x || vect_a[i+2] > x)
> > + return 1;
> > +
> > + vect_b[i] = x;
> > + vect_b[i+1] = x+1;
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758
> ca29a5f3f9d3f6e0d1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +char string[1020];
> > +
> > +char * find(int n, char c)
> > +{
> > + for (int i = 0; i < n; i++) {
> > + if (string[i] == c)
> > + return &string[i];
> > + }
> > + return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..374a051b945e97eedb9be
> 9da423cf54b5e564d6f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +char string[1020] __attribute__((aligned(1)));
> > +
> > +char * find(int n, char c)
> > +{
> > + for (int i = 1; i < n; i++) {
> > + if (string[i] == c)
> > + return &string[i];
> > + }
> > + return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect"
> } } */
> > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f25
> 7ceea1c065fcc6ae9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +char string[1020] __attribute__((aligned(1)));
> > +
> > +char * find(int n, char c)
> > +{
> > + for (int i = 0; i < n; i++) {
> > + if (string[i] == c)
> > + return &string[i];
> > + }
> > + return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> "vect" } } */
> > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..32a4cee68f3418ed4fc660
> 4ffd03ff4d8ff53d6b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +unsigned test4(char x, char *vect, int n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < n; i++)
> > + {
> > + if (vect[i] > x)
> > + return 1;
> > +
> > + vect[i] = x;
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Peeling for alignment will be applied" "vect" } } */
>
> See above.
>
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c
> 64a61c97b1b6268743
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +unsigned test4(char x, char *vect_a, char *vect_b, int n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 1; i < n; i++)
> > + {
> > + if (vect_a[i] > x || vect_b[i] > x)
> > + return 1;
> > +
> > + vect_a[i] = x;
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" "vect" }
> } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f
> 50776299531824ce9c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* This should be vectorizable through load_lanes and linear targets. */
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < n; i+=2)
> > + {
> > + if (vect_a[i] > x || vect_a[i+1] > x)
> > + return 1;
> > +
> > + vect_b[i] = x;
> > + vect_b[i+1] = x+1;
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..d4d87768a60803ed943f9
> 5ccbda19e7e7812bf29
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-tree-dump "Access will not read beyond buffer to due known
> size buffer" "vect" } } */
> > +
> > +
> > +char vect_a[1025];
> > +char vect_b[1025];
> > +
> > +unsigned test4(char x, int n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 1; i < (n - 2); i+=2)
> > + {
> > + if (vect_a[i] > x || vect_a[i+1] > x)
> > + return 1;
> > +
> > + vect_b[i] = x;
> > + vect_b[i+1] = x+1;
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..8419d7a9201dc8c433a23
> 8f59520dea7e35c666e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* Group size is uneven, load lanes targets can't safely vectorize this. */
> > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-tree-dump-not "Peeling for alignment will be applied" "vect"
> } } */
>
> Likewise.
>
> > +
> > +
> > +char vect_a[1025];
> > +char vect_b[1025];
> > +
> > +unsigned test4(char x, int n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 1; i < (n - 2); i+=2)
> > + {
> > + if (vect_a[i-1] > x || vect_a[i+2] > x)
> > + return 1;
> > +
> > + vect_b[i] = x;
> > + vect_b[i+1] = x+1;
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > index
> b3f5984f682f30f79331d48a264c2cc4af3e2503..a4bab5a72e369892c65569a0
> 4ec7507e32993ce8 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > @@ -42,4 +42,6 @@ main ()
> > return 0;
> > }
> >
> > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } }
> */
> > +/* Targets that make this LOAD_LANES fail due to the group misalignment. */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" {
> target vect_load_lanes } } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" {
> target { ! vect_load_lanes } } } } */
> > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > index
> 6eda40267bd1382938a77826d11f20dcc959a166..d0148d4938cafc3c59edfa6a
> 60002933f384f65b 100644
> > --- a/gcc/tree-vect-data-refs.cc
> > +++ b/gcc/tree-vect-data-refs.cc
> > @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences (loop_vec_info
> loop_vinfo)
> > if (is_gimple_debug (stmt))
> > continue;
> >
> > - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > + stmt_vec_info stmt_vinfo
> > + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> > auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > if (!dr_ref)
> > continue;
> > @@ -748,26 +749,15 @@ vect_analyze_early_break_dependences
> (loop_vec_info loop_vinfo)
> > bounded by VF so accesses are within range. We only need to check
> > the reads since writes are moved to a safe place where if we get
> > there we know they are safe to perform. */
> > - if (DR_IS_READ (dr_ref)
> > - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > + if (DR_IS_READ (dr_ref))
> > {
> > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> > - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> > - {
> > - const char *msg
> > - = "early break not supported: cannot peel "
> > - "for alignment, vectorization would read out of "
> > - "bounds at %G";
> > - return opt_result::failure_at (stmt, msg, stmt);
> > - }
> > -
> > dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> > dr_info->need_peeling_for_alignment = true;
>
> You're setting the flag on any DR of a DR group here ...
>
> > if (dump_enabled_p ())
> > dump_printf_loc (MSG_NOTE, vect_location,
> > - "marking DR (read) as needing peeling for "
> > - "alignment at %G", stmt);
> > + "marking DR (read) as possibly needing peeling "
> > + "for alignment at %G", stmt);
> > }
> >
> > if (DR_IS_READ (dr_ref))
> > @@ -1326,9 +1316,6 @@ vect_record_base_alignments (vec_info *vinfo)
> > Compute the misalignment of the data reference DR_INFO when vectorizing
> > with VECTYPE.
> >
> > - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
> > - be set appropriately on failure (but is otherwise left unchanged).
> > -
> > Output:
> > 1. initialized misalignment info for DR_INFO
> >
> > @@ -1337,7 +1324,7 @@ vect_record_base_alignments (vec_info *vinfo)
> >
> > static void
> > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> > - tree vectype, opt_result *result = nullptr)
> > + tree vectype)
> > {
> > stmt_vec_info stmt_info = dr_info->stmt;
> > vec_base_alignments *base_alignments = &vinfo->base_alignments;
> > @@ -1365,66 +1352,6 @@ vect_compute_data_ref_alignment (vec_info *vinfo,
> dr_vec_info *dr_info,
> > = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> > BITS_PER_UNIT);
> >
> > - /* If this DR needs peeling for alignment for correctness, we must
> > - ensure the target alignment is a constant power-of-two multiple of the
> > - amount read per vector iteration (overriding the above hook where
> > - necessary). */
> > - if (dr_info->need_peeling_for_alignment)
> > - {
> > - /* Vector size in bytes. */
> > - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> > -
> > - /* We can only peel for loops, of course. */
> > - gcc_checking_assert (loop_vinfo);
> > -
> > - /* Calculate the number of vectors read per vector iteration. If
> > - it is a power of two, multiply through to get the required
> > - alignment in bytes. Otherwise, fail analysis since alignment
> > - peeling wouldn't work in such a case. */
> > - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > - if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > - num_scalars *= DR_GROUP_SIZE (stmt_info);
> > -
> > - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> > - if (!pow2p_hwi (num_vectors))
> > - {
> > - *result = opt_result::failure_at (vect_location,
> > - "non-power-of-two num vectors %u "
> > - "for DR needing peeling for "
> > - "alignment at %G",
> > - num_vectors, stmt_info->stmt);
> > - return;
> > - }
> > -
> > - safe_align *= num_vectors;
> > - if (maybe_gt (safe_align, 4096U))
> > - {
> > - pretty_printer pp;
> > - pp_wide_integer (&pp, safe_align);
> > - *result = opt_result::failure_at (vect_location,
> > - "alignment required for correctness"
> > - " (%s) may exceed page size",
> > - pp_formatted_text (&pp));
> > - return;
> > - }
> > -
> > - unsigned HOST_WIDE_INT multiple;
> > - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> > - || !pow2p_hwi (multiple))
> > - {
> > - if (dump_enabled_p ())
> > - {
> > - dump_printf_loc (MSG_NOTE, vect_location,
> > - "forcing alignment for DR from preferred (");
> > - dump_dec (MSG_NOTE, vector_alignment);
> > - dump_printf (MSG_NOTE, ") to safe align (");
> > - dump_dec (MSG_NOTE, safe_align);
> > - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> > - }
> > - vector_alignment = safe_align;
> > - }
> > - }
> > -
> > SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment);
> >
> > /* If the main loop has peeled for alignment we have no way of knowing
> > @@ -2479,7 +2406,7 @@ vect_enhance_data_refs_alignment (loop_vec_info
> loop_vinfo)
> > || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> > loop_preheader_edge (loop))
> > || loop->inner
> > - || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > + || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) // <<-- ??
>
> Spurious change(?)
I was actually wondering why this is here. I'm curious why we're saying we can't
peel for alignment on an inverted loop.
>
> > do_peeling = false;
> >
> > struct _vect_peel_extended_info peel_for_known_alignment;
> > @@ -2942,12 +2869,9 @@ vect_analyze_data_refs_alignment (loop_vec_info
> loop_vinfo)
> > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
> > continue;
> > - opt_result res = opt_result::success ();
> > +
> > vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> > - STMT_VINFO_VECTYPE (dr_info->stmt),
> > - &res);
> > - if (!res)
> > - return res;
> > + STMT_VINFO_VECTYPE (dr_info->stmt));
> > }
> > }
> >
> > @@ -7211,7 +7135,7 @@ vect_supportable_dr_alignment (vec_info *vinfo,
> dr_vec_info *dr_info,
> >
> > if (misalignment == 0)
> > return dr_aligned;
> > - else if (dr_info->need_peeling_for_alignment)
> > + else if (dr_peeling_alignment (stmt_info))
> > return dr_unaligned_unsupported;
> >
> > /* For now assume all conditional loads/stores support unaligned
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index
> 1b639ae3b174779bb838d0f7ce4886d84ecafcfe..c213ec46c4bdb456f99a43a3a
> ff7c1af80e8769a 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -2597,6 +2597,69 @@ get_load_store_type (vec_info *vinfo,
> stmt_vec_info stmt_info,
> > return false;
> > }
> >
> > + auto unit_size = GET_MODE_UNIT_SIZE (TYPE_MODE (vectype));
> > + /* Check if a misalignment with an unsupported peeling for early break is
> > + still OK. First we need to distinguish between when we've reached here do
>
> due to
>
> > + to dependency analysis or when the user has requested -mstrict-align or
> > + similar. In those cases we must not override it. */
> > + if (dr_peeling_alignment (stmt_info)
> > + && *alignment_support_scheme == dr_unaligned_unsupported
> > + /* We can only attempt to override if the misalignment is a multiple of
> > + the element being loaded, otherwise peeling or versioning would have
> > + really been required. */
> > + && multiple_p (*misalignment, unit_size))
>
> Hmm, but wouldn't that mean dr_info->target_alignment is bigger
> than the vector size? Does that ever happen? I'll note that
> *alignment_support_scheme == dr_aligned means alignment according
> to dr_info->target_alignment which might be actually less than
> the vector size (we've noticed this recently in other PRs), so
> we might want to make sure that dr_info->target_alignment is
> at least vector size when ->need_peeling_for_alignment I think.
>
One reason I block LOAD_LANES from the non-inbound case is that in
those cases dr_info->target_alignment would need to be GROUP_SIZE * vector size
to ensure that the entire access doesn't cross a page. Because this puts an
excessive alignment request in place I currently just reject the loop.
But In principal it can happen, however the above checks for element size, not
vector size. This fails when the user has intentionally misaligned the array and we
don't support peeling for the access type to correct it.
So something like vect-early-break_133_pfa3.c with a grouped access for instance.
> So - which of the testcases gets you here? I think we
> set *misalignment to be modulo target_alignment, never larger than that.
>
The condition passes for most cases yes, unless we can't peel in which case we
vect-early-break_22.c and vect-early-break_121-pr114081.c two that get rejected
inside the block.
> > + {
> > + bool inbounds
> > + = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
> > + DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
> > + /* If we have a known misalignment, and are doing a group load for a DR
> > + that requires aligned access, check if the misalignment is a multiple
> > + of the unit size. In which case the group load will be issued aligned
> > + as long as the first load in the group is aligned.
> > +
> > + For the non-inbound case we'd need goup_size * vectype alignment. But
> > + this is quite huge and unlikely to ever happen so if we can't peel for
> > + it, just reject it. */
> > + if (*memory_access_type == VMAT_LOAD_STORE_LANES
> > + && (STMT_VINFO_GROUPED_ACCESS (stmt_info) || slp_node))
> > + {
> > + /* ?? This needs updating whenever we support slp group > 1. */
> > + auto group_size = DR_GROUP_SIZE (stmt_info);
> > + /* For the inbound case it's enough to check for an alignment of
> > + GROUP_SIZE * element size. */
> > + if (inbounds
> > + && (*misalignment % (group_size * unit_size)) == 0
> > + && group_size % 2 == 0)
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "Assuming grouped access is aligned due to load "
> > + "lanes, overriding alignment scheme\n");
> > +
> > + *alignment_support_scheme = dr_unaligned_supported;
> > + }
> > + }
> > + /* If we have a linear access and know the misalignment and know we won't
> > + read out of bounds then it's also ok if the misalignment is a multiple
> > + of the element size. We get this when the loop has known misalignments
> > + but the misalignments of the DRs can't be peeled to reach mutual
> > + alignment. Because the misalignments are known however we also know
> > + that versioning won't work. If the target does support unaligned
> > + accesses and we know we are free to read the entire buffer then we
> > + can allow the unaligned access if it's on elements for an early break
> > + condition. */
> > + else if (*memory_access_type != VMAT_GATHER_SCATTER
> > + && inbounds)
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "Access will not read beyond buffer to due known size "
> > + "buffer, overriding alignment scheme\n");
> > +
> > + *alignment_support_scheme = dr_unaligned_supported;
> > + }
> > + }
> > +
> > if (*alignment_support_scheme == dr_unaligned_unsupported)
> > {
> > if (dump_enabled_p ())
> > @@ -10520,6 +10583,68 @@ vectorizable_load (vec_info *vinfo,
> > /* Transform. */
> >
> > dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info), *first_dr_info =
> NULL;
> > +
> > + /* Check if we support the operation if early breaks are needed. */
> > + if (loop_vinfo
> > + && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > + && (memory_access_type == VMAT_GATHER_SCATTER
> > + || memory_access_type == VMAT_STRIDED_SLP))
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + "early break not supported: cannot peel for "
> > + "alignment. With non-contiguous memory vectorization"
> > + " could read out of bounds at %G ",
> > + STMT_VINFO_STMT (stmt_info));
>
> Hmm, this is now more restrictive than the original check in
> vect_analyze_early_break_dependences because it covers all accesses.
> The simplest fix would be to leave it there.
>
It covers all loads, which you're right is more restrictive, I think it just needs to be
moved inside if (costing_p && dr_info->need_peeling_for_alignment) block
below it though.
Delaying this to here instead of earlier has allowed us to vectorize gcc.dg/vect/bb-slp-pr65935.c
Which now vectorizes after the inner loops are unrolled.
Are you happy with just moving it down?
> > + return false;
> > + }
> > +
> > + /* If this DR needs peeling for alignment for correctness, we must
> > + ensure the target alignment is a constant power-of-two multiple of the
> > + amount read per vector iteration (overriding the above hook where
> > + necessary). We don't support group loads, which would have been filterd
> > + out in the check above. For now it means we don't have to look at the
> > + group info and just check that the load is continguous and can just use
> > + dr_info. For known size buffers we still need to check if the vector
> > + is misaligned and if so we need to peel. */
> > + if (costing_p && dr_info->need_peeling_for_alignment)
>
> dr_peeling_alignment ()
>
> I think this belongs in get_load_store_type, specifically I think
> we want to only allow VMAT_CONTIGUOUS for need_peeling_for_alignment refs
> and by construction dr_aligned should ensure type size alignment
> (by altering target_alignment for the respective refs). Given that
> both VF and vector type are fixed at vect_analyze_data_refs_alignment
> time we should be able to compute the appropriate target alignment
> there (I'm not sure we support peeling of more than VF-1 iterations
> though).
>
> > + {
> > + /* Vector size in bytes. */
> > + poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> > +
> > + /* We can only peel for loops, of course. */
> > + gcc_checking_assert (loop_vinfo);
> > +
> > + auto num_vectors = ncopies;
> > + if (!pow2p_hwi (num_vectors))
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + "non-power-of-two num vectors %u "
> > + "for DR needing peeling for "
> > + "alignment at %G",
> > + num_vectors, STMT_VINFO_STMT (stmt_info));
> > + return false;
> > + }
> > +
> > + safe_align *= num_vectors;
> > + if (known_gt (safe_align, (unsigned)param_min_pagesize)
> > + /* For VLA we don't support PFA when any unrolling needs to be done.
> > + We could though but too much work for GCC 15. For now we assume a
> > + vector is not larger than a page size so allow single loads. */
> > + && (num_vectors > 1 && !vf.is_constant ()))
> > + {
> > + if (dump_enabled_p ())
> > + {
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + "alignment required for correctness (");
> > + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> > + dump_printf (MSG_NOTE, ") may exceed page size\n");
> > + }
> > + return false;
> > + }
> > + }
> > +
> > ensure_base_align (dr_info);
> >
> > if (memory_access_type == VMAT_INVARIANT)
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index
> 44d3a1d46c409597f1e67a275211a1da414fc7c7..6ef97ee84336ac9a0e192214
> 5f3d418436d709f4 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -1998,6 +1998,19 @@ dr_target_alignment (dr_vec_info *dr_info)
> > }
> > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> >
> > +/* Return if the stmt_vec_info requires peeling for alignment. */
> > +inline bool
> > +dr_peeling_alignment (stmt_vec_info stmt_info)
> > +{
> > + dr_vec_info *dr_info;
> > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
>
> ... but checking it on the first only.
I did it that way because I was under the assumption that group loads could be relaxed
to e.g. element wise or some other form. If it's the case that the group cannot grow or
be changed I could instead set it only on the first access and then not need to check it
elsewhere if you prefer.
Thanks,
Tamar
>
> > + else
> > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > +
> > + return dr_info->need_peeling_for_alignment;
> > +}
> > +
> > inline void
> > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> > {
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Wed, 5 Feb 2025, Tamar Christina wrote:
>
>
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Tuesday, February 4, 2025 12:49 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > Subject: RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]
> >
> > On Tue, 4 Feb 2025, Tamar Christina wrote:
> >
> > > Looks like a last minute change I made accidentally blocked SVE. Fixed and re-
> > sending:
> > >
> > > Hi All,
> > >
> > > This fixes two PRs on Early break vectorization by delaying the safety checks to
> > > vectorizable_load when the VF, VMAT and vectype are all known.
> > >
> > > This patch does add two new restrictions:
> > >
> > > 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
> > > group sizes, as they are unaligned every n % 2 iterations and so may cross
> > > a page unwittingly.
> > >
> > > 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization
> > if
> > > we cannot peel for alignment, as the alignment requirement is quite large at
> > > GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
> > > don't support it for now.
> > >
> > > There are other steps documented inside the code itself so that the reasoning
> > > is next to the code.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > > -m32, -m64 and no issues.
> > >
> > > On arm-none-linux-gnueabihf some tests are failing to vectorize because it looks
> > > like LOAD_LANES is often misaligned. I need to debug those a bit more to see if
> > > it's the patch or backend.
> > >
> > > For now I think the patch itself is fine.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > PR tree-optimization/118464
> > > PR tree-optimization/116855
> > > * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
> > > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
> > > checks.
> > > (vect_compute_data_ref_alignment): Remove alignment checks and move
> > to
> > > vectorizable_load.
> > > (vect_enhance_data_refs_alignment): Add note to comment needing
> > > investigating.
> > > (vect_analyze_data_refs_alignment): Likewise.
> > > (vect_supportable_dr_alignment): For group loads look at first DR.
> > > * tree-vect-stmts.cc (get_load_store_type, vectorizable_load):
> > > Perform safety checks for early break pfa.
> > > * tree-vectorizer.h (dr_peeling_alignment): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR tree-optimization/118464
> > > PR tree-optimization/116855
> > > * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
> > > load type is relaxed later.
> > > * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> > > * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
> > > * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> > > * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
> > >
> > > -- inline copy of patch --
> > >
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index
> > dddde54a287dbdf504f540bc499e024d077746a8..85f9c49eff437221f2cea77c1
> > 14064a6a603b732 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -17246,7 +17246,7 @@ Maximum number of relations the oracle will
> > register in a basic block.
> > > Work bound when discovering transitive relations from existing relations.
> > >
> > > @item min-pagesize
> > > -Minimum page size for warning purposes.
> > > +Minimum page size for warning and early break vectorization purposes.
> > >
> > > @item openacc-kernels
> > > Specify mode of OpenACC `kernels' constructs handling.
> > > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..5e50e56ad17515e278c05c
> > 92263af120c3ab2c21
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +#include <cstddef>
> > > +
> > > +struct ts1 {
> > > + int spans[6][2];
> > > +};
> > > +struct gg {
> > > + int t[6];
> > > +};
> > > +ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
> > > + ts1 ret;
> > > + for (size_t i = 0; i != t; i++) {
> > > + if (!(i < t)) __builtin_abort();
> > > + ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
> > > + ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
> > > + }
> > > + return ret;
> > > +}
> >
> > This one ICEd at some point? Otherwise it doesn't test anything?
> >
>
> Yes. It and the one below it were ICEng and are not necessarily vectorizable
> code as they were ICEing early.
>
> > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > index
> > 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d
> > 93c950629f3231554 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > @@ -55,7 +55,9 @@ int main()
> > > }
> > > }
> > > rephase ();
> > > +#pragma GCC novector
> > > for (i = 0; i < 32; ++i)
> > > +#pragma GCC novector
> > > for (j = 0; j < 3; ++j)
> > > #pragma GCC novector
> > > for (k = 0; k < 3; ++k)
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > index
> > 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..8bd85f3893f08157e640414b
> > 5b252b716a8ba93a 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > @@ -5,7 +5,8 @@
> > > /* { dg-additional-options "-O3" } */
> > > /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
> > >
> > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* Arm creates a group size of 3 here, which we can't support yet. */
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! arm*-*-* }
> > } } } */
> > >
> > > typedef struct filter_list_entry {
> > > const char *name;
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8a
> > c83ab569fc9fbde126
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +int a, b, c, d, e, f;
> > > +short g[1];
> > > +int main() {
> > > + int h;
> > > + while (a) {
> > > + while (h)
> > > + ;
> > > + for (b = 2; b; b--) {
> > > + while (c)
> > > + ;
> > > + f = g[a];
> > > + if (d)
> > > + break;
> > > + }
> > > + while (e)
> > > + ;
> > > + }
> > > + return 0;
> > > +}
> >
> > again, what does this test?
> >
>
> That we don't ICE during data ref analysis.
>
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..dc771186efafe25bb65490
> > da7a383ad7f6ceb0a7
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +char string[1020];
> > > +
> > > +char * find(int n, char c)
> > > +{
> > > + for (int i = 1; i < n; i++) {
> > > + if (string[i] == c)
> > > + return &string[i];
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect"
> > } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..1cf58e4f6307f3d258cf093
> > afe6c86a998cdd216
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > @@ -0,0 +1,24 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* Alignment requirement too big, load lanes targets can't safely vectorize this.
> > */
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { !
> > vect_load_lanes } } } } */
> > > +/* { dg-final { scan-tree-dump-not "Peeling for alignment will be applied" "vect"
> > { target { ! vect_load_lanes } } } } */
> > > +
> >
> > Any reason to not use "Alignment of access forced using peeling" like in
> > the other testcases?
>
> Didn't know there was a preference.. will update.
>
> > > +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < (n - 2); i+=2)
> > > + {
> > > + if (vect_a[i] > x || vect_a[i+2] > x)
> > > + return 1;
> > > +
> > > + vect_b[i] = x;
> > > + vect_b[i+1] = x+1;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758
> > ca29a5f3f9d3f6e0d1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +char string[1020];
> > > +
> > > +char * find(int n, char c)
> > > +{
> > > + for (int i = 0; i < n; i++) {
> > > + if (string[i] == c)
> > > + return &string[i];
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> > "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..374a051b945e97eedb9be
> > 9da423cf54b5e564d6f
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +char string[1020] __attribute__((aligned(1)));
> > > +
> > > +char * find(int n, char c)
> > > +{
> > > + for (int i = 1; i < n; i++) {
> > > + if (string[i] == c)
> > > + return &string[i];
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect"
> > } } */
> > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f25
> > 7ceea1c065fcc6ae9
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +char string[1020] __attribute__((aligned(1)));
> > > +
> > > +char * find(int n, char c)
> > > +{
> > > + for (int i = 0; i < n; i++) {
> > > + if (string[i] == c)
> > > + return &string[i];
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> > "vect" } } */
> > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..32a4cee68f3418ed4fc660
> > 4ffd03ff4d8ff53d6b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +unsigned test4(char x, char *vect, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < n; i++)
> > > + {
> > > + if (vect[i] > x)
> > > + return 1;
> > > +
> > > + vect[i] = x;
> > > + }
> > > + return ret;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "Peeling for alignment will be applied" "vect" } } */
> >
> > See above.
> >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c
> > 64a61c97b1b6268743
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +unsigned test4(char x, char *vect_a, char *vect_b, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 1; i < n; i++)
> > > + {
> > > + if (vect_a[i] > x || vect_b[i] > x)
> > > + return 1;
> > > +
> > > + vect_a[i] = x;
> > > + }
> > > + return ret;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" "vect" }
> > } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f
> > 50776299531824ce9c
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* This should be vectorizable through load_lanes and linear targets. */
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < n; i+=2)
> > > + {
> > > + if (vect_a[i] > x || vect_a[i+1] > x)
> > > + return 1;
> > > +
> > > + vect_b[i] = x;
> > > + vect_b[i+1] = x+1;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..d4d87768a60803ed943f9
> > 5ccbda19e7e7812bf29
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > @@ -0,0 +1,27 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump "Access will not read beyond buffer to due known
> > size buffer" "vect" } } */
> > > +
> > > +
> > > +char vect_a[1025];
> > > +char vect_b[1025];
> > > +
> > > +unsigned test4(char x, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 1; i < (n - 2); i+=2)
> > > + {
> > > + if (vect_a[i] > x || vect_a[i+1] > x)
> > > + return 1;
> > > +
> > > + vect_b[i] = x;
> > > + vect_b[i+1] = x+1;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..8419d7a9201dc8c433a23
> > 8f59520dea7e35c666e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > @@ -0,0 +1,28 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* Group size is uneven, load lanes targets can't safely vectorize this. */
> > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-not "Peeling for alignment will be applied" "vect"
> > } } */
> >
> > Likewise.
> >
> > > +
> > > +
> > > +char vect_a[1025];
> > > +char vect_b[1025];
> > > +
> > > +unsigned test4(char x, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 1; i < (n - 2); i+=2)
> > > + {
> > > + if (vect_a[i-1] > x || vect_a[i+2] > x)
> > > + return 1;
> > > +
> > > + vect_b[i] = x;
> > > + vect_b[i+1] = x+1;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > > index
> > b3f5984f682f30f79331d48a264c2cc4af3e2503..a4bab5a72e369892c65569a0
> > 4ec7507e32993ce8 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > > @@ -42,4 +42,6 @@ main ()
> > > return 0;
> > > }
> > >
> > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } }
> > */
> > > +/* Targets that make this LOAD_LANES fail due to the group misalignment. */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" {
> > target vect_load_lanes } } } */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" {
> > target { ! vect_load_lanes } } } } */
> > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > > index
> > 6eda40267bd1382938a77826d11f20dcc959a166..d0148d4938cafc3c59edfa6a
> > 60002933f384f65b 100644
> > > --- a/gcc/tree-vect-data-refs.cc
> > > +++ b/gcc/tree-vect-data-refs.cc
> > > @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences (loop_vec_info
> > loop_vinfo)
> > > if (is_gimple_debug (stmt))
> > > continue;
> > >
> > > - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > + stmt_vec_info stmt_vinfo
> > > + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> > > auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > if (!dr_ref)
> > > continue;
> > > @@ -748,26 +749,15 @@ vect_analyze_early_break_dependences
> > (loop_vec_info loop_vinfo)
> > > bounded by VF so accesses are within range. We only need to check
> > > the reads since writes are moved to a safe place where if we get
> > > there we know they are safe to perform. */
> > > - if (DR_IS_READ (dr_ref)
> > > - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > > + if (DR_IS_READ (dr_ref))
> > > {
> > > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> > > - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> > > - {
> > > - const char *msg
> > > - = "early break not supported: cannot peel "
> > > - "for alignment, vectorization would read out of "
> > > - "bounds at %G";
> > > - return opt_result::failure_at (stmt, msg, stmt);
> > > - }
> > > -
> > > dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> > > dr_info->need_peeling_for_alignment = true;
> >
> > You're setting the flag on any DR of a DR group here ...
> >
> > > if (dump_enabled_p ())
> > > dump_printf_loc (MSG_NOTE, vect_location,
> > > - "marking DR (read) as needing peeling for "
> > > - "alignment at %G", stmt);
> > > + "marking DR (read) as possibly needing peeling "
> > > + "for alignment at %G", stmt);
> > > }
> > >
> > > if (DR_IS_READ (dr_ref))
> > > @@ -1326,9 +1316,6 @@ vect_record_base_alignments (vec_info *vinfo)
> > > Compute the misalignment of the data reference DR_INFO when vectorizing
> > > with VECTYPE.
> > >
> > > - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
> > > - be set appropriately on failure (but is otherwise left unchanged).
> > > -
> > > Output:
> > > 1. initialized misalignment info for DR_INFO
> > >
> > > @@ -1337,7 +1324,7 @@ vect_record_base_alignments (vec_info *vinfo)
> > >
> > > static void
> > > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> > > - tree vectype, opt_result *result = nullptr)
> > > + tree vectype)
> > > {
> > > stmt_vec_info stmt_info = dr_info->stmt;
> > > vec_base_alignments *base_alignments = &vinfo->base_alignments;
> > > @@ -1365,66 +1352,6 @@ vect_compute_data_ref_alignment (vec_info *vinfo,
> > dr_vec_info *dr_info,
> > > = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> > > BITS_PER_UNIT);
> > >
> > > - /* If this DR needs peeling for alignment for correctness, we must
> > > - ensure the target alignment is a constant power-of-two multiple of the
> > > - amount read per vector iteration (overriding the above hook where
> > > - necessary). */
> > > - if (dr_info->need_peeling_for_alignment)
> > > - {
> > > - /* Vector size in bytes. */
> > > - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> > > -
> > > - /* We can only peel for loops, of course. */
> > > - gcc_checking_assert (loop_vinfo);
> > > -
> > > - /* Calculate the number of vectors read per vector iteration. If
> > > - it is a power of two, multiply through to get the required
> > > - alignment in bytes. Otherwise, fail analysis since alignment
> > > - peeling wouldn't work in such a case. */
> > > - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > - if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > - num_scalars *= DR_GROUP_SIZE (stmt_info);
> > > -
> > > - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> > > - if (!pow2p_hwi (num_vectors))
> > > - {
> > > - *result = opt_result::failure_at (vect_location,
> > > - "non-power-of-two num vectors %u "
> > > - "for DR needing peeling for "
> > > - "alignment at %G",
> > > - num_vectors, stmt_info->stmt);
> > > - return;
> > > - }
> > > -
> > > - safe_align *= num_vectors;
> > > - if (maybe_gt (safe_align, 4096U))
> > > - {
> > > - pretty_printer pp;
> > > - pp_wide_integer (&pp, safe_align);
> > > - *result = opt_result::failure_at (vect_location,
> > > - "alignment required for correctness"
> > > - " (%s) may exceed page size",
> > > - pp_formatted_text (&pp));
> > > - return;
> > > - }
> > > -
> > > - unsigned HOST_WIDE_INT multiple;
> > > - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> > > - || !pow2p_hwi (multiple))
> > > - {
> > > - if (dump_enabled_p ())
> > > - {
> > > - dump_printf_loc (MSG_NOTE, vect_location,
> > > - "forcing alignment for DR from preferred (");
> > > - dump_dec (MSG_NOTE, vector_alignment);
> > > - dump_printf (MSG_NOTE, ") to safe align (");
> > > - dump_dec (MSG_NOTE, safe_align);
> > > - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> > > - }
> > > - vector_alignment = safe_align;
> > > - }
> > > - }
> > > -
> > > SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment);
> > >
> > > /* If the main loop has peeled for alignment we have no way of knowing
> > > @@ -2479,7 +2406,7 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > loop_vinfo)
> > > || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> > > loop_preheader_edge (loop))
> > > || loop->inner
> > > - || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > > + || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) // <<-- ??
> >
> > Spurious change(?)
>
> I was actually wondering why this is here. I'm curious why we're saying we can't
> peel for alignment on an inverted loop.
No idea either.
> >
> > > do_peeling = false;
> > >
> > > struct _vect_peel_extended_info peel_for_known_alignment;
> > > @@ -2942,12 +2869,9 @@ vect_analyze_data_refs_alignment (loop_vec_info
> > loop_vinfo)
> > > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> > > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
> > > continue;
> > > - opt_result res = opt_result::success ();
> > > +
> > > vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> > > - STMT_VINFO_VECTYPE (dr_info->stmt),
> > > - &res);
> > > - if (!res)
> > > - return res;
> > > + STMT_VINFO_VECTYPE (dr_info->stmt));
> > > }
> > > }
> > >
> > > @@ -7211,7 +7135,7 @@ vect_supportable_dr_alignment (vec_info *vinfo,
> > dr_vec_info *dr_info,
> > >
> > > if (misalignment == 0)
> > > return dr_aligned;
> > > - else if (dr_info->need_peeling_for_alignment)
> > > + else if (dr_peeling_alignment (stmt_info))
> > > return dr_unaligned_unsupported;
> > >
> > > /* For now assume all conditional loads/stores support unaligned
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index
> > 1b639ae3b174779bb838d0f7ce4886d84ecafcfe..c213ec46c4bdb456f99a43a3a
> > ff7c1af80e8769a 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -2597,6 +2597,69 @@ get_load_store_type (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> > > return false;
> > > }
> > >
> > > + auto unit_size = GET_MODE_UNIT_SIZE (TYPE_MODE (vectype));
> > > + /* Check if a misalignment with an unsupported peeling for early break is
> > > + still OK. First we need to distinguish between when we've reached here do
> >
> > due to
> >
> > > + to dependency analysis or when the user has requested -mstrict-align or
> > > + similar. In those cases we must not override it. */
> > > + if (dr_peeling_alignment (stmt_info)
> > > + && *alignment_support_scheme == dr_unaligned_unsupported
> > > + /* We can only attempt to override if the misalignment is a multiple of
> > > + the element being loaded, otherwise peeling or versioning would have
> > > + really been required. */
> > > + && multiple_p (*misalignment, unit_size))
> >
> > Hmm, but wouldn't that mean dr_info->target_alignment is bigger
> > than the vector size? Does that ever happen? I'll note that
> > *alignment_support_scheme == dr_aligned means alignment according
> > to dr_info->target_alignment which might be actually less than
> > the vector size (we've noticed this recently in other PRs), so
> > we might want to make sure that dr_info->target_alignment is
> > at least vector size when ->need_peeling_for_alignment I think.
> >
>
> One reason I block LOAD_LANES from the non-inbound case is that in
> those cases dr_info->target_alignment would need to be GROUP_SIZE * vector size
> to ensure that the entire access doesn't cross a page. Because this puts an
> excessive alignment request in place I currently just reject the loop.
But that's true for all grouped accesses and one reason we wanted to move
this code here - we know group_size and the vectorization factor.
> But In principal it can happen, however the above checks for element size, not
> vector size. This fails when the user has intentionally misaligned the array and we
> don't support peeling for the access type to correct it.
>
> So something like vect-early-break_133_pfa3.c with a grouped access for instance.
>
> > So - which of the testcases gets you here? I think we
> > set *misalignment to be modulo target_alignment, never larger than that.
> >
>
> The condition passes for most cases yes, unless we can't peel in which case we
> vect-early-break_22.c and vect-early-break_121-pr114081.c two that get rejected
> inside the block.
>
> > > + {
> > > + bool inbounds
> > > + = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
> > > + DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
> > > + /* If we have a known misalignment, and are doing a group load for a DR
> > > + that requires aligned access, check if the misalignment is a multiple
> > > + of the unit size. In which case the group load will be issued aligned
> > > + as long as the first load in the group is aligned.
> > > +
> > > + For the non-inbound case we'd need goup_size * vectype alignment. But
> > > + this is quite huge and unlikely to ever happen so if we can't peel for
> > > + it, just reject it. */
I don't think the in-bound case is any different from the non-in-bound
case unless the size of the object is a multiple of the whole vector
access size as well.
> > > + if (*memory_access_type == VMAT_LOAD_STORE_LANES
> > > + && (STMT_VINFO_GROUPED_ACCESS (stmt_info) || slp_node))
> > > + {
> > > + /* ?? This needs updating whenever we support slp group > 1. */
?
> > > + auto group_size = DR_GROUP_SIZE (stmt_info);
> > > + /* For the inbound case it's enough to check for an alignment of
> > > + GROUP_SIZE * element size. */
> > > + if (inbounds
> > > + && (*misalignment % (group_size * unit_size)) == 0
> > > + && group_size % 2 == 0)
It looks fishy that you do not need to consider the VF here.
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > + "Assuming grouped access is aligned due to load "
> > > + "lanes, overriding alignment scheme\n");
> > > +
> > > + *alignment_support_scheme = dr_unaligned_supported;
> > > + }
> > > + }
> > > + /* If we have a linear access and know the misalignment and know we won't
> > > + read out of bounds then it's also ok if the misalignment is a multiple
> > > + of the element size. We get this when the loop has known misalignments
> > > + but the misalignments of the DRs can't be peeled to reach mutual
> > > + alignment. Because the misalignments are known however we also know
> > > + that versioning won't work. If the target does support unaligned
> > > + accesses and we know we are free to read the entire buffer then we
> > > + can allow the unaligned access if it's on elements for an early break
> > > + condition. */
See above - one of the PRs was exactly that we ovverread a decl even
if the original scalar accesses are all in-bounds. So we can't allow
this.
> > > + else if (*memory_access_type != VMAT_GATHER_SCATTER
> > > + && inbounds)
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > + "Access will not read beyond buffer to due known size "
> > > + "buffer, overriding alignment scheme\n");
> > > +
> > > + *alignment_support_scheme = dr_unaligned_supported;
> > > + }
> > > + }
> > > +
> > > if (*alignment_support_scheme == dr_unaligned_unsupported)
> > > {
> > > if (dump_enabled_p ())
> > > @@ -10520,6 +10583,68 @@ vectorizable_load (vec_info *vinfo,
> > > /* Transform. */
> > >
> > > dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info), *first_dr_info =
> > NULL;
> > > +
> > > + /* Check if we support the operation if early breaks are needed. */
> > > + if (loop_vinfo
> > > + && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > + && (memory_access_type == VMAT_GATHER_SCATTER
> > > + || memory_access_type == VMAT_STRIDED_SLP))
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > + "early break not supported: cannot peel for "
> > > + "alignment. With non-contiguous memory vectorization"
> > > + " could read out of bounds at %G ",
> > > + STMT_VINFO_STMT (stmt_info));
> >
> > Hmm, this is now more restrictive than the original check in
> > vect_analyze_early_break_dependences because it covers all accesses.
> > The simplest fix would be to leave it there.
> >
>
> It covers all loads, which you're right is more restrictive, I think it just needs to be
> moved inside if (costing_p && dr_info->need_peeling_for_alignment) block
> below it though.
>
> Delaying this to here instead of earlier has allowed us to vectorize gcc.dg/vect/bb-slp-pr65935.c
> Which now vectorizes after the inner loops are unrolled.
>
> Are you happy with just moving it down?
OK.
> > > + return false;
> > > + }
> > > +
> > > + /* If this DR needs peeling for alignment for correctness, we must
> > > + ensure the target alignment is a constant power-of-two multiple of the
> > > + amount read per vector iteration (overriding the above hook where
> > > + necessary). We don't support group loads, which would have been filterd
> > > + out in the check above. For now it means we don't have to look at the
> > > + group info and just check that the load is continguous and can just use
> > > + dr_info. For known size buffers we still need to check if the vector
> > > + is misaligned and if so we need to peel. */
> > > + if (costing_p && dr_info->need_peeling_for_alignment)
> >
> > dr_peeling_alignment ()
> >
> > I think this belongs in get_load_store_type, specifically I think
> > we want to only allow VMAT_CONTIGUOUS for need_peeling_for_alignment refs
> > and by construction dr_aligned should ensure type size alignment
> > (by altering target_alignment for the respective refs). Given that
> > both VF and vector type are fixed at vect_analyze_data_refs_alignment
> > time we should be able to compute the appropriate target alignment
> > there (I'm not sure we support peeling of more than VF-1 iterations
> > though).
> >
> > > + {
> > > + /* Vector size in bytes. */
> > > + poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> > > +
> > > + /* We can only peel for loops, of course. */
> > > + gcc_checking_assert (loop_vinfo);
> > > +
> > > + auto num_vectors = ncopies;
> > > + if (!pow2p_hwi (num_vectors))
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > + "non-power-of-two num vectors %u "
> > > + "for DR needing peeling for "
> > > + "alignment at %G",
> > > + num_vectors, STMT_VINFO_STMT (stmt_info));
> > > + return false;
> > > + }
> > > +
> > > + safe_align *= num_vectors;
> > > + if (known_gt (safe_align, (unsigned)param_min_pagesize)
> > > + /* For VLA we don't support PFA when any unrolling needs to be done.
> > > + We could though but too much work for GCC 15. For now we assume a
> > > + vector is not larger than a page size so allow single loads. */
> > > + && (num_vectors > 1 && !vf.is_constant ()))
> > > + {
> > > + if (dump_enabled_p ())
> > > + {
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > + "alignment required for correctness (");
> > > + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> > > + dump_printf (MSG_NOTE, ") may exceed page size\n");
> > > + }
> > > + return false;
> > > + }
> > > + }
> > > +
> > > ensure_base_align (dr_info);
> > >
> > > if (memory_access_type == VMAT_INVARIANT)
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > index
> > 44d3a1d46c409597f1e67a275211a1da414fc7c7..6ef97ee84336ac9a0e192214
> > 5f3d418436d709f4 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -1998,6 +1998,19 @@ dr_target_alignment (dr_vec_info *dr_info)
> > > }
> > > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> > >
> > > +/* Return if the stmt_vec_info requires peeling for alignment. */
> > > +inline bool
> > > +dr_peeling_alignment (stmt_vec_info stmt_info)
> > > +{
> > > + dr_vec_info *dr_info;
> > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> >
> > ... but checking it on the first only.
>
> I did it that way because I was under the assumption that group loads could be relaxed
> to e.g. element wise or some other form. If it's the case that the group cannot grow or
> be changed I could instead set it only on the first access and then not need to check it
> elsewhere if you prefer.
I've merely noted the discrepancy - consider
if (a[2*i+1])
early break;
... = a[2*i];
then you'd set ->needs_peeling on the 2nd group member but
dr_peeling_alignment would always check the first. So yes, I think
we always want to set the flag on the first element of a grouped
access. We're no longer splitting groups when loop vectorizing.
Richard.
>
> Thanks,
> Tamar
> >
> > > + else
> > > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > +
> > > + return dr_info->need_peeling_for_alignment;
> > > +}
> > > +
> > > inline void
> > > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> > > {
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, February 5, 2025 10:16 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]
>
> On Wed, 5 Feb 2025, Tamar Christina wrote:
>
> >
> >
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Tuesday, February 4, 2025 12:49 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > Subject: RE: [PATCH]middle-end: delay checking for alignment to load
> [PR118464]
> > >
> > > On Tue, 4 Feb 2025, Tamar Christina wrote:
> > >
> > > > Looks like a last minute change I made accidentally blocked SVE. Fixed and re-
> > > sending:
> > > >
> > > > Hi All,
> > > >
> > > > This fixes two PRs on Early break vectorization by delaying the safety checks
> to
> > > > vectorizable_load when the VF, VMAT and vectype are all known.
> > > >
> > > > This patch does add two new restrictions:
> > > >
> > > > 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
> > > > group sizes, as they are unaligned every n % 2 iterations and so may cross
> > > > a page unwittingly.
> > > >
> > > > 2. On LOAD_LANES targets when the buffer is unknown, we reject
> vectorization
> > > if
> > > > we cannot peel for alignment, as the alignment requirement is quite large at
> > > > GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
> > > > don't support it for now.
> > > >
> > > > There are other steps documented inside the code itself so that the reasoning
> > > > is next to the code.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > > > -m32, -m64 and no issues.
> > > >
> > > > On arm-none-linux-gnueabihf some tests are failing to vectorize because it
> looks
> > > > like LOAD_LANES is often misaligned. I need to debug those a bit more to see
> if
> > > > it's the patch or backend.
> > > >
> > > > For now I think the patch itself is fine.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR tree-optimization/118464
> > > > PR tree-optimization/116855
> > > > * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
> > > > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
> > > > checks.
> > > > (vect_compute_data_ref_alignment): Remove alignment checks and move
> > > to
> > > > vectorizable_load.
> > > > (vect_enhance_data_refs_alignment): Add note to comment needing
> > > > investigating.
> > > > (vect_analyze_data_refs_alignment): Likewise.
> > > > (vect_supportable_dr_alignment): For group loads look at first DR.
> > > > * tree-vect-stmts.cc (get_load_store_type, vectorizable_load):
> > > > Perform safety checks for early break pfa.
> > > > * tree-vectorizer.h (dr_peeling_alignment): New.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR tree-optimization/118464
> > > > PR tree-optimization/116855
> > > > * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
> > > > load type is relaxed later.
> > > > * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> > > > * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
> > > > * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> > > > * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
> > > >
> > > > -- inline copy of patch --
> > > >
> > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > index
> > >
> dddde54a287dbdf504f540bc499e024d077746a8..85f9c49eff437221f2cea77c1
> > > 14064a6a603b732 100644
> > > > --- a/gcc/doc/invoke.texi
> > > > +++ b/gcc/doc/invoke.texi
> > > > @@ -17246,7 +17246,7 @@ Maximum number of relations the oracle will
> > > register in a basic block.
> > > > Work bound when discovering transitive relations from existing relations.
> > > >
> > > > @item min-pagesize
> > > > -Minimum page size for warning purposes.
> > > > +Minimum page size for warning and early break vectorization purposes.
> > > >
> > > > @item openacc-kernels
> > > > Specify mode of OpenACC `kernels' constructs handling.
> > > > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > > b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..5e50e56ad17515e278c05c
> > > 92263af120c3ab2c21
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > > > @@ -0,0 +1,23 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +/* { dg-additional-options "-O3" } */
> > > > +
> > > > +#include <cstddef>
> > > > +
> > > > +struct ts1 {
> > > > + int spans[6][2];
> > > > +};
> > > > +struct gg {
> > > > + int t[6];
> > > > +};
> > > > +ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
> > > > + ts1 ret;
> > > > + for (size_t i = 0; i != t; i++) {
> > > > + if (!(i < t)) __builtin_abort();
> > > > + ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
> > > > + ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
> > > > + }
> > > > + return ret;
> > > > +}
> > >
> > > This one ICEd at some point? Otherwise it doesn't test anything?
> > >
> >
> > Yes. It and the one below it were ICEng and are not necessarily vectorizable
> > code as they were ICEing early.
> >
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > > index
> > >
> 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d
> > > 93c950629f3231554 100644
> > > > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > > @@ -55,7 +55,9 @@ int main()
> > > > }
> > > > }
> > > > rephase ();
> > > > +#pragma GCC novector
> > > > for (i = 0; i < 32; ++i)
> > > > +#pragma GCC novector
> > > > for (j = 0; j < 3; ++j)
> > > > #pragma GCC novector
> > > > for (k = 0; k < 3; ++k)
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > > index
> > >
> 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..8bd85f3893f08157e640414b
> > > 5b252b716a8ba93a 100644
> > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > > @@ -5,7 +5,8 @@
> > > > /* { dg-additional-options "-O3" } */
> > > > /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
> > > >
> > > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +/* Arm creates a group size of 3 here, which we can't support yet. */
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! arm*-
> *-* }
> > > } } } */
> > > >
> > > > typedef struct filter_list_entry {
> > > > const char *name;
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8a
> > > c83ab569fc9fbde126
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > > @@ -0,0 +1,25 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +/* { dg-additional-options "-O3" } */
> > > > +
> > > > +int a, b, c, d, e, f;
> > > > +short g[1];
> > > > +int main() {
> > > > + int h;
> > > > + while (a) {
> > > > + while (h)
> > > > + ;
> > > > + for (b = 2; b; b--) {
> > > > + while (c)
> > > > + ;
> > > > + f = g[a];
> > > > + if (d)
> > > > + break;
> > > > + }
> > > > + while (e)
> > > > + ;
> > > > + }
> > > > + return 0;
> > > > +}
> > >
> > > again, what does this test?
> > >
> >
> > That we don't ICE during data ref analysis.
> >
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..dc771186efafe25bb65490
> > > da7a383ad7f6ceb0a7
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +/* { dg-additional-options "-O3" } */
> > > > +
> > > > +char string[1020];
> > > > +
> > > > +char * find(int n, char c)
> > > > +{
> > > > + for (int i = 1; i < n; i++) {
> > > > + if (string[i] == c)
> > > > + return &string[i];
> > > > + }
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling"
> "vect"
> > > } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..1cf58e4f6307f3d258cf093
> > > afe6c86a998cdd216
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > > @@ -0,0 +1,24 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +/* Alignment requirement too big, load lanes targets can't safely vectorize
> this.
> > > */
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { !
> > > vect_load_lanes } } } } */
> > > > +/* { dg-final { scan-tree-dump-not "Peeling for alignment will be applied"
> "vect"
> > > { target { ! vect_load_lanes } } } } */
> > > > +
> > >
> > > Any reason to not use "Alignment of access forced using peeling" like in
> > > the other testcases?
> >
> > Didn't know there was a preference.. will update.
> >
> > > > +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 0; i < (n - 2); i+=2)
> > > > + {
> > > > + if (vect_a[i] > x || vect_a[i+2] > x)
> > > > + return 1;
> > > > +
> > > > + vect_b[i] = x;
> > > > + vect_b[i+1] = x+1;
> > > > + }
> > > > + return ret;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758
> > > ca29a5f3f9d3f6e0d1
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +/* { dg-additional-options "-O3" } */
> > > > +
> > > > +char string[1020];
> > > > +
> > > > +char * find(int n, char c)
> > > > +{
> > > > + for (int i = 0; i < n; i++) {
> > > > + if (string[i] == c)
> > > > + return &string[i];
> > > > + }
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using
> peeling"
> > > "vect" } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..374a051b945e97eedb9be
> > > 9da423cf54b5e564d6f
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > > @@ -0,0 +1,20 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +/* { dg-additional-options "-O3" } */
> > > > +
> > > > +char string[1020] __attribute__((aligned(1)));
> > > > +
> > > > +char * find(int n, char c)
> > > > +{
> > > > + for (int i = 1; i < n; i++) {
> > > > + if (string[i] == c)
> > > > + return &string[i];
> > > > + }
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling"
> "vect"
> > > } } */
> > > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f25
> > > 7ceea1c065fcc6ae9
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > > @@ -0,0 +1,20 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +/* { dg-additional-options "-O3" } */
> > > > +
> > > > +char string[1020] __attribute__((aligned(1)));
> > > > +
> > > > +char * find(int n, char c)
> > > > +{
> > > > + for (int i = 0; i < n; i++) {
> > > > + if (string[i] == c)
> > > > + return &string[i];
> > > > + }
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using
> peeling"
> > > "vect" } } */
> > > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..32a4cee68f3418ed4fc660
> > > 4ffd03ff4d8ff53d6b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > > @@ -0,0 +1,23 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +
> > > > +unsigned test4(char x, char *vect, int n)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 0; i < n; i++)
> > > > + {
> > > > + if (vect[i] > x)
> > > > + return 1;
> > > > +
> > > > + vect[i] = x;
> > > > + }
> > > > + return ret;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "Peeling for alignment will be applied" "vect" }
> } */
> > >
> > > See above.
> > >
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c
> > > 64a61c97b1b6268743
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > > @@ -0,0 +1,23 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +
> > > > +unsigned test4(char x, char *vect_a, char *vect_b, int n)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 1; i < n; i++)
> > > > + {
> > > > + if (vect_a[i] > x || vect_b[i] > x)
> > > > + return 1;
> > > > +
> > > > + vect_a[i] = x;
> > > > + }
> > > > + return ret;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied"
> "vect" }
> > > } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f
> > > 50776299531824ce9c
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > > @@ -0,0 +1,23 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +/* This should be vectorizable through load_lanes and linear targets. */
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +
> > > > +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 0; i < n; i+=2)
> > > > + {
> > > > + if (vect_a[i] > x || vect_a[i+1] > x)
> > > > + return 1;
> > > > +
> > > > + vect_b[i] = x;
> > > > + vect_b[i+1] = x+1;
> > > > + }
> > > > + return ret;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..d4d87768a60803ed943f9
> > > 5ccbda19e7e7812bf29
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > > @@ -0,0 +1,27 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +/* { dg-final { scan-tree-dump "Access will not read beyond buffer to due
> known
> > > size buffer" "vect" } } */
> > > > +
> > > > +
> > > > +char vect_a[1025];
> > > > +char vect_b[1025];
> > > > +
> > > > +unsigned test4(char x, int n)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 1; i < (n - 2); i+=2)
> > > > + {
> > > > + if (vect_a[i] > x || vect_a[i+1] > x)
> > > > + return 1;
> > > > +
> > > > + vect_b[i] = x;
> > > > + vect_b[i+1] = x+1;
> > > > + }
> > > > + return ret;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..8419d7a9201dc8c433a23
> > > 8f59520dea7e35c666e
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > > @@ -0,0 +1,28 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +/* Group size is uneven, load lanes targets can't safely vectorize this. */
> > > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> > > > +/* { dg-final { scan-tree-dump-not "Peeling for alignment will be applied"
> "vect"
> > > } } */
> > >
> > > Likewise.
> > >
> > > > +
> > > > +
> > > > +char vect_a[1025];
> > > > +char vect_b[1025];
> > > > +
> > > > +unsigned test4(char x, int n)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 1; i < (n - 2); i+=2)
> > > > + {
> > > > + if (vect_a[i-1] > x || vect_a[i+2] > x)
> > > > + return 1;
> > > > +
> > > > + vect_b[i] = x;
> > > > + vect_b[i+1] = x+1;
> > > > + }
> > > > + return ret;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > > > index
> > >
> b3f5984f682f30f79331d48a264c2cc4af3e2503..a4bab5a72e369892c65569a0
> > > 4ec7507e32993ce8 100644
> > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > > > @@ -42,4 +42,6 @@ main ()
> > > > return 0;
> > > > }
> > > >
> > > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect"
> } }
> > > */
> > > > +/* Targets that make this LOAD_LANES fail due to the group misalignment.
> */
> > > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect"
> {
> > > target vect_load_lanes } } } */
> > > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect"
> {
> > > target { ! vect_load_lanes } } } } */
> > > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > > > index
> > >
> 6eda40267bd1382938a77826d11f20dcc959a166..d0148d4938cafc3c59edfa6a
> > > 60002933f384f65b 100644
> > > > --- a/gcc/tree-vect-data-refs.cc
> > > > +++ b/gcc/tree-vect-data-refs.cc
> > > > @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences
> (loop_vec_info
> > > loop_vinfo)
> > > > if (is_gimple_debug (stmt))
> > > > continue;
> > > >
> > > > - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > > + stmt_vec_info stmt_vinfo
> > > > + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> > > > auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > > if (!dr_ref)
> > > > continue;
> > > > @@ -748,26 +749,15 @@ vect_analyze_early_break_dependences
> > > (loop_vec_info loop_vinfo)
> > > > bounded by VF so accesses are within range. We only need to check
> > > > the reads since writes are moved to a safe place where if we get
> > > > there we know they are safe to perform. */
> > > > - if (DR_IS_READ (dr_ref)
> > > > - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > > > + if (DR_IS_READ (dr_ref))
> > > > {
> > > > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> > > > - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> > > > - {
> > > > - const char *msg
> > > > - = "early break not supported: cannot peel "
> > > > - "for alignment, vectorization would read out of "
> > > > - "bounds at %G";
> > > > - return opt_result::failure_at (stmt, msg, stmt);
> > > > - }
> > > > -
> > > > dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> > > > dr_info->need_peeling_for_alignment = true;
> > >
> > > You're setting the flag on any DR of a DR group here ...
> > >
> > > > if (dump_enabled_p ())
> > > > dump_printf_loc (MSG_NOTE, vect_location,
> > > > - "marking DR (read) as needing peeling for "
> > > > - "alignment at %G", stmt);
> > > > + "marking DR (read) as possibly needing peeling "
> > > > + "for alignment at %G", stmt);
> > > > }
> > > >
> > > > if (DR_IS_READ (dr_ref))
> > > > @@ -1326,9 +1316,6 @@ vect_record_base_alignments (vec_info *vinfo)
> > > > Compute the misalignment of the data reference DR_INFO when vectorizing
> > > > with VECTYPE.
> > > >
> > > > - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
> > > > - be set appropriately on failure (but is otherwise left unchanged).
> > > > -
> > > > Output:
> > > > 1. initialized misalignment info for DR_INFO
> > > >
> > > > @@ -1337,7 +1324,7 @@ vect_record_base_alignments (vec_info *vinfo)
> > > >
> > > > static void
> > > > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> > > > - tree vectype, opt_result *result = nullptr)
> > > > + tree vectype)
> > > > {
> > > > stmt_vec_info stmt_info = dr_info->stmt;
> > > > vec_base_alignments *base_alignments = &vinfo->base_alignments;
> > > > @@ -1365,66 +1352,6 @@ vect_compute_data_ref_alignment (vec_info
> *vinfo,
> > > dr_vec_info *dr_info,
> > > > = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> > > > BITS_PER_UNIT);
> > > >
> > > > - /* If this DR needs peeling for alignment for correctness, we must
> > > > - ensure the target alignment is a constant power-of-two multiple of the
> > > > - amount read per vector iteration (overriding the above hook where
> > > > - necessary). */
> > > > - if (dr_info->need_peeling_for_alignment)
> > > > - {
> > > > - /* Vector size in bytes. */
> > > > - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT
> (vectype));
> > > > -
> > > > - /* We can only peel for loops, of course. */
> > > > - gcc_checking_assert (loop_vinfo);
> > > > -
> > > > - /* Calculate the number of vectors read per vector iteration. If
> > > > - it is a power of two, multiply through to get the required
> > > > - alignment in bytes. Otherwise, fail analysis since alignment
> > > > - peeling wouldn't work in such a case. */
> > > > - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > > - if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > - num_scalars *= DR_GROUP_SIZE (stmt_info);
> > > > -
> > > > - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> > > > - if (!pow2p_hwi (num_vectors))
> > > > - {
> > > > - *result = opt_result::failure_at (vect_location,
> > > > - "non-power-of-two num vectors %u "
> > > > - "for DR needing peeling for "
> > > > - "alignment at %G",
> > > > - num_vectors, stmt_info->stmt);
> > > > - return;
> > > > - }
> > > > -
> > > > - safe_align *= num_vectors;
> > > > - if (maybe_gt (safe_align, 4096U))
> > > > - {
> > > > - pretty_printer pp;
> > > > - pp_wide_integer (&pp, safe_align);
> > > > - *result = opt_result::failure_at (vect_location,
> > > > - "alignment required for correctness"
> > > > - " (%s) may exceed page size",
> > > > - pp_formatted_text (&pp));
> > > > - return;
> > > > - }
> > > > -
> > > > - unsigned HOST_WIDE_INT multiple;
> > > > - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> > > > - || !pow2p_hwi (multiple))
> > > > - {
> > > > - if (dump_enabled_p ())
> > > > - {
> > > > - dump_printf_loc (MSG_NOTE, vect_location,
> > > > - "forcing alignment for DR from preferred (");
> > > > - dump_dec (MSG_NOTE, vector_alignment);
> > > > - dump_printf (MSG_NOTE, ") to safe align (");
> > > > - dump_dec (MSG_NOTE, safe_align);
> > > > - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> > > > - }
> > > > - vector_alignment = safe_align;
> > > > - }
> > > > - }
> > > > -
> > > > SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment);
> > > >
> > > > /* If the main loop has peeled for alignment we have no way of knowing
> > > > @@ -2479,7 +2406,7 @@ vect_enhance_data_refs_alignment
> (loop_vec_info
> > > loop_vinfo)
> > > > || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT
> (loop_vinfo),
> > > > loop_preheader_edge (loop))
> > > > || loop->inner
> > > > - || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > > > + || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) // <<-- ??
> > >
> > > Spurious change(?)
> >
> > I was actually wondering why this is here. I'm curious why we're saying we can't
> > peel for alignment on an inverted loop.
>
> No idea either.
>
> > >
> > > > do_peeling = false;
> > > >
> > > > struct _vect_peel_extended_info peel_for_known_alignment;
> > > > @@ -2942,12 +2869,9 @@ vect_analyze_data_refs_alignment
> (loop_vec_info
> > > loop_vinfo)
> > > > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> > > > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
> > > > continue;
> > > > - opt_result res = opt_result::success ();
> > > > +
> > > > vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> > > > - STMT_VINFO_VECTYPE (dr_info->stmt),
> > > > - &res);
> > > > - if (!res)
> > > > - return res;
> > > > + STMT_VINFO_VECTYPE (dr_info->stmt));
> > > > }
> > > > }
> > > >
> > > > @@ -7211,7 +7135,7 @@ vect_supportable_dr_alignment (vec_info *vinfo,
> > > dr_vec_info *dr_info,
> > > >
> > > > if (misalignment == 0)
> > > > return dr_aligned;
> > > > - else if (dr_info->need_peeling_for_alignment)
> > > > + else if (dr_peeling_alignment (stmt_info))
> > > > return dr_unaligned_unsupported;
> > > >
> > > > /* For now assume all conditional loads/stores support unaligned
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index
> > >
> 1b639ae3b174779bb838d0f7ce4886d84ecafcfe..c213ec46c4bdb456f99a43a3a
> > > ff7c1af80e8769a 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -2597,6 +2597,69 @@ get_load_store_type (vec_info *vinfo,
> > > stmt_vec_info stmt_info,
> > > > return false;
> > > > }
> > > >
> > > > + auto unit_size = GET_MODE_UNIT_SIZE (TYPE_MODE (vectype));
> > > > + /* Check if a misalignment with an unsupported peeling for early break is
> > > > + still OK. First we need to distinguish between when we've reached here
> do
> > >
> > > due to
> > >
> > > > + to dependency analysis or when the user has requested -mstrict-align or
> > > > + similar. In those cases we must not override it. */
> > > > + if (dr_peeling_alignment (stmt_info)
> > > > + && *alignment_support_scheme == dr_unaligned_unsupported
> > > > + /* We can only attempt to override if the misalignment is a multiple of
> > > > + the element being loaded, otherwise peeling or versioning would have
> > > > + really been required. */
> > > > + && multiple_p (*misalignment, unit_size))
> > >
> > > Hmm, but wouldn't that mean dr_info->target_alignment is bigger
> > > than the vector size? Does that ever happen? I'll note that
> > > *alignment_support_scheme == dr_aligned means alignment according
> > > to dr_info->target_alignment which might be actually less than
> > > the vector size (we've noticed this recently in other PRs), so
> > > we might want to make sure that dr_info->target_alignment is
> > > at least vector size when ->need_peeling_for_alignment I think.
> > >
> >
> > One reason I block LOAD_LANES from the non-inbound case is that in
> > those cases dr_info->target_alignment would need to be GROUP_SIZE * vector
> size
> > to ensure that the entire access doesn't cross a page. Because this puts an
> > excessive alignment request in place I currently just reject the loop.
>
> But that's true for all grouped accesses and one reason we wanted to move
> this code here - we know group_size and the vectorization factor.
>
Yes, I mentioned LOAD_LANES since that's what AArch64 makes group accesses,
But yes you're right, this is about group accessed, but maybe my understanding
is wrong here, I thought the only VMAT that can result in a group access is a LOAD_LANES?
But yes I agree I should drop the *memory_access_type == VMAT_LOAD_STORE_LANES bit
and only look at the group access flag.
> > But In principal it can happen, however the above checks for element size, not
> > vector size. This fails when the user has intentionally misaligned the array and we
> > don't support peeling for the access type to correct it.
> >
> > So something like vect-early-break_133_pfa3.c with a grouped access for
> instance.
> >
> > > So - which of the testcases gets you here? I think we
> > > set *misalignment to be modulo target_alignment, never larger than that.
> > >
> >
> > The condition passes for most cases yes, unless we can't peel in which case we
> > vect-early-break_22.c and vect-early-break_121-pr114081.c two that get
> rejected
> > inside the block.
> >
> > > > + {
> > > > + bool inbounds
> > > > + = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
> > > > + DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
> > > > + /* If we have a known misalignment, and are doing a group load for a DR
> > > > + that requires aligned access, check if the misalignment is a multiple
> > > > + of the unit size. In which case the group load will be issued aligned
> > > > + as long as the first load in the group is aligned.
> > > > +
> > > > + For the non-inbound case we'd need goup_size * vectype alignment. But
> > > > + this is quite huge and unlikely to ever happen so if we can't peel for
> > > > + it, just reject it. */
>
> I don't think the in-bound case is any different from the non-in-bound
> case unless the size of the object is a multiple of the whole vector
> access size as well.
>
The idea was that we know the accesses are all within the boundary of the object,
What we want to know is whether the accesses done are aligned. Since we don't
support group loads with gaps here we know the elements must be sequential and so
the relaxation was that if we know all this that then all we really need to know is that
it is safe to read a multiple of GROUP_SIZE * element sizes, since we would still be in
range of the buffer. Because these are the number of scalar loads that would have
been done per iteration and so we can relax the alignment requirement based on
the information known in the inbounds case.
> > > > + if (*memory_access_type == VMAT_LOAD_STORE_LANES
> > > > + && (STMT_VINFO_GROUPED_ACCESS (stmt_info) || slp_node))
> > > > + {
> > > > + /* ?? This needs updating whenever we support slp group > 1. */
>
> ?
>
> > > > + auto group_size = DR_GROUP_SIZE (stmt_info);
> > > > + /* For the inbound case it's enough to check for an alignment of
> > > > + GROUP_SIZE * element size. */
> > > > + if (inbounds
> > > > + && (*misalignment % (group_size * unit_size)) == 0
> > > > + && group_size % 2 == 0)
>
> It looks fishy that you do not need to consider the VF here.
This and the ? above from you are related. I don't think we have to consider VF here
since early break vectorization does not support slp group size > 1. So the VF can at
this time never be larger than a single vector. The note is there to update this when we do.
>
> > > > + {
> > > > + if (dump_enabled_p ())
> > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > + "Assuming grouped access is aligned due to load "
> > > > + "lanes, overriding alignment scheme\n");
> > > > +
> > > > + *alignment_support_scheme = dr_unaligned_supported;
> > > > + }
> > > > + }
> > > > + /* If we have a linear access and know the misalignment and know we
> won't
> > > > + read out of bounds then it's also ok if the misalignment is a multiple
> > > > + of the element size. We get this when the loop has known misalignments
> > > > + but the misalignments of the DRs can't be peeled to reach mutual
> > > > + alignment. Because the misalignments are known however we also know
> > > > + that versioning won't work. If the target does support unaligned
> > > > + accesses and we know we are free to read the entire buffer then we
> > > > + can allow the unaligned access if it's on elements for an early break
> > > > + condition. */
>
> See above - one of the PRs was exactly that we ovverread a decl even
> if the original scalar accesses are all in-bounds. So we can't allow
> this.
>
But that PR was only because it was misaligned to an unnatural alignment of the type
which resulted one element possibly being split across a page boundary. That is still
rejected here.
This check is saying that if you have mutual misalignment, but each DR is in itself still
aligned to the type's natural alignment and we know that all access are in bounds that
we should be safe to vectorize as we won't accidentally access an invalid part of memory.
> > > > + else if (*memory_access_type != VMAT_GATHER_SCATTER
> > > > + && inbounds)
> > > > + {
> > > > + if (dump_enabled_p ())
> > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > + "Access will not read beyond buffer to due known size "
> > > > + "buffer, overriding alignment scheme\n");
> > > > +
> > > > + *alignment_support_scheme = dr_unaligned_supported;
> > > > + }
> > > > + }
> > > > +
> > > > if (*alignment_support_scheme == dr_unaligned_unsupported)
> > > > {
> > > > if (dump_enabled_p ())
> > > > @@ -10520,6 +10583,68 @@ vectorizable_load (vec_info *vinfo,
> > > > /* Transform. */
> > > >
> > > > dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info), *first_dr_info =
> > > NULL;
> > > > +
> > > > + /* Check if we support the operation if early breaks are needed. */
> > > > + if (loop_vinfo
> > > > + && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > + && (memory_access_type == VMAT_GATHER_SCATTER
> > > > + || memory_access_type == VMAT_STRIDED_SLP))
> > > > + {
> > > > + if (dump_enabled_p ())
> > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > + "early break not supported: cannot peel for "
> > > > + "alignment. With non-contiguous memory vectorization"
> > > > + " could read out of bounds at %G ",
> > > > + STMT_VINFO_STMT (stmt_info));
> > >
> > > Hmm, this is now more restrictive than the original check in
> > > vect_analyze_early_break_dependences because it covers all accesses.
> > > The simplest fix would be to leave it there.
> > >
> >
> > It covers all loads, which you're right is more restrictive, I think it just needs to be
> > moved inside if (costing_p && dr_info->need_peeling_for_alignment) block
> > below it though.
> >
> > Delaying this to here instead of earlier has allowed us to vectorize
> gcc.dg/vect/bb-slp-pr65935.c
> > Which now vectorizes after the inner loops are unrolled.
> >
> > Are you happy with just moving it down?
>
> OK.
>
> > > > + return false;
> > > > + }
> > > > +
> > > > + /* If this DR needs peeling for alignment for correctness, we must
> > > > + ensure the target alignment is a constant power-of-two multiple of the
> > > > + amount read per vector iteration (overriding the above hook where
> > > > + necessary). We don't support group loads, which would have been filterd
> > > > + out in the check above. For now it means we don't have to look at the
> > > > + group info and just check that the load is continguous and can just use
> > > > + dr_info. For known size buffers we still need to check if the vector
> > > > + is misaligned and if so we need to peel. */
> > > > + if (costing_p && dr_info->need_peeling_for_alignment)
> > >
> > > dr_peeling_alignment ()
> > >
> > > I think this belongs in get_load_store_type, specifically I think
> > > we want to only allow VMAT_CONTIGUOUS for need_peeling_for_alignment
> refs
> > > and by construction dr_aligned should ensure type size alignment
> > > (by altering target_alignment for the respective refs). Given that
> > > both VF and vector type are fixed at vect_analyze_data_refs_alignment
> > > time we should be able to compute the appropriate target alignment
> > > there (I'm not sure we support peeling of more than VF-1 iterations
> > > though).
> > >
> > > > + {
> > > > + /* Vector size in bytes. */
> > > > + poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT
> (vectype));
> > > > +
> > > > + /* We can only peel for loops, of course. */
> > > > + gcc_checking_assert (loop_vinfo);
> > > > +
> > > > + auto num_vectors = ncopies;
> > > > + if (!pow2p_hwi (num_vectors))
> > > > + {
> > > > + if (dump_enabled_p ())
> > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > + "non-power-of-two num vectors %u "
> > > > + "for DR needing peeling for "
> > > > + "alignment at %G",
> > > > + num_vectors, STMT_VINFO_STMT (stmt_info));
> > > > + return false;
> > > > + }
> > > > +
> > > > + safe_align *= num_vectors;
> > > > + if (known_gt (safe_align, (unsigned)param_min_pagesize)
> > > > + /* For VLA we don't support PFA when any unrolling needs to be done.
> > > > + We could though but too much work for GCC 15. For now we assume a
> > > > + vector is not larger than a page size so allow single loads. */
> > > > + && (num_vectors > 1 && !vf.is_constant ()))
> > > > + {
> > > > + if (dump_enabled_p ())
> > > > + {
> > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > + "alignment required for correctness (");
> > > > + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> > > > + dump_printf (MSG_NOTE, ") may exceed page size\n");
> > > > + }
> > > > + return false;
> > > > + }
> > > > + }
> > > > +
> > > > ensure_base_align (dr_info);
> > > >
> > > > if (memory_access_type == VMAT_INVARIANT)
> > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > index
> > >
> 44d3a1d46c409597f1e67a275211a1da414fc7c7..6ef97ee84336ac9a0e192214
> > > 5f3d418436d709f4 100644
> > > > --- a/gcc/tree-vectorizer.h
> > > > +++ b/gcc/tree-vectorizer.h
> > > > @@ -1998,6 +1998,19 @@ dr_target_alignment (dr_vec_info *dr_info)
> > > > }
> > > > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> > > >
> > > > +/* Return if the stmt_vec_info requires peeling for alignment. */
> > > > +inline bool
> > > > +dr_peeling_alignment (stmt_vec_info stmt_info)
> > > > +{
> > > > + dr_vec_info *dr_info;
> > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT
> (stmt_info));
> > >
> > > ... but checking it on the first only.
> >
> > I did it that way because I was under the assumption that group loads could be
> relaxed
> > to e.g. element wise or some other form. If it's the case that the group cannot
> grow or
> > be changed I could instead set it only on the first access and then not need to
> check it
> > elsewhere if you prefer.
>
> I've merely noted the discrepancy - consider
>
> if (a[2*i+1])
> early break;
> ... = a[2*i];
>
> then you'd set ->needs_peeling on the 2nd group member but
> dr_peeling_alignment would always check the first. So yes, I think
> we always want to set the flag on the first element of a grouped
> access. We're no longer splitting groups when loop vectorizing.
>
Ack,
Will update.
Thanks,
Tamar
> Richard.
>
> >
> > Thanks,
> > Tamar
> > >
> > > > + else
> > > > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > > +
> > > > + return dr_info->need_peeling_for_alignment;
> > > > +}
> > > > +
> > > > inline void
> > > > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> > > > {
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Wed, 5 Feb 2025, Tamar Christina wrote:
[...]
> > 6eda40267bd1382938a77826d11f20dcc959a166..d0148d4938cafc3c59edfa6a
> > > > 60002933f384f65b 100644
> > > > > --- a/gcc/tree-vect-data-refs.cc
> > > > > +++ b/gcc/tree-vect-data-refs.cc
> > > > > @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences
> > (loop_vec_info
> > > > loop_vinfo)
> > > > > if (is_gimple_debug (stmt))
> > > > > continue;
> > > > >
> > > > > - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > > > + stmt_vec_info stmt_vinfo
> > > > > + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> > > > > auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > > > if (!dr_ref)
> > > > > continue;
> > > > > @@ -748,26 +749,15 @@ vect_analyze_early_break_dependences
> > > > (loop_vec_info loop_vinfo)
> > > > > bounded by VF so accesses are within range. We only need to check
> > > > > the reads since writes are moved to a safe place where if we get
> > > > > there we know they are safe to perform. */
> > > > > - if (DR_IS_READ (dr_ref)
> > > > > - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > > > > + if (DR_IS_READ (dr_ref))
> > > > > {
> > > > > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> > > > > - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> > > > > - {
> > > > > - const char *msg
> > > > > - = "early break not supported: cannot peel "
> > > > > - "for alignment, vectorization would read out of "
> > > > > - "bounds at %G";
> > > > > - return opt_result::failure_at (stmt, msg, stmt);
> > > > > - }
> > > > > -
> > > > > dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> > > > > dr_info->need_peeling_for_alignment = true;
> > > >
> > > > You're setting the flag on any DR of a DR group here ...
> > > >
> > > > > if (dump_enabled_p ())
> > > > > dump_printf_loc (MSG_NOTE, vect_location,
> > > > > - "marking DR (read) as needing peeling for "
> > > > > - "alignment at %G", stmt);
> > > > > + "marking DR (read) as possibly needing peeling "
> > > > > + "for alignment at %G", stmt);
> > > > > }
> > > > >
> > > > > if (DR_IS_READ (dr_ref))
> > > > > @@ -1326,9 +1316,6 @@ vect_record_base_alignments (vec_info *vinfo)
> > > > > Compute the misalignment of the data reference DR_INFO when vectorizing
> > > > > with VECTYPE.
> > > > >
> > > > > - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
> > > > > - be set appropriately on failure (but is otherwise left unchanged).
> > > > > -
> > > > > Output:
> > > > > 1. initialized misalignment info for DR_INFO
> > > > >
> > > > > @@ -1337,7 +1324,7 @@ vect_record_base_alignments (vec_info *vinfo)
> > > > >
> > > > > static void
> > > > > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> > > > > - tree vectype, opt_result *result = nullptr)
> > > > > + tree vectype)
> > > > > {
> > > > > stmt_vec_info stmt_info = dr_info->stmt;
> > > > > vec_base_alignments *base_alignments = &vinfo->base_alignments;
> > > > > @@ -1365,66 +1352,6 @@ vect_compute_data_ref_alignment (vec_info
> > *vinfo,
> > > > dr_vec_info *dr_info,
> > > > > = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> > > > > BITS_PER_UNIT);
> > > > >
> > > > > - /* If this DR needs peeling for alignment for correctness, we must
> > > > > - ensure the target alignment is a constant power-of-two multiple of the
> > > > > - amount read per vector iteration (overriding the above hook where
> > > > > - necessary). */
> > > > > - if (dr_info->need_peeling_for_alignment)
> > > > > - {
> > > > > - /* Vector size in bytes. */
> > > > > - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT
> > (vectype));
> > > > > -
> > > > > - /* We can only peel for loops, of course. */
> > > > > - gcc_checking_assert (loop_vinfo);
> > > > > -
> > > > > - /* Calculate the number of vectors read per vector iteration. If
> > > > > - it is a power of two, multiply through to get the required
> > > > > - alignment in bytes. Otherwise, fail analysis since alignment
> > > > > - peeling wouldn't work in such a case. */
> > > > > - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > > > - if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > > - num_scalars *= DR_GROUP_SIZE (stmt_info);
> > > > > -
> > > > > - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> > > > > - if (!pow2p_hwi (num_vectors))
> > > > > - {
> > > > > - *result = opt_result::failure_at (vect_location,
> > > > > - "non-power-of-two num vectors %u "
> > > > > - "for DR needing peeling for "
> > > > > - "alignment at %G",
> > > > > - num_vectors, stmt_info->stmt);
> > > > > - return;
> > > > > - }
> > > > > -
> > > > > - safe_align *= num_vectors;
> > > > > - if (maybe_gt (safe_align, 4096U))
> > > > > - {
> > > > > - pretty_printer pp;
> > > > > - pp_wide_integer (&pp, safe_align);
> > > > > - *result = opt_result::failure_at (vect_location,
> > > > > - "alignment required for correctness"
> > > > > - " (%s) may exceed page size",
> > > > > - pp_formatted_text (&pp));
> > > > > - return;
> > > > > - }
> > > > > -
> > > > > - unsigned HOST_WIDE_INT multiple;
> > > > > - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> > > > > - || !pow2p_hwi (multiple))
> > > > > - {
> > > > > - if (dump_enabled_p ())
> > > > > - {
> > > > > - dump_printf_loc (MSG_NOTE, vect_location,
> > > > > - "forcing alignment for DR from preferred (");
> > > > > - dump_dec (MSG_NOTE, vector_alignment);
> > > > > - dump_printf (MSG_NOTE, ") to safe align (");
> > > > > - dump_dec (MSG_NOTE, safe_align);
> > > > > - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> > > > > - }
> > > > > - vector_alignment = safe_align;
> > > > > - }
> > > > > - }
> > > > > -
> > > > > SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment);
> > > > >
> > > > > /* If the main loop has peeled for alignment we have no way of knowing
> > > > > @@ -2479,7 +2406,7 @@ vect_enhance_data_refs_alignment
> > (loop_vec_info
> > > > loop_vinfo)
> > > > > || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT
> > (loop_vinfo),
> > > > > loop_preheader_edge (loop))
> > > > > || loop->inner
> > > > > - || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > > > > + || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) // <<-- ??
> > > >
> > > > Spurious change(?)
> > >
> > > I was actually wondering why this is here. I'm curious why we're saying we can't
> > > peel for alignment on an inverted loop.
> >
> > No idea either.
> >
> > > >
> > > > > do_peeling = false;
> > > > >
> > > > > struct _vect_peel_extended_info peel_for_known_alignment;
> > > > > @@ -2942,12 +2869,9 @@ vect_analyze_data_refs_alignment
> > (loop_vec_info
> > > > loop_vinfo)
> > > > > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> > > > > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
> > > > > continue;
> > > > > - opt_result res = opt_result::success ();
> > > > > +
> > > > > vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> > > > > - STMT_VINFO_VECTYPE (dr_info->stmt),
> > > > > - &res);
> > > > > - if (!res)
> > > > > - return res;
> > > > > + STMT_VINFO_VECTYPE (dr_info->stmt));
> > > > > }
> > > > > }
> > > > >
> > > > > @@ -7211,7 +7135,7 @@ vect_supportable_dr_alignment (vec_info *vinfo,
> > > > dr_vec_info *dr_info,
> > > > >
> > > > > if (misalignment == 0)
> > > > > return dr_aligned;
> > > > > - else if (dr_info->need_peeling_for_alignment)
> > > > > + else if (dr_peeling_alignment (stmt_info))
> > > > > return dr_unaligned_unsupported;
> > > > >
> > > > > /* For now assume all conditional loads/stores support unaligned
> > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > index
> > > >
> > 1b639ae3b174779bb838d0f7ce4886d84ecafcfe..c213ec46c4bdb456f99a43a3a
> > > > ff7c1af80e8769a 100644
> > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > @@ -2597,6 +2597,69 @@ get_load_store_type (vec_info *vinfo,
> > > > stmt_vec_info stmt_info,
> > > > > return false;
> > > > > }
> > > > >
> > > > > + auto unit_size = GET_MODE_UNIT_SIZE (TYPE_MODE (vectype));
> > > > > + /* Check if a misalignment with an unsupported peeling for early break is
> > > > > + still OK. First we need to distinguish between when we've reached here
> > do
> > > >
> > > > due to
> > > >
> > > > > + to dependency analysis or when the user has requested -mstrict-align or
> > > > > + similar. In those cases we must not override it. */
> > > > > + if (dr_peeling_alignment (stmt_info)
> > > > > + && *alignment_support_scheme == dr_unaligned_unsupported
> > > > > + /* We can only attempt to override if the misalignment is a multiple of
> > > > > + the element being loaded, otherwise peeling or versioning would have
> > > > > + really been required. */
> > > > > + && multiple_p (*misalignment, unit_size))
> > > >
> > > > Hmm, but wouldn't that mean dr_info->target_alignment is bigger
> > > > than the vector size? Does that ever happen? I'll note that
> > > > *alignment_support_scheme == dr_aligned means alignment according
> > > > to dr_info->target_alignment which might be actually less than
> > > > the vector size (we've noticed this recently in other PRs), so
> > > > we might want to make sure that dr_info->target_alignment is
> > > > at least vector size when ->need_peeling_for_alignment I think.
> > > >
> > >
> > > One reason I block LOAD_LANES from the non-inbound case is that in
> > > those cases dr_info->target_alignment would need to be GROUP_SIZE * vector
> > size
> > > to ensure that the entire access doesn't cross a page. Because this puts an
> > > excessive alignment request in place I currently just reject the loop.
> >
> > But that's true for all grouped accesses and one reason we wanted to move
> > this code here - we know group_size and the vectorization factor.
> >
>
> Yes, I mentioned LOAD_LANES since that's what AArch64 makes group accesses,
> But yes you're right, this is about group accessed, but maybe my understanding
> is wrong here, I thought the only VMAT that can result in a group access is a LOAD_LANES?
No, even VMAT_CONTIGUOUS can be a group access.
> But yes I agree I should drop the *memory_access_type == VMAT_LOAD_STORE_LANES bit
> and only look at the group access flag.
>
> > > But In principal it can happen, however the above checks for element size, not
> > > vector size. This fails when the user has intentionally misaligned the array and we
> > > don't support peeling for the access type to correct it.
> > >
> > > So something like vect-early-break_133_pfa3.c with a grouped access for
> > instance.
> > >
> > > > So - which of the testcases gets you here? I think we
> > > > set *misalignment to be modulo target_alignment, never larger than that.
> > > >
> > >
> > > The condition passes for most cases yes, unless we can't peel in which case we
> > > vect-early-break_22.c and vect-early-break_121-pr114081.c two that get
> > rejected
> > > inside the block.
> > >
> > > > > + {
> > > > > + bool inbounds
> > > > > + = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
> > > > > + DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
> > > > > + /* If we have a known misalignment, and are doing a group load for a DR
> > > > > + that requires aligned access, check if the misalignment is a multiple
> > > > > + of the unit size. In which case the group load will be issued aligned
> > > > > + as long as the first load in the group is aligned.
> > > > > +
> > > > > + For the non-inbound case we'd need goup_size * vectype alignment. But
> > > > > + this is quite huge and unlikely to ever happen so if we can't peel for
> > > > > + it, just reject it. */
> >
> > I don't think the in-bound case is any different from the non-in-bound
> > case unless the size of the object is a multiple of the whole vector
> > access size as well.
> >
>
> The idea was that we know the accesses are all within the boundary of the object,
> What we want to know is whether the accesses done are aligned. Since we don't
> support group loads with gaps here we know the elements must be sequential and so
> the relaxation was that if we know all this that then all we really need to know is that
> it is safe to read a multiple of GROUP_SIZE * element sizes, since we would still be in
> range of the buffer. Because these are the number of scalar loads that would have
> been done per iteration and so we can relax the alignment requirement based on
> the information known in the inbounds case.
Huh, I guess I fail to parse this. ref_within_array_bound computes
whether we know all _scalar_ accesses are within bounds.
At this point we know *alignment_support_scheme ==
dr_unaligned_unsupported which means the access is not aligned
but the misalignment is a multiple of the vector type unit size.
How can the access not be aligned then? Only when target_alignment
> type unit size? But I think this implies a grouped access, thus
ncopies > 1 which is why I think you need to consider the VF?
> > > > > + if (*memory_access_type == VMAT_LOAD_STORE_LANES
> > > > > + && (STMT_VINFO_GROUPED_ACCESS (stmt_info) || slp_node))
> > > > > + {
> > > > > + /* ?? This needs updating whenever we support slp group > 1. */
> >
> > ?
> >
> > > > > + auto group_size = DR_GROUP_SIZE (stmt_info);
> > > > > + /* For the inbound case it's enough to check for an alignment of
> > > > > + GROUP_SIZE * element size. */
> > > > > + if (inbounds
> > > > > + && (*misalignment % (group_size * unit_size)) == 0
> > > > > + && group_size % 2 == 0)
> >
> > It looks fishy that you do not need to consider the VF here.
>
> This and the ? above from you are related. I don't think we have to consider VF here
> since early break vectorization does not support slp group size > 1. So the VF can at
> this time never be larger than a single vector. The note is there to update this when we do.
You don't need a SLP group-size > 1 to have more than one vector read.
Consider
if (long_long[i])
early break;
char[i] = char2[i];
where you with V2DI and V16QI end up with a VF of 16 and 4 V2DI loads
to compute the early break condition? That said, implying, without
double-checking, some constraints on early break vectorization here
looks a bit fragile - if there are unwritten constraints we'd better
check them here. That also makes it easier to understand.
> >
> > > > > + {
> > > > > + if (dump_enabled_p ())
> > > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > > + "Assuming grouped access is aligned due to load "
> > > > > + "lanes, overriding alignment scheme\n");
> > > > > +
> > > > > + *alignment_support_scheme = dr_unaligned_supported;
> > > > > + }
> > > > > + }
> > > > > + /* If we have a linear access and know the misalignment and know we
> > won't
> > > > > + read out of bounds then it's also ok if the misalignment is a multiple
> > > > > + of the element size. We get this when the loop has known misalignments
> > > > > + but the misalignments of the DRs can't be peeled to reach mutual
> > > > > + alignment. Because the misalignments are known however we also know
> > > > > + that versioning won't work. If the target does support unaligned
> > > > > + accesses and we know we are free to read the entire buffer then we
> > > > > + can allow the unaligned access if it's on elements for an early break
> > > > > + condition. */
> >
> > See above - one of the PRs was exactly that we ovverread a decl even
> > if the original scalar accesses are all in-bounds. So we can't allow
> > this.
> >
>
> But that PR was only because it was misaligned to an unnatural alignment of the type
> which resulted one element possibly being split across a page boundary. That is still
> rejected here.
By the multiple_p (*misalignment, unit_size) condition? Btw, you should
check known_alignment_for_access_p before using *misalignment or
check *misalignment != DR_MISALIGNMENT_UNKNOWN.
> This check is saying that if you have mutual misalignment, but each DR is in itself still
> aligned to the type's natural alignment and we know that all access are in bounds that
> we should be safe to vectorize as we won't accidentally access an invalid part of memory.
See above - when we do two vector loads we only know the first one had
a corresponding scalar access, the second vector load can still be
out of bounds. But when we can align to N * unit_size then it's fine
the vectorize. And only then (target_alignment > unit_size) we can have
*misalignment > unit_size?
Richard.
> > > > > + else if (*memory_access_type != VMAT_GATHER_SCATTER
> > > > > + && inbounds)
> > > > > + {
> > > > > + if (dump_enabled_p ())
> > > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > > + "Access will not read beyond buffer to due known size "
> > > > > + "buffer, overriding alignment scheme\n");
> > > > > +
> > > > > + *alignment_support_scheme = dr_unaligned_supported;
> > > > > + }
> > > > > + }
> > > > > +
> > > > > if (*alignment_support_scheme == dr_unaligned_unsupported)
> > > > > {
> > > > > if (dump_enabled_p ())
> > > > > @@ -10520,6 +10583,68 @@ vectorizable_load (vec_info *vinfo,
> > > > > /* Transform. */
> > > > >
> > > > > dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info), *first_dr_info =
> > > > NULL;
> > > > > +
> > > > > + /* Check if we support the operation if early breaks are needed. */
> > > > > + if (loop_vinfo
> > > > > + && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > + && (memory_access_type == VMAT_GATHER_SCATTER
> > > > > + || memory_access_type == VMAT_STRIDED_SLP))
> > > > > + {
> > > > > + if (dump_enabled_p ())
> > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > + "early break not supported: cannot peel for "
> > > > > + "alignment. With non-contiguous memory vectorization"
> > > > > + " could read out of bounds at %G ",
> > > > > + STMT_VINFO_STMT (stmt_info));
> > > >
> > > > Hmm, this is now more restrictive than the original check in
> > > > vect_analyze_early_break_dependences because it covers all accesses.
> > > > The simplest fix would be to leave it there.
> > > >
> > >
> > > It covers all loads, which you're right is more restrictive, I think it just needs to be
> > > moved inside if (costing_p && dr_info->need_peeling_for_alignment) block
> > > below it though.
> > >
> > > Delaying this to here instead of earlier has allowed us to vectorize
> > gcc.dg/vect/bb-slp-pr65935.c
> > > Which now vectorizes after the inner loops are unrolled.
> > >
> > > Are you happy with just moving it down?
> >
> > OK.
> >
> > > > > + return false;
> > > > > + }
> > > > > +
> > > > > + /* If this DR needs peeling for alignment for correctness, we must
> > > > > + ensure the target alignment is a constant power-of-two multiple of the
> > > > > + amount read per vector iteration (overriding the above hook where
> > > > > + necessary). We don't support group loads, which would have been filterd
> > > > > + out in the check above. For now it means we don't have to look at the
> > > > > + group info and just check that the load is continguous and can just use
> > > > > + dr_info. For known size buffers we still need to check if the vector
> > > > > + is misaligned and if so we need to peel. */
> > > > > + if (costing_p && dr_info->need_peeling_for_alignment)
> > > >
> > > > dr_peeling_alignment ()
> > > >
> > > > I think this belongs in get_load_store_type, specifically I think
> > > > we want to only allow VMAT_CONTIGUOUS for need_peeling_for_alignment
> > refs
> > > > and by construction dr_aligned should ensure type size alignment
> > > > (by altering target_alignment for the respective refs). Given that
> > > > both VF and vector type are fixed at vect_analyze_data_refs_alignment
> > > > time we should be able to compute the appropriate target alignment
> > > > there (I'm not sure we support peeling of more than VF-1 iterations
> > > > though).
> > > >
> > > > > + {
> > > > > + /* Vector size in bytes. */
> > > > > + poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT
> > (vectype));
> > > > > +
> > > > > + /* We can only peel for loops, of course. */
> > > > > + gcc_checking_assert (loop_vinfo);
> > > > > +
> > > > > + auto num_vectors = ncopies;
> > > > > + if (!pow2p_hwi (num_vectors))
> > > > > + {
> > > > > + if (dump_enabled_p ())
> > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > + "non-power-of-two num vectors %u "
> > > > > + "for DR needing peeling for "
> > > > > + "alignment at %G",
> > > > > + num_vectors, STMT_VINFO_STMT (stmt_info));
> > > > > + return false;
> > > > > + }
> > > > > +
> > > > > + safe_align *= num_vectors;
> > > > > + if (known_gt (safe_align, (unsigned)param_min_pagesize)
> > > > > + /* For VLA we don't support PFA when any unrolling needs to be done.
> > > > > + We could though but too much work for GCC 15. For now we assume a
> > > > > + vector is not larger than a page size so allow single loads. */
> > > > > + && (num_vectors > 1 && !vf.is_constant ()))
> > > > > + {
> > > > > + if (dump_enabled_p ())
> > > > > + {
> > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > + "alignment required for correctness (");
> > > > > + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> > > > > + dump_printf (MSG_NOTE, ") may exceed page size\n");
> > > > > + }
> > > > > + return false;
> > > > > + }
> > > > > + }
> > > > > +
> > > > > ensure_base_align (dr_info);
> > > > >
> > > > > if (memory_access_type == VMAT_INVARIANT)
> > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > > index
> > > >
> > 44d3a1d46c409597f1e67a275211a1da414fc7c7..6ef97ee84336ac9a0e192214
> > > > 5f3d418436d709f4 100644
> > > > > --- a/gcc/tree-vectorizer.h
> > > > > +++ b/gcc/tree-vectorizer.h
> > > > > @@ -1998,6 +1998,19 @@ dr_target_alignment (dr_vec_info *dr_info)
> > > > > }
> > > > > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> > > > >
> > > > > +/* Return if the stmt_vec_info requires peeling for alignment. */
> > > > > +inline bool
> > > > > +dr_peeling_alignment (stmt_vec_info stmt_info)
> > > > > +{
> > > > > + dr_vec_info *dr_info;
> > > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT
> > (stmt_info));
> > > >
> > > > ... but checking it on the first only.
> > >
> > > I did it that way because I was under the assumption that group loads could be
> > relaxed
> > > to e.g. element wise or some other form. If it's the case that the group cannot
> > grow or
> > > be changed I could instead set it only on the first access and then not need to
> > check it
> > > elsewhere if you prefer.
> >
> > I've merely noted the discrepancy - consider
> >
> > if (a[2*i+1])
> > early break;
> > ... = a[2*i];
> >
> > then you'd set ->needs_peeling on the 2nd group member but
> > dr_peeling_alignment would always check the first. So yes, I think
> > we always want to set the flag on the first element of a grouped
> > access. We're no longer splitting groups when loop vectorizing.
> >
>
> Ack,
>
> Will update.
>
> Thanks,
> Tamar
>
> > Richard.
> >
> > >
> > > Thanks,
> > > Tamar
> > > >
> > > > > + else
> > > > > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > > > +
> > > > > + return dr_info->need_peeling_for_alignment;
> > > > > +}
> > > > > +
> > > > > inline void
> > > > > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> > > > > {
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH,
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, February 5, 2025 1:15 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]
>
> On Wed, 5 Feb 2025, Tamar Christina wrote:
>
> [...]
>
> > >
> 6eda40267bd1382938a77826d11f20dcc959a166..d0148d4938cafc3c59edfa6a
> > > > > 60002933f384f65b 100644
> > > > > > --- a/gcc/tree-vect-data-refs.cc
> > > > > > +++ b/gcc/tree-vect-data-refs.cc
> > > > > > @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences
> > > (loop_vec_info
> > > > > loop_vinfo)
> > > > > > if (is_gimple_debug (stmt))
> > > > > > continue;
> > > > > >
> > > > > > - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > > > > + stmt_vec_info stmt_vinfo
> > > > > > + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> > > > > > auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > > > > if (!dr_ref)
> > > > > > continue;
> > > > > > @@ -748,26 +749,15 @@ vect_analyze_early_break_dependences
> > > > > (loop_vec_info loop_vinfo)
> > > > > > bounded by VF so accesses are within range. We only need to
> check
> > > > > > the reads since writes are moved to a safe place where if we get
> > > > > > there we know they are safe to perform. */
> > > > > > - if (DR_IS_READ (dr_ref)
> > > > > > - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > > > > > + if (DR_IS_READ (dr_ref))
> > > > > > {
> > > > > > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> > > > > > - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> > > > > > - {
> > > > > > - const char *msg
> > > > > > - = "early break not supported: cannot peel "
> > > > > > - "for alignment, vectorization would read out of "
> > > > > > - "bounds at %G";
> > > > > > - return opt_result::failure_at (stmt, msg, stmt);
> > > > > > - }
> > > > > > -
> > > > > > dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> > > > > > dr_info->need_peeling_for_alignment = true;
> > > > >
> > > > > You're setting the flag on any DR of a DR group here ...
> > > > >
> > > > > > if (dump_enabled_p ())
> > > > > > dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > - "marking DR (read) as needing peeling for
> "
> > > > > > - "alignment at %G", stmt);
> > > > > > + "marking DR (read) as possibly needing
> peeling "
> > > > > > + "for alignment at %G", stmt);
> > > > > > }
> > > > > >
> > > > > > if (DR_IS_READ (dr_ref))
> > > > > > @@ -1326,9 +1316,6 @@ vect_record_base_alignments (vec_info *vinfo)
> > > > > > Compute the misalignment of the data reference DR_INFO when
> vectorizing
> > > > > > with VECTYPE.
> > > > > >
> > > > > > - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT
> will
> > > > > > - be set appropriately on failure (but is otherwise left unchanged).
> > > > > > -
> > > > > > Output:
> > > > > > 1. initialized misalignment info for DR_INFO
> > > > > >
> > > > > > @@ -1337,7 +1324,7 @@ vect_record_base_alignments (vec_info *vinfo)
> > > > > >
> > > > > > static void
> > > > > > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> > > > > > - tree vectype, opt_result *result = nullptr)
> > > > > > + tree vectype)
> > > > > > {
> > > > > > stmt_vec_info stmt_info = dr_info->stmt;
> > > > > > vec_base_alignments *base_alignments = &vinfo->base_alignments;
> > > > > > @@ -1365,66 +1352,6 @@ vect_compute_data_ref_alignment (vec_info
> > > *vinfo,
> > > > > dr_vec_info *dr_info,
> > > > > > = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> > > > > > BITS_PER_UNIT);
> > > > > >
> > > > > > - /* If this DR needs peeling for alignment for correctness, we must
> > > > > > - ensure the target alignment is a constant power-of-two multiple of the
> > > > > > - amount read per vector iteration (overriding the above hook where
> > > > > > - necessary). */
> > > > > > - if (dr_info->need_peeling_for_alignment)
> > > > > > - {
> > > > > > - /* Vector size in bytes. */
> > > > > > - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT
> > > (vectype));
> > > > > > -
> > > > > > - /* We can only peel for loops, of course. */
> > > > > > - gcc_checking_assert (loop_vinfo);
> > > > > > -
> > > > > > - /* Calculate the number of vectors read per vector iteration. If
> > > > > > - it is a power of two, multiply through to get the required
> > > > > > - alignment in bytes. Otherwise, fail analysis since alignment
> > > > > > - peeling wouldn't work in such a case. */
> > > > > > - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > > > > - if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > > > - num_scalars *= DR_GROUP_SIZE (stmt_info);
> > > > > > -
> > > > > > - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> > > > > > - if (!pow2p_hwi (num_vectors))
> > > > > > - {
> > > > > > - *result = opt_result::failure_at (vect_location,
> > > > > > - "non-power-of-two num
> vectors %u "
> > > > > > - "for DR needing peeling for "
> > > > > > - "alignment at %G",
> > > > > > - num_vectors, stmt_info->stmt);
> > > > > > - return;
> > > > > > - }
> > > > > > -
> > > > > > - safe_align *= num_vectors;
> > > > > > - if (maybe_gt (safe_align, 4096U))
> > > > > > - {
> > > > > > - pretty_printer pp;
> > > > > > - pp_wide_integer (&pp, safe_align);
> > > > > > - *result = opt_result::failure_at (vect_location,
> > > > > > - "alignment required for
> correctness"
> > > > > > - " (%s) may exceed page size",
> > > > > > - pp_formatted_text (&pp));
> > > > > > - return;
> > > > > > - }
> > > > > > -
> > > > > > - unsigned HOST_WIDE_INT multiple;
> > > > > > - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> > > > > > - || !pow2p_hwi (multiple))
> > > > > > - {
> > > > > > - if (dump_enabled_p ())
> > > > > > - {
> > > > > > - dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > - "forcing alignment for DR from preferred (");
> > > > > > - dump_dec (MSG_NOTE, vector_alignment);
> > > > > > - dump_printf (MSG_NOTE, ") to safe align (");
> > > > > > - dump_dec (MSG_NOTE, safe_align);
> > > > > > - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> > > > > > - }
> > > > > > - vector_alignment = safe_align;
> > > > > > - }
> > > > > > - }
> > > > > > -
> > > > > > SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment);
> > > > > >
> > > > > > /* If the main loop has peeled for alignment we have no way of knowing
> > > > > > @@ -2479,7 +2406,7 @@ vect_enhance_data_refs_alignment
> > > (loop_vec_info
> > > > > loop_vinfo)
> > > > > > || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT
> > > (loop_vinfo),
> > > > > > loop_preheader_edge (loop))
> > > > > > || loop->inner
> > > > > > - || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > > > > > + || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) // <<--
> ??
> > > > >
> > > > > Spurious change(?)
> > > >
> > > > I was actually wondering why this is here. I'm curious why we're saying we
> can't
> > > > peel for alignment on an inverted loop.
> > >
> > > No idea either.
> > >
> > > > >
> > > > > > do_peeling = false;
> > > > > >
> > > > > > struct _vect_peel_extended_info peel_for_known_alignment;
> > > > > > @@ -2942,12 +2869,9 @@ vect_analyze_data_refs_alignment
> > > (loop_vec_info
> > > > > loop_vinfo)
> > > > > > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> > > > > > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info-
> >stmt)
> > > > > > continue;
> > > > > > - opt_result res = opt_result::success ();
> > > > > > +
> > > > > > vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> > > > > > - STMT_VINFO_VECTYPE (dr_info-
> >stmt),
> > > > > > - &res);
> > > > > > - if (!res)
> > > > > > - return res;
> > > > > > + STMT_VINFO_VECTYPE (dr_info-
> >stmt));
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > @@ -7211,7 +7135,7 @@ vect_supportable_dr_alignment (vec_info
> *vinfo,
> > > > > dr_vec_info *dr_info,
> > > > > >
> > > > > > if (misalignment == 0)
> > > > > > return dr_aligned;
> > > > > > - else if (dr_info->need_peeling_for_alignment)
> > > > > > + else if (dr_peeling_alignment (stmt_info))
> > > > > > return dr_unaligned_unsupported;
> > > > > >
> > > > > > /* For now assume all conditional loads/stores support unaligned
> > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > > index
> > > > >
> > >
> 1b639ae3b174779bb838d0f7ce4886d84ecafcfe..c213ec46c4bdb456f99a43a3a
> > > > > ff7c1af80e8769a 100644
> > > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > > @@ -2597,6 +2597,69 @@ get_load_store_type (vec_info *vinfo,
> > > > > stmt_vec_info stmt_info,
> > > > > > return false;
> > > > > > }
> > > > > >
> > > > > > + auto unit_size = GET_MODE_UNIT_SIZE (TYPE_MODE (vectype));
> > > > > > + /* Check if a misalignment with an unsupported peeling for early break
> is
> > > > > > + still OK. First we need to distinguish between when we've reached
> here
> > > do
> > > > >
> > > > > due to
> > > > >
> > > > > > + to dependency analysis or when the user has requested -mstrict-align
> or
> > > > > > + similar. In those cases we must not override it. */
> > > > > > + if (dr_peeling_alignment (stmt_info)
> > > > > > + && *alignment_support_scheme == dr_unaligned_unsupported
> > > > > > + /* We can only attempt to override if the misalignment is a multiple of
> > > > > > + the element being loaded, otherwise peeling or versioning would
> have
> > > > > > + really been required. */
> > > > > > + && multiple_p (*misalignment, unit_size))
> > > > >
> > > > > Hmm, but wouldn't that mean dr_info->target_alignment is bigger
> > > > > than the vector size? Does that ever happen? I'll note that
> > > > > *alignment_support_scheme == dr_aligned means alignment according
> > > > > to dr_info->target_alignment which might be actually less than
> > > > > the vector size (we've noticed this recently in other PRs), so
> > > > > we might want to make sure that dr_info->target_alignment is
> > > > > at least vector size when ->need_peeling_for_alignment I think.
> > > > >
> > > >
> > > > One reason I block LOAD_LANES from the non-inbound case is that in
> > > > those cases dr_info->target_alignment would need to be GROUP_SIZE *
> vector
> > > size
> > > > to ensure that the entire access doesn't cross a page. Because this puts an
> > > > excessive alignment request in place I currently just reject the loop.
> > >
> > > But that's true for all grouped accesses and one reason we wanted to move
> > > this code here - we know group_size and the vectorization factor.
> > >
> >
> > Yes, I mentioned LOAD_LANES since that's what AArch64 makes group accesses,
> > But yes you're right, this is about group accessed, but maybe my understanding
> > is wrong here, I thought the only VMAT that can result in a group access is a
> LOAD_LANES?
>
> No, even VMAT_CONTIGUOUS can be a group access.
>
Hmm That does correspond to what I think I was trying to fixup with the hunk below it
and I now see what you meant by your review comment.
I hadn't expected
if (vect_a[i] > x || vect_a[i+1] > x)
break;
to still be a VMAT_CONTIGUOUS with slp lane == 1 as you'd need a permute as well
so I was expecting VMAT_CONTIGUOUS_PERMUTE. But reading the comments on
the enum indicates that this is only used for the non-SLP vectorizer.
I guess SLP always models this as linear load + separate permutation. Which in hindsight
makes sense looking at the node:
note: node 0x627f608 (max_nunits=2, refcnt=1) vector(2) unsigned int
note: op template: _6 = vect_a[_3];
note: stmt 0 _6 = vect_a[_3];
note: load permutation { 0 }
The comments here make sense now, sorry was confused about the representation.
> > But yes I agree I should drop the *memory_access_type ==
> VMAT_LOAD_STORE_LANES bit
> > and only look at the group access flag.
> >
> > > > But In principal it can happen, however the above checks for element size, not
> > > > vector size. This fails when the user has intentionally misaligned the array and
> we
> > > > don't support peeling for the access type to correct it.
> > > >
> > > > So something like vect-early-break_133_pfa3.c with a grouped access for
> > > instance.
> > > >
> > > > > So - which of the testcases gets you here? I think we
> > > > > set *misalignment to be modulo target_alignment, never larger than that.
> > > > >
> > > >
> > > > The condition passes for most cases yes, unless we can't peel in which case we
> > > > vect-early-break_22.c and vect-early-break_121-pr114081.c two that get
> > > rejected
> > > > inside the block.
> > > >
> > > > > > + {
> > > > > > + bool inbounds
> > > > > > + = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
> > > > > > + DR_REF (STMT_VINFO_DATA_REF
> (stmt_info)));
> > > > > > + /* If we have a known misalignment, and are doing a group load for a
> DR
> > > > > > + that requires aligned access, check if the misalignment is a multiple
> > > > > > + of the unit size. In which case the group load will be issued aligned
> > > > > > + as long as the first load in the group is aligned.
> > > > > > +
> > > > > > + For the non-inbound case we'd need goup_size * vectype
> alignment. But
> > > > > > + this is quite huge and unlikely to ever happen so if we can't peel
> for
> > > > > > + it, just reject it. */
> > >
> > > I don't think the in-bound case is any different from the non-in-bound
> > > case unless the size of the object is a multiple of the whole vector
> > > access size as well.
> > >
> >
> > The idea was that we know the accesses are all within the boundary of the
> object,
> > What we want to know is whether the accesses done are aligned. Since we don't
> > support group loads with gaps here we know the elements must be sequential
> and so
> > the relaxation was that if we know all this that then all we really need to know is
> that
> > it is safe to read a multiple of GROUP_SIZE * element sizes, since we would still be
> in
> > range of the buffer. Because these are the number of scalar loads that would
> have
> > been done per iteration and so we can relax the alignment requirement based on
> > the information known in the inbounds case.
>
> Huh, I guess I fail to parse this. ref_within_array_bound computes
> whether we know all _scalar_ accesses are within bounds.
>
> At this point we know *alignment_support_scheme ==
> dr_unaligned_unsupported which means the access is not aligned
> but the misalignment is a multiple of the vector type unit size.
>
> How can the access not be aligned then? Only when target_alignment
> > type unit size? But I think this implies a grouped access, thus
> ncopies > 1 which is why I think you need to consider the VF?
>
But if we do that means we effectively can't vectorize a simple group access
Such as
if (vect_a[i] > x || vect_a[i+1] > x)
break;
even though we know it's safe to do so. If vect_a is V4SI, requiring 256 bit alignment
on a 128-bit target alignment just doesn't work no?
But in the loop:
#define N 1024
unsigned vect_a[N];
unsigned vect_b[N];
unsigned test4(unsigned x)
{
unsigned ret = 0;
for (int i = 1; i < (N/2)-1; i+=2)
{
if (vect_a[i] > x || vect_a[i+1] > x)
break;
vect_a[i] += x * vect_b[i];
vect_a[i+1] += x * vect_b[i+1];
}
return ret;
}
Should be safe to vectorize though..
Or are you saying that in order to vectorize this we have to raise the alignment to 32 bytes?
I.e. override the targets preferred alignment if the new alignment is a multiple of the preferred?
> > > > > > + if (*memory_access_type == VMAT_LOAD_STORE_LANES
> > > > > > + && (STMT_VINFO_GROUPED_ACCESS (stmt_info) || slp_node))
> > > > > > + {
> > > > > > + /* ?? This needs updating whenever we support slp group > 1. */
> > >
> > > ?
> > >
> > > > > > + auto group_size = DR_GROUP_SIZE (stmt_info);
> > > > > > + /* For the inbound case it's enough to check for an alignment of
> > > > > > + GROUP_SIZE * element size. */
> > > > > > + if (inbounds
> > > > > > + && (*misalignment % (group_size * unit_size)) == 0
> > > > > > + && group_size % 2 == 0)
> > >
> > > It looks fishy that you do not need to consider the VF here.
> >
> > This and the ? above from you are related. I don't think we have to consider VF
> here
> > since early break vectorization does not support slp group size > 1. So the VF can
> at
> > this time never be larger than a single vector. The note is there to update this
> when we do.
>
> You don't need a SLP group-size > 1 to have more than one vector read.
> Consider
>
> if (long_long[i])
> early break;
> char[i] = char2[i];
>
> where you with V2DI and V16QI end up with a VF of 16 and 4 V2DI loads
> to compute the early break condition? That said, implying, without
> double-checking, some constraints on early break vectorization here
> looks a bit fragile - if there are unwritten constraints we'd better
> check them here. That also makes it easier to understand.
>
I did consider this case of widening/unpacking. But I had only considered
the opposite case where the char was inside the if.
Yes I can see why ncopies needs to be considered here too.
> > >
> > > > > > + {
> > > > > > + if (dump_enabled_p ())
> > > > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > + "Assuming grouped access is aligned due to load "
> > > > > > + "lanes, overriding alignment scheme\n");
> > > > > > +
> > > > > > + *alignment_support_scheme = dr_unaligned_supported;
> > > > > > + }
> > > > > > + }
> > > > > > + /* If we have a linear access and know the misalignment and know we
> > > won't
> > > > > > + read out of bounds then it's also ok if the misalignment is a
> multiple
> > > > > > + of the element size. We get this when the loop has known
> misalignments
> > > > > > + but the misalignments of the DRs can't be peeled to reach mutual
> > > > > > + alignment. Because the misalignments are known however we
> also know
> > > > > > + that versioning won't work. If the target does support unaligned
> > > > > > + accesses and we know we are free to read the entire buffer then
> we
> > > > > > + can allow the unaligned access if it's on elements for an early break
> > > > > > + condition. */
> > >
> > > See above - one of the PRs was exactly that we ovverread a decl even
> > > if the original scalar accesses are all in-bounds. So we can't allow
> > > this.
> > >
> >
> > But that PR was only because it was misaligned to an unnatural alignment of the
> type
> > which resulted one element possibly being split across a page boundary. That is
> still
> > rejected here.
>
> By the multiple_p (*misalignment, unit_size) condition? Btw, you should
> check known_alignment_for_access_p before using *misalignment or
> check *misalignment != DR_MISALIGNMENT_UNKNOWN.
>
> > This check is saying that if you have mutual misalignment, but each DR is in itself
> still
> > aligned to the type's natural alignment and we know that all access are in bounds
> that
> > we should be safe to vectorize as we won't accidentally access an invalid part of
> memory.
>
> See above - when we do two vector loads we only know the first one had
> a corresponding scalar access, the second vector load can still be
> out of bounds. But when we can align to N * unit_size then it's fine
> the vectorize. And only then (target_alignment > unit_size) we can have
> *misalignment > unit_size?
>
Where N here I guess is ncopies * nunits? i.e. number of scalar loads per iteration?
or rather VF. With most group access this is quite large though..
I think you'll probably answer my question above which clarifies this one.
Thanks for the review.
I think with the next reply I have enough to respin it properly.
Thanks,
Tamar
> Richard.
>
> > > > > > + else if (*memory_access_type != VMAT_GATHER_SCATTER
> > > > > > + && inbounds)
> > > > > > + {
> > > > > > + if (dump_enabled_p ())
> > > > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > + "Access will not read beyond buffer to due known
> size "
> > > > > > + "buffer, overriding alignment scheme\n");
> > > > > > +
> > > > > > + *alignment_support_scheme = dr_unaligned_supported;
> > > > > > + }
> > > > > > + }
> > > > > > +
> > > > > > if (*alignment_support_scheme == dr_unaligned_unsupported)
> > > > > > {
> > > > > > if (dump_enabled_p ())
> > > > > > @@ -10520,6 +10583,68 @@ vectorizable_load (vec_info *vinfo,
> > > > > > /* Transform. */
> > > > > >
> > > > > > dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info),
> *first_dr_info =
> > > > > NULL;
> > > > > > +
> > > > > > + /* Check if we support the operation if early breaks are needed. */
> > > > > > + if (loop_vinfo
> > > > > > + && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > > + && (memory_access_type == VMAT_GATHER_SCATTER
> > > > > > + || memory_access_type == VMAT_STRIDED_SLP))
> > > > > > + {
> > > > > > + if (dump_enabled_p ())
> > > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > > + "early break not supported: cannot peel for "
> > > > > > + "alignment. With non-contiguous memory
> vectorization"
> > > > > > + " could read out of bounds at %G ",
> > > > > > + STMT_VINFO_STMT (stmt_info));
> > > > >
> > > > > Hmm, this is now more restrictive than the original check in
> > > > > vect_analyze_early_break_dependences because it covers all accesses.
> > > > > The simplest fix would be to leave it there.
> > > > >
> > > >
> > > > It covers all loads, which you're right is more restrictive, I think it just needs to
> be
> > > > moved inside if (costing_p && dr_info->need_peeling_for_alignment) block
> > > > below it though.
> > > >
> > > > Delaying this to here instead of earlier has allowed us to vectorize
> > > gcc.dg/vect/bb-slp-pr65935.c
> > > > Which now vectorizes after the inner loops are unrolled.
> > > >
> > > > Are you happy with just moving it down?
> > >
> > > OK.
> > >
> > > > > > + return false;
> > > > > > + }
> > > > > > +
> > > > > > + /* If this DR needs peeling for alignment for correctness, we must
> > > > > > + ensure the target alignment is a constant power-of-two multiple of the
> > > > > > + amount read per vector iteration (overriding the above hook where
> > > > > > + necessary). We don't support group loads, which would have been
> filterd
> > > > > > + out in the check above. For now it means we don't have to look at the
> > > > > > + group info and just check that the load is continguous and can just use
> > > > > > + dr_info. For known size buffers we still need to check if the vector
> > > > > > + is misaligned and if so we need to peel. */
> > > > > > + if (costing_p && dr_info->need_peeling_for_alignment)
> > > > >
> > > > > dr_peeling_alignment ()
> > > > >
> > > > > I think this belongs in get_load_store_type, specifically I think
> > > > > we want to only allow VMAT_CONTIGUOUS for
> need_peeling_for_alignment
> > > refs
> > > > > and by construction dr_aligned should ensure type size alignment
> > > > > (by altering target_alignment for the respective refs). Given that
> > > > > both VF and vector type are fixed at vect_analyze_data_refs_alignment
> > > > > time we should be able to compute the appropriate target alignment
> > > > > there (I'm not sure we support peeling of more than VF-1 iterations
> > > > > though).
> > > > >
> > > > > > + {
> > > > > > + /* Vector size in bytes. */
> > > > > > + poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT
> > > (vectype));
> > > > > > +
> > > > > > + /* We can only peel for loops, of course. */
> > > > > > + gcc_checking_assert (loop_vinfo);
> > > > > > +
> > > > > > + auto num_vectors = ncopies;
> > > > > > + if (!pow2p_hwi (num_vectors))
> > > > > > + {
> > > > > > + if (dump_enabled_p ())
> > > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > > + "non-power-of-two num vectors %u "
> > > > > > + "for DR needing peeling for "
> > > > > > + "alignment at %G",
> > > > > > + num_vectors, STMT_VINFO_STMT (stmt_info));
> > > > > > + return false;
> > > > > > + }
> > > > > > +
> > > > > > + safe_align *= num_vectors;
> > > > > > + if (known_gt (safe_align, (unsigned)param_min_pagesize)
> > > > > > + /* For VLA we don't support PFA when any unrolling needs to be
> done.
> > > > > > + We could though but too much work for GCC 15. For now we
> assume a
> > > > > > + vector is not larger than a page size so allow single loads. */
> > > > > > + && (num_vectors > 1 && !vf.is_constant ()))
> > > > > > + {
> > > > > > + if (dump_enabled_p ())
> > > > > > + {
> > > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > > + "alignment required for correctness (");
> > > > > > + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> > > > > > + dump_printf (MSG_NOTE, ") may exceed page size\n");
> > > > > > + }
> > > > > > + return false;
> > > > > > + }
> > > > > > + }
> > > > > > +
> > > > > > ensure_base_align (dr_info);
> > > > > >
> > > > > > if (memory_access_type == VMAT_INVARIANT)
> > > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > > > index
> > > > >
> > >
> 44d3a1d46c409597f1e67a275211a1da414fc7c7..6ef97ee84336ac9a0e192214
> > > > > 5f3d418436d709f4 100644
> > > > > > --- a/gcc/tree-vectorizer.h
> > > > > > +++ b/gcc/tree-vectorizer.h
> > > > > > @@ -1998,6 +1998,19 @@ dr_target_alignment (dr_vec_info *dr_info)
> > > > > > }
> > > > > > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> > > > > >
> > > > > > +/* Return if the stmt_vec_info requires peeling for alignment. */
> > > > > > +inline bool
> > > > > > +dr_peeling_alignment (stmt_vec_info stmt_info)
> > > > > > +{
> > > > > > + dr_vec_info *dr_info;
> > > > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT
> > > (stmt_info));
> > > > >
> > > > > ... but checking it on the first only.
> > > >
> > > > I did it that way because I was under the assumption that group loads could
> be
> > > relaxed
> > > > to e.g. element wise or some other form. If it's the case that the group cannot
> > > grow or
> > > > be changed I could instead set it only on the first access and then not need to
> > > check it
> > > > elsewhere if you prefer.
> > >
> > > I've merely noted the discrepancy - consider
> > >
> > > if (a[2*i+1])
> > > early break;
> > > ... = a[2*i];
> > >
> > > then you'd set ->needs_peeling on the 2nd group member but
> > > dr_peeling_alignment would always check the first. So yes, I think
> > > we always want to set the flag on the first element of a grouped
> > > access. We're no longer splitting groups when loop vectorizing.
> > >
> >
> > Ack,
> >
> > Will update.
> >
> > Thanks,
> > Tamar
> >
> > > Richard.
> > >
> > > >
> > > > Thanks,
> > > > Tamar
> > > > >
> > > > > > + else
> > > > > > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > > > > +
> > > > > > + return dr_info->need_peeling_for_alignment;
> > > > > > +}
> > > > > > +
> > > > > > inline void
> > > > > > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> > > > > > {
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH,
> > > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Fri, 7 Feb 2025, Tamar Christina wrote:
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, February 5, 2025 1:15 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > Subject: RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]
> >
> > On Wed, 5 Feb 2025, Tamar Christina wrote:
> >
> > [...]
> >
> > > >
> > 6eda40267bd1382938a77826d11f20dcc959a166..d0148d4938cafc3c59edfa6a
> > > > > > 60002933f384f65b 100644
> > > > > > > --- a/gcc/tree-vect-data-refs.cc
> > > > > > > +++ b/gcc/tree-vect-data-refs.cc
> > > > > > > @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences
> > > > (loop_vec_info
> > > > > > loop_vinfo)
> > > > > > > if (is_gimple_debug (stmt))
> > > > > > > continue;
> > > > > > >
> > > > > > > - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > > > > > + stmt_vec_info stmt_vinfo
> > > > > > > + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> > > > > > > auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > > > > > if (!dr_ref)
> > > > > > > continue;
> > > > > > > @@ -748,26 +749,15 @@ vect_analyze_early_break_dependences
> > > > > > (loop_vec_info loop_vinfo)
> > > > > > > bounded by VF so accesses are within range. We only need to
> > check
> > > > > > > the reads since writes are moved to a safe place where if we get
> > > > > > > there we know they are safe to perform. */
> > > > > > > - if (DR_IS_READ (dr_ref)
> > > > > > > - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > > > > > > + if (DR_IS_READ (dr_ref))
> > > > > > > {
> > > > > > > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> > > > > > > - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> > > > > > > - {
> > > > > > > - const char *msg
> > > > > > > - = "early break not supported: cannot peel "
> > > > > > > - "for alignment, vectorization would read out of "
> > > > > > > - "bounds at %G";
> > > > > > > - return opt_result::failure_at (stmt, msg, stmt);
> > > > > > > - }
> > > > > > > -
> > > > > > > dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> > > > > > > dr_info->need_peeling_for_alignment = true;
> > > > > >
> > > > > > You're setting the flag on any DR of a DR group here ...
> > > > > >
> > > > > > > if (dump_enabled_p ())
> > > > > > > dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > - "marking DR (read) as needing peeling for
> > "
> > > > > > > - "alignment at %G", stmt);
> > > > > > > + "marking DR (read) as possibly needing
> > peeling "
> > > > > > > + "for alignment at %G", stmt);
> > > > > > > }
> > > > > > >
> > > > > > > if (DR_IS_READ (dr_ref))
> > > > > > > @@ -1326,9 +1316,6 @@ vect_record_base_alignments (vec_info *vinfo)
> > > > > > > Compute the misalignment of the data reference DR_INFO when
> > vectorizing
> > > > > > > with VECTYPE.
> > > > > > >
> > > > > > > - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT
> > will
> > > > > > > - be set appropriately on failure (but is otherwise left unchanged).
> > > > > > > -
> > > > > > > Output:
> > > > > > > 1. initialized misalignment info for DR_INFO
> > > > > > >
> > > > > > > @@ -1337,7 +1324,7 @@ vect_record_base_alignments (vec_info *vinfo)
> > > > > > >
> > > > > > > static void
> > > > > > > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> > > > > > > - tree vectype, opt_result *result = nullptr)
> > > > > > > + tree vectype)
> > > > > > > {
> > > > > > > stmt_vec_info stmt_info = dr_info->stmt;
> > > > > > > vec_base_alignments *base_alignments = &vinfo->base_alignments;
> > > > > > > @@ -1365,66 +1352,6 @@ vect_compute_data_ref_alignment (vec_info
> > > > *vinfo,
> > > > > > dr_vec_info *dr_info,
> > > > > > > = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> > > > > > > BITS_PER_UNIT);
> > > > > > >
> > > > > > > - /* If this DR needs peeling for alignment for correctness, we must
> > > > > > > - ensure the target alignment is a constant power-of-two multiple of the
> > > > > > > - amount read per vector iteration (overriding the above hook where
> > > > > > > - necessary). */
> > > > > > > - if (dr_info->need_peeling_for_alignment)
> > > > > > > - {
> > > > > > > - /* Vector size in bytes. */
> > > > > > > - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT
> > > > (vectype));
> > > > > > > -
> > > > > > > - /* We can only peel for loops, of course. */
> > > > > > > - gcc_checking_assert (loop_vinfo);
> > > > > > > -
> > > > > > > - /* Calculate the number of vectors read per vector iteration. If
> > > > > > > - it is a power of two, multiply through to get the required
> > > > > > > - alignment in bytes. Otherwise, fail analysis since alignment
> > > > > > > - peeling wouldn't work in such a case. */
> > > > > > > - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > > > > > - if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > > > > - num_scalars *= DR_GROUP_SIZE (stmt_info);
> > > > > > > -
> > > > > > > - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> > > > > > > - if (!pow2p_hwi (num_vectors))
> > > > > > > - {
> > > > > > > - *result = opt_result::failure_at (vect_location,
> > > > > > > - "non-power-of-two num
> > vectors %u "
> > > > > > > - "for DR needing peeling for "
> > > > > > > - "alignment at %G",
> > > > > > > - num_vectors, stmt_info->stmt);
> > > > > > > - return;
> > > > > > > - }
> > > > > > > -
> > > > > > > - safe_align *= num_vectors;
> > > > > > > - if (maybe_gt (safe_align, 4096U))
> > > > > > > - {
> > > > > > > - pretty_printer pp;
> > > > > > > - pp_wide_integer (&pp, safe_align);
> > > > > > > - *result = opt_result::failure_at (vect_location,
> > > > > > > - "alignment required for
> > correctness"
> > > > > > > - " (%s) may exceed page size",
> > > > > > > - pp_formatted_text (&pp));
> > > > > > > - return;
> > > > > > > - }
> > > > > > > -
> > > > > > > - unsigned HOST_WIDE_INT multiple;
> > > > > > > - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> > > > > > > - || !pow2p_hwi (multiple))
> > > > > > > - {
> > > > > > > - if (dump_enabled_p ())
> > > > > > > - {
> > > > > > > - dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > - "forcing alignment for DR from preferred (");
> > > > > > > - dump_dec (MSG_NOTE, vector_alignment);
> > > > > > > - dump_printf (MSG_NOTE, ") to safe align (");
> > > > > > > - dump_dec (MSG_NOTE, safe_align);
> > > > > > > - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> > > > > > > - }
> > > > > > > - vector_alignment = safe_align;
> > > > > > > - }
> > > > > > > - }
> > > > > > > -
> > > > > > > SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment);
> > > > > > >
> > > > > > > /* If the main loop has peeled for alignment we have no way of knowing
> > > > > > > @@ -2479,7 +2406,7 @@ vect_enhance_data_refs_alignment
> > > > (loop_vec_info
> > > > > > loop_vinfo)
> > > > > > > || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT
> > > > (loop_vinfo),
> > > > > > > loop_preheader_edge (loop))
> > > > > > > || loop->inner
> > > > > > > - || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > > > > > > + || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) // <<--
> > ??
> > > > > >
> > > > > > Spurious change(?)
> > > > >
> > > > > I was actually wondering why this is here. I'm curious why we're saying we
> > can't
> > > > > peel for alignment on an inverted loop.
> > > >
> > > > No idea either.
> > > >
> > > > > >
> > > > > > > do_peeling = false;
> > > > > > >
> > > > > > > struct _vect_peel_extended_info peel_for_known_alignment;
> > > > > > > @@ -2942,12 +2869,9 @@ vect_analyze_data_refs_alignment
> > > > (loop_vec_info
> > > > > > loop_vinfo)
> > > > > > > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> > > > > > > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info-
> > >stmt)
> > > > > > > continue;
> > > > > > > - opt_result res = opt_result::success ();
> > > > > > > +
> > > > > > > vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> > > > > > > - STMT_VINFO_VECTYPE (dr_info-
> > >stmt),
> > > > > > > - &res);
> > > > > > > - if (!res)
> > > > > > > - return res;
> > > > > > > + STMT_VINFO_VECTYPE (dr_info-
> > >stmt));
> > > > > > > }
> > > > > > > }
> > > > > > >
> > > > > > > @@ -7211,7 +7135,7 @@ vect_supportable_dr_alignment (vec_info
> > *vinfo,
> > > > > > dr_vec_info *dr_info,
> > > > > > >
> > > > > > > if (misalignment == 0)
> > > > > > > return dr_aligned;
> > > > > > > - else if (dr_info->need_peeling_for_alignment)
> > > > > > > + else if (dr_peeling_alignment (stmt_info))
> > > > > > > return dr_unaligned_unsupported;
> > > > > > >
> > > > > > > /* For now assume all conditional loads/stores support unaligned
> > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > > > index
> > > > > >
> > > >
> > 1b639ae3b174779bb838d0f7ce4886d84ecafcfe..c213ec46c4bdb456f99a43a3a
> > > > > > ff7c1af80e8769a 100644
> > > > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > > > @@ -2597,6 +2597,69 @@ get_load_store_type (vec_info *vinfo,
> > > > > > stmt_vec_info stmt_info,
> > > > > > > return false;
> > > > > > > }
> > > > > > >
> > > > > > > + auto unit_size = GET_MODE_UNIT_SIZE (TYPE_MODE (vectype));
> > > > > > > + /* Check if a misalignment with an unsupported peeling for early break
> > is
> > > > > > > + still OK. First we need to distinguish between when we've reached
> > here
> > > > do
> > > > > >
> > > > > > due to
> > > > > >
> > > > > > > + to dependency analysis or when the user has requested -mstrict-align
> > or
> > > > > > > + similar. In those cases we must not override it. */
> > > > > > > + if (dr_peeling_alignment (stmt_info)
> > > > > > > + && *alignment_support_scheme == dr_unaligned_unsupported
> > > > > > > + /* We can only attempt to override if the misalignment is a multiple of
> > > > > > > + the element being loaded, otherwise peeling or versioning would
> > have
> > > > > > > + really been required. */
> > > > > > > + && multiple_p (*misalignment, unit_size))
> > > > > >
> > > > > > Hmm, but wouldn't that mean dr_info->target_alignment is bigger
> > > > > > than the vector size? Does that ever happen? I'll note that
> > > > > > *alignment_support_scheme == dr_aligned means alignment according
> > > > > > to dr_info->target_alignment which might be actually less than
> > > > > > the vector size (we've noticed this recently in other PRs), so
> > > > > > we might want to make sure that dr_info->target_alignment is
> > > > > > at least vector size when ->need_peeling_for_alignment I think.
> > > > > >
> > > > >
> > > > > One reason I block LOAD_LANES from the non-inbound case is that in
> > > > > those cases dr_info->target_alignment would need to be GROUP_SIZE *
> > vector
> > > > size
> > > > > to ensure that the entire access doesn't cross a page. Because this puts an
> > > > > excessive alignment request in place I currently just reject the loop.
> > > >
> > > > But that's true for all grouped accesses and one reason we wanted to move
> > > > this code here - we know group_size and the vectorization factor.
> > > >
> > >
> > > Yes, I mentioned LOAD_LANES since that's what AArch64 makes group accesses,
> > > But yes you're right, this is about group accessed, but maybe my understanding
> > > is wrong here, I thought the only VMAT that can result in a group access is a
> > LOAD_LANES?
> >
> > No, even VMAT_CONTIGUOUS can be a group access.
> >
>
> Hmm That does correspond to what I think I was trying to fixup with the hunk below it
> and I now see what you meant by your review comment.
>
> I hadn't expected
>
> if (vect_a[i] > x || vect_a[i+1] > x)
> break;
>
> to still be a VMAT_CONTIGUOUS with slp lane == 1 as you'd need a permute as well
> so I was expecting VMAT_CONTIGUOUS_PERMUTE. But reading the comments on
> the enum indicates that this is only used for the non-SLP vectorizer.
>
> I guess SLP always models this as linear load + separate permutation. Which in hindsight
> makes sense looking at the node:
>
> note: node 0x627f608 (max_nunits=2, refcnt=1) vector(2) unsigned int
> note: op template: _6 = vect_a[_3];
> note: stmt 0 _6 = vect_a[_3];
> note: load permutation { 0 }
>
> The comments here make sense now, sorry was confused about the representation.
>
> > > But yes I agree I should drop the *memory_access_type ==
> > VMAT_LOAD_STORE_LANES bit
> > > and only look at the group access flag.
> > >
> > > > > But In principal it can happen, however the above checks for element size, not
> > > > > vector size. This fails when the user has intentionally misaligned the array and
> > we
> > > > > don't support peeling for the access type to correct it.
> > > > >
> > > > > So something like vect-early-break_133_pfa3.c with a grouped access for
> > > > instance.
> > > > >
> > > > > > So - which of the testcases gets you here? I think we
> > > > > > set *misalignment to be modulo target_alignment, never larger than that.
> > > > > >
> > > > >
> > > > > The condition passes for most cases yes, unless we can't peel in which case we
> > > > > vect-early-break_22.c and vect-early-break_121-pr114081.c two that get
> > > > rejected
> > > > > inside the block.
> > > > >
> > > > > > > + {
> > > > > > > + bool inbounds
> > > > > > > + = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
> > > > > > > + DR_REF (STMT_VINFO_DATA_REF
> > (stmt_info)));
> > > > > > > + /* If we have a known misalignment, and are doing a group load for a
> > DR
> > > > > > > + that requires aligned access, check if the misalignment is a multiple
> > > > > > > + of the unit size. In which case the group load will be issued aligned
> > > > > > > + as long as the first load in the group is aligned.
> > > > > > > +
> > > > > > > + For the non-inbound case we'd need goup_size * vectype
> > alignment. But
> > > > > > > + this is quite huge and unlikely to ever happen so if we can't peel
> > for
> > > > > > > + it, just reject it. */
> > > >
> > > > I don't think the in-bound case is any different from the non-in-bound
> > > > case unless the size of the object is a multiple of the whole vector
> > > > access size as well.
> > > >
> > >
> > > The idea was that we know the accesses are all within the boundary of the
> > object,
> > > What we want to know is whether the accesses done are aligned. Since we don't
> > > support group loads with gaps here we know the elements must be sequential
> > and so
> > > the relaxation was that if we know all this that then all we really need to know is
> > that
> > > it is safe to read a multiple of GROUP_SIZE * element sizes, since we would still be
> > in
> > > range of the buffer. Because these are the number of scalar loads that would
> > have
> > > been done per iteration and so we can relax the alignment requirement based on
> > > the information known in the inbounds case.
> >
> > Huh, I guess I fail to parse this. ref_within_array_bound computes
> > whether we know all _scalar_ accesses are within bounds.
> >
> > At this point we know *alignment_support_scheme ==
> > dr_unaligned_unsupported which means the access is not aligned
> > but the misalignment is a multiple of the vector type unit size.
> >
> > How can the access not be aligned then? Only when target_alignment
> > > type unit size? But I think this implies a grouped access, thus
> > ncopies > 1 which is why I think you need to consider the VF?
> >
>
> But if we do that means we effectively can't vectorize a simple group access
> Such as
>
> if (vect_a[i] > x || vect_a[i+1] > x)
> break;
>
> even though we know it's safe to do so. If vect_a is V4SI, requiring 256 bit alignment
> on a 128-bit target alignment just doesn't work no?
Well, if you set ->target_alignment to 256 then peeling should make that
work. There's then the missed optimization (or wrong-code in other
context) when we decide on whether to emit an aligned or unaligned load
because we do that based on dr_aligned (to target_alignment) rather than
on byte alignment with respect to mode alignment of the load.
> But in the loop:
>
> #define N 1024
> unsigned vect_a[N];
> unsigned vect_b[N];
>
> unsigned test4(unsigned x)
> {
> unsigned ret = 0;
> for (int i = 1; i < (N/2)-1; i+=2)
> {
> if (vect_a[i] > x || vect_a[i+1] > x)
> break;
> vect_a[i] += x * vect_b[i];
> vect_a[i+1] += x * vect_b[i+1];
> }
> return ret;
> }
>
> Should be safe to vectorize though..
Yes.
> Or are you saying that in order to vectorize this we have to raise the alignment to 32 bytes?
For the former loop, yes.
> I.e. override the targets preferred alignment if the new alignment is a multiple of the preferred?
Yes. That's where we need to look at the VF - even a non-grouped load
can end up with two vector loads needed when there are multiple types
in the loop.
> > > > > > > + if (*memory_access_type == VMAT_LOAD_STORE_LANES
> > > > > > > + && (STMT_VINFO_GROUPED_ACCESS (stmt_info) || slp_node))
> > > > > > > + {
> > > > > > > + /* ?? This needs updating whenever we support slp group > 1. */
> > > >
> > > > ?
> > > >
> > > > > > > + auto group_size = DR_GROUP_SIZE (stmt_info);
> > > > > > > + /* For the inbound case it's enough to check for an alignment of
> > > > > > > + GROUP_SIZE * element size. */
> > > > > > > + if (inbounds
> > > > > > > + && (*misalignment % (group_size * unit_size)) == 0
> > > > > > > + && group_size % 2 == 0)
> > > >
> > > > It looks fishy that you do not need to consider the VF here.
> > >
> > > This and the ? above from you are related. I don't think we have to consider VF
> > here
> > > since early break vectorization does not support slp group size > 1. So the VF can
> > at
> > > this time never be larger than a single vector. The note is there to update this
> > when we do.
> >
> > You don't need a SLP group-size > 1 to have more than one vector read.
> > Consider
> >
> > if (long_long[i])
> > early break;
> > char[i] = char2[i];
> >
> > where you with V2DI and V16QI end up with a VF of 16 and 4 V2DI loads
> > to compute the early break condition? That said, implying, without
> > double-checking, some constraints on early break vectorization here
> > looks a bit fragile - if there are unwritten constraints we'd better
> > check them here. That also makes it easier to understand.
> >
>
> I did consider this case of widening/unpacking. But I had only considered
> the opposite case where the char was inside the if.
>
> Yes I can see why ncopies needs to be considered here too.
>
> > > >
> > > > > > > + {
> > > > > > > + if (dump_enabled_p ())
> > > > > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > + "Assuming grouped access is aligned due to load "
> > > > > > > + "lanes, overriding alignment scheme\n");
> > > > > > > +
> > > > > > > + *alignment_support_scheme = dr_unaligned_supported;
> > > > > > > + }
> > > > > > > + }
> > > > > > > + /* If we have a linear access and know the misalignment and know we
> > > > won't
> > > > > > > + read out of bounds then it's also ok if the misalignment is a
> > multiple
> > > > > > > + of the element size. We get this when the loop has known
> > misalignments
> > > > > > > + but the misalignments of the DRs can't be peeled to reach mutual
> > > > > > > + alignment. Because the misalignments are known however we
> > also know
> > > > > > > + that versioning won't work. If the target does support unaligned
> > > > > > > + accesses and we know we are free to read the entire buffer then
> > we
> > > > > > > + can allow the unaligned access if it's on elements for an early break
> > > > > > > + condition. */
> > > >
> > > > See above - one of the PRs was exactly that we ovverread a decl even
> > > > if the original scalar accesses are all in-bounds. So we can't allow
> > > > this.
> > > >
> > >
> > > But that PR was only because it was misaligned to an unnatural alignment of the
> > type
> > > which resulted one element possibly being split across a page boundary. That is
> > still
> > > rejected here.
> >
> > By the multiple_p (*misalignment, unit_size) condition? Btw, you should
> > check known_alignment_for_access_p before using *misalignment or
> > check *misalignment != DR_MISALIGNMENT_UNKNOWN.
> >
> > > This check is saying that if you have mutual misalignment, but each DR is in itself
> > still
> > > aligned to the type's natural alignment and we know that all access are in bounds
> > that
> > > we should be safe to vectorize as we won't accidentally access an invalid part of
> > memory.
> >
> > See above - when we do two vector loads we only know the first one had
> > a corresponding scalar access, the second vector load can still be
> > out of bounds. But when we can align to N * unit_size then it's fine
> > the vectorize. And only then (target_alignment > unit_size) we can have
> > *misalignment > unit_size?
> >
>
> Where N here I guess is ncopies * nunits? i.e. number of scalar loads per iteration?
> or rather VF. With most group access this is quite large though..
Yep, that's why we ended up with the worry about PAGE_SIZE in the last
discussion.
> I think you'll probably answer my question above which clarifies this one.
>
> Thanks for the review.
>
> I think with the next reply I have enough to respin it properly.
Thanks - take your time.
Richard.
> Thanks,
> Tamar
>
> > Richard.
> >
> > > > > > > + else if (*memory_access_type != VMAT_GATHER_SCATTER
> > > > > > > + && inbounds)
> > > > > > > + {
> > > > > > > + if (dump_enabled_p ())
> > > > > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > + "Access will not read beyond buffer to due known
> > size "
> > > > > > > + "buffer, overriding alignment scheme\n");
> > > > > > > +
> > > > > > > + *alignment_support_scheme = dr_unaligned_supported;
> > > > > > > + }
> > > > > > > + }
> > > > > > > +
> > > > > > > if (*alignment_support_scheme == dr_unaligned_unsupported)
> > > > > > > {
> > > > > > > if (dump_enabled_p ())
> > > > > > > @@ -10520,6 +10583,68 @@ vectorizable_load (vec_info *vinfo,
> > > > > > > /* Transform. */
> > > > > > >
> > > > > > > dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info),
> > *first_dr_info =
> > > > > > NULL;
> > > > > > > +
> > > > > > > + /* Check if we support the operation if early breaks are needed. */
> > > > > > > + if (loop_vinfo
> > > > > > > + && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > > > + && (memory_access_type == VMAT_GATHER_SCATTER
> > > > > > > + || memory_access_type == VMAT_STRIDED_SLP))
> > > > > > > + {
> > > > > > > + if (dump_enabled_p ())
> > > > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > > > + "early break not supported: cannot peel for "
> > > > > > > + "alignment. With non-contiguous memory
> > vectorization"
> > > > > > > + " could read out of bounds at %G ",
> > > > > > > + STMT_VINFO_STMT (stmt_info));
> > > > > >
> > > > > > Hmm, this is now more restrictive than the original check in
> > > > > > vect_analyze_early_break_dependences because it covers all accesses.
> > > > > > The simplest fix would be to leave it there.
> > > > > >
> > > > >
> > > > > It covers all loads, which you're right is more restrictive, I think it just needs to
> > be
> > > > > moved inside if (costing_p && dr_info->need_peeling_for_alignment) block
> > > > > below it though.
> > > > >
> > > > > Delaying this to here instead of earlier has allowed us to vectorize
> > > > gcc.dg/vect/bb-slp-pr65935.c
> > > > > Which now vectorizes after the inner loops are unrolled.
> > > > >
> > > > > Are you happy with just moving it down?
> > > >
> > > > OK.
> > > >
> > > > > > > + return false;
> > > > > > > + }
> > > > > > > +
> > > > > > > + /* If this DR needs peeling for alignment for correctness, we must
> > > > > > > + ensure the target alignment is a constant power-of-two multiple of the
> > > > > > > + amount read per vector iteration (overriding the above hook where
> > > > > > > + necessary). We don't support group loads, which would have been
> > filterd
> > > > > > > + out in the check above. For now it means we don't have to look at the
> > > > > > > + group info and just check that the load is continguous and can just use
> > > > > > > + dr_info. For known size buffers we still need to check if the vector
> > > > > > > + is misaligned and if so we need to peel. */
> > > > > > > + if (costing_p && dr_info->need_peeling_for_alignment)
> > > > > >
> > > > > > dr_peeling_alignment ()
> > > > > >
> > > > > > I think this belongs in get_load_store_type, specifically I think
> > > > > > we want to only allow VMAT_CONTIGUOUS for
> > need_peeling_for_alignment
> > > > refs
> > > > > > and by construction dr_aligned should ensure type size alignment
> > > > > > (by altering target_alignment for the respective refs). Given that
> > > > > > both VF and vector type are fixed at vect_analyze_data_refs_alignment
> > > > > > time we should be able to compute the appropriate target alignment
> > > > > > there (I'm not sure we support peeling of more than VF-1 iterations
> > > > > > though).
> > > > > >
> > > > > > > + {
> > > > > > > + /* Vector size in bytes. */
> > > > > > > + poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT
> > > > (vectype));
> > > > > > > +
> > > > > > > + /* We can only peel for loops, of course. */
> > > > > > > + gcc_checking_assert (loop_vinfo);
> > > > > > > +
> > > > > > > + auto num_vectors = ncopies;
> > > > > > > + if (!pow2p_hwi (num_vectors))
> > > > > > > + {
> > > > > > > + if (dump_enabled_p ())
> > > > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > > > + "non-power-of-two num vectors %u "
> > > > > > > + "for DR needing peeling for "
> > > > > > > + "alignment at %G",
> > > > > > > + num_vectors, STMT_VINFO_STMT (stmt_info));
> > > > > > > + return false;
> > > > > > > + }
> > > > > > > +
> > > > > > > + safe_align *= num_vectors;
> > > > > > > + if (known_gt (safe_align, (unsigned)param_min_pagesize)
> > > > > > > + /* For VLA we don't support PFA when any unrolling needs to be
> > done.
> > > > > > > + We could though but too much work for GCC 15. For now we
> > assume a
> > > > > > > + vector is not larger than a page size so allow single loads. */
> > > > > > > + && (num_vectors > 1 && !vf.is_constant ()))
> > > > > > > + {
> > > > > > > + if (dump_enabled_p ())
> > > > > > > + {
> > > > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > > > + "alignment required for correctness (");
> > > > > > > + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> > > > > > > + dump_printf (MSG_NOTE, ") may exceed page size\n");
> > > > > > > + }
> > > > > > > + return false;
> > > > > > > + }
> > > > > > > + }
> > > > > > > +
> > > > > > > ensure_base_align (dr_info);
> > > > > > >
> > > > > > > if (memory_access_type == VMAT_INVARIANT)
> > > > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > > > > index
> > > > > >
> > > >
> > 44d3a1d46c409597f1e67a275211a1da414fc7c7..6ef97ee84336ac9a0e192214
> > > > > > 5f3d418436d709f4 100644
> > > > > > > --- a/gcc/tree-vectorizer.h
> > > > > > > +++ b/gcc/tree-vectorizer.h
> > > > > > > @@ -1998,6 +1998,19 @@ dr_target_alignment (dr_vec_info *dr_info)
> > > > > > > }
> > > > > > > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> > > > > > >
> > > > > > > +/* Return if the stmt_vec_info requires peeling for alignment. */
> > > > > > > +inline bool
> > > > > > > +dr_peeling_alignment (stmt_vec_info stmt_info)
> > > > > > > +{
> > > > > > > + dr_vec_info *dr_info;
> > > > > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > > > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT
> > > > (stmt_info));
> > > > > >
> > > > > > ... but checking it on the first only.
> > > > >
> > > > > I did it that way because I was under the assumption that group loads could
> > be
> > > > relaxed
> > > > > to e.g. element wise or some other form. If it's the case that the group cannot
> > > > grow or
> > > > > be changed I could instead set it only on the first access and then not need to
> > > > check it
> > > > > elsewhere if you prefer.
> > > >
> > > > I've merely noted the discrepancy - consider
> > > >
> > > > if (a[2*i+1])
> > > > early break;
> > > > ... = a[2*i];
> > > >
> > > > then you'd set ->needs_peeling on the 2nd group member but
> > > > dr_peeling_alignment would always check the first. So yes, I think
> > > > we always want to set the flag on the first element of a grouped
> > > > access. We're no longer splitting groups when loop vectorizing.
> > > >
> > >
> > > Ack,
> > >
> > > Will update.
> > >
> > > Thanks,
> > > Tamar
> > >
> > > > Richard.
> > > >
> > > > >
> > > > > Thanks,
> > > > > Tamar
> > > > > >
> > > > > > > + else
> > > > > > > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > > > > > +
> > > > > > > + return dr_info->need_peeling_for_alignment;
> > > > > > > +}
> > > > > > > +
> > > > > > > inline void
> > > > > > > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> > > > > > > {
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de>
> > > > > > SUSE Software Solutions Germany GmbH,
> > > > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH,
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>
On Tue, 11 Feb 2025, Tamar Christina wrote:
> Hi All,
>
> This fixes two PRs on Early break vectorization by delaying the safety checks to
> vectorizable_load when the VF, VMAT and vectype are all known.
>
> This patch does add two new restrictions:
>
> 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
> group sizes, as they are unaligned every n % 2 iterations and so may cross
> a page unwittingly.
>
> 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if
> we cannot peel for alignment, as the alignment requirement is quite large at
> GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
> don't support it for now.
>
> There are other steps documented inside the code itself so that the reasoning
> is next to the code.
>
> Note that for VLA I have still left this fully disabled when not working on a
> fixed buffer.
>
> For VLA targets like SVE return element alignment as the desired vector
> alignment. This means that the loads are never misaligned and so annoying it
> won't ever need to peel.
>
> So what I think needs to happen in GCC 16 is that.
>
> 1. during vect_compute_data_ref_alignment we need to take the max of
> POLY_VALUE_MIN and vector_alignment.
>
> 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a
> check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use as a
> proxy for pagesize.
>
> 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
> vect_determine_partial_vectors_and_peeling since the first iteration has to
> be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
> vectorize.
>
> 4. Create a default mask to be used, so that vect_use_loop_mask_for_alignment_p
> becomes true and we generate the peeled check through loop control for
> partial loops. From what I can tell this won't work for
> LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support at
> all in the compiler. That would need to be done independently from the
> above.
We basically need to implement peeling/versioning for alignment based
on the actual POLY value with the fallback being first-fault loads.
> In any case, not GCC 15 material so I've kept the WIP patches I have downstream.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR tree-optimization/118464
> PR tree-optimization/116855
> * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
> * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
> checks.
> (vect_compute_data_ref_alignment): Remove alignment checks and move to
> get_load_store_type, increase group access alignment.
> (vect_enhance_data_refs_alignment): Add note to comment needing
> investigating.
> (vect_analyze_data_refs_alignment): Likewise.
> (vect_supportable_dr_alignment): For group loads look at first DR.
> * tree-vect-stmts.cc (get_load_store_type):
> Perform safety checks for early break pfa.
> * tree-vectorizer.h (dr_peeling_alignment,
> dr_set_peeling_alignment): New.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/118464
> PR tree-optimization/116855
> * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
> load type is relaxed later.
> * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
> * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
> * g++.dg/ext/pragma-unroll-lambda-lto.C: Add pragma novector.
> * gcc.dg/tree-ssa/gen-vect-2.c: Likewise.
> * gcc.dg/tree-ssa/gen-vect-25.c: Likewise.
> * gcc.dg/tree-ssa/gen-vect-32.c: Likewise.
> * gcc.dg/tree-ssa/ivopt_mult_2g.c: Likewise.
> * gcc.dg/tree-ssa/ivopts-5.c: Likewise.
> * gcc.dg/tree-ssa/ivopts-6.c: Likewise.
> * gcc.dg/tree-ssa/ivopts-7.c: Likewise.
> * gcc.dg/tree-ssa/ivopts-8.c: Likewise.
> * gcc.dg/tree-ssa/ivopts-9.c: Likewise.
> * gcc.dg/tree-ssa/predcom-dse-1.c: Likewise.
> * gcc.dg/tree-ssa/predcom-dse-10.c: Likewise.
> * gcc.dg/tree-ssa/predcom-dse-11.c: Likewise.
> * gcc.dg/tree-ssa/predcom-dse-12.c: Likewise.
> * gcc.dg/tree-ssa/predcom-dse-2.c: Likewise.
> * gcc.dg/tree-ssa/predcom-dse-3.c: Likewise.
> * gcc.dg/tree-ssa/predcom-dse-4.c: Likewise.
> * gcc.dg/tree-ssa/predcom-dse-5.c: Likewise.
> * gcc.dg/tree-ssa/predcom-dse-6.c: Likewise.
> * gcc.dg/tree-ssa/predcom-dse-7.c: Likewise.
> * gcc.dg/tree-ssa/predcom-dse-8.c: Likewise.
> * gcc.dg/tree-ssa/predcom-dse-9.c: Likewise.
> * gcc.target/i386/pr90178.c: Likewise.
> * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment.
>
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 0aef2abf05b9b2f5996de69d5ebc3a21109ee6e1..db00f8b403814b58261849d8917863dc06bbf3e2 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17256,7 +17256,7 @@ Maximum number of relations the oracle will register in a basic block.
> Work bound when discovering transitive relations from existing relations.
>
> @item min-pagesize
> -Minimum page size for warning purposes.
> +Minimum page size for warning and early break vectorization purposes.
>
> @item openacc-kernels
> Specify mode of OpenACC `kernels' constructs handling.
> diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> index 0db57c8d3a01985e1e76bb9f8a52613179060f19..5980bf316899553e16d078deee32911f31fafd94 100644
> --- a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> +++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> @@ -10,6 +10,7 @@ inline Iter
> my_find(Iter first, Iter last, Pred pred)
> {
> #pragma GCC unroll 4
> +#pragma GCC novector
> while (first != last && !pred(*first))
> ++first;
> return first;
> diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> new file mode 100644
> index 0000000000000000000000000000000000000000..5e50e56ad17515e278c05c92263af120c3ab2c21
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include <cstddef>
> +
> +struct ts1 {
> + int spans[6][2];
> +};
> +struct gg {
> + int t[6];
> +};
> +ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
> + ts1 ret;
> + for (size_t i = 0; i != t; i++) {
> + if (!(i < t)) __builtin_abort();
> + ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
> + ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> index a35999a172ac762bb4873d10b331301750f4015b..00fc8f01991cc994737bc2088e72d85f249bf341 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> @@ -29,6 +29,7 @@ int main ()
> }
>
> /* check results: */
> +#pragma GCC novector
> for (i = 0; i < N; i++)
> {
> if (ca[i] != cb[i])
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> index 9f14a54c413757df7230b7b6053c83a8a5a1e6c9..99d5e6231ff053089782b52dc6ce9b9ccb8c64a0 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> @@ -27,6 +27,7 @@ int main_1 (int n, int *p)
> }
>
> /* check results: */
> +#pragma GCC novector
> for (i = 0; i < N; i++)
> {
> if (ia[i] != n)
> @@ -40,6 +41,7 @@ int main_1 (int n, int *p)
> }
>
> /* check results: */
> +#pragma GCC novector
> for (i = 0; i < N; i++)
> {
> if (ib[i] != k)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> index 62d2b5049fd902047540b90a2ef79b789f903969..1202ec326c7e0020daf58af9544cdbe2b1da4914 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> @@ -23,6 +23,7 @@ int main ()
> }
>
> /* check results: */
> +#pragma GCC novector
> for (i = 0; i < N; i++)
> {
> if (s.ca[i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> index dd06e598f7f48e1a75eba41d626860404325259d..b79bd10585f501992c93648ea1a1f2d2699c07c1 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> -/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */
> +/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details -fno-tree-vectorize" } */
>
> /* Exit tests 'i < N1' and 'p2 > p_limit2' can be replaced, so
> * two ivs i and p2 can be eliminate. */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> index a6af497f4bf7f1ef6c64e09b87931225287d78e0..7b9615f07f3c4af3657eb7d0183c1a51de9fbc42 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> @@ -5,6 +5,7 @@ int*
> foo (int* mem, int sz, int val)
> {
> int i;
> +#pragma GCC novector
> for (i = 0; i < sz; i++)
> if (mem[i] == val)
> return &mem[i];
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> index 8383154f99f2559873ef5b3a8fa8119cf679782f..08304293140a82e5484c8399b4374a474c66b34b 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> @@ -5,6 +5,7 @@ int*
> foo (int* mem, int sz, int val)
> {
> int i;
> +#pragma GCC novector
> for (i = 0; i != sz; i++)
> if (mem[i] == val)
> return &mem[i];
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> index 44f5603d4f5b8da6c759e8732503638131b0fca8..03160f234f74319cda6d7450788da871ea0cea74 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> @@ -5,6 +5,7 @@ int*
> foo (int* mem, int beg, int end, int val)
> {
> int i;
> +#pragma GCC novector
> for (i = beg; i < end; i++)
> if (mem[i] == val)
> return &mem[i];
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> index b2556eaac0d02f65a50bbd532a47fef9c0b1dfa8..a7fd3c9de3746c116dfb73419805fd7ce6e69ffa 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> @@ -5,6 +5,7 @@ int*
> foo (int* mem, char sz, int val)
> {
> char i;
> +#pragma GCC novector
> for (i = 0; i < sz; i++)
> if (mem[i] == val)
> return &mem[i];
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> index d26d994f9bd28bc2346a6878d48b159729851ef6..fb9656b88d7bea8a9a84e2ca6ff877a2aac7e05b 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> @@ -5,6 +5,7 @@ int*
> foo (int* mem, unsigned char sz, int val)
> {
> unsigned char i;
> +#pragma GCC novector
> for (i = 0; i < sz; i++)
> if (mem[i] == val)
> return &mem[i];
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> index a0a04a08c61d48128ad5fd1a11daaf0abc783053..b660f9d258423356a4d73d5996a5f1a8ede9ead9 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> @@ -32,6 +32,7 @@ void check (int *a, int *res, int len)
> {
> int i;
>
> +#pragma GCC novector
> for (i = 0; i < len; i++)
> if (a[i] != res[i])
> abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> index f770a8ad812aedee8f65b011134cda91cbe2bf91..8e5a3a434986a31bb635bf3bc1ecc36d463f2ee7 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> @@ -23,6 +23,7 @@ void check (int *a, int *res, int len)
> {
> int i;
>
> +#pragma GCC novector
> for (i = 0; i < len; i++)
> if (a[i] != res[i])
> abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> index ed2b96a0d1a4e0c90bf52a83b5f21e2fd1c5a5c5..fd56fd9747e3c572c93107188ede7482ad01bb99 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> @@ -29,6 +29,7 @@ void check (int *a, int *res, int len, int sum, int val)
> if (sum != val)
> abort ();
>
> +#pragma GCC novector
> for (i = 0; i < len; i++)
> if (a[i] != res[i])
> abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> index 2487c1c8205a4f09fd16974f3599ddc8c48b92cf..5eac905aff87e6c4aa4449c689d2594b240fec4e 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> @@ -37,6 +37,7 @@ void check (int *a, int *res, int len, int sval)
> if (sum != sval)
> abort ();
>
> +#pragma GCC novector
> for (i = 0; i < len; i++)
> if (a[i] != res[i])
> abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> index 020ca705790d6ace707184c9d2804f3d690de916..801acad33e9d6b7eb17f0cde408903c4f2674acc 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> @@ -32,6 +32,7 @@ void check (int *a, int *res, int len)
> {
> int i;
>
> +#pragma GCC novector
> for (i = 0; i < len; i++)
> if (a[i] != res[i])
> abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> index 667cc333d9f2c030474e0b3115c0b86cda733c2e..8b82bdbc0c92cc579824393dc15f2f5a3e5f55e5 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> @@ -40,6 +40,7 @@ void check (int *a, int *res, int len)
> {
> int i;
>
> +#pragma GCC novector
> for (i = 0; i < len; i++)
> if (a[i] != res[i])
> abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> index 8118461af0b63d1f9b42879783ae2650a9d9b34a..0d64bc72f82341fd0518a6f59ad2a10aec7b0088 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> @@ -31,6 +31,7 @@ void check (int *a, int *res, int len)
> {
> int i;
>
> +#pragma GCC novector
> for (i = 0; i < len; i++)
> if (a[i] != res[i])
> abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> index 03fa646661e2839946e80e0b27ea1d0ea0ef9aeb..7db3bca3b2df98f3c0b3db00be18fc8054644655 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> @@ -33,6 +33,7 @@ void check (int *a, int *res, int len)
> {
> int i;
>
> +#pragma GCC novector
> for (i = 0; i < len; i++)
> if (a[i] != res[i])
> abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> index ab2fd403d3005ba06d9992580945ce28f8fb1c09..1267bae5f1c44d60d484cca7d88a5714770f147f 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> @@ -35,6 +35,7 @@ void check (int *a, int *res, int len)
> {
> int i;
>
> +#pragma GCC novector
> for (i = 0; i < len; i++)
> if (a[i] != res[i])
> abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> index c746ebd715561eb9f7192a433c321f86e0751eaa..cfe44a06ce4ada6fddc3659ddf748a16904b5d9e 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> @@ -33,6 +33,7 @@ void check (int *a, int *res, int len)
> {
> int i;
>
> +#pragma GCC novector
> for (i = 0; i < len; i++)
> if (a[i] != res[i])
> abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> index 6c4e9afa487ed33e4ab5d887640e0efa44a72c6d..646e43d9aad2b235bdae0d9d52df89a3da2dd3e4 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> @@ -31,6 +31,7 @@ void check (int *a, int *res, int len)
> {
> int i;
>
> +#pragma GCC novector
> for (i = 0; i < len; i++)
> if (a[i] != res[i])
> abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> index 9c5e8ca9a793b0405e7f448798aa1fac483d2f05..30daf82fac5cef2e26e4597aa4eb10aa33cd0af2 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> @@ -69,6 +69,7 @@ void check (int *a, int *res, int len)
> {
> int i;
>
> +#pragma GCC novector
> for (i = 0; i < len; i++)
> if (a[i] != res[i])
> abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> index 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d93c950629f3231554 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> @@ -55,7 +55,9 @@ int main()
> }
> }
> rephase ();
> +#pragma GCC novector
> for (i = 0; i < 32; ++i)
> +#pragma GCC novector
> for (j = 0; j < 3; ++j)
> #pragma GCC novector
> for (k = 0; k < 3; ++k)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> index 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..5464d1d56fe97542a2dfc7afba39aabc0468737c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> @@ -5,7 +5,8 @@
> /* { dg-additional-options "-O3" } */
> /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* Arm and -m32 create a group size of 3 here, which we can't support yet. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! arm*-*-* } || { { x86_64-*-* i?86-*-* } && ilp64 } } } } } */
>
> typedef struct filter_list_entry {
> const char *name;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8ac83ab569fc9fbde126
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +int a, b, c, d, e, f;
> +short g[1];
> +int main() {
> + int h;
> + while (a) {
> + while (h)
> + ;
> + for (b = 2; b; b--) {
> + while (c)
> + ;
> + f = g[a];
> + if (d)
> + break;
> + }
> + while (e)
> + ;
> + }
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..dc771186efafe25bb65490da7a383ad7f6ceb0a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020];
> +
> +char * find(int n, char c)
> +{
> + for (int i = 1; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..82d473a279ce060c550289c61729d9f9b56f0d2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> @@ -0,0 +1,24 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* Alignment requirement too big, load lanes targets can't safely vectorize this. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" { target { ! vect_load_lanes } } } } */
> +
> +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < (n - 2); i+=2)
> + {
> + if (vect_a[i] > x || vect_a[i+2] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758ca29a5f3f9d3f6e0d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020];
> +
> +char * find(int n, char c)
> +{
> + for (int i = 0; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..374a051b945e97eedb9be9da423cf54b5e564d6f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020] __attribute__((aligned(1)));
> +
> +char * find(int n, char c)
> +{
> + for (int i = 1; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f257ceea1c065fcc6ae9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020] __attribute__((aligned(1)));
> +
> +char * find(int n, char c)
> +{
> + for (int i = 0; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ca95be44e92e32769da1d1e9b740ae54682a3d55
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char *vect, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < n; i++)
> + {
> + if (vect[i] > x)
> + return 1;
> +
> + vect[i] = x;
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c64a61c97b1b6268743
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char *vect_a, char *vect_b, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 1; i < n; i++)
> + {
> + if (vect_a[i] > x || vect_b[i] > x)
> + return 1;
> +
> + vect_a[i] = x;
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f50776299531824ce9c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* This should be vectorizable through load_lanes and linear targets. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < n; i+=2)
> + {
> + if (vect_a[i] > x || vect_a[i+1] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..dbb14ba3239c91b9bfdf56cecc60750394e10f2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> @@ -0,0 +1,25 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +char vect_a[1025];
> +char vect_b[1025];
> +
> +unsigned test4(char x, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 1; i < (n - 2); i+=2)
> + {
> + if (vect_a[i] > x || vect_a[i+1] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..31e2096209253539483253efc17499a53d112894
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> @@ -0,0 +1,28 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* Group size is uneven, load lanes targets can't safely vectorize this. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
> +
> +
> +char vect_a[1025];
> +char vect_b[1025];
> +
> +unsigned test4(char x, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 1; i < (n - 2); i+=2)
> + {
> + if (vect_a[i-1] > x || vect_a[i+2] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> index 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46a462238c3b5825ef 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n)
> return ret;
> }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> +/* cannot safely vectorize this due due to the group misalignment. */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90178.c b/gcc/testsuite/gcc.target/i386/pr90178.c
> index 1df36af0541c01f3624fe51efbc8cfa0ec67fe60..e9fea04fb148ed53c1ac9b2c6ed73e85ba982b42 100644
> --- a/gcc/testsuite/gcc.target/i386/pr90178.c
> +++ b/gcc/testsuite/gcc.target/i386/pr90178.c
> @@ -4,6 +4,7 @@
> int*
> find_ptr (int* mem, int sz, int val)
> {
> +#pragma GCC novector
> for (int i = 0; i < sz; i++)
> if (mem[i] == val)
> return &mem[i];
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..c85df96685f64f9814251f2d4fdbcc5973f2b513 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> if (is_gimple_debug (stmt))
> continue;
>
> - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> + stmt_vec_info stmt_vinfo
> + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> if (!dr_ref)
> continue;
> @@ -748,26 +749,14 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> bounded by VF so accesses are within range. We only need to check
> the reads since writes are moved to a safe place where if we get
> there we know they are safe to perform. */
> - if (DR_IS_READ (dr_ref)
> - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> + if (DR_IS_READ (dr_ref))
> {
> - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> - {
> - const char *msg
> - = "early break not supported: cannot peel "
> - "for alignment, vectorization would read out of "
> - "bounds at %G";
> - return opt_result::failure_at (stmt, msg, stmt);
> - }
> -
> - dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> - dr_info->need_peeling_for_alignment = true;
> + dr_set_peeling_alignment (stmt_vinfo, true);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> - "marking DR (read) as needing peeling for "
> - "alignment at %G", stmt);
> + "marking DR (read) as possibly needing peeling "
> + "for alignment at %G", stmt);
> }
>
> if (DR_IS_READ (dr_ref))
> @@ -1326,9 +1315,6 @@ vect_record_base_alignments (vec_info *vinfo)
> Compute the misalignment of the data reference DR_INFO when vectorizing
> with VECTYPE.
>
> - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
> - be set appropriately on failure (but is otherwise left unchanged).
> -
> Output:
> 1. initialized misalignment info for DR_INFO
>
> @@ -1337,7 +1323,7 @@ vect_record_base_alignments (vec_info *vinfo)
>
> static void
> vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> - tree vectype, opt_result *result = nullptr)
> + tree vectype)
> {
> stmt_vec_info stmt_info = dr_info->stmt;
> vec_base_alignments *base_alignments = &vinfo->base_alignments;
> @@ -1365,63 +1351,20 @@ vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> BITS_PER_UNIT);
>
> - /* If this DR needs peeling for alignment for correctness, we must
> - ensure the target alignment is a constant power-of-two multiple of the
> - amount read per vector iteration (overriding the above hook where
> - necessary). */
> - if (dr_info->need_peeling_for_alignment)
> + /* If we have a grouped access we require that the alignment be VF * elem. */
> + if (loop_vinfo
> + && dr_peeling_alignment (stmt_info)
> + && STMT_VINFO_GROUPED_ACCESS (stmt_info))
> {
> - /* Vector size in bytes. */
> - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> -
> - /* We can only peel for loops, of course. */
> - gcc_checking_assert (loop_vinfo);
> -
> - /* Calculate the number of vectors read per vector iteration. If
> - it is a power of two, multiply through to get the required
> - alignment in bytes. Otherwise, fail analysis since alignment
> - peeling wouldn't work in such a case. */
> - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> - if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> - num_scalars *= DR_GROUP_SIZE (stmt_info);
> -
> - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> - if (!pow2p_hwi (num_vectors))
> - {
> - *result = opt_result::failure_at (vect_location,
> - "non-power-of-two num vectors %u "
> - "for DR needing peeling for "
> - "alignment at %G",
> - num_vectors, stmt_info->stmt);
> - return;
> - }
> -
> - safe_align *= num_vectors;
> - if (maybe_gt (safe_align, 4096U))
> - {
> - pretty_printer pp;
> - pp_wide_integer (&pp, safe_align);
> - *result = opt_result::failure_at (vect_location,
> - "alignment required for correctness"
> - " (%s) may exceed page size",
> - pp_formatted_text (&pp));
> - return;
> - }
> -
> - unsigned HOST_WIDE_INT multiple;
> - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> - || !pow2p_hwi (multiple))
> + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> + vector_alignment
> + = vf * TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
I think we discussed this before, also when introducing peeling
for alignment support. This is incorrect for grouped accesses where
the number of scalar elements accessed is GROUP_SIZE * vf, so you
miss a multiplication by GROUP_SIZE here.
Note that this (and also your VF * element_size) can result in a
non-power-of-two value.
That said, I'm quite sure we don't want to have a dr->target_alignment
that isn't power-of-two, so if the comput doesn't end up with a
power-of-two value we should leave it as the target prefers and
fixup (or fail) during vectorizable_load.
> + if (dump_enabled_p ())
> {
> - if (dump_enabled_p ())
> - {
> - dump_printf_loc (MSG_NOTE, vect_location,
> - "forcing alignment for DR from preferred (");
> - dump_dec (MSG_NOTE, vector_alignment);
> - dump_printf (MSG_NOTE, ") to safe align (");
> - dump_dec (MSG_NOTE, safe_align);
> - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> - }
> - vector_alignment = safe_align;
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "alignment increased due to early break to ");
> + dump_dec (MSG_NOTE, vector_alignment);
> + dump_printf (MSG_NOTE, " bytes.\n");
> }
> }
>
> @@ -2487,6 +2430,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
> || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> loop_preheader_edge (loop))
> || loop->inner
> + /* We don't currently maintaing the LCSSA for prologue peeled inversed
> + loops. */
> || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> do_peeling = false;
>
> @@ -2950,12 +2895,9 @@ vect_analyze_data_refs_alignment (loop_vec_info loop_vinfo)
> if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
> continue;
> - opt_result res = opt_result::success ();
> +
> vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> - STMT_VINFO_VECTYPE (dr_info->stmt),
> - &res);
> - if (!res)
> - return res;
> + STMT_VINFO_VECTYPE (dr_info->stmt));
> }
> }
>
> @@ -7219,7 +7161,7 @@ vect_supportable_dr_alignment (vec_info *vinfo, dr_vec_info *dr_info,
>
> if (misalignment == 0)
> return dr_aligned;
> - else if (dr_info->need_peeling_for_alignment)
> + else if (dr_peeling_alignment (stmt_info))
> return dr_unaligned_unsupported;
>
> /* For now assume all conditional loads/stores support unaligned
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..436d373ae6ec06aff165a7bee37b3fa1dc95079b 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2597,6 +2597,89 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
> return false;
> }
>
> + /* If this DR needs peeling for alignment for correctness, we must
> + ensure the target alignment is a constant power-of-two multiple of the
> + amount read per vector iteration (overriding the above hook where
> + necessary). */
> + if (dr_peeling_alignment (stmt_info))
> + {
> + /* We can only peel for loops, of course. */
> + gcc_checking_assert (loop_vinfo);
> +
> + /* Check if we support the operation if early breaks are needed. */
> + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> + && (*memory_access_type == VMAT_GATHER_SCATTER
> + || *memory_access_type == VMAT_STRIDED_SLP))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "early break not supported: cannot peel for "
> + "alignment. With non-contiguous memory vectorization"
> + " could read out of bounds at %G ",
> + STMT_VINFO_STMT (stmt_info));
> + return false;
> + }
> +
> + /* Even if uneven group sizes are aligned on the first load, the second
> + iteration won't be. As such reject uneven group sizes. */
> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
> + && (DR_GROUP_SIZE (stmt_info) % 2) == 1)
Hmm, but a group size of 6 is even, but a vector size of four doesn't
make the 2nd aligned. So we need a power-of-two GROUP_SIZE * VF
and a byte alignment according to that.
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "early break not supported: uneven group size, "
> + "vectorization could read out of bounds at %G ",
> + STMT_VINFO_STMT (stmt_info));
> + return false;
> + }
> +
> + /* Vector size in bytes. */
> + poly_uint64 safe_align;
> + if (nunits.is_constant ())
> + safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> + else
> + safe_align = estimated_poly_value (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> + POLY_VALUE_MAX);
> +
> + auto num_vectors = ncopies;
> + if (!pow2p_hwi (num_vectors))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "non-power-of-two num vectors %u "
> + "for DR needing peeling for "
> + "alignment at %G",
> + num_vectors, STMT_VINFO_STMT (stmt_info));
> + return false;
> + }
> +
> + safe_align *= num_vectors;
> + bool inbounds
> + = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
> + DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
I'm again confused why you think ref_within_array_bound can be used to
validize anything?
> + /* For VLA we have to insert a runtime check that the vector loads
> + per iterations don't exceed a page size. For now we can use
> + POLY_VALUE_MAX as a proxy as we can't peel for VLA. */
> + if (maybe_gt (safe_align, (unsigned)param_min_pagesize)
> + /* We don't support PFA for VLA at the moment. Some targets like SVE
> + return a target alignment requirement of a single element. For
> + early break this is potentially unsafe so we can't count on
> + alignment rejecting such loops later as it thinks loads are never
> + misaligned. */
> + || (!nunits.is_constant () && !inbounds))
> + {
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "alignment required for correctness (");
> + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> + dump_printf (MSG_NOTE, ") may exceed page size\n");
> + }
> + return false;
> + }
> + *alignment_support_scheme = dr_unaligned_supported;
and the only thing should be *alignment_support_scheme == dr_aligned,
and with a possibly too low taget_alignment even that's not enough.
Can you split out the testsuite part that just adds #pragma GCC novector?
That part is OK.
Thanks,
Richard.
> + }
> +
> if (*alignment_support_scheme == dr_unaligned_unsupported)
> {
> if (dump_enabled_p ())
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..aeaf714c155bc2d87bf50e6dba0dbfbcca027441 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1998,6 +1998,33 @@ dr_target_alignment (dr_vec_info *dr_info)
> }
> #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
>
> +/* Return if the stmt_vec_info requires peeling for alignment. */
> +inline bool
> +dr_peeling_alignment (stmt_vec_info stmt_info)
> +{
> + dr_vec_info *dr_info;
> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> + else
> + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +
> + return dr_info->need_peeling_for_alignment;
> +}
> +
> +/* Set the need_peeling_for_alignment for the the stmt_vec_info, if group
> + access then set on the fist element otherwise set on DR directly. */
> +inline void
> +dr_set_peeling_alignment (stmt_vec_info stmt_info, bool requires_alignment)
> +{
> + dr_vec_info *dr_info;
> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> + else
> + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +
> + dr_info->need_peeling_for_alignment = requires_alignment;
> +}
> +
> inline void
> set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> {
>
>
>
>
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, February 12, 2025 2:58 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: Re: [PATCH v2]middle-end: delay checking for alignment to load
> [PR118464]
>
> On Tue, 11 Feb 2025, Tamar Christina wrote:
>
> > Hi All,
> >
> > This fixes two PRs on Early break vectorization by delaying the safety checks to
> > vectorizable_load when the VF, VMAT and vectype are all known.
> >
> > This patch does add two new restrictions:
> >
> > 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
> > group sizes, as they are unaligned every n % 2 iterations and so may cross
> > a page unwittingly.
> >
> > 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization
> if
> > we cannot peel for alignment, as the alignment requirement is quite large at
> > GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
> > don't support it for now.
> >
> > There are other steps documented inside the code itself so that the reasoning
> > is next to the code.
> >
> > Note that for VLA I have still left this fully disabled when not working on a
> > fixed buffer.
> >
> > For VLA targets like SVE return element alignment as the desired vector
> > alignment. This means that the loads are never misaligned and so annoying it
> > won't ever need to peel.
> >
> > So what I think needs to happen in GCC 16 is that.
> >
> > 1. during vect_compute_data_ref_alignment we need to take the max of
> > POLY_VALUE_MIN and vector_alignment.
> >
> > 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a
> > check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use
> as a
> > proxy for pagesize.
> >
> > 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
> > vect_determine_partial_vectors_and_peeling since the first iteration has to
> > be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
> > vectorize.
> >
> > 4. Create a default mask to be used, so that
> vect_use_loop_mask_for_alignment_p
> > becomes true and we generate the peeled check through loop control for
> > partial loops. From what I can tell this won't work for
> > LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling
> support at
> > all in the compiler. That would need to be done independently from the
> > above.
>
> We basically need to implement peeling/versioning for alignment based
> on the actual POLY value with the fallback being first-fault loads.
>
> > In any case, not GCC 15 material so I've kept the WIP patches I have
> downstream.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/118464
> > PR tree-optimization/116855
> > * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
> > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
> > checks.
> > (vect_compute_data_ref_alignment): Remove alignment checks and move
> to
> > get_load_store_type, increase group access alignment.
> > (vect_enhance_data_refs_alignment): Add note to comment needing
> > investigating.
> > (vect_analyze_data_refs_alignment): Likewise.
> > (vect_supportable_dr_alignment): For group loads look at first DR.
> > * tree-vect-stmts.cc (get_load_store_type):
> > Perform safety checks for early break pfa.
> > * tree-vectorizer.h (dr_peeling_alignment,
> > dr_set_peeling_alignment): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/118464
> > PR tree-optimization/116855
> > * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
> > load type is relaxed later.
> > * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> > * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
> > * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> > * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
> > * g++.dg/ext/pragma-unroll-lambda-lto.C: Add pragma novector.
> > * gcc.dg/tree-ssa/gen-vect-2.c: Likewise.
> > * gcc.dg/tree-ssa/gen-vect-25.c: Likewise.
> > * gcc.dg/tree-ssa/gen-vect-32.c: Likewise.
> > * gcc.dg/tree-ssa/ivopt_mult_2g.c: Likewise.
> > * gcc.dg/tree-ssa/ivopts-5.c: Likewise.
> > * gcc.dg/tree-ssa/ivopts-6.c: Likewise.
> > * gcc.dg/tree-ssa/ivopts-7.c: Likewise.
> > * gcc.dg/tree-ssa/ivopts-8.c: Likewise.
> > * gcc.dg/tree-ssa/ivopts-9.c: Likewise.
> > * gcc.dg/tree-ssa/predcom-dse-1.c: Likewise.
> > * gcc.dg/tree-ssa/predcom-dse-10.c: Likewise.
> > * gcc.dg/tree-ssa/predcom-dse-11.c: Likewise.
> > * gcc.dg/tree-ssa/predcom-dse-12.c: Likewise.
> > * gcc.dg/tree-ssa/predcom-dse-2.c: Likewise.
> > * gcc.dg/tree-ssa/predcom-dse-3.c: Likewise.
> > * gcc.dg/tree-ssa/predcom-dse-4.c: Likewise.
> > * gcc.dg/tree-ssa/predcom-dse-5.c: Likewise.
> > * gcc.dg/tree-ssa/predcom-dse-6.c: Likewise.
> > * gcc.dg/tree-ssa/predcom-dse-7.c: Likewise.
> > * gcc.dg/tree-ssa/predcom-dse-8.c: Likewise.
> > * gcc.dg/tree-ssa/predcom-dse-9.c: Likewise.
> > * gcc.target/i386/pr90178.c: Likewise.
> > * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment.
> >
> > ---
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index
> 0aef2abf05b9b2f5996de69d5ebc3a21109ee6e1..db00f8b403814b58261849d
> 8917863dc06bbf3e2 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -17256,7 +17256,7 @@ Maximum number of relations the oracle will
> register in a basic block.
> > Work bound when discovering transitive relations from existing relations.
> >
> > @item min-pagesize
> > -Minimum page size for warning purposes.
> > +Minimum page size for warning and early break vectorization purposes.
> >
> > @item openacc-kernels
> > Specify mode of OpenACC `kernels' constructs handling.
> > diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > index
> 0db57c8d3a01985e1e76bb9f8a52613179060f19..5980bf316899553e16d078d
> eee32911f31fafd94 100644
> > --- a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > +++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > @@ -10,6 +10,7 @@ inline Iter
> > my_find(Iter first, Iter last, Pred pred)
> > {
> > #pragma GCC unroll 4
> > +#pragma GCC novector
> > while (first != last && !pred(*first))
> > ++first;
> > return first;
> > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..5e50e56ad17515e278c05c
> 92263af120c3ab2c21
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +#include <cstddef>
> > +
> > +struct ts1 {
> > + int spans[6][2];
> > +};
> > +struct gg {
> > + int t[6];
> > +};
> > +ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
> > + ts1 ret;
> > + for (size_t i = 0; i != t; i++) {
> > + if (!(i < t)) __builtin_abort();
> > + ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
> > + ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> > index
> a35999a172ac762bb4873d10b331301750f4015b..00fc8f01991cc994737bc20
> 88e72d85f249bf341 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> > @@ -29,6 +29,7 @@ int main ()
> > }
> >
> > /* check results: */
> > +#pragma GCC novector
> > for (i = 0; i < N; i++)
> > {
> > if (ca[i] != cb[i])
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> > index
> 9f14a54c413757df7230b7b6053c83a8a5a1e6c9..99d5e6231ff053089782b52d
> c6ce9b9ccb8c64a0 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> > @@ -27,6 +27,7 @@ int main_1 (int n, int *p)
> > }
> >
> > /* check results: */
> > +#pragma GCC novector
> > for (i = 0; i < N; i++)
> > {
> > if (ia[i] != n)
> > @@ -40,6 +41,7 @@ int main_1 (int n, int *p)
> > }
> >
> > /* check results: */
> > +#pragma GCC novector
> > for (i = 0; i < N; i++)
> > {
> > if (ib[i] != k)
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> > index
> 62d2b5049fd902047540b90a2ef79b789f903969..1202ec326c7e0020daf58af9
> 544cdbe2b1da4914 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> > @@ -23,6 +23,7 @@ int main ()
> > }
> >
> > /* check results: */
> > +#pragma GCC novector
> > for (i = 0; i < N; i++)
> > {
> > if (s.ca[i] != 5)
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> > index
> dd06e598f7f48e1a75eba41d626860404325259d..b79bd10585f501992c93648
> ea1a1f2d2699c07c1 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> > @@ -1,5 +1,5 @@
> > /* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> > -/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */
> > +/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details -fno-tree-
> vectorize" } */
> >
> > /* Exit tests 'i < N1' and 'p2 > p_limit2' can be replaced, so
> > * two ivs i and p2 can be eliminate. */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c b/gcc/testsuite/gcc.dg/tree-
> ssa/ivopts-5.c
> > index
> a6af497f4bf7f1ef6c64e09b87931225287d78e0..7b9615f07f3c4af3657eb7d01
> 83c1a51de9fbc42 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> > @@ -5,6 +5,7 @@ int*
> > foo (int* mem, int sz, int val)
> > {
> > int i;
> > +#pragma GCC novector
> > for (i = 0; i < sz; i++)
> > if (mem[i] == val)
> > return &mem[i];
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c b/gcc/testsuite/gcc.dg/tree-
> ssa/ivopts-6.c
> > index
> 8383154f99f2559873ef5b3a8fa8119cf679782f..08304293140a82e5484c8399
> b4374a474c66b34b 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> > @@ -5,6 +5,7 @@ int*
> > foo (int* mem, int sz, int val)
> > {
> > int i;
> > +#pragma GCC novector
> > for (i = 0; i != sz; i++)
> > if (mem[i] == val)
> > return &mem[i];
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c b/gcc/testsuite/gcc.dg/tree-
> ssa/ivopts-7.c
> > index
> 44f5603d4f5b8da6c759e8732503638131b0fca8..03160f234f74319cda6d7450
> 788da871ea0cea74 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> > @@ -5,6 +5,7 @@ int*
> > foo (int* mem, int beg, int end, int val)
> > {
> > int i;
> > +#pragma GCC novector
> > for (i = beg; i < end; i++)
> > if (mem[i] == val)
> > return &mem[i];
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c b/gcc/testsuite/gcc.dg/tree-
> ssa/ivopts-8.c
> > index
> b2556eaac0d02f65a50bbd532a47fef9c0b1dfa8..a7fd3c9de3746c116dfb73419
> 805fd7ce6e69ffa 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> > @@ -5,6 +5,7 @@ int*
> > foo (int* mem, char sz, int val)
> > {
> > char i;
> > +#pragma GCC novector
> > for (i = 0; i < sz; i++)
> > if (mem[i] == val)
> > return &mem[i];
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c b/gcc/testsuite/gcc.dg/tree-
> ssa/ivopts-9.c
> > index
> d26d994f9bd28bc2346a6878d48b159729851ef6..fb9656b88d7bea8a9a84e2c
> a6ff877a2aac7e05b 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> > @@ -5,6 +5,7 @@ int*
> > foo (int* mem, unsigned char sz, int val)
> > {
> > unsigned char i;
> > +#pragma GCC novector
> > for (i = 0; i < sz; i++)
> > if (mem[i] == val)
> > return &mem[i];
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> > index
> a0a04a08c61d48128ad5fd1a11daaf0abc783053..b660f9d258423356a4d73d5
> 996a5f1a8ede9ead9 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> > @@ -32,6 +32,7 @@ void check (int *a, int *res, int len)
> > {
> > int i;
> >
> > +#pragma GCC novector
> > for (i = 0; i < len; i++)
> > if (a[i] != res[i])
> > abort ();
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> > index
> f770a8ad812aedee8f65b011134cda91cbe2bf91..8e5a3a434986a31bb635bf3b
> c1ecc36d463f2ee7 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> > @@ -23,6 +23,7 @@ void check (int *a, int *res, int len)
> > {
> > int i;
> >
> > +#pragma GCC novector
> > for (i = 0; i < len; i++)
> > if (a[i] != res[i])
> > abort ();
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> > index
> ed2b96a0d1a4e0c90bf52a83b5f21e2fd1c5a5c5..fd56fd9747e3c572c93107188
> ede7482ad01bb99 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> > @@ -29,6 +29,7 @@ void check (int *a, int *res, int len, int sum, int val)
> > if (sum != val)
> > abort ();
> >
> > +#pragma GCC novector
> > for (i = 0; i < len; i++)
> > if (a[i] != res[i])
> > abort ();
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> > index
> 2487c1c8205a4f09fd16974f3599ddc8c48b92cf..5eac905aff87e6c4aa4449c689
> d2594b240fec4e 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> > @@ -37,6 +37,7 @@ void check (int *a, int *res, int len, int sval)
> > if (sum != sval)
> > abort ();
> >
> > +#pragma GCC novector
> > for (i = 0; i < len; i++)
> > if (a[i] != res[i])
> > abort ();
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> > index
> 020ca705790d6ace707184c9d2804f3d690de916..801acad33e9d6b7eb17f0cde
> 408903c4f2674acc 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> > @@ -32,6 +32,7 @@ void check (int *a, int *res, int len)
> > {
> > int i;
> >
> > +#pragma GCC novector
> > for (i = 0; i < len; i++)
> > if (a[i] != res[i])
> > abort ();
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> > index
> 667cc333d9f2c030474e0b3115c0b86cda733c2e..8b82bdbc0c92cc579824393d
> c15f2f5a3e5f55e5 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> > @@ -40,6 +40,7 @@ void check (int *a, int *res, int len)
> > {
> > int i;
> >
> > +#pragma GCC novector
> > for (i = 0; i < len; i++)
> > if (a[i] != res[i])
> > abort ();
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> > index
> 8118461af0b63d1f9b42879783ae2650a9d9b34a..0d64bc72f82341fd0518a6f5
> 9ad2a10aec7b0088 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> > @@ -31,6 +31,7 @@ void check (int *a, int *res, int len)
> > {
> > int i;
> >
> > +#pragma GCC novector
> > for (i = 0; i < len; i++)
> > if (a[i] != res[i])
> > abort ();
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> > index
> 03fa646661e2839946e80e0b27ea1d0ea0ef9aeb..7db3bca3b2df98f3c0b3db00
> be18fc8054644655 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> > @@ -33,6 +33,7 @@ void check (int *a, int *res, int len)
> > {
> > int i;
> >
> > +#pragma GCC novector
> > for (i = 0; i < len; i++)
> > if (a[i] != res[i])
> > abort ();
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> > index
> ab2fd403d3005ba06d9992580945ce28f8fb1c09..1267bae5f1c44d60d484cca7
> d88a5714770f147f 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> > @@ -35,6 +35,7 @@ void check (int *a, int *res, int len)
> > {
> > int i;
> >
> > +#pragma GCC novector
> > for (i = 0; i < len; i++)
> > if (a[i] != res[i])
> > abort ();
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> > index
> c746ebd715561eb9f7192a433c321f86e0751eaa..cfe44a06ce4ada6fddc3659dd
> f748a16904b5d9e 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> > @@ -33,6 +33,7 @@ void check (int *a, int *res, int len)
> > {
> > int i;
> >
> > +#pragma GCC novector
> > for (i = 0; i < len; i++)
> > if (a[i] != res[i])
> > abort ();
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> > index
> 6c4e9afa487ed33e4ab5d887640e0efa44a72c6d..646e43d9aad2b235bdae0d9d
> 52df89a3da2dd3e4 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> > @@ -31,6 +31,7 @@ void check (int *a, int *res, int len)
> > {
> > int i;
> >
> > +#pragma GCC novector
> > for (i = 0; i < len; i++)
> > if (a[i] != res[i])
> > abort ();
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> > index
> 9c5e8ca9a793b0405e7f448798aa1fac483d2f05..30daf82fac5cef2e26e4597aa4
> eb10aa33cd0af2 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> > @@ -69,6 +69,7 @@ void check (int *a, int *res, int len)
> > {
> > int i;
> >
> > +#pragma GCC novector
> > for (i = 0; i < len; i++)
> > if (a[i] != res[i])
> > abort ();
> > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > index
> 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d
> 93c950629f3231554 100644
> > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > @@ -55,7 +55,9 @@ int main()
> > }
> > }
> > rephase ();
> > +#pragma GCC novector
> > for (i = 0; i < 32; ++i)
> > +#pragma GCC novector
> > for (j = 0; j < 3; ++j)
> > #pragma GCC novector
> > for (k = 0; k < 3; ++k)
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > index
> 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..5464d1d56fe97542a2dfc7afba
> 39aabc0468737c 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > @@ -5,7 +5,8 @@
> > /* { dg-additional-options "-O3" } */
> > /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
> >
> > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* Arm and -m32 create a group size of 3 here, which we can't support yet. */
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! arm*-*-*
> } || { { x86_64-*-* i?86-*-* } && ilp64 } } } } } */
> >
> > typedef struct filter_list_entry {
> > const char *name;
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8a
> c83ab569fc9fbde126
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +int a, b, c, d, e, f;
> > +short g[1];
> > +int main() {
> > + int h;
> > + while (a) {
> > + while (h)
> > + ;
> > + for (b = 2; b; b--) {
> > + while (c)
> > + ;
> > + f = g[a];
> > + if (d)
> > + break;
> > + }
> > + while (e)
> > + ;
> > + }
> > + return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..dc771186efafe25bb65490
> da7a383ad7f6ceb0a7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +char string[1020];
> > +
> > +char * find(int n, char c)
> > +{
> > + for (int i = 1; i < n; i++) {
> > + if (string[i] == c)
> > + return &string[i];
> > + }
> > + return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect"
> } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..82d473a279ce060c55028
> 9c61729d9f9b56f0d2a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* Alignment requirement too big, load lanes targets can't safely vectorize this.
> */
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { !
> vect_load_lanes } } } } */
> > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> "vect" { target { ! vect_load_lanes } } } } */
> > +
> > +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < (n - 2); i+=2)
> > + {
> > + if (vect_a[i] > x || vect_a[i+2] > x)
> > + return 1;
> > +
> > + vect_b[i] = x;
> > + vect_b[i+1] = x+1;
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758
> ca29a5f3f9d3f6e0d1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +char string[1020];
> > +
> > +char * find(int n, char c)
> > +{
> > + for (int i = 0; i < n; i++) {
> > + if (string[i] == c)
> > + return &string[i];
> > + }
> > + return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..374a051b945e97eedb9be
> 9da423cf54b5e564d6f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +char string[1020] __attribute__((aligned(1)));
> > +
> > +char * find(int n, char c)
> > +{
> > + for (int i = 1; i < n; i++) {
> > + if (string[i] == c)
> > + return &string[i];
> > + }
> > + return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect"
> } } */
> > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f25
> 7ceea1c065fcc6ae9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +char string[1020] __attribute__((aligned(1)));
> > +
> > +char * find(int n, char c)
> > +{
> > + for (int i = 0; i < n; i++) {
> > + if (string[i] == c)
> > + return &string[i];
> > + }
> > + return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> "vect" } } */
> > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..ca95be44e92e32769da1d
> 1e9b740ae54682a3d55
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +unsigned test4(char x, char *vect, int n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < n; i++)
> > + {
> > + if (vect[i] > x)
> > + return 1;
> > +
> > + vect[i] = x;
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect"
> } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c
> 64a61c97b1b6268743
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +unsigned test4(char x, char *vect_a, char *vect_b, int n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 1; i < n; i++)
> > + {
> > + if (vect_a[i] > x || vect_b[i] > x)
> > + return 1;
> > +
> > + vect_a[i] = x;
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" "vect" }
> } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f
> 50776299531824ce9c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* This should be vectorizable through load_lanes and linear targets. */
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < n; i+=2)
> > + {
> > + if (vect_a[i] > x || vect_a[i+1] > x)
> > + return 1;
> > +
> > + vect_b[i] = x;
> > + vect_b[i+1] = x+1;
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..dbb14ba3239c91b9bfdf56
> cecc60750394e10f2b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +char vect_a[1025];
> > +char vect_b[1025];
> > +
> > +unsigned test4(char x, int n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 1; i < (n - 2); i+=2)
> > + {
> > + if (vect_a[i] > x || vect_a[i+1] > x)
> > + return 1;
> > +
> > + vect_b[i] = x;
> > + vect_b[i+1] = x+1;
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..31e209620925353948325
> 3efc17499a53d112894
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* Group size is uneven, load lanes targets can't safely vectorize this. */
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> "vect" } } */
> > +
> > +
> > +char vect_a[1025];
> > +char vect_b[1025];
> > +
> > +unsigned test4(char x, int n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 1; i < (n - 2); i+=2)
> > + {
> > + if (vect_a[i-1] > x || vect_a[i+2] > x)
> > + return 1;
> > +
> > + vect_b[i] = x;
> > + vect_b[i+1] = x+1;
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > index
> 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46
> a462238c3b5825ef 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n)
> > return ret;
> > }
> >
> > -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > +/* cannot safely vectorize this due due to the group misalignment. */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } }
> */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr90178.c
> b/gcc/testsuite/gcc.target/i386/pr90178.c
> > index
> 1df36af0541c01f3624fe51efbc8cfa0ec67fe60..e9fea04fb148ed53c1ac9b2c6ed7
> 3e85ba982b42 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr90178.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr90178.c
> > @@ -4,6 +4,7 @@
> > int*
> > find_ptr (int* mem, int sz, int val)
> > {
> > +#pragma GCC novector
> > for (int i = 0; i < sz; i++)
> > if (mem[i] == val)
> > return &mem[i];
> > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > index
> 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..c85df96685f64f9814251f2d4f
> dbcc5973f2b513 100644
> > --- a/gcc/tree-vect-data-refs.cc
> > +++ b/gcc/tree-vect-data-refs.cc
> > @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences (loop_vec_info
> loop_vinfo)
> > if (is_gimple_debug (stmt))
> > continue;
> >
> > - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > + stmt_vec_info stmt_vinfo
> > + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> > auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > if (!dr_ref)
> > continue;
> > @@ -748,26 +749,14 @@ vect_analyze_early_break_dependences
> (loop_vec_info loop_vinfo)
> > bounded by VF so accesses are within range. We only need to check
> > the reads since writes are moved to a safe place where if we get
> > there we know they are safe to perform. */
> > - if (DR_IS_READ (dr_ref)
> > - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > + if (DR_IS_READ (dr_ref))
> > {
> > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> > - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> > - {
> > - const char *msg
> > - = "early break not supported: cannot peel "
> > - "for alignment, vectorization would read out of "
> > - "bounds at %G";
> > - return opt_result::failure_at (stmt, msg, stmt);
> > - }
> > -
> > - dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> > - dr_info->need_peeling_for_alignment = true;
> > + dr_set_peeling_alignment (stmt_vinfo, true);
> >
> > if (dump_enabled_p ())
> > dump_printf_loc (MSG_NOTE, vect_location,
> > - "marking DR (read) as needing peeling for "
> > - "alignment at %G", stmt);
> > + "marking DR (read) as possibly needing peeling "
> > + "for alignment at %G", stmt);
> > }
> >
> > if (DR_IS_READ (dr_ref))
> > @@ -1326,9 +1315,6 @@ vect_record_base_alignments (vec_info *vinfo)
> > Compute the misalignment of the data reference DR_INFO when vectorizing
> > with VECTYPE.
> >
> > - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
> > - be set appropriately on failure (but is otherwise left unchanged).
> > -
> > Output:
> > 1. initialized misalignment info for DR_INFO
> >
> > @@ -1337,7 +1323,7 @@ vect_record_base_alignments (vec_info *vinfo)
> >
> > static void
> > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> > - tree vectype, opt_result *result = nullptr)
> > + tree vectype)
> > {
> > stmt_vec_info stmt_info = dr_info->stmt;
> > vec_base_alignments *base_alignments = &vinfo->base_alignments;
> > @@ -1365,63 +1351,20 @@ vect_compute_data_ref_alignment (vec_info
> *vinfo, dr_vec_info *dr_info,
> > = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> > BITS_PER_UNIT);
> >
> > - /* If this DR needs peeling for alignment for correctness, we must
> > - ensure the target alignment is a constant power-of-two multiple of the
> > - amount read per vector iteration (overriding the above hook where
> > - necessary). */
> > - if (dr_info->need_peeling_for_alignment)
> > + /* If we have a grouped access we require that the alignment be VF * elem. */
> > + if (loop_vinfo
> > + && dr_peeling_alignment (stmt_info)
> > + && STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > {
> > - /* Vector size in bytes. */
> > - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> > -
> > - /* We can only peel for loops, of course. */
> > - gcc_checking_assert (loop_vinfo);
> > -
> > - /* Calculate the number of vectors read per vector iteration. If
> > - it is a power of two, multiply through to get the required
> > - alignment in bytes. Otherwise, fail analysis since alignment
> > - peeling wouldn't work in such a case. */
> > - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > - if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > - num_scalars *= DR_GROUP_SIZE (stmt_info);
> > -
> > - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> > - if (!pow2p_hwi (num_vectors))
> > - {
> > - *result = opt_result::failure_at (vect_location,
> > - "non-power-of-two num vectors %u "
> > - "for DR needing peeling for "
> > - "alignment at %G",
> > - num_vectors, stmt_info->stmt);
> > - return;
> > - }
> > -
> > - safe_align *= num_vectors;
> > - if (maybe_gt (safe_align, 4096U))
> > - {
> > - pretty_printer pp;
> > - pp_wide_integer (&pp, safe_align);
> > - *result = opt_result::failure_at (vect_location,
> > - "alignment required for correctness"
> > - " (%s) may exceed page size",
> > - pp_formatted_text (&pp));
> > - return;
> > - }
> > -
> > - unsigned HOST_WIDE_INT multiple;
> > - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> > - || !pow2p_hwi (multiple))
> > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > + vector_alignment
> > + = vf * TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
>
> I think we discussed this before, also when introducing peeling
> for alignment support. This is incorrect for grouped accesses where
> the number of scalar elements accessed is GROUP_SIZE * vf, so you
> miss a multiplication by GROUP_SIZE here.
Huh, But doesn't your VF already contain your group size? If I have an LD4 on V4SI
My VF is 16 isn't it? Because I still handled 16 elements per iteration.
So why would I need 4 * 16?
>
> Note that this (and also your VF * element_size) can result in a
> non-power-of-two value.
>
> That said, I'm quite sure we don't want to have a dr->target_alignment
> that isn't power-of-two, so if the comput doesn't end up with a
> power-of-two value we should leave it as the target prefers and
> fixup (or fail) during vectorizable_load.
Ack I'll round up to power of 2.
>
> > + if (dump_enabled_p ())
> > {
> > - if (dump_enabled_p ())
> > - {
> > - dump_printf_loc (MSG_NOTE, vect_location,
> > - "forcing alignment for DR from preferred (");
> > - dump_dec (MSG_NOTE, vector_alignment);
> > - dump_printf (MSG_NOTE, ") to safe align (");
> > - dump_dec (MSG_NOTE, safe_align);
> > - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> > - }
> > - vector_alignment = safe_align;
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "alignment increased due to early break to ");
> > + dump_dec (MSG_NOTE, vector_alignment);
> > + dump_printf (MSG_NOTE, " bytes.\n");
> > }
> > }
> >
> > @@ -2487,6 +2430,8 @@ vect_enhance_data_refs_alignment (loop_vec_info
> loop_vinfo)
> > || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> > loop_preheader_edge (loop))
> > || loop->inner
> > + /* We don't currently maintaing the LCSSA for prologue peeled inversed
> > + loops. */
> > || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > do_peeling = false;
> >
> > @@ -2950,12 +2895,9 @@ vect_analyze_data_refs_alignment (loop_vec_info
> loop_vinfo)
> > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
> > continue;
> > - opt_result res = opt_result::success ();
> > +
> > vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> > - STMT_VINFO_VECTYPE (dr_info->stmt),
> > - &res);
> > - if (!res)
> > - return res;
> > + STMT_VINFO_VECTYPE (dr_info->stmt));
> > }
> > }
> >
> > @@ -7219,7 +7161,7 @@ vect_supportable_dr_alignment (vec_info *vinfo,
> dr_vec_info *dr_info,
> >
> > if (misalignment == 0)
> > return dr_aligned;
> > - else if (dr_info->need_peeling_for_alignment)
> > + else if (dr_peeling_alignment (stmt_info))
> > return dr_unaligned_unsupported;
> >
> > /* For now assume all conditional loads/stores support unaligned
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index
> 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..436d373ae6ec06aff165a7bee3
> 7b3fa1dc95079b 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -2597,6 +2597,89 @@ get_load_store_type (vec_info *vinfo,
> stmt_vec_info stmt_info,
> > return false;
> > }
> >
> > + /* If this DR needs peeling for alignment for correctness, we must
> > + ensure the target alignment is a constant power-of-two multiple of the
> > + amount read per vector iteration (overriding the above hook where
> > + necessary). */
> > + if (dr_peeling_alignment (stmt_info))
> > + {
> > + /* We can only peel for loops, of course. */
> > + gcc_checking_assert (loop_vinfo);
> > +
> > + /* Check if we support the operation if early breaks are needed. */
> > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > + && (*memory_access_type == VMAT_GATHER_SCATTER
> > + || *memory_access_type == VMAT_STRIDED_SLP))
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + "early break not supported: cannot peel for "
> > + "alignment. With non-contiguous memory
> vectorization"
> > + " could read out of bounds at %G ",
> > + STMT_VINFO_STMT (stmt_info));
> > + return false;
> > + }
> > +
> > + /* Even if uneven group sizes are aligned on the first load, the second
> > + iteration won't be. As such reject uneven group sizes. */
> > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
> > + && (DR_GROUP_SIZE (stmt_info) % 2) == 1)
>
> Hmm, but a group size of 6 is even, but a vector size of four doesn't
> make the 2nd aligned. So we need a power-of-two GROUP_SIZE * VF
> and a byte alignment according to that.
>
Argg true.
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + "early break not supported: uneven group size, "
> > + "vectorization could read out of bounds at %G ",
> > + STMT_VINFO_STMT (stmt_info));
> > + return false;
> > + }
> > +
> > + /* Vector size in bytes. */
> > + poly_uint64 safe_align;
> > + if (nunits.is_constant ())
> > + safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> > + else
> > + safe_align = estimated_poly_value (LOOP_VINFO_VECT_FACTOR
> (loop_vinfo),
> > + POLY_VALUE_MAX);
> > +
> > + auto num_vectors = ncopies;
> > + if (!pow2p_hwi (num_vectors))
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + "non-power-of-two num vectors %u "
> > + "for DR needing peeling for "
> > + "alignment at %G",
> > + num_vectors, STMT_VINFO_STMT (stmt_info));
> > + return false;
> > + }
> > +
> > + safe_align *= num_vectors;
> > + bool inbounds
> > + = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
> > + DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
>
> I'm again confused why you think ref_within_array_bound can be used to
> validize anything?
The goal here is that if we have a value that is aligned, and we don't exceed the page
And we're able to increase the target alignment that VLA is safe to use.
Since we can't peel for SVE any unaligned value can't be handled later. But as I mentioned,
To SVE nothing is unaligned, as the target alignment is the element size.
For non-SVE unaligned accesses get processed later on. For SVE you just generate wrong code.
Since we know we can realign known sized buffers (since we'll, they're .data) then we know we
can let it through.
I don't really know of a better way to do this. Since again, for VLA the backend never returns that
the load is misaligned, so it doesn't stop vectorization.
The inbounds is used as a proxy for that. Which is what the comment below was trying to explain.
>
> > + /* For VLA we have to insert a runtime check that the vector loads
> > + per iterations don't exceed a page size. For now we can use
> > + POLY_VALUE_MAX as a proxy as we can't peel for VLA. */
> > + if (maybe_gt (safe_align, (unsigned)param_min_pagesize)
> > + /* We don't support PFA for VLA at the moment. Some targets like SVE
> > + return a target alignment requirement of a single element. For
> > + early break this is potentially unsafe so we can't count on
> > + alignment rejecting such loops later as it thinks loads are never
> > + misaligned. */
> > + || (!nunits.is_constant () && !inbounds))
> > + {
> > + if (dump_enabled_p ())
> > + {
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + "alignment required for correctness (");
> > + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> > + dump_printf (MSG_NOTE, ") may exceed page size\n");
> > + }
> > + return false;
> > + }
> > + *alignment_support_scheme = dr_unaligned_supported;
>
> and the only thing should be *alignment_support_scheme == dr_aligned,
> and with a possibly too low taget_alignment even that's not enough.
>
Fair, but you shouldn't be able to get there with a too low target alignment though.
> Can you split out the testsuite part that just adds #pragma GCC novector?
> That part is OK.
>
Ok,
Tamar
> Thanks,
> Richard.
>
> > + }
> > +
> > if (*alignment_support_scheme == dr_unaligned_unsupported)
> > {
> > if (dump_enabled_p ())
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index
> b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..aeaf714c155bc2d87bf50e6dba
> 0dbfbcca027441 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -1998,6 +1998,33 @@ dr_target_alignment (dr_vec_info *dr_info)
> > }
> > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> >
> > +/* Return if the stmt_vec_info requires peeling for alignment. */
> > +inline bool
> > +dr_peeling_alignment (stmt_vec_info stmt_info)
> > +{
> > + dr_vec_info *dr_info;
> > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> > + else
> > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > +
> > + return dr_info->need_peeling_for_alignment;
> > +}
> > +
> > +/* Set the need_peeling_for_alignment for the the stmt_vec_info, if group
> > + access then set on the fist element otherwise set on DR directly. */
> > +inline void
> > +dr_set_peeling_alignment (stmt_vec_info stmt_info, bool requires_alignment)
> > +{
> > + dr_vec_info *dr_info;
> > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> > + else
> > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > +
> > + dr_info->need_peeling_for_alignment = requires_alignment;
> > +}
> > +
> > inline void
> > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> > {
> >
> >
> >
> >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Wednesday, February 12, 2025 3:20 PM
> To: Richard Biener <rguenther@suse.de>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: RE: [PATCH v2]middle-end: delay checking for alignment to load
> [PR118464]
>
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, February 12, 2025 2:58 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > Subject: Re: [PATCH v2]middle-end: delay checking for alignment to load
> > [PR118464]
> >
> > On Tue, 11 Feb 2025, Tamar Christina wrote:
> >
> > > Hi All,
> > >
> > > This fixes two PRs on Early break vectorization by delaying the safety checks to
> > > vectorizable_load when the VF, VMAT and vectype are all known.
> > >
> > > This patch does add two new restrictions:
> > >
> > > 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
> > > group sizes, as they are unaligned every n % 2 iterations and so may cross
> > > a page unwittingly.
> > >
> > > 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization
> > if
> > > we cannot peel for alignment, as the alignment requirement is quite large at
> > > GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
> > > don't support it for now.
> > >
> > > There are other steps documented inside the code itself so that the reasoning
> > > is next to the code.
> > >
> > > Note that for VLA I have still left this fully disabled when not working on a
> > > fixed buffer.
> > >
> > > For VLA targets like SVE return element alignment as the desired vector
> > > alignment. This means that the loads are never misaligned and so annoying it
> > > won't ever need to peel.
> > >
> > > So what I think needs to happen in GCC 16 is that.
> > >
> > > 1. during vect_compute_data_ref_alignment we need to take the max of
> > > POLY_VALUE_MIN and vector_alignment.
> > >
> > > 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add
> a
> > > check that ncopies * vectype does not exceed POLY_VALUE_MAX which we
> use
> > as a
> > > proxy for pagesize.
> > >
> > > 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
> > > vect_determine_partial_vectors_and_peeling since the first iteration has to
> > > be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
> > > vectorize.
> > >
> > > 4. Create a default mask to be used, so that
> > vect_use_loop_mask_for_alignment_p
> > > becomes true and we generate the peeled check through loop control for
> > > partial loops. From what I can tell this won't work for
> > > LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling
> > support at
> > > all in the compiler. That would need to be done independently from the
> > > above.
> >
> > We basically need to implement peeling/versioning for alignment based
> > on the actual POLY value with the fallback being first-fault loads.
> >
> > > In any case, not GCC 15 material so I've kept the WIP patches I have
> > downstream.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > -m32, -m64 and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > PR tree-optimization/118464
> > > PR tree-optimization/116855
> > > * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
> > > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
> > > checks.
> > > (vect_compute_data_ref_alignment): Remove alignment checks and move
> > to
> > > get_load_store_type, increase group access alignment.
> > > (vect_enhance_data_refs_alignment): Add note to comment needing
> > > investigating.
> > > (vect_analyze_data_refs_alignment): Likewise.
> > > (vect_supportable_dr_alignment): For group loads look at first DR.
> > > * tree-vect-stmts.cc (get_load_store_type):
> > > Perform safety checks for early break pfa.
> > > * tree-vectorizer.h (dr_peeling_alignment,
> > > dr_set_peeling_alignment): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR tree-optimization/118464
> > > PR tree-optimization/116855
> > > * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
> > > load type is relaxed later.
> > > * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> > > * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
> > > * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> > > * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
> > > * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
> > > * g++.dg/ext/pragma-unroll-lambda-lto.C: Add pragma novector.
> > > * gcc.dg/tree-ssa/gen-vect-2.c: Likewise.
> > > * gcc.dg/tree-ssa/gen-vect-25.c: Likewise.
> > > * gcc.dg/tree-ssa/gen-vect-32.c: Likewise.
> > > * gcc.dg/tree-ssa/ivopt_mult_2g.c: Likewise.
> > > * gcc.dg/tree-ssa/ivopts-5.c: Likewise.
> > > * gcc.dg/tree-ssa/ivopts-6.c: Likewise.
> > > * gcc.dg/tree-ssa/ivopts-7.c: Likewise.
> > > * gcc.dg/tree-ssa/ivopts-8.c: Likewise.
> > > * gcc.dg/tree-ssa/ivopts-9.c: Likewise.
> > > * gcc.dg/tree-ssa/predcom-dse-1.c: Likewise.
> > > * gcc.dg/tree-ssa/predcom-dse-10.c: Likewise.
> > > * gcc.dg/tree-ssa/predcom-dse-11.c: Likewise.
> > > * gcc.dg/tree-ssa/predcom-dse-12.c: Likewise.
> > > * gcc.dg/tree-ssa/predcom-dse-2.c: Likewise.
> > > * gcc.dg/tree-ssa/predcom-dse-3.c: Likewise.
> > > * gcc.dg/tree-ssa/predcom-dse-4.c: Likewise.
> > > * gcc.dg/tree-ssa/predcom-dse-5.c: Likewise.
> > > * gcc.dg/tree-ssa/predcom-dse-6.c: Likewise.
> > > * gcc.dg/tree-ssa/predcom-dse-7.c: Likewise.
> > > * gcc.dg/tree-ssa/predcom-dse-8.c: Likewise.
> > > * gcc.dg/tree-ssa/predcom-dse-9.c: Likewise.
> > > * gcc.target/i386/pr90178.c: Likewise.
> > > * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment.
> > >
> > > ---
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index
> >
> 0aef2abf05b9b2f5996de69d5ebc3a21109ee6e1..db00f8b403814b58261849d
> > 8917863dc06bbf3e2 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -17256,7 +17256,7 @@ Maximum number of relations the oracle will
> > register in a basic block.
> > > Work bound when discovering transitive relations from existing relations.
> > >
> > > @item min-pagesize
> > > -Minimum page size for warning purposes.
> > > +Minimum page size for warning and early break vectorization purposes.
> > >
> > > @item openacc-kernels
> > > Specify mode of OpenACC `kernels' constructs handling.
> > > diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > > index
> >
> 0db57c8d3a01985e1e76bb9f8a52613179060f19..5980bf316899553e16d078d
> > eee32911f31fafd94 100644
> > > --- a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > > +++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > > @@ -10,6 +10,7 @@ inline Iter
> > > my_find(Iter first, Iter last, Pred pred)
> > > {
> > > #pragma GCC unroll 4
> > > +#pragma GCC novector
> > > while (first != last && !pred(*first))
> > > ++first;
> > > return first;
> > > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..5e50e56ad17515e278c05c
> > 92263af120c3ab2c21
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +#include <cstddef>
> > > +
> > > +struct ts1 {
> > > + int spans[6][2];
> > > +};
> > > +struct gg {
> > > + int t[6];
> > > +};
> > > +ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
> > > + ts1 ret;
> > > + for (size_t i = 0; i != t; i++) {
> > > + if (!(i < t)) __builtin_abort();
> > > + ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
> > > + ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> > > index
> >
> a35999a172ac762bb4873d10b331301750f4015b..00fc8f01991cc994737bc20
> > 88e72d85f249bf341 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> > > @@ -29,6 +29,7 @@ int main ()
> > > }
> > >
> > > /* check results: */
> > > +#pragma GCC novector
> > > for (i = 0; i < N; i++)
> > > {
> > > if (ca[i] != cb[i])
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> > > index
> >
> 9f14a54c413757df7230b7b6053c83a8a5a1e6c9..99d5e6231ff053089782b52d
> > c6ce9b9ccb8c64a0 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> > > @@ -27,6 +27,7 @@ int main_1 (int n, int *p)
> > > }
> > >
> > > /* check results: */
> > > +#pragma GCC novector
> > > for (i = 0; i < N; i++)
> > > {
> > > if (ia[i] != n)
> > > @@ -40,6 +41,7 @@ int main_1 (int n, int *p)
> > > }
> > >
> > > /* check results: */
> > > +#pragma GCC novector
> > > for (i = 0; i < N; i++)
> > > {
> > > if (ib[i] != k)
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> > > index
> >
> 62d2b5049fd902047540b90a2ef79b789f903969..1202ec326c7e0020daf58af9
> > 544cdbe2b1da4914 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> > > @@ -23,6 +23,7 @@ int main ()
> > > }
> > >
> > > /* check results: */
> > > +#pragma GCC novector
> > > for (i = 0; i < N; i++)
> > > {
> > > if (s.ca[i] != 5)
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> > > index
> >
> dd06e598f7f48e1a75eba41d626860404325259d..b79bd10585f501992c93648
> > ea1a1f2d2699c07c1 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> > > @@ -1,5 +1,5 @@
> > > /* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> > > -/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */
> > > +/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details -fno-tree-
> > vectorize" } */
> > >
> > > /* Exit tests 'i < N1' and 'p2 > p_limit2' can be replaced, so
> > > * two ivs i and p2 can be eliminate. */
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> b/gcc/testsuite/gcc.dg/tree-
> > ssa/ivopts-5.c
> > > index
> >
> a6af497f4bf7f1ef6c64e09b87931225287d78e0..7b9615f07f3c4af3657eb7d01
> > 83c1a51de9fbc42 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> > > @@ -5,6 +5,7 @@ int*
> > > foo (int* mem, int sz, int val)
> > > {
> > > int i;
> > > +#pragma GCC novector
> > > for (i = 0; i < sz; i++)
> > > if (mem[i] == val)
> > > return &mem[i];
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> b/gcc/testsuite/gcc.dg/tree-
> > ssa/ivopts-6.c
> > > index
> >
> 8383154f99f2559873ef5b3a8fa8119cf679782f..08304293140a82e5484c8399
> > b4374a474c66b34b 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> > > @@ -5,6 +5,7 @@ int*
> > > foo (int* mem, int sz, int val)
> > > {
> > > int i;
> > > +#pragma GCC novector
> > > for (i = 0; i != sz; i++)
> > > if (mem[i] == val)
> > > return &mem[i];
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> b/gcc/testsuite/gcc.dg/tree-
> > ssa/ivopts-7.c
> > > index
> >
> 44f5603d4f5b8da6c759e8732503638131b0fca8..03160f234f74319cda6d7450
> > 788da871ea0cea74 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> > > @@ -5,6 +5,7 @@ int*
> > > foo (int* mem, int beg, int end, int val)
> > > {
> > > int i;
> > > +#pragma GCC novector
> > > for (i = beg; i < end; i++)
> > > if (mem[i] == val)
> > > return &mem[i];
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> b/gcc/testsuite/gcc.dg/tree-
> > ssa/ivopts-8.c
> > > index
> >
> b2556eaac0d02f65a50bbd532a47fef9c0b1dfa8..a7fd3c9de3746c116dfb73419
> > 805fd7ce6e69ffa 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> > > @@ -5,6 +5,7 @@ int*
> > > foo (int* mem, char sz, int val)
> > > {
> > > char i;
> > > +#pragma GCC novector
> > > for (i = 0; i < sz; i++)
> > > if (mem[i] == val)
> > > return &mem[i];
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> b/gcc/testsuite/gcc.dg/tree-
> > ssa/ivopts-9.c
> > > index
> >
> d26d994f9bd28bc2346a6878d48b159729851ef6..fb9656b88d7bea8a9a84e2c
> > a6ff877a2aac7e05b 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> > > @@ -5,6 +5,7 @@ int*
> > > foo (int* mem, unsigned char sz, int val)
> > > {
> > > unsigned char i;
> > > +#pragma GCC novector
> > > for (i = 0; i < sz; i++)
> > > if (mem[i] == val)
> > > return &mem[i];
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> > > index
> >
> a0a04a08c61d48128ad5fd1a11daaf0abc783053..b660f9d258423356a4d73d5
> > 996a5f1a8ede9ead9 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> > > @@ -32,6 +32,7 @@ void check (int *a, int *res, int len)
> > > {
> > > int i;
> > >
> > > +#pragma GCC novector
> > > for (i = 0; i < len; i++)
> > > if (a[i] != res[i])
> > > abort ();
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> > > index
> >
> f770a8ad812aedee8f65b011134cda91cbe2bf91..8e5a3a434986a31bb635bf3b
> > c1ecc36d463f2ee7 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> > > @@ -23,6 +23,7 @@ void check (int *a, int *res, int len)
> > > {
> > > int i;
> > >
> > > +#pragma GCC novector
> > > for (i = 0; i < len; i++)
> > > if (a[i] != res[i])
> > > abort ();
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> > > index
> >
> ed2b96a0d1a4e0c90bf52a83b5f21e2fd1c5a5c5..fd56fd9747e3c572c93107188
> > ede7482ad01bb99 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> > > @@ -29,6 +29,7 @@ void check (int *a, int *res, int len, int sum, int val)
> > > if (sum != val)
> > > abort ();
> > >
> > > +#pragma GCC novector
> > > for (i = 0; i < len; i++)
> > > if (a[i] != res[i])
> > > abort ();
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> > > index
> >
> 2487c1c8205a4f09fd16974f3599ddc8c48b92cf..5eac905aff87e6c4aa4449c689
> > d2594b240fec4e 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> > > @@ -37,6 +37,7 @@ void check (int *a, int *res, int len, int sval)
> > > if (sum != sval)
> > > abort ();
> > >
> > > +#pragma GCC novector
> > > for (i = 0; i < len; i++)
> > > if (a[i] != res[i])
> > > abort ();
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> > > index
> >
> 020ca705790d6ace707184c9d2804f3d690de916..801acad33e9d6b7eb17f0cde
> > 408903c4f2674acc 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> > > @@ -32,6 +32,7 @@ void check (int *a, int *res, int len)
> > > {
> > > int i;
> > >
> > > +#pragma GCC novector
> > > for (i = 0; i < len; i++)
> > > if (a[i] != res[i])
> > > abort ();
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> > > index
> >
> 667cc333d9f2c030474e0b3115c0b86cda733c2e..8b82bdbc0c92cc579824393d
> > c15f2f5a3e5f55e5 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> > > @@ -40,6 +40,7 @@ void check (int *a, int *res, int len)
> > > {
> > > int i;
> > >
> > > +#pragma GCC novector
> > > for (i = 0; i < len; i++)
> > > if (a[i] != res[i])
> > > abort ();
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> > > index
> >
> 8118461af0b63d1f9b42879783ae2650a9d9b34a..0d64bc72f82341fd0518a6f5
> > 9ad2a10aec7b0088 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> > > @@ -31,6 +31,7 @@ void check (int *a, int *res, int len)
> > > {
> > > int i;
> > >
> > > +#pragma GCC novector
> > > for (i = 0; i < len; i++)
> > > if (a[i] != res[i])
> > > abort ();
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> > > index
> >
> 03fa646661e2839946e80e0b27ea1d0ea0ef9aeb..7db3bca3b2df98f3c0b3db00
> > be18fc8054644655 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> > > @@ -33,6 +33,7 @@ void check (int *a, int *res, int len)
> > > {
> > > int i;
> > >
> > > +#pragma GCC novector
> > > for (i = 0; i < len; i++)
> > > if (a[i] != res[i])
> > > abort ();
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> > > index
> >
> ab2fd403d3005ba06d9992580945ce28f8fb1c09..1267bae5f1c44d60d484cca7
> > d88a5714770f147f 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> > > @@ -35,6 +35,7 @@ void check (int *a, int *res, int len)
> > > {
> > > int i;
> > >
> > > +#pragma GCC novector
> > > for (i = 0; i < len; i++)
> > > if (a[i] != res[i])
> > > abort ();
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> > > index
> >
> c746ebd715561eb9f7192a433c321f86e0751eaa..cfe44a06ce4ada6fddc3659dd
> > f748a16904b5d9e 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> > > @@ -33,6 +33,7 @@ void check (int *a, int *res, int len)
> > > {
> > > int i;
> > >
> > > +#pragma GCC novector
> > > for (i = 0; i < len; i++)
> > > if (a[i] != res[i])
> > > abort ();
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> > > index
> >
> 6c4e9afa487ed33e4ab5d887640e0efa44a72c6d..646e43d9aad2b235bdae0d9d
> > 52df89a3da2dd3e4 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> > > @@ -31,6 +31,7 @@ void check (int *a, int *res, int len)
> > > {
> > > int i;
> > >
> > > +#pragma GCC novector
> > > for (i = 0; i < len; i++)
> > > if (a[i] != res[i])
> > > abort ();
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> > > index
> >
> 9c5e8ca9a793b0405e7f448798aa1fac483d2f05..30daf82fac5cef2e26e4597aa4
> > eb10aa33cd0af2 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> > > @@ -69,6 +69,7 @@ void check (int *a, int *res, int len)
> > > {
> > > int i;
> > >
> > > +#pragma GCC novector
> > > for (i = 0; i < len; i++)
> > > if (a[i] != res[i])
> > > abort ();
> > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > index
> >
> 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d
> > 93c950629f3231554 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > @@ -55,7 +55,9 @@ int main()
> > > }
> > > }
> > > rephase ();
> > > +#pragma GCC novector
> > > for (i = 0; i < 32; ++i)
> > > +#pragma GCC novector
> > > for (j = 0; j < 3; ++j)
> > > #pragma GCC novector
> > > for (k = 0; k < 3; ++k)
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > index
> >
> 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..5464d1d56fe97542a2dfc7afba
> > 39aabc0468737c 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > @@ -5,7 +5,8 @@
> > > /* { dg-additional-options "-O3" } */
> > > /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
> > >
> > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* Arm and -m32 create a group size of 3 here, which we can't support yet. */
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! arm*-*-
> *
> > } || { { x86_64-*-* i?86-*-* } && ilp64 } } } } } */
> > >
> > > typedef struct filter_list_entry {
> > > const char *name;
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8a
> > c83ab569fc9fbde126
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +int a, b, c, d, e, f;
> > > +short g[1];
> > > +int main() {
> > > + int h;
> > > + while (a) {
> > > + while (h)
> > > + ;
> > > + for (b = 2; b; b--) {
> > > + while (c)
> > > + ;
> > > + f = g[a];
> > > + if (d)
> > > + break;
> > > + }
> > > + while (e)
> > > + ;
> > > + }
> > > + return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..dc771186efafe25bb65490
> > da7a383ad7f6ceb0a7
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +char string[1020];
> > > +
> > > +char * find(int n, char c)
> > > +{
> > > + for (int i = 1; i < n; i++) {
> > > + if (string[i] == c)
> > > + return &string[i];
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling"
> "vect"
> > } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..82d473a279ce060c55028
> > 9c61729d9f9b56f0d2a
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > @@ -0,0 +1,24 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* Alignment requirement too big, load lanes targets can't safely vectorize this.
> > */
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { !
> > vect_load_lanes } } } } */
> > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> > "vect" { target { ! vect_load_lanes } } } } */
> > > +
> > > +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < (n - 2); i+=2)
> > > + {
> > > + if (vect_a[i] > x || vect_a[i+2] > x)
> > > + return 1;
> > > +
> > > + vect_b[i] = x;
> > > + vect_b[i+1] = x+1;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758
> > ca29a5f3f9d3f6e0d1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +char string[1020];
> > > +
> > > +char * find(int n, char c)
> > > +{
> > > + for (int i = 0; i < n; i++) {
> > > + if (string[i] == c)
> > > + return &string[i];
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> > "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..374a051b945e97eedb9be
> > 9da423cf54b5e564d6f
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +char string[1020] __attribute__((aligned(1)));
> > > +
> > > +char * find(int n, char c)
> > > +{
> > > + for (int i = 1; i < n; i++) {
> > > + if (string[i] == c)
> > > + return &string[i];
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling"
> "vect"
> > } } */
> > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f25
> > 7ceea1c065fcc6ae9
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +char string[1020] __attribute__((aligned(1)));
> > > +
> > > +char * find(int n, char c)
> > > +{
> > > + for (int i = 0; i < n; i++) {
> > > + if (string[i] == c)
> > > + return &string[i];
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> > "vect" } } */
> > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..ca95be44e92e32769da1d
> > 1e9b740ae54682a3d55
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +unsigned test4(char x, char *vect, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < n; i++)
> > > + {
> > > + if (vect[i] > x)
> > > + return 1;
> > > +
> > > + vect[i] = x;
> > > + }
> > > + return ret;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling"
> "vect"
> > } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c
> > 64a61c97b1b6268743
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +unsigned test4(char x, char *vect_a, char *vect_b, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 1; i < n; i++)
> > > + {
> > > + if (vect_a[i] > x || vect_b[i] > x)
> > > + return 1;
> > > +
> > > + vect_a[i] = x;
> > > + }
> > > + return ret;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" "vect"
> }
> > } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f
> > 50776299531824ce9c
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* This should be vectorizable through load_lanes and linear targets. */
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < n; i+=2)
> > > + {
> > > + if (vect_a[i] > x || vect_a[i+1] > x)
> > > + return 1;
> > > +
> > > + vect_b[i] = x;
> > > + vect_b[i+1] = x+1;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..dbb14ba3239c91b9bfdf56
> > cecc60750394e10f2b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +char vect_a[1025];
> > > +char vect_b[1025];
> > > +
> > > +unsigned test4(char x, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 1; i < (n - 2); i+=2)
> > > + {
> > > + if (vect_a[i] > x || vect_a[i+1] > x)
> > > + return 1;
> > > +
> > > + vect_b[i] = x;
> > > + vect_b[i+1] = x+1;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > new file mode 100644
> > > index
> >
> 0000000000000000000000000000000000000000..31e209620925353948325
> > 3efc17499a53d112894
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > @@ -0,0 +1,28 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* Group size is uneven, load lanes targets can't safely vectorize this. */
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> > "vect" } } */
> > > +
> > > +
> > > +char vect_a[1025];
> > > +char vect_b[1025];
> > > +
> > > +unsigned test4(char x, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 1; i < (n - 2); i+=2)
> > > + {
> > > + if (vect_a[i-1] > x || vect_a[i+2] > x)
> > > + return 1;
> > > +
> > > + vect_b[i] = x;
> > > + vect_b[i+1] = x+1;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > > index
> >
> 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46
> > a462238c3b5825ef 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > > @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n)
> > > return ret;
> > > }
> > >
> > > -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > > +/* cannot safely vectorize this due due to the group misalignment. */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" }
> }
> > */
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr90178.c
> > b/gcc/testsuite/gcc.target/i386/pr90178.c
> > > index
> >
> 1df36af0541c01f3624fe51efbc8cfa0ec67fe60..e9fea04fb148ed53c1ac9b2c6ed7
> > 3e85ba982b42 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr90178.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr90178.c
> > > @@ -4,6 +4,7 @@
> > > int*
> > > find_ptr (int* mem, int sz, int val)
> > > {
> > > +#pragma GCC novector
> > > for (int i = 0; i < sz; i++)
> > > if (mem[i] == val)
> > > return &mem[i];
> > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > > index
> >
> 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..c85df96685f64f9814251f2d4f
> > dbcc5973f2b513 100644
> > > --- a/gcc/tree-vect-data-refs.cc
> > > +++ b/gcc/tree-vect-data-refs.cc
> > > @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences
> (loop_vec_info
> > loop_vinfo)
> > > if (is_gimple_debug (stmt))
> > > continue;
> > >
> > > - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > + stmt_vec_info stmt_vinfo
> > > + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> > > auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > if (!dr_ref)
> > > continue;
> > > @@ -748,26 +749,14 @@ vect_analyze_early_break_dependences
> > (loop_vec_info loop_vinfo)
> > > bounded by VF so accesses are within range. We only need to check
> > > the reads since writes are moved to a safe place where if we get
> > > there we know they are safe to perform. */
> > > - if (DR_IS_READ (dr_ref)
> > > - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > > + if (DR_IS_READ (dr_ref))
> > > {
> > > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> > > - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> > > - {
> > > - const char *msg
> > > - = "early break not supported: cannot peel "
> > > - "for alignment, vectorization would read out of "
> > > - "bounds at %G";
> > > - return opt_result::failure_at (stmt, msg, stmt);
> > > - }
> > > -
> > > - dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> > > - dr_info->need_peeling_for_alignment = true;
> > > + dr_set_peeling_alignment (stmt_vinfo, true);
> > >
> > > if (dump_enabled_p ())
> > > dump_printf_loc (MSG_NOTE, vect_location,
> > > - "marking DR (read) as needing peeling for "
> > > - "alignment at %G", stmt);
> > > + "marking DR (read) as possibly needing peeling "
> > > + "for alignment at %G", stmt);
> > > }
> > >
> > > if (DR_IS_READ (dr_ref))
> > > @@ -1326,9 +1315,6 @@ vect_record_base_alignments (vec_info *vinfo)
> > > Compute the misalignment of the data reference DR_INFO when vectorizing
> > > with VECTYPE.
> > >
> > > - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
> > > - be set appropriately on failure (but is otherwise left unchanged).
> > > -
> > > Output:
> > > 1. initialized misalignment info for DR_INFO
> > >
> > > @@ -1337,7 +1323,7 @@ vect_record_base_alignments (vec_info *vinfo)
> > >
> > > static void
> > > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> > > - tree vectype, opt_result *result = nullptr)
> > > + tree vectype)
> > > {
> > > stmt_vec_info stmt_info = dr_info->stmt;
> > > vec_base_alignments *base_alignments = &vinfo->base_alignments;
> > > @@ -1365,63 +1351,20 @@ vect_compute_data_ref_alignment (vec_info
> > *vinfo, dr_vec_info *dr_info,
> > > = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> > > BITS_PER_UNIT);
> > >
> > > - /* If this DR needs peeling for alignment for correctness, we must
> > > - ensure the target alignment is a constant power-of-two multiple of the
> > > - amount read per vector iteration (overriding the above hook where
> > > - necessary). */
> > > - if (dr_info->need_peeling_for_alignment)
> > > + /* If we have a grouped access we require that the alignment be VF * elem.
> */
> > > + if (loop_vinfo
> > > + && dr_peeling_alignment (stmt_info)
> > > + && STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > {
> > > - /* Vector size in bytes. */
> > > - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> > > -
> > > - /* We can only peel for loops, of course. */
> > > - gcc_checking_assert (loop_vinfo);
> > > -
> > > - /* Calculate the number of vectors read per vector iteration. If
> > > - it is a power of two, multiply through to get the required
> > > - alignment in bytes. Otherwise, fail analysis since alignment
> > > - peeling wouldn't work in such a case. */
> > > - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > - if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > - num_scalars *= DR_GROUP_SIZE (stmt_info);
> > > -
> > > - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> > > - if (!pow2p_hwi (num_vectors))
> > > - {
> > > - *result = opt_result::failure_at (vect_location,
> > > - "non-power-of-two num vectors %u "
> > > - "for DR needing peeling for "
> > > - "alignment at %G",
> > > - num_vectors, stmt_info->stmt);
> > > - return;
> > > - }
> > > -
> > > - safe_align *= num_vectors;
> > > - if (maybe_gt (safe_align, 4096U))
> > > - {
> > > - pretty_printer pp;
> > > - pp_wide_integer (&pp, safe_align);
> > > - *result = opt_result::failure_at (vect_location,
> > > - "alignment required for correctness"
> > > - " (%s) may exceed page size",
> > > - pp_formatted_text (&pp));
> > > - return;
> > > - }
> > > -
> > > - unsigned HOST_WIDE_INT multiple;
> > > - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> > > - || !pow2p_hwi (multiple))
> > > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > + vector_alignment
> > > + = vf * TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> >
> > I think we discussed this before, also when introducing peeling
> > for alignment support. This is incorrect for grouped accesses where
> > the number of scalar elements accessed is GROUP_SIZE * vf, so you
> > miss a multiplication by GROUP_SIZE here.
>
> Huh, But doesn't your VF already contain your group size? If I have an LD4 on V4SI
> My VF is 16 isn't it? Because I still handled 16 elements per iteration.
>
> So why would I need 4 * 16?
>
*sigh* nevermind. I was using this loop to check:
char vect_a[1025];
char vect_b[1025];
unsigned test4(char x, int n)
{
unsigned ret = 0;
for (int i = 1; i < (n - 2); i+=2)
{
if (vect_a[i] > x || vect_a[i+1] > x)
return 1;
vect_b[i] = x;
vect_b[i+1] = x+1;
}
return ret;
}
And I was expecting 32-bytes alignment, but then misread:
.align 4
.set .LANCHOR0,. + 0
.type vect_b, %object
.size vect_b, 1025
vect_b:
so yes I need a GROUP_SIZE *.
> >
> > Note that this (and also your VF * element_size) can result in a
> > non-power-of-two value.
> >
> > That said, I'm quite sure we don't want to have a dr->target_alignment
> > that isn't power-of-two, so if the comput doesn't end up with a
> > power-of-two value we should leave it as the target prefers and
> > fixup (or fail) during vectorizable_load.
>
> Ack I'll round up to power of 2.
>
> >
> > > + if (dump_enabled_p ())
> > > {
> > > - if (dump_enabled_p ())
> > > - {
> > > - dump_printf_loc (MSG_NOTE, vect_location,
> > > - "forcing alignment for DR from preferred (");
> > > - dump_dec (MSG_NOTE, vector_alignment);
> > > - dump_printf (MSG_NOTE, ") to safe align (");
> > > - dump_dec (MSG_NOTE, safe_align);
> > > - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> > > - }
> > > - vector_alignment = safe_align;
> > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > + "alignment increased due to early break to ");
> > > + dump_dec (MSG_NOTE, vector_alignment);
> > > + dump_printf (MSG_NOTE, " bytes.\n");
> > > }
> > > }
> > >
> > > @@ -2487,6 +2430,8 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > loop_vinfo)
> > > || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> > > loop_preheader_edge (loop))
> > > || loop->inner
> > > + /* We don't currently maintaing the LCSSA for prologue peeled inversed
> > > + loops. */
> > > || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > > do_peeling = false;
> > >
> > > @@ -2950,12 +2895,9 @@ vect_analyze_data_refs_alignment (loop_vec_info
> > loop_vinfo)
> > > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> > > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
> > > continue;
> > > - opt_result res = opt_result::success ();
> > > +
> > > vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> > > - STMT_VINFO_VECTYPE (dr_info->stmt),
> > > - &res);
> > > - if (!res)
> > > - return res;
> > > + STMT_VINFO_VECTYPE (dr_info->stmt));
> > > }
> > > }
> > >
> > > @@ -7219,7 +7161,7 @@ vect_supportable_dr_alignment (vec_info *vinfo,
> > dr_vec_info *dr_info,
> > >
> > > if (misalignment == 0)
> > > return dr_aligned;
> > > - else if (dr_info->need_peeling_for_alignment)
> > > + else if (dr_peeling_alignment (stmt_info))
> > > return dr_unaligned_unsupported;
> > >
> > > /* For now assume all conditional loads/stores support unaligned
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index
> >
> 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..436d373ae6ec06aff165a7bee3
> > 7b3fa1dc95079b 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -2597,6 +2597,89 @@ get_load_store_type (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> > > return false;
> > > }
> > >
> > > + /* If this DR needs peeling for alignment for correctness, we must
> > > + ensure the target alignment is a constant power-of-two multiple of the
> > > + amount read per vector iteration (overriding the above hook where
> > > + necessary). */
> > > + if (dr_peeling_alignment (stmt_info))
> > > + {
> > > + /* We can only peel for loops, of course. */
> > > + gcc_checking_assert (loop_vinfo);
> > > +
> > > + /* Check if we support the operation if early breaks are needed. */
> > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > + && (*memory_access_type == VMAT_GATHER_SCATTER
> > > + || *memory_access_type == VMAT_STRIDED_SLP))
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > + "early break not supported: cannot peel for "
> > > + "alignment. With non-contiguous memory
> > vectorization"
> > > + " could read out of bounds at %G ",
> > > + STMT_VINFO_STMT (stmt_info));
> > > + return false;
> > > + }
> > > +
> > > + /* Even if uneven group sizes are aligned on the first load, the second
> > > + iteration won't be. As such reject uneven group sizes. */
> > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
> > > + && (DR_GROUP_SIZE (stmt_info) % 2) == 1)
> >
> > Hmm, but a group size of 6 is even, but a vector size of four doesn't
> > make the 2nd aligned. So we need a power-of-two GROUP_SIZE * VF
> > and a byte alignment according to that.
> >
>
> Argg true.
>
>
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > + "early break not supported: uneven group size, "
> > > + "vectorization could read out of bounds at %G ",
> > > + STMT_VINFO_STMT (stmt_info));
> > > + return false;
> > > + }
> > > +
> > > + /* Vector size in bytes. */
> > > + poly_uint64 safe_align;
> > > + if (nunits.is_constant ())
> > > + safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> > > + else
> > > + safe_align = estimated_poly_value (LOOP_VINFO_VECT_FACTOR
> > (loop_vinfo),
> > > + POLY_VALUE_MAX);
> > > +
> > > + auto num_vectors = ncopies;
> > > + if (!pow2p_hwi (num_vectors))
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > + "non-power-of-two num vectors %u "
> > > + "for DR needing peeling for "
> > > + "alignment at %G",
> > > + num_vectors, STMT_VINFO_STMT (stmt_info));
> > > + return false;
> > > + }
> > > +
> > > + safe_align *= num_vectors;
> > > + bool inbounds
> > > + = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
> > > + DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
> >
> > I'm again confused why you think ref_within_array_bound can be used to
> > validize anything?
>
> The goal here is that if we have a value that is aligned, and we don't exceed the
> page
> And we're able to increase the target alignment that VLA is safe to use.
>
> Since we can't peel for SVE any unaligned value can't be handled later. But as I
> mentioned,
> To SVE nothing is unaligned, as the target alignment is the element size.
>
> For non-SVE unaligned accesses get processed later on. For SVE you just generate
> wrong code.
> Since we know we can realign known sized buffers (since we'll, they're .data) then
> we know we
> can let it through.
>
> I don't really know of a better way to do this. Since again, for VLA the backend
> never returns that
> the load is misaligned, so it doesn't stop vectorization.
>
> The inbounds is used as a proxy for that. Which is what the comment below was
> trying to explain.
To explain this some more.
This loop is clearly safe to vectorize:
char vect_a[1025];
char vect_b[1025];
unsigned test4(char x, int n)
{
unsigned ret = 0;
for (int i = 0; i < 1023; i+=2)
{
if (vect_a[i] > x || vect_a[i+1] > x)
return 1;
vect_b[i] = x;
vect_b[i+1] = x+1;
}
return ret;
}
But I cannot distinguish between this case, and cases where we've established it's unsafe
as you need to peel for alignment, or increase the data alignment.
SVE will always return an alignment of the element type. This means after vect_enhance_data_refs_alignment
the calculated misalignment is always zero, which means that dr_peeling_alignment is always ignored in
vect_supportable_dr_alignment which means, the code is never marked as needing peeling for alignment,
*nor* needing versioning.
This did not seem right to me. As a result, as the known size cases are always ok, (You can never overread the size of
the buffer as the predicate would just make the last iteration partial) then the known inbounds cases are safe for VLA
due to predication.
Increasing alignment also doesn't really work for VLA because you'd have to way overalign
To get a safe value (i.e. align to maximum vector size) which a e.g. an LD2 already doesn't make much sense.
At least, this is what I was trying to explain in the comment below it's use. As far as I can tell, the only cases where
an inbounds VLA case would fault, is an incorrect program where scalar just happened to exit before reading memory
we were told was safe to read.
Thanks,
Tamar
>
> >
> > > + /* For VLA we have to insert a runtime check that the vector loads
> > > + per iterations don't exceed a page size. For now we can use
> > > + POLY_VALUE_MAX as a proxy as we can't peel for VLA. */
> > > + if (maybe_gt (safe_align, (unsigned)param_min_pagesize)
> > > + /* We don't support PFA for VLA at the moment. Some targets like SVE
> > > + return a target alignment requirement of a single element. For
> > > + early break this is potentially unsafe so we can't count on
> > > + alignment rejecting such loops later as it thinks loads are never
> > > + misaligned. */
> > > + || (!nunits.is_constant () && !inbounds))
> > > + {
> > > + if (dump_enabled_p ())
> > > + {
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > + "alignment required for correctness (");
> > > + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> > > + dump_printf (MSG_NOTE, ") may exceed page size\n");
> > > + }
> > > + return false;
> > > + }
> > > + *alignment_support_scheme = dr_unaligned_supported;
> >
> > and the only thing should be *alignment_support_scheme == dr_aligned,
> > and with a possibly too low taget_alignment even that's not enough.
> >
>
> Fair, but you shouldn't be able to get there with a too low target alignment though.
>
> > Can you split out the testsuite part that just adds #pragma GCC novector?
> > That part is OK.
> >
>
> Ok,
> Tamar
>
> > Thanks,
> > Richard.
> >
> > > + }
> > > +
> > > if (*alignment_support_scheme == dr_unaligned_unsupported)
> > > {
> > > if (dump_enabled_p ())
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > index
> >
> b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..aeaf714c155bc2d87bf50e6dba
> > 0dbfbcca027441 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -1998,6 +1998,33 @@ dr_target_alignment (dr_vec_info *dr_info)
> > > }
> > > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> > >
> > > +/* Return if the stmt_vec_info requires peeling for alignment. */
> > > +inline bool
> > > +dr_peeling_alignment (stmt_vec_info stmt_info)
> > > +{
> > > + dr_vec_info *dr_info;
> > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT
> (stmt_info));
> > > + else
> > > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > +
> > > + return dr_info->need_peeling_for_alignment;
> > > +}
> > > +
> > > +/* Set the need_peeling_for_alignment for the the stmt_vec_info, if group
> > > + access then set on the fist element otherwise set on DR directly. */
> > > +inline void
> > > +dr_set_peeling_alignment (stmt_vec_info stmt_info, bool requires_alignment)
> > > +{
> > > + dr_vec_info *dr_info;
> > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT
> (stmt_info));
> > > + else
> > > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > +
> > > + dr_info->need_peeling_for_alignment = requires_alignment;
> > > +}
> > > +
> > > inline void
> > > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> > > {
> > >
> > >
> > >
> > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Wed, 12 Feb 2025, Tamar Christina wrote:
> > -----Original Message-----
> > From: Tamar Christina <Tamar.Christina@arm.com>
> > Sent: Wednesday, February 12, 2025 3:20 PM
> > To: Richard Biener <rguenther@suse.de>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > Subject: RE: [PATCH v2]middle-end: delay checking for alignment to load
> > [PR118464]
> >
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Wednesday, February 12, 2025 2:58 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > Subject: Re: [PATCH v2]middle-end: delay checking for alignment to load
> > > [PR118464]
> > >
> > > On Tue, 11 Feb 2025, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This fixes two PRs on Early break vectorization by delaying the safety checks to
> > > > vectorizable_load when the VF, VMAT and vectype are all known.
> > > >
> > > > This patch does add two new restrictions:
> > > >
> > > > 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
> > > > group sizes, as they are unaligned every n % 2 iterations and so may cross
> > > > a page unwittingly.
> > > >
> > > > 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization
> > > if
> > > > we cannot peel for alignment, as the alignment requirement is quite large at
> > > > GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
> > > > don't support it for now.
> > > >
> > > > There are other steps documented inside the code itself so that the reasoning
> > > > is next to the code.
> > > >
> > > > Note that for VLA I have still left this fully disabled when not working on a
> > > > fixed buffer.
> > > >
> > > > For VLA targets like SVE return element alignment as the desired vector
> > > > alignment. This means that the loads are never misaligned and so annoying it
> > > > won't ever need to peel.
> > > >
> > > > So what I think needs to happen in GCC 16 is that.
> > > >
> > > > 1. during vect_compute_data_ref_alignment we need to take the max of
> > > > POLY_VALUE_MIN and vector_alignment.
> > > >
> > > > 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add
> > a
> > > > check that ncopies * vectype does not exceed POLY_VALUE_MAX which we
> > use
> > > as a
> > > > proxy for pagesize.
> > > >
> > > > 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
> > > > vect_determine_partial_vectors_and_peeling since the first iteration has to
> > > > be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
> > > > vectorize.
> > > >
> > > > 4. Create a default mask to be used, so that
> > > vect_use_loop_mask_for_alignment_p
> > > > becomes true and we generate the peeled check through loop control for
> > > > partial loops. From what I can tell this won't work for
> > > > LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling
> > > support at
> > > > all in the compiler. That would need to be done independently from the
> > > > above.
> > >
> > > We basically need to implement peeling/versioning for alignment based
> > > on the actual POLY value with the fallback being first-fault loads.
> > >
> > > > In any case, not GCC 15 material so I've kept the WIP patches I have
> > > downstream.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > > -m32, -m64 and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR tree-optimization/118464
> > > > PR tree-optimization/116855
> > > > * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
> > > > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
> > > > checks.
> > > > (vect_compute_data_ref_alignment): Remove alignment checks and move
> > > to
> > > > get_load_store_type, increase group access alignment.
> > > > (vect_enhance_data_refs_alignment): Add note to comment needing
> > > > investigating.
> > > > (vect_analyze_data_refs_alignment): Likewise.
> > > > (vect_supportable_dr_alignment): For group loads look at first DR.
> > > > * tree-vect-stmts.cc (get_load_store_type):
> > > > Perform safety checks for early break pfa.
> > > > * tree-vectorizer.h (dr_peeling_alignment,
> > > > dr_set_peeling_alignment): New.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR tree-optimization/118464
> > > > PR tree-optimization/116855
> > > > * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
> > > > load type is relaxed later.
> > > > * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> > > > * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
> > > > * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> > > > * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
> > > > * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
> > > > * g++.dg/ext/pragma-unroll-lambda-lto.C: Add pragma novector.
> > > > * gcc.dg/tree-ssa/gen-vect-2.c: Likewise.
> > > > * gcc.dg/tree-ssa/gen-vect-25.c: Likewise.
> > > > * gcc.dg/tree-ssa/gen-vect-32.c: Likewise.
> > > > * gcc.dg/tree-ssa/ivopt_mult_2g.c: Likewise.
> > > > * gcc.dg/tree-ssa/ivopts-5.c: Likewise.
> > > > * gcc.dg/tree-ssa/ivopts-6.c: Likewise.
> > > > * gcc.dg/tree-ssa/ivopts-7.c: Likewise.
> > > > * gcc.dg/tree-ssa/ivopts-8.c: Likewise.
> > > > * gcc.dg/tree-ssa/ivopts-9.c: Likewise.
> > > > * gcc.dg/tree-ssa/predcom-dse-1.c: Likewise.
> > > > * gcc.dg/tree-ssa/predcom-dse-10.c: Likewise.
> > > > * gcc.dg/tree-ssa/predcom-dse-11.c: Likewise.
> > > > * gcc.dg/tree-ssa/predcom-dse-12.c: Likewise.
> > > > * gcc.dg/tree-ssa/predcom-dse-2.c: Likewise.
> > > > * gcc.dg/tree-ssa/predcom-dse-3.c: Likewise.
> > > > * gcc.dg/tree-ssa/predcom-dse-4.c: Likewise.
> > > > * gcc.dg/tree-ssa/predcom-dse-5.c: Likewise.
> > > > * gcc.dg/tree-ssa/predcom-dse-6.c: Likewise.
> > > > * gcc.dg/tree-ssa/predcom-dse-7.c: Likewise.
> > > > * gcc.dg/tree-ssa/predcom-dse-8.c: Likewise.
> > > > * gcc.dg/tree-ssa/predcom-dse-9.c: Likewise.
> > > > * gcc.target/i386/pr90178.c: Likewise.
> > > > * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment.
> > > >
> > > > ---
> > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > index
> > >
> > 0aef2abf05b9b2f5996de69d5ebc3a21109ee6e1..db00f8b403814b58261849d
> > > 8917863dc06bbf3e2 100644
> > > > --- a/gcc/doc/invoke.texi
> > > > +++ b/gcc/doc/invoke.texi
> > > > @@ -17256,7 +17256,7 @@ Maximum number of relations the oracle will
> > > register in a basic block.
> > > > Work bound when discovering transitive relations from existing relations.
> > > >
> > > > @item min-pagesize
> > > > -Minimum page size for warning purposes.
> > > > +Minimum page size for warning and early break vectorization purposes.
> > > >
> > > > @item openacc-kernels
> > > > Specify mode of OpenACC `kernels' constructs handling.
> > > > diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > > b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > > > index
> > >
> > 0db57c8d3a01985e1e76bb9f8a52613179060f19..5980bf316899553e16d078d
> > > eee32911f31fafd94 100644
> > > > --- a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > > > +++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > > > @@ -10,6 +10,7 @@ inline Iter
> > > > my_find(Iter first, Iter last, Pred pred)
> > > > {
> > > > #pragma GCC unroll 4
> > > > +#pragma GCC novector
> > > > while (first != last && !pred(*first))
> > > > ++first;
> > > > return first;
> > > > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > > b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > > > new file mode 100644
> > > > index
> > >
> > 0000000000000000000000000000000000000000..5e50e56ad17515e278c05c
> > > 92263af120c3ab2c21
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > > > @@ -0,0 +1,23 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +/* { dg-additional-options "-O3" } */
> > > > +
> > > > +#include <cstddef>
> > > > +
> > > > +struct ts1 {
> > > > + int spans[6][2];
> > > > +};
> > > > +struct gg {
> > > > + int t[6];
> > > > +};
> > > > +ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
> > > > + ts1 ret;
> > > > + for (size_t i = 0; i != t; i++) {
> > > > + if (!(i < t)) __builtin_abort();
> > > > + ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
> > > > + ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
> > > > + }
> > > > + return ret;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> > > > index
> > >
> > a35999a172ac762bb4873d10b331301750f4015b..00fc8f01991cc994737bc20
> > > 88e72d85f249bf341 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> > > > @@ -29,6 +29,7 @@ int main ()
> > > > }
> > > >
> > > > /* check results: */
> > > > +#pragma GCC novector
> > > > for (i = 0; i < N; i++)
> > > > {
> > > > if (ca[i] != cb[i])
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> > > > index
> > >
> > 9f14a54c413757df7230b7b6053c83a8a5a1e6c9..99d5e6231ff053089782b52d
> > > c6ce9b9ccb8c64a0 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> > > > @@ -27,6 +27,7 @@ int main_1 (int n, int *p)
> > > > }
> > > >
> > > > /* check results: */
> > > > +#pragma GCC novector
> > > > for (i = 0; i < N; i++)
> > > > {
> > > > if (ia[i] != n)
> > > > @@ -40,6 +41,7 @@ int main_1 (int n, int *p)
> > > > }
> > > >
> > > > /* check results: */
> > > > +#pragma GCC novector
> > > > for (i = 0; i < N; i++)
> > > > {
> > > > if (ib[i] != k)
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> > > > index
> > >
> > 62d2b5049fd902047540b90a2ef79b789f903969..1202ec326c7e0020daf58af9
> > > 544cdbe2b1da4914 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> > > > @@ -23,6 +23,7 @@ int main ()
> > > > }
> > > >
> > > > /* check results: */
> > > > +#pragma GCC novector
> > > > for (i = 0; i < N; i++)
> > > > {
> > > > if (s.ca[i] != 5)
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> > > > index
> > >
> > dd06e598f7f48e1a75eba41d626860404325259d..b79bd10585f501992c93648
> > > ea1a1f2d2699c07c1 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> > > > @@ -1,5 +1,5 @@
> > > > /* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> > > > -/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */
> > > > +/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details -fno-tree-
> > > vectorize" } */
> > > >
> > > > /* Exit tests 'i < N1' and 'p2 > p_limit2' can be replaced, so
> > > > * two ivs i and p2 can be eliminate. */
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> > b/gcc/testsuite/gcc.dg/tree-
> > > ssa/ivopts-5.c
> > > > index
> > >
> > a6af497f4bf7f1ef6c64e09b87931225287d78e0..7b9615f07f3c4af3657eb7d01
> > > 83c1a51de9fbc42 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> > > > @@ -5,6 +5,7 @@ int*
> > > > foo (int* mem, int sz, int val)
> > > > {
> > > > int i;
> > > > +#pragma GCC novector
> > > > for (i = 0; i < sz; i++)
> > > > if (mem[i] == val)
> > > > return &mem[i];
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> > b/gcc/testsuite/gcc.dg/tree-
> > > ssa/ivopts-6.c
> > > > index
> > >
> > 8383154f99f2559873ef5b3a8fa8119cf679782f..08304293140a82e5484c8399
> > > b4374a474c66b34b 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> > > > @@ -5,6 +5,7 @@ int*
> > > > foo (int* mem, int sz, int val)
> > > > {
> > > > int i;
> > > > +#pragma GCC novector
> > > > for (i = 0; i != sz; i++)
> > > > if (mem[i] == val)
> > > > return &mem[i];
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> > b/gcc/testsuite/gcc.dg/tree-
> > > ssa/ivopts-7.c
> > > > index
> > >
> > 44f5603d4f5b8da6c759e8732503638131b0fca8..03160f234f74319cda6d7450
> > > 788da871ea0cea74 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> > > > @@ -5,6 +5,7 @@ int*
> > > > foo (int* mem, int beg, int end, int val)
> > > > {
> > > > int i;
> > > > +#pragma GCC novector
> > > > for (i = beg; i < end; i++)
> > > > if (mem[i] == val)
> > > > return &mem[i];
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> > b/gcc/testsuite/gcc.dg/tree-
> > > ssa/ivopts-8.c
> > > > index
> > >
> > b2556eaac0d02f65a50bbd532a47fef9c0b1dfa8..a7fd3c9de3746c116dfb73419
> > > 805fd7ce6e69ffa 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> > > > @@ -5,6 +5,7 @@ int*
> > > > foo (int* mem, char sz, int val)
> > > > {
> > > > char i;
> > > > +#pragma GCC novector
> > > > for (i = 0; i < sz; i++)
> > > > if (mem[i] == val)
> > > > return &mem[i];
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> > b/gcc/testsuite/gcc.dg/tree-
> > > ssa/ivopts-9.c
> > > > index
> > >
> > d26d994f9bd28bc2346a6878d48b159729851ef6..fb9656b88d7bea8a9a84e2c
> > > a6ff877a2aac7e05b 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> > > > @@ -5,6 +5,7 @@ int*
> > > > foo (int* mem, unsigned char sz, int val)
> > > > {
> > > > unsigned char i;
> > > > +#pragma GCC novector
> > > > for (i = 0; i < sz; i++)
> > > > if (mem[i] == val)
> > > > return &mem[i];
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> > > > index
> > >
> > a0a04a08c61d48128ad5fd1a11daaf0abc783053..b660f9d258423356a4d73d5
> > > 996a5f1a8ede9ead9 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> > > > @@ -32,6 +32,7 @@ void check (int *a, int *res, int len)
> > > > {
> > > > int i;
> > > >
> > > > +#pragma GCC novector
> > > > for (i = 0; i < len; i++)
> > > > if (a[i] != res[i])
> > > > abort ();
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> > > > index
> > >
> > f770a8ad812aedee8f65b011134cda91cbe2bf91..8e5a3a434986a31bb635bf3b
> > > c1ecc36d463f2ee7 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> > > > @@ -23,6 +23,7 @@ void check (int *a, int *res, int len)
> > > > {
> > > > int i;
> > > >
> > > > +#pragma GCC novector
> > > > for (i = 0; i < len; i++)
> > > > if (a[i] != res[i])
> > > > abort ();
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> > > > index
> > >
> > ed2b96a0d1a4e0c90bf52a83b5f21e2fd1c5a5c5..fd56fd9747e3c572c93107188
> > > ede7482ad01bb99 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> > > > @@ -29,6 +29,7 @@ void check (int *a, int *res, int len, int sum, int val)
> > > > if (sum != val)
> > > > abort ();
> > > >
> > > > +#pragma GCC novector
> > > > for (i = 0; i < len; i++)
> > > > if (a[i] != res[i])
> > > > abort ();
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> > > > index
> > >
> > 2487c1c8205a4f09fd16974f3599ddc8c48b92cf..5eac905aff87e6c4aa4449c689
> > > d2594b240fec4e 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> > > > @@ -37,6 +37,7 @@ void check (int *a, int *res, int len, int sval)
> > > > if (sum != sval)
> > > > abort ();
> > > >
> > > > +#pragma GCC novector
> > > > for (i = 0; i < len; i++)
> > > > if (a[i] != res[i])
> > > > abort ();
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> > > > index
> > >
> > 020ca705790d6ace707184c9d2804f3d690de916..801acad33e9d6b7eb17f0cde
> > > 408903c4f2674acc 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> > > > @@ -32,6 +32,7 @@ void check (int *a, int *res, int len)
> > > > {
> > > > int i;
> > > >
> > > > +#pragma GCC novector
> > > > for (i = 0; i < len; i++)
> > > > if (a[i] != res[i])
> > > > abort ();
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> > > > index
> > >
> > 667cc333d9f2c030474e0b3115c0b86cda733c2e..8b82bdbc0c92cc579824393d
> > > c15f2f5a3e5f55e5 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> > > > @@ -40,6 +40,7 @@ void check (int *a, int *res, int len)
> > > > {
> > > > int i;
> > > >
> > > > +#pragma GCC novector
> > > > for (i = 0; i < len; i++)
> > > > if (a[i] != res[i])
> > > > abort ();
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> > > > index
> > >
> > 8118461af0b63d1f9b42879783ae2650a9d9b34a..0d64bc72f82341fd0518a6f5
> > > 9ad2a10aec7b0088 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> > > > @@ -31,6 +31,7 @@ void check (int *a, int *res, int len)
> > > > {
> > > > int i;
> > > >
> > > > +#pragma GCC novector
> > > > for (i = 0; i < len; i++)
> > > > if (a[i] != res[i])
> > > > abort ();
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> > > > index
> > >
> > 03fa646661e2839946e80e0b27ea1d0ea0ef9aeb..7db3bca3b2df98f3c0b3db00
> > > be18fc8054644655 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> > > > @@ -33,6 +33,7 @@ void check (int *a, int *res, int len)
> > > > {
> > > > int i;
> > > >
> > > > +#pragma GCC novector
> > > > for (i = 0; i < len; i++)
> > > > if (a[i] != res[i])
> > > > abort ();
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> > > > index
> > >
> > ab2fd403d3005ba06d9992580945ce28f8fb1c09..1267bae5f1c44d60d484cca7
> > > d88a5714770f147f 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> > > > @@ -35,6 +35,7 @@ void check (int *a, int *res, int len)
> > > > {
> > > > int i;
> > > >
> > > > +#pragma GCC novector
> > > > for (i = 0; i < len; i++)
> > > > if (a[i] != res[i])
> > > > abort ();
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> > > > index
> > >
> > c746ebd715561eb9f7192a433c321f86e0751eaa..cfe44a06ce4ada6fddc3659dd
> > > f748a16904b5d9e 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> > > > @@ -33,6 +33,7 @@ void check (int *a, int *res, int len)
> > > > {
> > > > int i;
> > > >
> > > > +#pragma GCC novector
> > > > for (i = 0; i < len; i++)
> > > > if (a[i] != res[i])
> > > > abort ();
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> > > > index
> > >
> > 6c4e9afa487ed33e4ab5d887640e0efa44a72c6d..646e43d9aad2b235bdae0d9d
> > > 52df89a3da2dd3e4 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> > > > @@ -31,6 +31,7 @@ void check (int *a, int *res, int len)
> > > > {
> > > > int i;
> > > >
> > > > +#pragma GCC novector
> > > > for (i = 0; i < len; i++)
> > > > if (a[i] != res[i])
> > > > abort ();
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> > > > index
> > >
> > 9c5e8ca9a793b0405e7f448798aa1fac483d2f05..30daf82fac5cef2e26e4597aa4
> > > eb10aa33cd0af2 100644
> > > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> > > > @@ -69,6 +69,7 @@ void check (int *a, int *res, int len)
> > > > {
> > > > int i;
> > > >
> > > > +#pragma GCC novector
> > > > for (i = 0; i < len; i++)
> > > > if (a[i] != res[i])
> > > > abort ();
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > > index
> > >
> > 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d
> > > 93c950629f3231554 100644
> > > > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > > @@ -55,7 +55,9 @@ int main()
> > > > }
> > > > }
> > > > rephase ();
> > > > +#pragma GCC novector
> > > > for (i = 0; i < 32; ++i)
> > > > +#pragma GCC novector
> > > > for (j = 0; j < 3; ++j)
> > > > #pragma GCC novector
> > > > for (k = 0; k < 3; ++k)
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > > index
> > >
> > 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..5464d1d56fe97542a2dfc7afba
> > > 39aabc0468737c 100644
> > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > > @@ -5,7 +5,8 @@
> > > > /* { dg-additional-options "-O3" } */
> > > > /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
> > > >
> > > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +/* Arm and -m32 create a group size of 3 here, which we can't support yet. */
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! arm*-*-
> > *
> > > } || { { x86_64-*-* i?86-*-* } && ilp64 } } } } } */
> > > >
> > > > typedef struct filter_list_entry {
> > > > const char *name;
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > > new file mode 100644
> > > > index
> > >
> > 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8a
> > > c83ab569fc9fbde126
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > > @@ -0,0 +1,25 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +/* { dg-additional-options "-O3" } */
> > > > +
> > > > +int a, b, c, d, e, f;
> > > > +short g[1];
> > > > +int main() {
> > > > + int h;
> > > > + while (a) {
> > > > + while (h)
> > > > + ;
> > > > + for (b = 2; b; b--) {
> > > > + while (c)
> > > > + ;
> > > > + f = g[a];
> > > > + if (d)
> > > > + break;
> > > > + }
> > > > + while (e)
> > > > + ;
> > > > + }
> > > > + return 0;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > > new file mode 100644
> > > > index
> > >
> > 0000000000000000000000000000000000000000..dc771186efafe25bb65490
> > > da7a383ad7f6ceb0a7
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +/* { dg-additional-options "-O3" } */
> > > > +
> > > > +char string[1020];
> > > > +
> > > > +char * find(int n, char c)
> > > > +{
> > > > + for (int i = 1; i < n; i++) {
> > > > + if (string[i] == c)
> > > > + return &string[i];
> > > > + }
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling"
> > "vect"
> > > } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > > new file mode 100644
> > > > index
> > >
> > 0000000000000000000000000000000000000000..82d473a279ce060c55028
> > > 9c61729d9f9b56f0d2a
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > > @@ -0,0 +1,24 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +/* Alignment requirement too big, load lanes targets can't safely vectorize this.
> > > */
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { !
> > > vect_load_lanes } } } } */
> > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> > > "vect" { target { ! vect_load_lanes } } } } */
> > > > +
> > > > +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 0; i < (n - 2); i+=2)
> > > > + {
> > > > + if (vect_a[i] > x || vect_a[i+2] > x)
> > > > + return 1;
> > > > +
> > > > + vect_b[i] = x;
> > > > + vect_b[i+1] = x+1;
> > > > + }
> > > > + return ret;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > > new file mode 100644
> > > > index
> > >
> > 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758
> > > ca29a5f3f9d3f6e0d1
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +/* { dg-additional-options "-O3" } */
> > > > +
> > > > +char string[1020];
> > > > +
> > > > +char * find(int n, char c)
> > > > +{
> > > > + for (int i = 0; i < n; i++) {
> > > > + if (string[i] == c)
> > > > + return &string[i];
> > > > + }
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> > > "vect" } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > > new file mode 100644
> > > > index
> > >
> > 0000000000000000000000000000000000000000..374a051b945e97eedb9be
> > > 9da423cf54b5e564d6f
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > > @@ -0,0 +1,20 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +/* { dg-additional-options "-O3" } */
> > > > +
> > > > +char string[1020] __attribute__((aligned(1)));
> > > > +
> > > > +char * find(int n, char c)
> > > > +{
> > > > + for (int i = 1; i < n; i++) {
> > > > + if (string[i] == c)
> > > > + return &string[i];
> > > > + }
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling"
> > "vect"
> > > } } */
> > > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > > new file mode 100644
> > > > index
> > >
> > 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f25
> > > 7ceea1c065fcc6ae9
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > > @@ -0,0 +1,20 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +/* { dg-additional-options "-O3" } */
> > > > +
> > > > +char string[1020] __attribute__((aligned(1)));
> > > > +
> > > > +char * find(int n, char c)
> > > > +{
> > > > + for (int i = 0; i < n; i++) {
> > > > + if (string[i] == c)
> > > > + return &string[i];
> > > > + }
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> > > "vect" } } */
> > > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > > new file mode 100644
> > > > index
> > >
> > 0000000000000000000000000000000000000000..ca95be44e92e32769da1d
> > > 1e9b740ae54682a3d55
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > > @@ -0,0 +1,23 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +
> > > > +unsigned test4(char x, char *vect, int n)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 0; i < n; i++)
> > > > + {
> > > > + if (vect[i] > x)
> > > > + return 1;
> > > > +
> > > > + vect[i] = x;
> > > > + }
> > > > + return ret;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling"
> > "vect"
> > > } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > > new file mode 100644
> > > > index
> > >
> > 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c
> > > 64a61c97b1b6268743
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > > @@ -0,0 +1,23 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +
> > > > +unsigned test4(char x, char *vect_a, char *vect_b, int n)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 1; i < n; i++)
> > > > + {
> > > > + if (vect_a[i] > x || vect_b[i] > x)
> > > > + return 1;
> > > > +
> > > > + vect_a[i] = x;
> > > > + }
> > > > + return ret;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" "vect"
> > }
> > > } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > > new file mode 100644
> > > > index
> > >
> > 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f
> > > 50776299531824ce9c
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > > @@ -0,0 +1,23 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +/* This should be vectorizable through load_lanes and linear targets. */
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +
> > > > +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 0; i < n; i+=2)
> > > > + {
> > > > + if (vect_a[i] > x || vect_a[i+1] > x)
> > > > + return 1;
> > > > +
> > > > + vect_b[i] = x;
> > > > + vect_b[i+1] = x+1;
> > > > + }
> > > > + return ret;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > > new file mode 100644
> > > > index
> > >
> > 0000000000000000000000000000000000000000..dbb14ba3239c91b9bfdf56
> > > cecc60750394e10f2b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > > @@ -0,0 +1,25 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +
> > > > +char vect_a[1025];
> > > > +char vect_b[1025];
> > > > +
> > > > +unsigned test4(char x, int n)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 1; i < (n - 2); i+=2)
> > > > + {
> > > > + if (vect_a[i] > x || vect_a[i+1] > x)
> > > > + return 1;
> > > > +
> > > > + vect_b[i] = x;
> > > > + vect_b[i+1] = x+1;
> > > > + }
> > > > + return ret;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > > new file mode 100644
> > > > index
> > >
> > 0000000000000000000000000000000000000000..31e209620925353948325
> > > 3efc17499a53d112894
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > > @@ -0,0 +1,28 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +/* Group size is uneven, load lanes targets can't safely vectorize this. */
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling"
> > > "vect" } } */
> > > > +
> > > > +
> > > > +char vect_a[1025];
> > > > +char vect_b[1025];
> > > > +
> > > > +unsigned test4(char x, int n)
> > > > +{
> > > > + unsigned ret = 0;
> > > > + for (int i = 1; i < (n - 2); i+=2)
> > > > + {
> > > > + if (vect_a[i-1] > x || vect_a[i+2] > x)
> > > > + return 1;
> > > > +
> > > > + vect_b[i] = x;
> > > > + vect_b[i+1] = x+1;
> > > > + }
> > > > + return ret;
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > > > index
> > >
> > 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46
> > > a462238c3b5825ef 100644
> > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > > > @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n)
> > > > return ret;
> > > > }
> > > >
> > > > -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > > > +/* cannot safely vectorize this due due to the group misalignment. */
> > > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" }
> > }
> > > */
> > > > diff --git a/gcc/testsuite/gcc.target/i386/pr90178.c
> > > b/gcc/testsuite/gcc.target/i386/pr90178.c
> > > > index
> > >
> > 1df36af0541c01f3624fe51efbc8cfa0ec67fe60..e9fea04fb148ed53c1ac9b2c6ed7
> > > 3e85ba982b42 100644
> > > > --- a/gcc/testsuite/gcc.target/i386/pr90178.c
> > > > +++ b/gcc/testsuite/gcc.target/i386/pr90178.c
> > > > @@ -4,6 +4,7 @@
> > > > int*
> > > > find_ptr (int* mem, int sz, int val)
> > > > {
> > > > +#pragma GCC novector
> > > > for (int i = 0; i < sz; i++)
> > > > if (mem[i] == val)
> > > > return &mem[i];
> > > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > > > index
> > >
> > 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..c85df96685f64f9814251f2d4f
> > > dbcc5973f2b513 100644
> > > > --- a/gcc/tree-vect-data-refs.cc
> > > > +++ b/gcc/tree-vect-data-refs.cc
> > > > @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences
> > (loop_vec_info
> > > loop_vinfo)
> > > > if (is_gimple_debug (stmt))
> > > > continue;
> > > >
> > > > - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > > + stmt_vec_info stmt_vinfo
> > > > + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> > > > auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > > if (!dr_ref)
> > > > continue;
> > > > @@ -748,26 +749,14 @@ vect_analyze_early_break_dependences
> > > (loop_vec_info loop_vinfo)
> > > > bounded by VF so accesses are within range. We only need to check
> > > > the reads since writes are moved to a safe place where if we get
> > > > there we know they are safe to perform. */
> > > > - if (DR_IS_READ (dr_ref)
> > > > - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > > > + if (DR_IS_READ (dr_ref))
> > > > {
> > > > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> > > > - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> > > > - {
> > > > - const char *msg
> > > > - = "early break not supported: cannot peel "
> > > > - "for alignment, vectorization would read out of "
> > > > - "bounds at %G";
> > > > - return opt_result::failure_at (stmt, msg, stmt);
> > > > - }
> > > > -
> > > > - dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> > > > - dr_info->need_peeling_for_alignment = true;
> > > > + dr_set_peeling_alignment (stmt_vinfo, true);
> > > >
> > > > if (dump_enabled_p ())
> > > > dump_printf_loc (MSG_NOTE, vect_location,
> > > > - "marking DR (read) as needing peeling for "
> > > > - "alignment at %G", stmt);
> > > > + "marking DR (read) as possibly needing peeling "
> > > > + "for alignment at %G", stmt);
> > > > }
> > > >
> > > > if (DR_IS_READ (dr_ref))
> > > > @@ -1326,9 +1315,6 @@ vect_record_base_alignments (vec_info *vinfo)
> > > > Compute the misalignment of the data reference DR_INFO when vectorizing
> > > > with VECTYPE.
> > > >
> > > > - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
> > > > - be set appropriately on failure (but is otherwise left unchanged).
> > > > -
> > > > Output:
> > > > 1. initialized misalignment info for DR_INFO
> > > >
> > > > @@ -1337,7 +1323,7 @@ vect_record_base_alignments (vec_info *vinfo)
> > > >
> > > > static void
> > > > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> > > > - tree vectype, opt_result *result = nullptr)
> > > > + tree vectype)
> > > > {
> > > > stmt_vec_info stmt_info = dr_info->stmt;
> > > > vec_base_alignments *base_alignments = &vinfo->base_alignments;
> > > > @@ -1365,63 +1351,20 @@ vect_compute_data_ref_alignment (vec_info
> > > *vinfo, dr_vec_info *dr_info,
> > > > = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> > > > BITS_PER_UNIT);
> > > >
> > > > - /* If this DR needs peeling for alignment for correctness, we must
> > > > - ensure the target alignment is a constant power-of-two multiple of the
> > > > - amount read per vector iteration (overriding the above hook where
> > > > - necessary). */
> > > > - if (dr_info->need_peeling_for_alignment)
> > > > + /* If we have a grouped access we require that the alignment be VF * elem.
> > */
> > > > + if (loop_vinfo
> > > > + && dr_peeling_alignment (stmt_info)
> > > > + && STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > {
> > > > - /* Vector size in bytes. */
> > > > - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> > > > -
> > > > - /* We can only peel for loops, of course. */
> > > > - gcc_checking_assert (loop_vinfo);
> > > > -
> > > > - /* Calculate the number of vectors read per vector iteration. If
> > > > - it is a power of two, multiply through to get the required
> > > > - alignment in bytes. Otherwise, fail analysis since alignment
> > > > - peeling wouldn't work in such a case. */
> > > > - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > > - if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > - num_scalars *= DR_GROUP_SIZE (stmt_info);
> > > > -
> > > > - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> > > > - if (!pow2p_hwi (num_vectors))
> > > > - {
> > > > - *result = opt_result::failure_at (vect_location,
> > > > - "non-power-of-two num vectors %u "
> > > > - "for DR needing peeling for "
> > > > - "alignment at %G",
> > > > - num_vectors, stmt_info->stmt);
> > > > - return;
> > > > - }
> > > > -
> > > > - safe_align *= num_vectors;
> > > > - if (maybe_gt (safe_align, 4096U))
> > > > - {
> > > > - pretty_printer pp;
> > > > - pp_wide_integer (&pp, safe_align);
> > > > - *result = opt_result::failure_at (vect_location,
> > > > - "alignment required for correctness"
> > > > - " (%s) may exceed page size",
> > > > - pp_formatted_text (&pp));
> > > > - return;
> > > > - }
> > > > -
> > > > - unsigned HOST_WIDE_INT multiple;
> > > > - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> > > > - || !pow2p_hwi (multiple))
> > > > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > > + vector_alignment
> > > > + = vf * TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> > >
> > > I think we discussed this before, also when introducing peeling
> > > for alignment support. This is incorrect for grouped accesses where
> > > the number of scalar elements accessed is GROUP_SIZE * vf, so you
> > > miss a multiplication by GROUP_SIZE here.
> >
> > Huh, But doesn't your VF already contain your group size? If I have an LD4 on V4SI
> > My VF is 16 isn't it? Because I still handled 16 elements per iteration.
> >
> > So why would I need 4 * 16?
> >
>
> *sigh* nevermind. I was using this loop to check:
>
> char vect_a[1025];
> char vect_b[1025];
> unsigned test4(char x, int n)
> {
> unsigned ret = 0;
> for (int i = 1; i < (n - 2); i+=2)
> {
> if (vect_a[i] > x || vect_a[i+1] > x)
> return 1;
> vect_b[i] = x;
> vect_b[i+1] = x+1;
> }
> return ret;
> }
>
> And I was expecting 32-bytes alignment, but then misread:
>
>
> .align 4
> .set .LANCHOR0,. + 0
> .type vect_b, %object
> .size vect_b, 1025
> vect_b:
>
> so yes I need a GROUP_SIZE *.
>
> > >
> > > Note that this (and also your VF * element_size) can result in a
> > > non-power-of-two value.
> > >
> > > That said, I'm quite sure we don't want to have a dr->target_alignment
> > > that isn't power-of-two, so if the comput doesn't end up with a
> > > power-of-two value we should leave it as the target prefers and
> > > fixup (or fail) during vectorizable_load.
> >
> > Ack I'll round up to power of 2.
> >
> > >
> > > > + if (dump_enabled_p ())
> > > > {
> > > > - if (dump_enabled_p ())
> > > > - {
> > > > - dump_printf_loc (MSG_NOTE, vect_location,
> > > > - "forcing alignment for DR from preferred (");
> > > > - dump_dec (MSG_NOTE, vector_alignment);
> > > > - dump_printf (MSG_NOTE, ") to safe align (");
> > > > - dump_dec (MSG_NOTE, safe_align);
> > > > - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> > > > - }
> > > > - vector_alignment = safe_align;
> > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > + "alignment increased due to early break to ");
> > > > + dump_dec (MSG_NOTE, vector_alignment);
> > > > + dump_printf (MSG_NOTE, " bytes.\n");
> > > > }
> > > > }
> > > >
> > > > @@ -2487,6 +2430,8 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > > loop_vinfo)
> > > > || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> > > > loop_preheader_edge (loop))
> > > > || loop->inner
> > > > + /* We don't currently maintaing the LCSSA for prologue peeled inversed
> > > > + loops. */
> > > > || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > > > do_peeling = false;
> > > >
> > > > @@ -2950,12 +2895,9 @@ vect_analyze_data_refs_alignment (loop_vec_info
> > > loop_vinfo)
> > > > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> > > > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
> > > > continue;
> > > > - opt_result res = opt_result::success ();
> > > > +
> > > > vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> > > > - STMT_VINFO_VECTYPE (dr_info->stmt),
> > > > - &res);
> > > > - if (!res)
> > > > - return res;
> > > > + STMT_VINFO_VECTYPE (dr_info->stmt));
> > > > }
> > > > }
> > > >
> > > > @@ -7219,7 +7161,7 @@ vect_supportable_dr_alignment (vec_info *vinfo,
> > > dr_vec_info *dr_info,
> > > >
> > > > if (misalignment == 0)
> > > > return dr_aligned;
> > > > - else if (dr_info->need_peeling_for_alignment)
> > > > + else if (dr_peeling_alignment (stmt_info))
> > > > return dr_unaligned_unsupported;
> > > >
> > > > /* For now assume all conditional loads/stores support unaligned
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index
> > >
> > 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..436d373ae6ec06aff165a7bee3
> > > 7b3fa1dc95079b 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -2597,6 +2597,89 @@ get_load_store_type (vec_info *vinfo,
> > > stmt_vec_info stmt_info,
> > > > return false;
> > > > }
> > > >
> > > > + /* If this DR needs peeling for alignment for correctness, we must
> > > > + ensure the target alignment is a constant power-of-two multiple of the
> > > > + amount read per vector iteration (overriding the above hook where
> > > > + necessary). */
> > > > + if (dr_peeling_alignment (stmt_info))
> > > > + {
> > > > + /* We can only peel for loops, of course. */
> > > > + gcc_checking_assert (loop_vinfo);
> > > > +
> > > > + /* Check if we support the operation if early breaks are needed. */
> > > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > + && (*memory_access_type == VMAT_GATHER_SCATTER
> > > > + || *memory_access_type == VMAT_STRIDED_SLP))
> > > > + {
> > > > + if (dump_enabled_p ())
> > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > + "early break not supported: cannot peel for "
> > > > + "alignment. With non-contiguous memory
> > > vectorization"
> > > > + " could read out of bounds at %G ",
> > > > + STMT_VINFO_STMT (stmt_info));
> > > > + return false;
> > > > + }
> > > > +
> > > > + /* Even if uneven group sizes are aligned on the first load, the second
> > > > + iteration won't be. As such reject uneven group sizes. */
> > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
> > > > + && (DR_GROUP_SIZE (stmt_info) % 2) == 1)
> > >
> > > Hmm, but a group size of 6 is even, but a vector size of four doesn't
> > > make the 2nd aligned. So we need a power-of-two GROUP_SIZE * VF
> > > and a byte alignment according to that.
> > >
> >
> > Argg true.
> >
> >
> > > > + {
> > > > + if (dump_enabled_p ())
> > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > + "early break not supported: uneven group size, "
> > > > + "vectorization could read out of bounds at %G ",
> > > > + STMT_VINFO_STMT (stmt_info));
> > > > + return false;
> > > > + }
> > > > +
> > > > + /* Vector size in bytes. */
> > > > + poly_uint64 safe_align;
> > > > + if (nunits.is_constant ())
> > > > + safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> > > > + else
> > > > + safe_align = estimated_poly_value (LOOP_VINFO_VECT_FACTOR
> > > (loop_vinfo),
> > > > + POLY_VALUE_MAX);
> > > > +
> > > > + auto num_vectors = ncopies;
> > > > + if (!pow2p_hwi (num_vectors))
> > > > + {
> > > > + if (dump_enabled_p ())
> > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > + "non-power-of-two num vectors %u "
> > > > + "for DR needing peeling for "
> > > > + "alignment at %G",
> > > > + num_vectors, STMT_VINFO_STMT (stmt_info));
> > > > + return false;
> > > > + }
> > > > +
> > > > + safe_align *= num_vectors;
> > > > + bool inbounds
> > > > + = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
> > > > + DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
> > >
> > > I'm again confused why you think ref_within_array_bound can be used to
> > > validize anything?
> >
> > The goal here is that if we have a value that is aligned, and we don't exceed the
> > page
> > And we're able to increase the target alignment that VLA is safe to use.
> >
> > Since we can't peel for SVE any unaligned value can't be handled later. But as I
> > mentioned,
> > To SVE nothing is unaligned, as the target alignment is the element size.
> >
> > For non-SVE unaligned accesses get processed later on. For SVE you just generate
> > wrong code.
> > Since we know we can realign known sized buffers (since we'll, they're .data) then
> > we know we
> > can let it through.
> >
> > I don't really know of a better way to do this. Since again, for VLA the backend
> > never returns that
> > the load is misaligned, so it doesn't stop vectorization.
> >
> > The inbounds is used as a proxy for that. Which is what the comment below was
> > trying to explain.
>
> To explain this some more.
>
> This loop is clearly safe to vectorize:
>
> char vect_a[1025];
> char vect_b[1025];
> unsigned test4(char x, int n)
> {
> unsigned ret = 0;
> for (int i = 0; i < 1023; i+=2)
> {
> if (vect_a[i] > x || vect_a[i+1] > x)
> return 1;
> vect_b[i] = x;
> vect_b[i+1] = x+1;
> }
> return ret;
> }
>
> But I cannot distinguish between this case, and cases where we've established it's unsafe
> as you need to peel for alignment, or increase the data alignment.
>
> SVE will always return an alignment of the element type. This means after vect_enhance_data_refs_alignment
> the calculated misalignment is always zero, which means that dr_peeling_alignment is always ignored in
> vect_supportable_dr_alignment which means, the code is never marked as needing peeling for alignment,
> *nor* needing versioning.
>
> This did not seem right to me. As a result, as the known size cases are always ok, (You can never overread the size of
> the buffer as the predicate would just make the last iteration partial) then the known inbounds cases are safe for VLA
> due to predication.
Ah, so you bring up predication again. So indeed predication should
guarantee that accesses that were in-bound in the scalar code will
stay in-bound with loop-predicated vector code.
I think we can make that argument in general, not just for SVE, and thus
instead of requiring alignment, require predication via
LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P. That said,
we should mark DRs that we might access speculatively due to early-break
vectorization. We now use ->need_peeling_for_alignment which IMO is
a bad name in some places you check this in your patch - I'd say
rename that to ->safe_speculative_read_required or so - and make
that an incentive for alignment peeling to try aligning it. And make
get_load_store_type check whether such an access is aligned according
to VF * group_size * element_size and if not, _and_ the scalar access
is statically known to be in-bounds, require
LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P. And if neither, fail.
Note there's the case of a gap at the end - peeling for gaps - which
is not solved by loop masking (I'm not 100% positively sure the current
condition captures the VLA issue fully, but ...).
> Increasing alignment also doesn't really work for VLA because you'd have to way overalign
> To get a safe value (i.e. align to maximum vector size) which a e.g. an LD2 already doesn't make much sense.
Yes, we know that for VLA we need to read the architectural vector size
to compute the desired alignment. And masking might not do it when
we need multiple vectors for an access(?).
> At least, this is what I was trying to explain in the comment below it's use. As far as I can tell, the only cases where
> an inbounds VLA case would fault, is an incorrect program where scalar just happened to exit before reading memory
> we were told was safe to read.
>
>
> Thanks,
> Tamar
> >
> > >
> > > > + /* For VLA we have to insert a runtime check that the vector loads
> > > > + per iterations don't exceed a page size. For now we can use
> > > > + POLY_VALUE_MAX as a proxy as we can't peel for VLA. */
> > > > + if (maybe_gt (safe_align, (unsigned)param_min_pagesize)
> > > > + /* We don't support PFA for VLA at the moment. Some targets like SVE
> > > > + return a target alignment requirement of a single element. For
> > > > + early break this is potentially unsafe so we can't count on
> > > > + alignment rejecting such loops later as it thinks loads are never
> > > > + misaligned. */
> > > > + || (!nunits.is_constant () && !inbounds))
> > > > + {
> > > > + if (dump_enabled_p ())
> > > > + {
> > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > + "alignment required for correctness (");
> > > > + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> > > > + dump_printf (MSG_NOTE, ") may exceed page size\n");
> > > > + }
> > > > + return false;
> > > > + }
> > > > + *alignment_support_scheme = dr_unaligned_supported;
> > >
> > > and the only thing should be *alignment_support_scheme == dr_aligned,
> > > and with a possibly too low taget_alignment even that's not enough.
> > >
> >
> > Fair, but you shouldn't be able to get there with a too low target alignment though.
> >
> > > Can you split out the testsuite part that just adds #pragma GCC novector?
> > > That part is OK.
> > >
> >
> > Ok,
> > Tamar
> >
> > > Thanks,
> > > Richard.
> > >
> > > > + }
> > > > +
> > > > if (*alignment_support_scheme == dr_unaligned_unsupported)
> > > > {
> > > > if (dump_enabled_p ())
> > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > index
> > >
> > b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..aeaf714c155bc2d87bf50e6dba
> > > 0dbfbcca027441 100644
> > > > --- a/gcc/tree-vectorizer.h
> > > > +++ b/gcc/tree-vectorizer.h
> > > > @@ -1998,6 +1998,33 @@ dr_target_alignment (dr_vec_info *dr_info)
> > > > }
> > > > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> > > >
> > > > +/* Return if the stmt_vec_info requires peeling for alignment. */
> > > > +inline bool
> > > > +dr_peeling_alignment (stmt_vec_info stmt_info)
> > > > +{
> > > > + dr_vec_info *dr_info;
> > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT
> > (stmt_info));
> > > > + else
> > > > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > > +
> > > > + return dr_info->need_peeling_for_alignment;
> > > > +}
> > > > +
> > > > +/* Set the need_peeling_for_alignment for the the stmt_vec_info, if group
> > > > + access then set on the fist element otherwise set on DR directly. */
> > > > +inline void
> > > > +dr_set_peeling_alignment (stmt_vec_info stmt_info, bool requires_alignment)
> > > > +{
> > > > + dr_vec_info *dr_info;
> > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT
> > (stmt_info));
> > > > + else
> > > > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > > +
> > > > + dr_info->need_peeling_for_alignment = requires_alignment;
> > > > +}
> > > > +
> > > > inline void
> > > > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> > > > {
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>
Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> That said, I'm quite sure we don't want to have a dr->target_alignment
>> that isn't power-of-two, so if the comput doesn't end up with a
>> power-of-two value we should leave it as the target prefers and
>> fixup (or fail) during vectorizable_load.
>
> Ack I'll round up to power of 2.
I don't think that's enough. Rounding up 3 would give 4, but a group
size of 3 would produce vector iterations that start at 0, 3X, 6X, 9X, 12X
for some X. [3X, 6X) and [6X, 9X) both straddle a 4X alignment boundary.
Thanks,
Richard
> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Thursday, February 13, 2025 4:55 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Biener <rguenther@suse.de>; gcc-patches@gcc.gnu.org; nd
> <nd@arm.com>
> Subject: Re: [PATCH v2]middle-end: delay checking for alignment to load
> [PR118464]
>
> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> -----Original Message-----
> >> That said, I'm quite sure we don't want to have a dr->target_alignment
> >> that isn't power-of-two, so if the comput doesn't end up with a
> >> power-of-two value we should leave it as the target prefers and
> >> fixup (or fail) during vectorizable_load.
> >
> > Ack I'll round up to power of 2.
>
> I don't think that's enough. Rounding up 3 would give 4, but a group
> size of 3 would produce vector iterations that start at 0, 3X, 6X, 9X, 12X
> for some X. [3X, 6X) and [6X, 9X) both straddle a 4X alignment boundary.
>
Indeed, instead of rounding up I just reject the non-power of 2 alignment
requests in vectorizable_load as Richi originally requested. I thought I could
get it to work better by rounding up but it doesn't seem worth it.
Cheers,
Tamar
> Thanks,
> Richard
On Fri, 28 Feb 2025, Tamar Christina wrote:
> Hi All,
>
> This fixes two PRs on Early break vectorization by delaying the safety checks to
> vectorizable_load when the VF, VMAT and vectype are all known.
>
> This patch does add two new restrictions:
>
> 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
> group sizes, as they are unaligned every n % 2 iterations and so may cross
> a page unwittingly.
>
> 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if
> we cannot peel for alignment, as the alignment requirement is quite large at
> GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
> don't support it for now.
>
> There are other steps documented inside the code itself so that the reasoning
> is next to the code.
>
> As a fall-back, when the alignment fails we require partial vector support.
>
> For VLA targets like SVE return element alignment as the desired vector
> alignment. This means that the loads are never misaligned and so annoying it
> won't ever need to peel.
>
> So what I think needs to happen in GCC 16 is that.
>
> 1. during vect_compute_data_ref_alignment we need to take the max of
> POLY_VALUE_MIN and vector_alignment.
>
> 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a
> check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use as a
> proxy for pagesize.
>
> 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
> vect_determine_partial_vectors_and_peeling since the first iteration has to
> be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
> vectorize.
>
> 4. Create a default mask to be used, so that vect_use_loop_mask_for_alignment_p
> becomes true and we generate the peeled check through loop control for
> partial loops. From what I can tell this won't work for
> LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support at
> all in the compiler. That would need to be done independently from the
> above.
>
> In any case, not GCC 15 material so I've kept the WIP patches I have downstream.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR tree-optimization/118464
> PR tree-optimization/116855
> * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
> * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
> checks.
> (vect_compute_data_ref_alignment): Remove alignment checks and move to
> get_load_store_type, increase group access alignment.
> (vect_enhance_data_refs_alignment): Add note to comment needing
> investigating.
> (vect_analyze_data_refs_alignment): Likewise.
> (vect_supportable_dr_alignment): For group loads look at first DR.
> * tree-vect-stmts.cc (get_load_store_type):
> Perform safety checks for early break pfa.
> * tree-vectorizer.h (dr_set_safe_speculative_read_required,
> dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New.
> (need_peeling_for_alignment): Renamed to...
> (safe_speculative_read_required): .. This
> (class dr_vec_info): Add scalar_access_known_in_bounds.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/118464
> PR tree-optimization/116855
> * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
> load type is relaxed later.
> * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> * gcc.dg/vect/vect-early-break_22.c: Require partial vectors.
> * gcc.dg/vect/vect-early-break_128.c: Likewise.
> * gcc.dg/vect/vect-early-break_26.c: Likewise.
> * gcc.dg/vect/vect-early-break_43.c: Likewise.
> * gcc.dg/vect/vect-early-break_44.c: Likewise.
> * gcc.dg/vect/vect-early-break_2.c: Require load_lanes.
> * gcc.dg/vect/vect-early-break_7.c: Likewise.
> * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
> * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment.
> * gcc.dg/vect/vect-early-break_53.c: Likewise.
> * gcc.dg/vect/vect-early-break_56.c: Likewise.
> * gcc.dg/vect/vect-early-break_57.c: Likewise.
> * gcc.dg/vect/vect-early-break_81.c: Likewise.
>
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 6f8bf3923863dee9ed35b0497f1ef58a65726701..a4c62e50785362c93de31ac44f4fb5cbf4d1e1ee 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17260,7 +17260,7 @@ Maximum number of relations the oracle will register in a basic block.
> Work bound when discovering transitive relations from existing relations.
>
> @item min-pagesize
> -Minimum page size for warning purposes.
> +Minimum page size for warning and early break vectorization purposes.
>
> @item openacc-kernels
> Specify mode of OpenACC `kernels' constructs handling.
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> index 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d93c950629f3231554 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> @@ -55,7 +55,9 @@ int main()
> }
> }
> rephase ();
> +#pragma GCC novector
> for (i = 0; i < 32; ++i)
> +#pragma GCC novector
> for (j = 0; j < 3; ++j)
> #pragma GCC novector
> for (k = 0; k < 3; ++k)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> index 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..f99c57be0adc4d49035b8a75c72d4a5b04cc05c7 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> @@ -5,7 +5,8 @@
> /* { dg-additional-options "-O3" } */
> /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* Arm and -m32 create a group size of 3 here, which we can't support yet. AArch64 makes elementwise accesses here. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { aarch64*-*-* } } } } */
>
> typedef struct filter_list_entry {
> const char *name;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
> index 6d7fb920ec2de529a4aa1de2c4a04286989204fd..ed6baf2d451f3887076a1e9143035363128efe70 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
> @@ -3,7 +3,8 @@
> /* { dg-require-effective-target vect_early_break } */
> /* { dg-require-effective-target vect_int } */
>
> -/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-not "vectorizing stmts using SLP" "vect" { target { ! vect_partial_vectors } } } } */
> /* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
>
> #ifndef N
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8ac83ab569fc9fbde126
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +int a, b, c, d, e, f;
> +short g[1];
> +int main() {
> + int h;
> + while (a) {
> + while (h)
> + ;
> + for (b = 2; b; b--) {
> + while (c)
> + ;
> + f = g[a];
> + if (d)
> + break;
> + }
> + while (e)
> + ;
> + }
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..dc771186efafe25bb65490da7a383ad7f6ceb0a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020];
> +
> +char * find(int n, char c)
> +{
> + for (int i = 1; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..dd05046982524f15662be8df517716b581b8a2d9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> @@ -0,0 +1,25 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* Alignment requirement too big, load lanes targets can't safely vectorize this. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { vect_partial_vectors || vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! { vect_partial_vectors || vect_load_lanes } } } } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" { target { ! { vect_partial_vectors || vect_load_lanes } } } } } */
> +
> +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < (n - 2); i+=2)
> + {
> + if (vect_a[i] > x || vect_a[i+2] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758ca29a5f3f9d3f6e0d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020];
> +
> +char * find(int n, char c)
> +{
> + for (int i = 0; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..374a051b945e97eedb9be9da423cf54b5e564d6f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020] __attribute__((aligned(1)));
> +
> +char * find(int n, char c)
> +{
> + for (int i = 1; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f257ceea1c065fcc6ae9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020] __attribute__((aligned(1)));
> +
> +char * find(int n, char c)
> +{
> + for (int i = 0; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ca95be44e92e32769da1d1e9b740ae54682a3d55
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char *vect, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < n; i++)
> + {
> + if (vect[i] > x)
> + return 1;
> +
> + vect[i] = x;
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c64a61c97b1b6268743
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char *vect_a, char *vect_b, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 1; i < n; i++)
> + {
> + if (vect_a[i] > x || vect_b[i] > x)
> + return 1;
> +
> + vect_a[i] = x;
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..cf76c7109edce15f860cdc27e10850ef5a31fc9a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* This should be vectorizable through load_lanes and linear targets. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_load_lanes } } } */
> +
> +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < n; i+=2)
> + {
> + if (vect_a[i] > x || vect_a[i+1] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..25d3a62356baf127c89187b150810e4d31567c6f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> @@ -0,0 +1,26 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_partial_vectors } } } } */
> +
> +char vect_a[1025];
> +char vect_b[1025];
> +
> +unsigned test4(char x, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 1; i < (n - 2); i+=2)
> + {
> + if (vect_a[i] > x || vect_a[i+1] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..10eb98b726acb32a0d1de4daf202724995bfa1a6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> @@ -0,0 +1,29 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* Group size is uneven and second group is misaligned. Needs partial vectors. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_partial_vectors } } } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
> +
> +
> +char vect_a[1025];
> +char vect_b[1025];
> +
> +unsigned test4(char x, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 1; i < (n - 2); i+=2)
> + {
> + if (vect_a[i-1] > x || vect_a[i+2] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> index dec0b492ab883de6e02944a95fd554a109a68a39..8f5ccc45ce06ed36627107e080d633e55e254fa0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> @@ -5,7 +5,9 @@
>
> /* { dg-additional-options "-Ofast" } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! "arm*-*-*" } } } } */
> +/* Complex numbers read x and x+1, which on non-load lanes targets require partial loops. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! "arm*-*-*" } && vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { { "arm*-*-*" } || { ! vect_load_lanes } } } } } */
>
> #include <complex.h>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> index b3f5984f682f30f79331d48a264c2cc4af3e2503..f8f84fab97ab586847000af8b89448b0885ef5fc 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> @@ -42,4 +42,6 @@ main ()
> return 0;
> }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> +/* This will fail because we cannot SLP the load groups yet. */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { ! vect_partial_vectors } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
> index 47d2a50218bd1b32fe43edcaaabb1079d0b26223..643016b2ccfea29ba36d65c8070f255cb8179481 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
> @@ -41,4 +41,6 @@ main ()
> return 0;
> }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> +/* This will fail because we cannot SLP the load groups yet. */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { ! vect_partial_vectors } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> index 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46a462238c3b5825ef 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n)
> return ret;
> }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> +/* cannot safely vectorize this due due to the group misalignment. */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
> index 7e9f635a0b5a8f6fb5da5d7cc6a426f343af4b56..0cfa2428cc61d5f4ea0784367acea6436736970f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
> @@ -27,4 +27,6 @@ unsigned test4(unsigned x)
> return ret;
> }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> +/* This will fail because we cannot SLP the load groups yet. */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" { target { ! vect_partial_vectors } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
> index 7e9f635a0b5a8f6fb5da5d7cc6a426f343af4b56..0cfa2428cc61d5f4ea0784367acea6436736970f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
> @@ -27,4 +27,6 @@ unsigned test4(unsigned x)
> return ret;
> }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> +/* This will fail because we cannot SLP the load groups yet. */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" { target { ! vect_partial_vectors } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> index a02d5986ba3cfc117b19305c5e96711299996931..d4fd0d39a25a5659e3d9452b79f3e0fabba8b3c0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> @@ -2,6 +2,7 @@
> /* { dg-do compile } */
> /* { dg-require-effective-target vect_early_break } */
> /* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_partial_vectors } */
>
> void abort ();
> int a[64], b[64];
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> index 9096f66647c7b3cb430562d35f8ce076244f7c11..b35e737fa3b9137cd745c14f7ad915a3f81c38c4 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> @@ -4,6 +4,7 @@
> /* { dg-require-effective-target vect_int } */
> /* { dg-add-options bind_pic_locally } */
> /* { dg-require-effective-target vect_early_break_hw } */
> +/* { dg-require-effective-target vect_partial_vectors } */
>
> #include <stdarg.h>
> #include "tree-vect.h"
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> index 319bd125c3156f13c300ff2b94d269bb9ec29e97..a4886654f152b2c0568286febea2b31cb7be8499 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> @@ -5,8 +5,9 @@
>
> /* { dg-additional-options "-Ofast" } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
> +/* Multiple loads of different alignments, we can't peel this. */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */
>
> void abort ();
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> index 7b870e9c60dcac6164d879dd70c1fc07ec0221fe..c7cce81f52c80d83bd2c1face8cbd13f93834531 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> @@ -5,7 +5,9 @@
>
> /* { dg-additional-options "-Ofast" } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* This will fail because we cannot SLP the load groups yet. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_partial_vectors } } } } */
>
> #define N 1024
> unsigned vect_a[N];
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> index d218a0686719fee4c167684dcf26402851b53260..34d187483320b9cc215304b73e28d45d7031516e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> @@ -5,7 +5,10 @@
>
> /* { dg-additional-options "-Ofast" } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! "arm*-*-*" } } } } */
> +/* Complex numbers read x and x+1, which on non-load lanes targets require partial loops. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! "arm*-*-*" } && vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { { "arm*-*-*" } || { ! vect_load_lanes } } } } } */
> +
>
> #include <complex.h>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> index 8a8c076ba92ca6fef419cb23b457a23555c61c64..b58a4611d6b8d86f0247d9ea44ab4750473589a9 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> @@ -5,8 +5,9 @@
>
> /* { dg-additional-options "-Ofast" } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
> +/* Multiple loads with different misalignments. Can't peel need partial loop support. */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */
> void abort ();
>
> unsigned short sa[32];
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..9949fc3d98852399242a96095f4dae5ffe7613b3 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -731,7 +731,9 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> if (is_gimple_debug (stmt))
> continue;
>
> - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> + stmt_vec_info stmt_vinfo
> + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> + stmt = STMT_VINFO_STMT (stmt_vinfo);
> auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> if (!dr_ref)
> continue;
> @@ -748,26 +750,16 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> bounded by VF so accesses are within range. We only need to check
> the reads since writes are moved to a safe place where if we get
> there we know they are safe to perform. */
> - if (DR_IS_READ (dr_ref)
> - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> + if (DR_IS_READ (dr_ref))
> {
> - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> - {
> - const char *msg
> - = "early break not supported: cannot peel "
> - "for alignment, vectorization would read out of "
> - "bounds at %G";
> - return opt_result::failure_at (stmt, msg, stmt);
> - }
> -
> - dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> - dr_info->need_peeling_for_alignment = true;
> + dr_set_safe_speculative_read_required (stmt_vinfo, true);
> + bool inbounds = ref_within_array_bound (stmt, DR_REF (dr_ref));
> + DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_vinfo)) = inbounds;
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> - "marking DR (read) as needing peeling for "
> - "alignment at %G", stmt);
> + "marking DR (read) as possibly needing peeling "
> + "for alignment at %G", stmt);
> }
>
> if (DR_IS_READ (dr_ref))
> @@ -1326,9 +1318,6 @@ vect_record_base_alignments (vec_info *vinfo)
> Compute the misalignment of the data reference DR_INFO when vectorizing
> with VECTYPE.
>
> - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
> - be set appropriately on failure (but is otherwise left unchanged).
> -
> Output:
> 1. initialized misalignment info for DR_INFO
>
> @@ -1337,7 +1326,7 @@ vect_record_base_alignments (vec_info *vinfo)
>
> static void
> vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> - tree vectype, opt_result *result = nullptr)
> + tree vectype)
> {
> stmt_vec_info stmt_info = dr_info->stmt;
> vec_base_alignments *base_alignments = &vinfo->base_alignments;
> @@ -1365,63 +1354,29 @@ vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> BITS_PER_UNIT);
>
> - /* If this DR needs peeling for alignment for correctness, we must
> - ensure the target alignment is a constant power-of-two multiple of the
> - amount read per vector iteration (overriding the above hook where
> - necessary). */
> - if (dr_info->need_peeling_for_alignment)
> + if (loop_vinfo
> + && dr_safe_speculative_read_required (stmt_info))
> {
> - /* Vector size in bytes. */
> - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> -
> - /* We can only peel for loops, of course. */
> - gcc_checking_assert (loop_vinfo);
> -
> - /* Calculate the number of vectors read per vector iteration. If
> - it is a power of two, multiply through to get the required
> - alignment in bytes. Otherwise, fail analysis since alignment
> - peeling wouldn't work in such a case. */
> - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> + auto vectype_size
> + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> + poly_uint64 new_alignment = vf * vectype_size;
> + /* If we have a grouped access we require that the alignment be N * elem. */
> if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> - num_scalars *= DR_GROUP_SIZE (stmt_info);
> + new_alignment *= DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
>
> - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> - if (!pow2p_hwi (num_vectors))
> - {
> - *result = opt_result::failure_at (vect_location,
> - "non-power-of-two num vectors %u "
> - "for DR needing peeling for "
> - "alignment at %G",
> - num_vectors, stmt_info->stmt);
> - return;
> - }
> -
> - safe_align *= num_vectors;
> - if (maybe_gt (safe_align, 4096U))
> - {
> - pretty_printer pp;
> - pp_wide_integer (&pp, safe_align);
> - *result = opt_result::failure_at (vect_location,
> - "alignment required for correctness"
> - " (%s) may exceed page size",
> - pp_formatted_text (&pp));
> - return;
> - }
> -
> - unsigned HOST_WIDE_INT multiple;
> - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> - || !pow2p_hwi (multiple))
> + unsigned HOST_WIDE_INT target_alignment;
> + if (new_alignment.is_constant (&target_alignment)
> + && pow2p_hwi (target_alignment))
> {
> if (dump_enabled_p ())
> {
> dump_printf_loc (MSG_NOTE, vect_location,
> - "forcing alignment for DR from preferred (");
> - dump_dec (MSG_NOTE, vector_alignment);
> - dump_printf (MSG_NOTE, ") to safe align (");
> - dump_dec (MSG_NOTE, safe_align);
> - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> + "alignment increased due to early break to ");
> + dump_dec (MSG_NOTE, new_alignment);
> + dump_printf (MSG_NOTE, " bytes.\n");
> }
> - vector_alignment = safe_align;
> + vector_alignment = target_alignment;
> }
> }
>
> @@ -2487,6 +2442,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
> || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> loop_preheader_edge (loop))
> || loop->inner
> + /* We don't currently maintaing the LCSSA for prologue peeled inversed
> + loops. */
> || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> do_peeling = false;
>
> @@ -2950,12 +2907,9 @@ vect_analyze_data_refs_alignment (loop_vec_info loop_vinfo)
> if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
> continue;
> - opt_result res = opt_result::success ();
> +
> vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> - STMT_VINFO_VECTYPE (dr_info->stmt),
> - &res);
> - if (!res)
> - return res;
> + STMT_VINFO_VECTYPE (dr_info->stmt));
> }
> }
>
> @@ -7219,7 +7173,7 @@ vect_supportable_dr_alignment (vec_info *vinfo, dr_vec_info *dr_info,
>
> if (misalignment == 0)
> return dr_aligned;
> - else if (dr_info->need_peeling_for_alignment)
> + else if (dr_safe_speculative_read_required (stmt_info))
> return dr_unaligned_unsupported;
>
> /* For now assume all conditional loads/stores support unaligned
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..b661deeeed400e5826fc1c4f70957b335d1741fa 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2597,6 +2597,128 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
> return false;
> }
>
> + /* If this DR needs alignment for correctness, we must ensure the target
> + alignment is a constant power-of-two multiple of the amount read per
> + vector iteration or force masking. */
> + if (dr_safe_speculative_read_required (stmt_info))
> + {
> + /* We can only peel for loops, of course. */
> + gcc_checking_assert (loop_vinfo);
> +
> + /* Check if we support the operation if early breaks are needed. Here we
> + must ensure that we don't access any more than the scalar code would
> + have. A masked operation would ensure this, so for these load types
> + force masking. */
> + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> + && (*memory_access_type == VMAT_GATHER_SCATTER
> + || *memory_access_type == VMAT_STRIDED_SLP))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "early break not supported: cannot peel for "
> + "alignment. With non-contiguous memory vectorization"
> + " could read out of bounds at %G ",
> + STMT_VINFO_STMT (stmt_info));
> + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> + }
> +
> + auto target_alignment
> + = DR_TARGET_ALIGNMENT (STMT_VINFO_DR_INFO (stmt_info));
> + unsigned HOST_WIDE_INT target_align;
> + bool inbounds
> + = DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_info));
> +
> + /* If the scalar loop is known to be in bounds, and we're using scalar
> + accesses then there's no need to check further. */
> + if (inbounds
> + && *memory_access_type == VMAT_ELEMENTWISE)
> + {
> + *alignment_support_scheme = dr_aligned;
Nothing should look at *alignment_support_scheme for VMAT_ELEMENTWISE.
Did you actually need this adjustment?
> + return true;
> + }
> +
> + bool group_aligned = false;
> + if (*alignment_support_scheme == dr_aligned
> + && target_alignment.is_constant (&target_align)
> + && nunits.is_constant ())
> + {
> + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> + auto vectype_size
> + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> + poly_uint64 required_alignment = vf * vectype_size;
> + /* If we have a grouped access we require that the alignment be N * elem. */
> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> + required_alignment *=
> + DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
> + if (!multiple_p (target_alignment, required_alignment))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "desired alignment %wu not met. Instead got %wu "
> + "for DR alignment at %G",
> + required_alignment.to_constant (),
> + target_align, STMT_VINFO_STMT (stmt_info));
> + return false;
> + }
> +
> + if (!pow2p_hwi (target_align))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "non-power-of-two vector alignment %wd "
> + "for DR alignment at %G",
> + target_align, STMT_VINFO_STMT (stmt_info));
> + return false;
> + }
> +
> + /* For VLA we have to insert a runtime check that the vector loads
> + per iterations don't exceed a page size. For now we can use
> + POLY_VALUE_MAX as a proxy as we can't peel for VLA. */
> + if (known_gt (required_alignment, (unsigned)param_min_pagesize))
> + {
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "alignment required for correctness (");
> + dump_dec (MSG_MISSED_OPTIMIZATION, required_alignment);
> + dump_printf (MSG_NOTE, ") may exceed page size\n");
> + }
> + return false;
> + }
> +
> + group_aligned = true;
> + }
> +
> + /* There are multiple loads that have a misalignment that we couldn't
> + align. We would need LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P to
> + vectorize. */
> + if (!group_aligned)
> + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
I think we need to fail here unless scalar-access-in-bounds.
> +
> + /* When using a group access the first element may be aligned but the
> + subsequent loads may not be. For LOAD_LANES since the loads are based
> + on the first DR then all loads in the group are aligned. For
> + non-LOAD_LANES this is not the case. In particular a load + blend when
> + there are gaps can have the non first loads issued unaligned, even
> + partially overlapping the memory of the first load in order to simplify
> + the blend. This is what the x86_64 backend does for instance. As
> + such only the first load in the group is aligned, the rest are not.
> + Because of this the permutes may break the alignment requirements that
> + have been set, and as such we should for now, reject them. */
> + if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "loads with load permutations not supported for "
> + "speculative early break loads without partial "
> + "vectors for %G",
> + STMT_VINFO_STMT (stmt_info));
> + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
again, I think this doesn't save us. Specifically ...
> + }
> +
> + *alignment_support_scheme = dr_aligned;
... we must not simply claim the access is aligned when it wasn't
analyzed as such. If we committed to try peeling for a high
target alignment we can't simply walk back here either.
Richard.
> + }
> +
> if (*alignment_support_scheme == dr_unaligned_unsupported)
> {
> if (dump_enabled_p ())
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..97caf61b345735d297ec49fd6ca64797435b46fc 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1281,7 +1281,11 @@ public:
>
> /* Set by early break vectorization when this DR needs peeling for alignment
> for correctness. */
> - bool need_peeling_for_alignment;
> + bool safe_speculative_read_required;
> +
> + /* Set by early break vectorization when this DR's scalar accesses are known
> + to be inbounds of a known bounds loop. */
> + bool scalar_access_known_in_bounds;
>
> tree base_decl;
>
> @@ -1997,6 +2001,35 @@ dr_target_alignment (dr_vec_info *dr_info)
> return dr_info->target_alignment;
> }
> #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> +#define DR_SCALAR_KNOWN_BOUNDS(DR) (DR)->scalar_access_known_in_bounds
> +
> +/* Return if the stmt_vec_info requires peeling for alignment. */
> +inline bool
> +dr_safe_speculative_read_required (stmt_vec_info stmt_info)
> +{
> + dr_vec_info *dr_info;
> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> + else
> + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +
> + return dr_info->safe_speculative_read_required;
> +}
> +
> +/* Set the safe_speculative_read_required for the the stmt_vec_info, if group
> + access then set on the fist element otherwise set on DR directly. */
> +inline void
> +dr_set_safe_speculative_read_required (stmt_vec_info stmt_info,
> + bool requires_alignment)
> +{
> + dr_vec_info *dr_info;
> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> + else
> + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +
> + dr_info->safe_speculative_read_required = requires_alignment;
> +}
>
> inline void
> set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
>
>
>
> > /* For now assume all conditional loads/stores support unaligned
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index
> 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..b661deeeed400e5826fc1c4f70
> 957b335d1741fa 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -2597,6 +2597,128 @@ get_load_store_type (vec_info *vinfo,
> stmt_vec_info stmt_info,
> > return false;
> > }
> >
> > + /* If this DR needs alignment for correctness, we must ensure the target
> > + alignment is a constant power-of-two multiple of the amount read per
> > + vector iteration or force masking. */
> > + if (dr_safe_speculative_read_required (stmt_info))
> > + {
> > + /* We can only peel for loops, of course. */
> > + gcc_checking_assert (loop_vinfo);
> > +
> > + /* Check if we support the operation if early breaks are needed. Here we
> > + must ensure that we don't access any more than the scalar code would
> > + have. A masked operation would ensure this, so for these load types
> > + force masking. */
> > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > + && (*memory_access_type == VMAT_GATHER_SCATTER
> > + || *memory_access_type == VMAT_STRIDED_SLP))
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "early break not supported: cannot peel for "
> > + "alignment. With non-contiguous memory
> vectorization"
> > + " could read out of bounds at %G ",
> > + STMT_VINFO_STMT (stmt_info));
> > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> > + }
> > +
> > + auto target_alignment
> > + = DR_TARGET_ALIGNMENT (STMT_VINFO_DR_INFO (stmt_info));
> > + unsigned HOST_WIDE_INT target_align;
> > + bool inbounds
> > + = DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_info));
> > +
> > + /* If the scalar loop is known to be in bounds, and we're using scalar
> > + accesses then there's no need to check further. */
> > + if (inbounds
> > + && *memory_access_type == VMAT_ELEMENTWISE)
> > + {
> > + *alignment_support_scheme = dr_aligned;
>
> Nothing should look at *alignment_support_scheme for VMAT_ELEMENTWISE.
> Did you actually need this adjustment?
>
Yes, bitfields are relaxed a few lines up from contiguous to this:
if (SLP_TREE_LANES (slp_node) == 1)
{
*memory_access_type = VMAT_ELEMENTWISE;
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"single-element interleaving not supported "
"for not adjacent vector loads, using "
"elementwise access\n");
}
This means we then reach:
if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
we bail out because the permutes still exist. The code relaxed the load to elements but
never removed the permutes or any associated information.
If the permutes are removed or some other workaround, you then hit
if (!group_aligned && inbounds)
LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
Because these aren't group loads. Because the original load didn't have any misalignment they
never needed peeling and as such are dr_unaligned_supported.
So the only way to avoid checking elementwise if by guarding the top level with
if (dr_safe_speculative_read_required (stmt_info)
&& *alignment_support_scheme == dr_aligned)
{
Instead of just
if (dr_safe_speculative_read_required (stmt_info))
{
Which I wasn't sure if it was the right thing to do... Anyway if I do that I can remove...
> > + return true;
> > + }
> > +
> > + bool group_aligned = false;
> > + if (*alignment_support_scheme == dr_aligned
> > + && target_alignment.is_constant (&target_align)
> > + && nunits.is_constant ())
> > + {
> > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > + auto vectype_size
> > + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> > + poly_uint64 required_alignment = vf * vectype_size;
> > + /* If we have a grouped access we require that the alignment be N * elem.
> */
> > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > + required_alignment *=
> > + DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
> > + if (!multiple_p (target_alignment, required_alignment))
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + "desired alignment %wu not met. Instead got %wu "
> > + "for DR alignment at %G",
> > + required_alignment.to_constant (),
> > + target_align, STMT_VINFO_STMT (stmt_info));
> > + return false;
> > + }
> > +
> > + if (!pow2p_hwi (target_align))
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + "non-power-of-two vector alignment %wd "
> > + "for DR alignment at %G",
> > + target_align, STMT_VINFO_STMT (stmt_info));
> > + return false;
> > + }
> > +
> > + /* For VLA we have to insert a runtime check that the vector loads
> > + per iterations don't exceed a page size. For now we can use
> > + POLY_VALUE_MAX as a proxy as we can't peel for VLA. */
> > + if (known_gt (required_alignment, (unsigned)param_min_pagesize))
> > + {
> > + if (dump_enabled_p ())
> > + {
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + "alignment required for correctness (");
> > + dump_dec (MSG_MISSED_OPTIMIZATION, required_alignment);
> > + dump_printf (MSG_NOTE, ") may exceed page size\n");
> > + }
> > + return false;
> > + }
> > +
> > + group_aligned = true;
> > + }
> > +
> > + /* There are multiple loads that have a misalignment that we couldn't
> > + align. We would need LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P to
> > + vectorize. */
> > + if (!group_aligned)
> > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
>
> I think we need to fail here unless scalar-access-in-bounds.
>
> > +
> > + /* When using a group access the first element may be aligned but the
> > + subsequent loads may not be. For LOAD_LANES since the loads are based
> > + on the first DR then all loads in the group are aligned. For
> > + non-LOAD_LANES this is not the case. In particular a load + blend when
> > + there are gaps can have the non first loads issued unaligned, even
> > + partially overlapping the memory of the first load in order to simplify
> > + the blend. This is what the x86_64 backend does for instance. As
> > + such only the first load in the group is aligned, the rest are not.
> > + Because of this the permutes may break the alignment requirements that
> > + have been set, and as such we should for now, reject them. */
> > + if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + "loads with load permutations not supported for "
> > + "speculative early break loads without partial "
> > + "vectors for %G",
> > + STMT_VINFO_STMT (stmt_info));
> > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
>
> again, I think this doesn't save us. Specifically ...
>
> > + }
> > +
> > + *alignment_support_scheme = dr_aligned;
>
> ... we must not simply claim the access is aligned when it wasn't
> analyzed as such. If we committed to try peeling for a high
> target alignment we can't simply walk back here either.
>
...This.
That also solves your other comment about once we commit to peeling we can't back out.
Are those changes ok?
Thanks,
Tamar
> Richard.
>
> > + }
> > +
> > if (*alignment_support_scheme == dr_unaligned_unsupported)
> > {
> > if (dump_enabled_p ())
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index
> b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..97caf61b345735d297ec49fd6ca
> 64797435b46fc 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -1281,7 +1281,11 @@ public:
> >
> > /* Set by early break vectorization when this DR needs peeling for alignment
> > for correctness. */
> > - bool need_peeling_for_alignment;
> > + bool safe_speculative_read_required;
> > +
> > + /* Set by early break vectorization when this DR's scalar accesses are known
> > + to be inbounds of a known bounds loop. */
> > + bool scalar_access_known_in_bounds;
> >
> > tree base_decl;
> >
> > @@ -1997,6 +2001,35 @@ dr_target_alignment (dr_vec_info *dr_info)
> > return dr_info->target_alignment;
> > }
> > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> > +#define DR_SCALAR_KNOWN_BOUNDS(DR) (DR)-
> >scalar_access_known_in_bounds
> > +
> > +/* Return if the stmt_vec_info requires peeling for alignment. */
> > +inline bool
> > +dr_safe_speculative_read_required (stmt_vec_info stmt_info)
> > +{
> > + dr_vec_info *dr_info;
> > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> > + else
> > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > +
> > + return dr_info->safe_speculative_read_required;
> > +}
> > +
> > +/* Set the safe_speculative_read_required for the the stmt_vec_info, if group
> > + access then set on the fist element otherwise set on DR directly. */
> > +inline void
> > +dr_set_safe_speculative_read_required (stmt_vec_info stmt_info,
> > + bool requires_alignment)
> > +{
> > + dr_vec_info *dr_info;
> > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> > + else
> > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > +
> > + dr_info->safe_speculative_read_required = requires_alignment;
> > +}
> >
> > inline void
> > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> >
> >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Mon, 3 Mar 2025, Tamar Christina wrote:
> > > /* For now assume all conditional loads/stores support unaligned
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index
> > 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..b661deeeed400e5826fc1c4f70
> > 957b335d1741fa 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -2597,6 +2597,128 @@ get_load_store_type (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> > > return false;
> > > }
> > >
> > > + /* If this DR needs alignment for correctness, we must ensure the target
> > > + alignment is a constant power-of-two multiple of the amount read per
> > > + vector iteration or force masking. */
> > > + if (dr_safe_speculative_read_required (stmt_info))
> > > + {
> > > + /* We can only peel for loops, of course. */
> > > + gcc_checking_assert (loop_vinfo);
> > > +
> > > + /* Check if we support the operation if early breaks are needed. Here we
> > > + must ensure that we don't access any more than the scalar code would
> > > + have. A masked operation would ensure this, so for these load types
> > > + force masking. */
> > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > + && (*memory_access_type == VMAT_GATHER_SCATTER
> > > + || *memory_access_type == VMAT_STRIDED_SLP))
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > + "early break not supported: cannot peel for "
> > > + "alignment. With non-contiguous memory
> > vectorization"
> > > + " could read out of bounds at %G ",
> > > + STMT_VINFO_STMT (stmt_info));
> > > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> > > + }
> > > +
> > > + auto target_alignment
> > > + = DR_TARGET_ALIGNMENT (STMT_VINFO_DR_INFO (stmt_info));
> > > + unsigned HOST_WIDE_INT target_align;
> > > + bool inbounds
> > > + = DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_info));
> > > +
> > > + /* If the scalar loop is known to be in bounds, and we're using scalar
> > > + accesses then there's no need to check further. */
> > > + if (inbounds
> > > + && *memory_access_type == VMAT_ELEMENTWISE)
> > > + {
> > > + *alignment_support_scheme = dr_aligned;
> >
> > Nothing should look at *alignment_support_scheme for VMAT_ELEMENTWISE.
> > Did you actually need this adjustment?
> >
>
> Yes, bitfields are relaxed a few lines up from contiguous to this:
>
> if (SLP_TREE_LANES (slp_node) == 1)
> {
> *memory_access_type = VMAT_ELEMENTWISE;
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> "single-element interleaving not supported "
> "for not adjacent vector loads, using "
> "elementwise access\n");
> }
>
> This means we then reach:
> if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
>
> we bail out because the permutes still exist. The code relaxed the load to elements but
> never removed the permutes or any associated information.
>
> If the permutes are removed or some other workaround, you then hit
>
> if (!group_aligned && inbounds)
> LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
>
> Because these aren't group loads. Because the original load didn't have any misalignment they
> never needed peeling and as such are dr_unaligned_supported.
>
> So the only way to avoid checking elementwise if by guarding the top level with
>
> if (dr_safe_speculative_read_required (stmt_info)
> && *alignment_support_scheme == dr_aligned)
> {
>
> Instead of just
>
> if (dr_safe_speculative_read_required (stmt_info))
> {
>
> Which I wasn't sure if it was the right thing to do... Anyway if I do that I can remove...
>
> > > + return true;
> > > + }
> > > +
> > > + bool group_aligned = false;
> > > + if (*alignment_support_scheme == dr_aligned
> > > + && target_alignment.is_constant (&target_align)
> > > + && nunits.is_constant ())
> > > + {
> > > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > + auto vectype_size
> > > + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> > > + poly_uint64 required_alignment = vf * vectype_size;
> > > + /* If we have a grouped access we require that the alignment be N * elem.
> > */
> > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > + required_alignment *=
> > > + DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
> > > + if (!multiple_p (target_alignment, required_alignment))
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > + "desired alignment %wu not met. Instead got %wu "
> > > + "for DR alignment at %G",
> > > + required_alignment.to_constant (),
> > > + target_align, STMT_VINFO_STMT (stmt_info));
> > > + return false;
> > > + }
> > > +
> > > + if (!pow2p_hwi (target_align))
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > + "non-power-of-two vector alignment %wd "
> > > + "for DR alignment at %G",
> > > + target_align, STMT_VINFO_STMT (stmt_info));
> > > + return false;
> > > + }
> > > +
> > > + /* For VLA we have to insert a runtime check that the vector loads
> > > + per iterations don't exceed a page size. For now we can use
> > > + POLY_VALUE_MAX as a proxy as we can't peel for VLA. */
> > > + if (known_gt (required_alignment, (unsigned)param_min_pagesize))
> > > + {
> > > + if (dump_enabled_p ())
> > > + {
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > + "alignment required for correctness (");
> > > + dump_dec (MSG_MISSED_OPTIMIZATION, required_alignment);
> > > + dump_printf (MSG_NOTE, ") may exceed page size\n");
> > > + }
> > > + return false;
> > > + }
> > > +
> > > + group_aligned = true;
> > > + }
> > > +
> > > + /* There are multiple loads that have a misalignment that we couldn't
> > > + align. We would need LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P to
> > > + vectorize. */
> > > + if (!group_aligned)
> > > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> >
> > I think we need to fail here unless scalar-access-in-bounds.
> >
> > > +
> > > + /* When using a group access the first element may be aligned but the
> > > + subsequent loads may not be. For LOAD_LANES since the loads are based
> > > + on the first DR then all loads in the group are aligned. For
> > > + non-LOAD_LANES this is not the case. In particular a load + blend when
> > > + there are gaps can have the non first loads issued unaligned, even
> > > + partially overlapping the memory of the first load in order to simplify
> > > + the blend. This is what the x86_64 backend does for instance. As
> > > + such only the first load in the group is aligned, the rest are not.
> > > + Because of this the permutes may break the alignment requirements that
> > > + have been set, and as such we should for now, reject them. */
> > > + if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
> > > + {
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > + "loads with load permutations not supported for "
> > > + "speculative early break loads without partial "
> > > + "vectors for %G",
> > > + STMT_VINFO_STMT (stmt_info));
> > > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> >
> > again, I think this doesn't save us. Specifically ...
> >
> > > + }
> > > +
> > > + *alignment_support_scheme = dr_aligned;
> >
> > ... we must not simply claim the access is aligned when it wasn't
> > analyzed as such. If we committed to try peeling for a high
> > target alignment we can't simply walk back here either.
> >
>
> ...This.
>
> That also solves your other comment about once we commit to peeling we can't back out.
>
> Are those changes ok?
Yes, that sounds good to me.
Richard.
> Thanks,
> Tamar
>
> > Richard.
> >
> > > + }
> > > +
> > > if (*alignment_support_scheme == dr_unaligned_unsupported)
> > > {
> > > if (dump_enabled_p ())
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > index
> > b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..97caf61b345735d297ec49fd6ca
> > 64797435b46fc 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -1281,7 +1281,11 @@ public:
> > >
> > > /* Set by early break vectorization when this DR needs peeling for alignment
> > > for correctness. */
> > > - bool need_peeling_for_alignment;
> > > + bool safe_speculative_read_required;
> > > +
> > > + /* Set by early break vectorization when this DR's scalar accesses are known
> > > + to be inbounds of a known bounds loop. */
> > > + bool scalar_access_known_in_bounds;
> > >
> > > tree base_decl;
> > >
> > > @@ -1997,6 +2001,35 @@ dr_target_alignment (dr_vec_info *dr_info)
> > > return dr_info->target_alignment;
> > > }
> > > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> > > +#define DR_SCALAR_KNOWN_BOUNDS(DR) (DR)-
> > >scalar_access_known_in_bounds
> > > +
> > > +/* Return if the stmt_vec_info requires peeling for alignment. */
> > > +inline bool
> > > +dr_safe_speculative_read_required (stmt_vec_info stmt_info)
> > > +{
> > > + dr_vec_info *dr_info;
> > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> > > + else
> > > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > +
> > > + return dr_info->safe_speculative_read_required;
> > > +}
> > > +
> > > +/* Set the safe_speculative_read_required for the the stmt_vec_info, if group
> > > + access then set on the fist element otherwise set on DR directly. */
> > > +inline void
> > > +dr_set_safe_speculative_read_required (stmt_vec_info stmt_info,
> > > + bool requires_alignment)
> > > +{
> > > + dr_vec_info *dr_info;
> > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> > > + else
> > > + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > +
> > > + dr_info->safe_speculative_read_required = requires_alignment;
> > > +}
> > >
> > > inline void
> > > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> > >
> > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>
On Tue, 4 Mar 2025, Tamar Christina wrote:
> Hi All,
>
> This fixes two PRs on Early break vectorization by delaying the safety checks to
> vectorizable_load when the VF, VMAT and vectype are all known.
>
> This patch does add two new restrictions:
>
> 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
> group sizes, as they are unaligned every n % 2 iterations and so may cross
> a page unwittingly.
>
> 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if
> we cannot peel for alignment, as the alignment requirement is quite large at
> GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
> don't support it for now.
>
> There are other steps documented inside the code itself so that the reasoning
> is next to the code.
>
> As a fall-back, when the alignment fails we require partial vector support.
>
> For VLA targets like SVE return element alignment as the desired vector
> alignment. This means that the loads are never misaligned and so annoying it
> won't ever need to peel.
>
> So what I think needs to happen in GCC 16 is that.
>
> 1. during vect_compute_data_ref_alignment we need to take the max of
> POLY_VALUE_MIN and vector_alignment.
>
> 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a
> check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use as a
> proxy for pagesize.
>
> 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
> vect_determine_partial_vectors_and_peeling since the first iteration has to
> be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
> vectorize.
>
> 4. Create a default mask to be used, so that vect_use_loop_mask_for_alignment_p
> becomes true and we generate the peeled check through loop control for
> partial loops. From what I can tell this won't work for
> LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support at
> all in the compiler. That would need to be done independently from the
> above.
>
> In any case, not GCC 15 material so I've kept the WIP patches I have downstream.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
>
> Ok for master?
OK.
Thanks,
Richard.
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR tree-optimization/118464
> PR tree-optimization/116855
> * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
> * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
> checks.
> (vect_compute_data_ref_alignment): Remove alignment checks and move to
> get_load_store_type, increase group access alignment.
> (vect_enhance_data_refs_alignment): Add note to comment needing
> investigating.
> (vect_analyze_data_refs_alignment): Likewise.
> (vect_supportable_dr_alignment): For group loads look at first DR.
> * tree-vect-stmts.cc (get_load_store_type):
> Perform safety checks for early break pfa.
> * tree-vectorizer.h (dr_set_safe_speculative_read_required,
> dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New.
> (need_peeling_for_alignment): Renamed to...
> (safe_speculative_read_required): .. This
> (class dr_vec_info): Add scalar_access_known_in_bounds.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/118464
> PR tree-optimization/116855
> * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
> load type is relaxed later.
> * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> * gcc.dg/vect/vect-early-break_22.c: Require partial vectors.
> * gcc.dg/vect/vect-early-break_128.c: Likewise.
> * gcc.dg/vect/vect-early-break_26.c: Likewise.
> * gcc.dg/vect/vect-early-break_43.c: Likewise.
> * gcc.dg/vect/vect-early-break_44.c: Likewise.
> * gcc.dg/vect/vect-early-break_2.c: Require load_lanes.
> * gcc.dg/vect/vect-early-break_7.c: Likewise.
> * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa11.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
> * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
> * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment.
> * gcc.dg/vect/vect-early-break_53.c: Likewise.
> * gcc.dg/vect/vect-early-break_56.c: Likewise.
> * gcc.dg/vect/vect-early-break_57.c: Likewise.
> * gcc.dg/vect/vect-early-break_81.c: Likewise.
>
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 6f8bf3923863dee9ed35b0497f1ef58a65726701..a4c62e50785362c93de31ac44f4fb5cbf4d1e1ee 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17260,7 +17260,7 @@ Maximum number of relations the oracle will register in a basic block.
> Work bound when discovering transitive relations from existing relations.
>
> @item min-pagesize
> -Minimum page size for warning purposes.
> +Minimum page size for warning and early break vectorization purposes.
>
> @item openacc-kernels
> Specify mode of OpenACC `kernels' constructs handling.
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> index 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d93c950629f3231554 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> @@ -55,7 +55,9 @@ int main()
> }
> }
> rephase ();
> +#pragma GCC novector
> for (i = 0; i < 32; ++i)
> +#pragma GCC novector
> for (j = 0; j < 3; ++j)
> #pragma GCC novector
> for (k = 0; k < 3; ++k)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> index 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..f99c57be0adc4d49035b8a75c72d4a5b04cc05c7 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> @@ -5,7 +5,8 @@
> /* { dg-additional-options "-O3" } */
> /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* Arm and -m32 create a group size of 3 here, which we can't support yet. AArch64 makes elementwise accesses here. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { aarch64*-*-* } } } } */
>
> typedef struct filter_list_entry {
> const char *name;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
> index 6d7fb920ec2de529a4aa1de2c4a04286989204fd..ed6baf2d451f3887076a1e9143035363128efe70 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
> @@ -3,7 +3,8 @@
> /* { dg-require-effective-target vect_early_break } */
> /* { dg-require-effective-target vect_int } */
>
> -/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-not "vectorizing stmts using SLP" "vect" { target { ! vect_partial_vectors } } } } */
> /* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
>
> #ifndef N
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8ac83ab569fc9fbde126
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +int a, b, c, d, e, f;
> +short g[1];
> +int main() {
> + int h;
> + while (a) {
> + while (h)
> + ;
> + for (b = 2; b; b--) {
> + while (c)
> + ;
> + f = g[a];
> + if (d)
> + break;
> + }
> + while (e)
> + ;
> + }
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..dc771186efafe25bb65490da7a383ad7f6ceb0a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020];
> +
> +char * find(int n, char c)
> +{
> + for (int i = 1; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..dd05046982524f15662be8df517716b581b8a2d9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> @@ -0,0 +1,25 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* Alignment requirement too big, load lanes targets can't safely vectorize this. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { vect_partial_vectors || vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! { vect_partial_vectors || vect_load_lanes } } } } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" { target { ! { vect_partial_vectors || vect_load_lanes } } } } } */
> +
> +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < (n - 2); i+=2)
> + {
> + if (vect_a[i] > x || vect_a[i+2] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa11.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa11.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..085dd9b81bb6943440f34d044cbd24ee2121657c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa11.c
> @@ -0,0 +1,26 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* Gathers and scatters are not save to speculate across early breaks. */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_partial_vectors } } } } */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_partial_vectors } } } */
> +
> +#define N 1024
> +int vect_a[N];
> +int vect_b[N];
> +
> +int test4(int x, int stride)
> +{
> + int ret = 0;
> + for (int i = 0; i < (N / stride); i++)
> + {
> + vect_b[i] += x + i;
> + if (vect_a[i*stride] == x)
> + return i;
> + vect_a[i] += x * vect_b[i];
> +
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758ca29a5f3f9d3f6e0d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020];
> +
> +char * find(int n, char c)
> +{
> + for (int i = 0; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..374a051b945e97eedb9be9da423cf54b5e564d6f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020] __attribute__((aligned(1)));
> +
> +char * find(int n, char c)
> +{
> + for (int i = 1; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f257ceea1c065fcc6ae9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020] __attribute__((aligned(1)));
> +
> +char * find(int n, char c)
> +{
> + for (int i = 0; i < n; i++) {
> + if (string[i] == c)
> + return &string[i];
> + }
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ca95be44e92e32769da1d1e9b740ae54682a3d55
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char *vect, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < n; i++)
> + {
> + if (vect[i] > x)
> + return 1;
> +
> + vect[i] = x;
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c64a61c97b1b6268743
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char *vect_a, char *vect_b, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 1; i < n; i++)
> + {
> + if (vect_a[i] > x || vect_b[i] > x)
> + return 1;
> +
> + vect_a[i] = x;
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..cf76c7109edce15f860cdc27e10850ef5a31fc9a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* This should be vectorizable through load_lanes and linear targets. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_load_lanes } } } */
> +
> +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < n; i+=2)
> + {
> + if (vect_a[i] > x || vect_a[i+1] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..25d3a62356baf127c89187b150810e4d31567c6f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> @@ -0,0 +1,26 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_partial_vectors } } } } */
> +
> +char vect_a[1025];
> +char vect_b[1025];
> +
> +unsigned test4(char x, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 1; i < (n - 2); i+=2)
> + {
> + if (vect_a[i] > x || vect_a[i+1] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..10eb98b726acb32a0d1de4daf202724995bfa1a6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> @@ -0,0 +1,29 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* Group size is uneven and second group is misaligned. Needs partial vectors. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_partial_vectors } } } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
> +
> +
> +char vect_a[1025];
> +char vect_b[1025];
> +
> +unsigned test4(char x, int n)
> +{
> + unsigned ret = 0;
> + for (int i = 1; i < (n - 2); i+=2)
> + {
> + if (vect_a[i-1] > x || vect_a[i+2] > x)
> + return 1;
> +
> + vect_b[i] = x;
> + vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
> index babc79c74c39b5beedd293f2138f0c46846543b0..edddb44bad66aa419d097f69ca850e5eaa66e014 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
> @@ -5,7 +5,8 @@
>
> /* { dg-additional-options "-Ofast" } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_load_lanes } } } */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_load_lanes } } } } */
>
> #ifndef N
> #define N 803
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> index dec0b492ab883de6e02944a95fd554a109a68a39..8f5ccc45ce06ed36627107e080d633e55e254fa0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> @@ -5,7 +5,9 @@
>
> /* { dg-additional-options "-Ofast" } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! "arm*-*-*" } } } } */
> +/* Complex numbers read x and x+1, which on non-load lanes targets require partial loops. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! "arm*-*-*" } && vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { { "arm*-*-*" } || { ! vect_load_lanes } } } } } */
>
> #include <complex.h>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
> index 039aac7fd84cf6131e1ea401b87385a32b545e67..7ac1e76f0aca37aa04a767b6034000f09aaf98b8 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
> @@ -5,7 +5,7 @@
>
> /* { dg-additional-options "-Ofast" } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! ia32 } } } } */
>
> #include <stdbool.h>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
> index f73f3c2eb86e804803a969dab983dc9e39eed66a..483ea5f243c825d6a6c4f5aa7f86c3f9eb8b2e10 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
> @@ -5,7 +5,7 @@
>
> /* { dg-additional-options "-Ofast" } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! ia32 } } } } */
>
> #include <stdbool.h>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> index b3f5984f682f30f79331d48a264c2cc4af3e2503..f8f84fab97ab586847000af8b89448b0885ef5fc 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> @@ -42,4 +42,6 @@ main ()
> return 0;
> }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> +/* This will fail because we cannot SLP the load groups yet. */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { ! vect_partial_vectors } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
> index 47d2a50218bd1b32fe43edcaaabb1079d0b26223..643016b2ccfea29ba36d65c8070f255cb8179481 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
> @@ -41,4 +41,6 @@ main ()
> return 0;
> }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> +/* This will fail because we cannot SLP the load groups yet. */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { ! vect_partial_vectors } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
> index 8062fbbf6422af6a2e42de9574e88d411a8fb917..36fc6a6eb60fae70f8f05a3d9435f5adce025847 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
> @@ -23,4 +23,5 @@ unsigned test4(unsigned x)
> return ret;
> }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target vect_load_lanes } } } */
> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" { target { ! vect_load_lanes } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> index 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46a462238c3b5825ef 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n)
> return ret;
> }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> +/* cannot safely vectorize this due due to the group misalignment. */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
> index 7e9f635a0b5a8f6fb5da5d7cc6a426f343af4b56..0cfa2428cc61d5f4ea0784367acea6436736970f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
> @@ -27,4 +27,6 @@ unsigned test4(unsigned x)
> return ret;
> }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> +/* This will fail because we cannot SLP the load groups yet. */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" { target { ! vect_partial_vectors } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
> index 7e9f635a0b5a8f6fb5da5d7cc6a426f343af4b56..0cfa2428cc61d5f4ea0784367acea6436736970f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
> @@ -27,4 +27,6 @@ unsigned test4(unsigned x)
> return ret;
> }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> +/* This will fail because we cannot SLP the load groups yet. */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" { target { ! vect_partial_vectors } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> index a02d5986ba3cfc117b19305c5e96711299996931..d4fd0d39a25a5659e3d9452b79f3e0fabba8b3c0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> @@ -2,6 +2,7 @@
> /* { dg-do compile } */
> /* { dg-require-effective-target vect_early_break } */
> /* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_partial_vectors } */
>
> void abort ();
> int a[64], b[64];
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> index 9096f66647c7b3cb430562d35f8ce076244f7c11..b35e737fa3b9137cd745c14f7ad915a3f81c38c4 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> @@ -4,6 +4,7 @@
> /* { dg-require-effective-target vect_int } */
> /* { dg-add-options bind_pic_locally } */
> /* { dg-require-effective-target vect_early_break_hw } */
> +/* { dg-require-effective-target vect_partial_vectors } */
>
> #include <stdarg.h>
> #include "tree-vect.h"
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> index 319bd125c3156f13c300ff2b94d269bb9ec29e97..a4886654f152b2c0568286febea2b31cb7be8499 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> @@ -5,8 +5,9 @@
>
> /* { dg-additional-options "-Ofast" } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
> +/* Multiple loads of different alignments, we can't peel this. */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */
>
> void abort ();
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> index 7b870e9c60dcac6164d879dd70c1fc07ec0221fe..c7cce81f52c80d83bd2c1face8cbd13f93834531 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> @@ -5,7 +5,9 @@
>
> /* { dg-additional-options "-Ofast" } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* This will fail because we cannot SLP the load groups yet. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_partial_vectors } } } */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_partial_vectors } } } } */
>
> #define N 1024
> unsigned vect_a[N];
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> index d218a0686719fee4c167684dcf26402851b53260..34d187483320b9cc215304b73e28d45d7031516e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> @@ -5,7 +5,10 @@
>
> /* { dg-additional-options "-Ofast" } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! "arm*-*-*" } } } } */
> +/* Complex numbers read x and x+1, which on non-load lanes targets require partial loops. */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! "arm*-*-*" } && vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { { "arm*-*-*" } || { ! vect_load_lanes } } } } } */
> +
>
> #include <complex.h>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> index 8a8c076ba92ca6fef419cb23b457a23555c61c64..b58a4611d6b8d86f0247d9ea44ab4750473589a9 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> @@ -5,8 +5,9 @@
>
> /* { dg-additional-options "-Ofast" } */
>
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
> +/* Multiple loads with different misalignments. Can't peel need partial loop support. */
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */
> void abort ();
>
> unsigned short sa[32];
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..9949fc3d98852399242a96095f4dae5ffe7613b3 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -731,7 +731,9 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> if (is_gimple_debug (stmt))
> continue;
>
> - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> + stmt_vec_info stmt_vinfo
> + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> + stmt = STMT_VINFO_STMT (stmt_vinfo);
> auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> if (!dr_ref)
> continue;
> @@ -748,26 +750,16 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> bounded by VF so accesses are within range. We only need to check
> the reads since writes are moved to a safe place where if we get
> there we know they are safe to perform. */
> - if (DR_IS_READ (dr_ref)
> - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> + if (DR_IS_READ (dr_ref))
> {
> - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> - || STMT_VINFO_STRIDED_P (stmt_vinfo))
> - {
> - const char *msg
> - = "early break not supported: cannot peel "
> - "for alignment, vectorization would read out of "
> - "bounds at %G";
> - return opt_result::failure_at (stmt, msg, stmt);
> - }
> -
> - dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> - dr_info->need_peeling_for_alignment = true;
> + dr_set_safe_speculative_read_required (stmt_vinfo, true);
> + bool inbounds = ref_within_array_bound (stmt, DR_REF (dr_ref));
> + DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_vinfo)) = inbounds;
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> - "marking DR (read) as needing peeling for "
> - "alignment at %G", stmt);
> + "marking DR (read) as possibly needing peeling "
> + "for alignment at %G", stmt);
> }
>
> if (DR_IS_READ (dr_ref))
> @@ -1326,9 +1318,6 @@ vect_record_base_alignments (vec_info *vinfo)
> Compute the misalignment of the data reference DR_INFO when vectorizing
> with VECTYPE.
>
> - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
> - be set appropriately on failure (but is otherwise left unchanged).
> -
> Output:
> 1. initialized misalignment info for DR_INFO
>
> @@ -1337,7 +1326,7 @@ vect_record_base_alignments (vec_info *vinfo)
>
> static void
> vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> - tree vectype, opt_result *result = nullptr)
> + tree vectype)
> {
> stmt_vec_info stmt_info = dr_info->stmt;
> vec_base_alignments *base_alignments = &vinfo->base_alignments;
> @@ -1365,63 +1354,29 @@ vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> BITS_PER_UNIT);
>
> - /* If this DR needs peeling for alignment for correctness, we must
> - ensure the target alignment is a constant power-of-two multiple of the
> - amount read per vector iteration (overriding the above hook where
> - necessary). */
> - if (dr_info->need_peeling_for_alignment)
> + if (loop_vinfo
> + && dr_safe_speculative_read_required (stmt_info))
> {
> - /* Vector size in bytes. */
> - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> -
> - /* We can only peel for loops, of course. */
> - gcc_checking_assert (loop_vinfo);
> -
> - /* Calculate the number of vectors read per vector iteration. If
> - it is a power of two, multiply through to get the required
> - alignment in bytes. Otherwise, fail analysis since alignment
> - peeling wouldn't work in such a case. */
> - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> + auto vectype_size
> + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> + poly_uint64 new_alignment = vf * vectype_size;
> + /* If we have a grouped access we require that the alignment be N * elem. */
> if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> - num_scalars *= DR_GROUP_SIZE (stmt_info);
> + new_alignment *= DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
>
> - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> - if (!pow2p_hwi (num_vectors))
> - {
> - *result = opt_result::failure_at (vect_location,
> - "non-power-of-two num vectors %u "
> - "for DR needing peeling for "
> - "alignment at %G",
> - num_vectors, stmt_info->stmt);
> - return;
> - }
> -
> - safe_align *= num_vectors;
> - if (maybe_gt (safe_align, 4096U))
> - {
> - pretty_printer pp;
> - pp_wide_integer (&pp, safe_align);
> - *result = opt_result::failure_at (vect_location,
> - "alignment required for correctness"
> - " (%s) may exceed page size",
> - pp_formatted_text (&pp));
> - return;
> - }
> -
> - unsigned HOST_WIDE_INT multiple;
> - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> - || !pow2p_hwi (multiple))
> + unsigned HOST_WIDE_INT target_alignment;
> + if (new_alignment.is_constant (&target_alignment)
> + && pow2p_hwi (target_alignment))
> {
> if (dump_enabled_p ())
> {
> dump_printf_loc (MSG_NOTE, vect_location,
> - "forcing alignment for DR from preferred (");
> - dump_dec (MSG_NOTE, vector_alignment);
> - dump_printf (MSG_NOTE, ") to safe align (");
> - dump_dec (MSG_NOTE, safe_align);
> - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> + "alignment increased due to early break to ");
> + dump_dec (MSG_NOTE, new_alignment);
> + dump_printf (MSG_NOTE, " bytes.\n");
> }
> - vector_alignment = safe_align;
> + vector_alignment = target_alignment;
> }
> }
>
> @@ -2487,6 +2442,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
> || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> loop_preheader_edge (loop))
> || loop->inner
> + /* We don't currently maintaing the LCSSA for prologue peeled inversed
> + loops. */
> || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> do_peeling = false;
>
> @@ -2950,12 +2907,9 @@ vect_analyze_data_refs_alignment (loop_vec_info loop_vinfo)
> if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
> continue;
> - opt_result res = opt_result::success ();
> +
> vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> - STMT_VINFO_VECTYPE (dr_info->stmt),
> - &res);
> - if (!res)
> - return res;
> + STMT_VINFO_VECTYPE (dr_info->stmt));
> }
> }
>
> @@ -7219,7 +7173,7 @@ vect_supportable_dr_alignment (vec_info *vinfo, dr_vec_info *dr_info,
>
> if (misalignment == 0)
> return dr_aligned;
> - else if (dr_info->need_peeling_for_alignment)
> + else if (dr_safe_speculative_read_required (stmt_info))
> return dr_unaligned_unsupported;
>
> /* For now assume all conditional loads/stores support unaligned
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..743631f944884a31505a95f7a188fd4e4ca3797d 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2597,6 +2597,128 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
> return false;
> }
>
> +
> + /* Checks if all scalar iterations are known to be inbounds. */
> + bool inbounds = DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_info));
> +
> + /* Check if we support the operation if early breaks are needed. Here we
> + must ensure that we don't access any more than the scalar code would
> + have. A masked operation would ensure this, so for these load types
> + force masking. */
> + if (loop_vinfo
> + && dr_safe_speculative_read_required (stmt_info)
> + && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> + && (*memory_access_type == VMAT_GATHER_SCATTER
> + || *memory_access_type == VMAT_STRIDED_SLP))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "early break not supported: cannot peel for "
> + "alignment. With non-contiguous memory vectorization"
> + " could read out of bounds at %G ",
> + STMT_VINFO_STMT (stmt_info));
> + if (inbounds)
> + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> + else
> + return false;
> + }
> +
> + /* If this DR needs alignment for correctness, we must ensure the target
> + alignment is a constant power-of-two multiple of the amount read per
> + vector iteration or force masking. */
> + if (dr_safe_speculative_read_required (stmt_info)
> + && *alignment_support_scheme == dr_aligned)
> + {
> + /* We can only peel for loops, of course. */
> + gcc_checking_assert (loop_vinfo);
> +
> + auto target_alignment
> + = DR_TARGET_ALIGNMENT (STMT_VINFO_DR_INFO (stmt_info));
> + unsigned HOST_WIDE_INT target_align;
> +
> + bool group_aligned = false;
> + if (target_alignment.is_constant (&target_align)
> + && nunits.is_constant ())
> + {
> + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> + auto vectype_size
> + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> + poly_uint64 required_alignment = vf * vectype_size;
> + /* If we have a grouped access we require that the alignment be N * elem. */
> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> + required_alignment *=
> + DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
> + if (!multiple_p (target_alignment, required_alignment))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "desired alignment %wu not met. Instead got %wu "
> + "for DR alignment at %G",
> + required_alignment.to_constant (),
> + target_align, STMT_VINFO_STMT (stmt_info));
> + return false;
> + }
> +
> + if (!pow2p_hwi (target_align))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "non-power-of-two vector alignment %wd "
> + "for DR alignment at %G",
> + target_align, STMT_VINFO_STMT (stmt_info));
> + return false;
> + }
> +
> + /* For VLA we have to insert a runtime check that the vector loads
> + per iterations don't exceed a page size. For now we can use
> + POLY_VALUE_MAX as a proxy as we can't peel for VLA. */
> + if (known_gt (required_alignment, (unsigned)param_min_pagesize))
> + {
> + if (dump_enabled_p ())
> + {
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "alignment required for correctness (");
> + dump_dec (MSG_MISSED_OPTIMIZATION, required_alignment);
> + dump_printf (MSG_NOTE, ") may exceed page size\n");
> + }
> + return false;
> + }
> +
> + group_aligned = true;
> + }
> +
> + /* There are multiple loads that have a misalignment that we couldn't
> + align. We would need LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P to
> + vectorize. */
> + if (!group_aligned)
> + {
> + if (inbounds)
> + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> + else
> + return false;
> + }
> +
> + /* When using a group access the first element may be aligned but the
> + subsequent loads may not be. For LOAD_LANES since the loads are based
> + on the first DR then all loads in the group are aligned. For
> + non-LOAD_LANES this is not the case. In particular a load + blend when
> + there are gaps can have the non first loads issued unaligned, even
> + partially overlapping the memory of the first load in order to simplify
> + the blend. This is what the x86_64 backend does for instance. As
> + such only the first load in the group is aligned, the rest are not.
> + Because of this the permutes may break the alignment requirements that
> + have been set, and as such we should for now, reject them. */
> + if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "loads with load permutations not supported for "
> + "speculative early break loads for %G",
> + STMT_VINFO_STMT (stmt_info));
> + return false;
> + }
> + }
> +
> if (*alignment_support_scheme == dr_unaligned_unsupported)
> {
> if (dump_enabled_p ())
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..97caf61b345735d297ec49fd6ca64797435b46fc 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1281,7 +1281,11 @@ public:
>
> /* Set by early break vectorization when this DR needs peeling for alignment
> for correctness. */
> - bool need_peeling_for_alignment;
> + bool safe_speculative_read_required;
> +
> + /* Set by early break vectorization when this DR's scalar accesses are known
> + to be inbounds of a known bounds loop. */
> + bool scalar_access_known_in_bounds;
>
> tree base_decl;
>
> @@ -1997,6 +2001,35 @@ dr_target_alignment (dr_vec_info *dr_info)
> return dr_info->target_alignment;
> }
> #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> +#define DR_SCALAR_KNOWN_BOUNDS(DR) (DR)->scalar_access_known_in_bounds
> +
> +/* Return if the stmt_vec_info requires peeling for alignment. */
> +inline bool
> +dr_safe_speculative_read_required (stmt_vec_info stmt_info)
> +{
> + dr_vec_info *dr_info;
> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> + else
> + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +
> + return dr_info->safe_speculative_read_required;
> +}
> +
> +/* Set the safe_speculative_read_required for the the stmt_vec_info, if group
> + access then set on the fist element otherwise set on DR directly. */
> +inline void
> +dr_set_safe_speculative_read_required (stmt_vec_info stmt_info,
> + bool requires_alignment)
> +{
> + dr_vec_info *dr_info;
> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> + else
> + dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +
> + dr_info->safe_speculative_read_required = requires_alignment;
> +}
>
> inline void
> set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
>
>
>
Richard Biener <rguenther@suse.de> writes:
> On Tue, 4 Mar 2025, Tamar Christina wrote:
>
>> Hi All,
>>
>> This fixes two PRs on Early break vectorization by delaying the safety checks to
>> vectorizable_load when the VF, VMAT and vectype are all known.
>>
>> This patch does add two new restrictions:
>>
>> 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
>> group sizes, as they are unaligned every n % 2 iterations and so may cross
>> a page unwittingly.
Sorry for the drive-by comment, but: it might be worth updating the
commit message to say non-power-of-2, rather than uneven. The patch
uses the right check, but the message made it sound like it didn't.
Thanks,
Richard
>> 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if
>> we cannot peel for alignment, as the alignment requirement is quite large at
>> GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
>> don't support it for now.
>>
>> There are other steps documented inside the code itself so that the reasoning
>> is next to the code.
>>
>> As a fall-back, when the alignment fails we require partial vector support.
>>
>> For VLA targets like SVE return element alignment as the desired vector
>> alignment. This means that the loads are never misaligned and so annoying it
>> won't ever need to peel.
>>
>> So what I think needs to happen in GCC 16 is that.
>>
>> 1. during vect_compute_data_ref_alignment we need to take the max of
>> POLY_VALUE_MIN and vector_alignment.
>>
>> 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a
>> check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use as a
>> proxy for pagesize.
>>
>> 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
>> vect_determine_partial_vectors_and_peeling since the first iteration has to
>> be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
>> vectorize.
>>
>> 4. Create a default mask to be used, so that vect_use_loop_mask_for_alignment_p
>> becomes true and we generate the peeled check through loop control for
>> partial loops. From what I can tell this won't work for
>> LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support at
>> all in the compiler. That would need to be done independently from the
>> above.
>>
>> In any case, not GCC 15 material so I've kept the WIP patches I have downstream.
>>
>> Bootstrapped Regtested on aarch64-none-linux-gnu,
>> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
>> -m32, -m64 and no issues.
>>
>> Ok for master?
>
> OK.
>
> Thanks,
> Richard.
>
>> Thanks,
>> Tamar
>>
>> gcc/ChangeLog:
>>
>> PR tree-optimization/118464
>> PR tree-optimization/116855
>> * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
>> * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
>> checks.
>> (vect_compute_data_ref_alignment): Remove alignment checks and move to
>> get_load_store_type, increase group access alignment.
>> (vect_enhance_data_refs_alignment): Add note to comment needing
>> investigating.
>> (vect_analyze_data_refs_alignment): Likewise.
>> (vect_supportable_dr_alignment): For group loads look at first DR.
>> * tree-vect-stmts.cc (get_load_store_type):
>> Perform safety checks for early break pfa.
>> * tree-vectorizer.h (dr_set_safe_speculative_read_required,
>> dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New.
>> (need_peeling_for_alignment): Renamed to...
>> (safe_speculative_read_required): .. This
>> (class dr_vec_info): Add scalar_access_known_in_bounds.
>>
>> gcc/testsuite/ChangeLog:
>>
>> PR tree-optimization/118464
>> PR tree-optimization/116855
>> * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
>> load type is relaxed later.
>> * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
>> * gcc.dg/vect/vect-early-break_22.c: Require partial vectors.
>> * gcc.dg/vect/vect-early-break_128.c: Likewise.
>> * gcc.dg/vect/vect-early-break_26.c: Likewise.
>> * gcc.dg/vect/vect-early-break_43.c: Likewise.
>> * gcc.dg/vect/vect-early-break_44.c: Likewise.
>> * gcc.dg/vect/vect-early-break_2.c: Require load_lanes.
>> * gcc.dg/vect/vect-early-break_7.c: Likewise.
>> * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
>> * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
>> * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
>> * gcc.dg/vect/vect-early-break_133_pfa11.c: New test.
>> * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
>> * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
>> * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
>> * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
>> * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
>> * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
>> * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
>> * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
>> * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
>> * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment.
>> * gcc.dg/vect/vect-early-break_53.c: Likewise.
>> * gcc.dg/vect/vect-early-break_56.c: Likewise.
>> * gcc.dg/vect/vect-early-break_57.c: Likewise.
>> * gcc.dg/vect/vect-early-break_81.c: Likewise.
>>
>> ---
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 6f8bf3923863dee9ed35b0497f1ef58a65726701..a4c62e50785362c93de31ac44f4fb5cbf4d1e1ee 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -17260,7 +17260,7 @@ Maximum number of relations the oracle will register in a basic block.
>> Work bound when discovering transitive relations from existing relations.
>>
>> @item min-pagesize
>> -Minimum page size for warning purposes.
>> +Minimum page size for warning and early break vectorization purposes.
>>
>> @item openacc-kernels
>> Specify mode of OpenACC `kernels' constructs handling.
>> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
>> index 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d93c950629f3231554 100644
>> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
>> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
>> @@ -55,7 +55,9 @@ int main()
>> }
>> }
>> rephase ();
>> +#pragma GCC novector
>> for (i = 0; i < 32; ++i)
>> +#pragma GCC novector
>> for (j = 0; j < 3; ++j)
>> #pragma GCC novector
>> for (k = 0; k < 3; ++k)
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
>> index 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..f99c57be0adc4d49035b8a75c72d4a5b04cc05c7 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
>> @@ -5,7 +5,8 @@
>> /* { dg-additional-options "-O3" } */
>> /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
>>
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* Arm and -m32 create a group size of 3 here, which we can't support yet. AArch64 makes elementwise accesses here. */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { aarch64*-*-* } } } } */
>>
>> typedef struct filter_list_entry {
>> const char *name;
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
>> index 6d7fb920ec2de529a4aa1de2c4a04286989204fd..ed6baf2d451f3887076a1e9143035363128efe70 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
>> @@ -3,7 +3,8 @@
>> /* { dg-require-effective-target vect_early_break } */
>> /* { dg-require-effective-target vect_int } */
>>
>> -/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
>> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-not "vectorizing stmts using SLP" "vect" { target { ! vect_partial_vectors } } } } */
>> /* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
>>
>> #ifndef N
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8ac83ab569fc9fbde126
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
>> @@ -0,0 +1,25 @@
>> +/* { dg-do compile } */
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "-O3" } */
>> +
>> +int a, b, c, d, e, f;
>> +short g[1];
>> +int main() {
>> + int h;
>> + while (a) {
>> + while (h)
>> + ;
>> + for (b = 2; b; b--) {
>> + while (c)
>> + ;
>> + f = g[a];
>> + if (d)
>> + break;
>> + }
>> + while (e)
>> + ;
>> + }
>> + return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..dc771186efafe25bb65490da7a383ad7f6ceb0a7
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "-O3" } */
>> +
>> +char string[1020];
>> +
>> +char * find(int n, char c)
>> +{
>> + for (int i = 1; i < n; i++) {
>> + if (string[i] == c)
>> + return &string[i];
>> + }
>> + return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..dd05046982524f15662be8df517716b581b8a2d9
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
>> @@ -0,0 +1,25 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +/* Alignment requirement too big, load lanes targets can't safely vectorize this. */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { vect_partial_vectors || vect_load_lanes } } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! { vect_partial_vectors || vect_load_lanes } } } } } */
>> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" { target { ! { vect_partial_vectors || vect_load_lanes } } } } } */
>> +
>> +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
>> +{
>> + unsigned ret = 0;
>> + for (int i = 0; i < (n - 2); i+=2)
>> + {
>> + if (vect_a[i] > x || vect_a[i+2] > x)
>> + return 1;
>> +
>> + vect_b[i] = x;
>> + vect_b[i+1] = x+1;
>> + }
>> + return ret;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa11.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa11.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..085dd9b81bb6943440f34d044cbd24ee2121657c
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa11.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* Gathers and scatters are not save to speculate across early breaks. */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_partial_vectors } } } } */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_partial_vectors } } } */
>> +
>> +#define N 1024
>> +int vect_a[N];
>> +int vect_b[N];
>> +
>> +int test4(int x, int stride)
>> +{
>> + int ret = 0;
>> + for (int i = 0; i < (N / stride); i++)
>> + {
>> + vect_b[i] += x + i;
>> + if (vect_a[i*stride] == x)
>> + return i;
>> + vect_a[i] += x * vect_b[i];
>> +
>> + }
>> + return ret;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758ca29a5f3f9d3f6e0d1
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "-O3" } */
>> +
>> +char string[1020];
>> +
>> +char * find(int n, char c)
>> +{
>> + for (int i = 0; i < n; i++) {
>> + if (string[i] == c)
>> + return &string[i];
>> + }
>> + return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..374a051b945e97eedb9be9da423cf54b5e564d6f
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile } */
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "-O3" } */
>> +
>> +char string[1020] __attribute__((aligned(1)));
>> +
>> +char * find(int n, char c)
>> +{
>> + for (int i = 1; i < n; i++) {
>> + if (string[i] == c)
>> + return &string[i];
>> + }
>> + return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
>> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f257ceea1c065fcc6ae9
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile } */
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "-O3" } */
>> +
>> +char string[1020] __attribute__((aligned(1)));
>> +
>> +char * find(int n, char c)
>> +{
>> + for (int i = 0; i < n; i++) {
>> + if (string[i] == c)
>> + return &string[i];
>> + }
>> + return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
>> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..ca95be44e92e32769da1d1e9b740ae54682a3d55
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +
>> +unsigned test4(char x, char *vect, int n)
>> +{
>> + unsigned ret = 0;
>> + for (int i = 0; i < n; i++)
>> + {
>> + if (vect[i] > x)
>> + return 1;
>> +
>> + vect[i] = x;
>> + }
>> + return ret;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c64a61c97b1b6268743
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +
>> +unsigned test4(char x, char *vect_a, char *vect_b, int n)
>> +{
>> + unsigned ret = 0;
>> + for (int i = 1; i < n; i++)
>> + {
>> + if (vect_a[i] > x || vect_b[i] > x)
>> + return 1;
>> +
>> + vect_a[i] = x;
>> + }
>> + return ret;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..cf76c7109edce15f860cdc27e10850ef5a31fc9a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +/* This should be vectorizable through load_lanes and linear targets. */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_load_lanes } } } */
>> +
>> +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
>> +{
>> + unsigned ret = 0;
>> + for (int i = 0; i < n; i+=2)
>> + {
>> + if (vect_a[i] > x || vect_a[i+1] > x)
>> + return 1;
>> +
>> + vect_b[i] = x;
>> + vect_b[i+1] = x+1;
>> + }
>> + return ret;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..25d3a62356baf127c89187b150810e4d31567c6f
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_partial_vectors } } } } */
>> +
>> +char vect_a[1025];
>> +char vect_b[1025];
>> +
>> +unsigned test4(char x, int n)
>> +{
>> + unsigned ret = 0;
>> + for (int i = 1; i < (n - 2); i+=2)
>> + {
>> + if (vect_a[i] > x || vect_a[i+1] > x)
>> + return 1;
>> +
>> + vect_b[i] = x;
>> + vect_b[i+1] = x+1;
>> + }
>> + return ret;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..10eb98b726acb32a0d1de4daf202724995bfa1a6
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
>> @@ -0,0 +1,29 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +/* Group size is uneven and second group is misaligned. Needs partial vectors. */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_partial_vectors } } } } */
>> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
>> +
>> +
>> +char vect_a[1025];
>> +char vect_b[1025];
>> +
>> +unsigned test4(char x, int n)
>> +{
>> + unsigned ret = 0;
>> + for (int i = 1; i < (n - 2); i+=2)
>> + {
>> + if (vect_a[i-1] > x || vect_a[i+2] > x)
>> + return 1;
>> +
>> + vect_b[i] = x;
>> + vect_b[i+1] = x+1;
>> + }
>> + return ret;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
>> index babc79c74c39b5beedd293f2138f0c46846543b0..edddb44bad66aa419d097f69ca850e5eaa66e014 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
>> @@ -5,7 +5,8 @@
>>
>> /* { dg-additional-options "-Ofast" } */
>>
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_load_lanes } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_load_lanes } } } } */
>>
>> #ifndef N
>> #define N 803
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
>> index dec0b492ab883de6e02944a95fd554a109a68a39..8f5ccc45ce06ed36627107e080d633e55e254fa0 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
>> @@ -5,7 +5,9 @@
>>
>> /* { dg-additional-options "-Ofast" } */
>>
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! "arm*-*-*" } } } } */
>> +/* Complex numbers read x and x+1, which on non-load lanes targets require partial loops. */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! "arm*-*-*" } && vect_load_lanes } } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { { "arm*-*-*" } || { ! vect_load_lanes } } } } } */
>>
>> #include <complex.h>
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
>> index 039aac7fd84cf6131e1ea401b87385a32b545e67..7ac1e76f0aca37aa04a767b6034000f09aaf98b8 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
>> @@ -5,7 +5,7 @@
>>
>> /* { dg-additional-options "-Ofast" } */
>>
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! ia32 } } } } */
>>
>> #include <stdbool.h>
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
>> index f73f3c2eb86e804803a969dab983dc9e39eed66a..483ea5f243c825d6a6c4f5aa7f86c3f9eb8b2e10 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
>> @@ -5,7 +5,7 @@
>>
>> /* { dg-additional-options "-Ofast" } */
>>
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! ia32 } } } } */
>>
>> #include <stdbool.h>
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
>> index b3f5984f682f30f79331d48a264c2cc4af3e2503..f8f84fab97ab586847000af8b89448b0885ef5fc 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
>> @@ -42,4 +42,6 @@ main ()
>> return 0;
>> }
>>
>> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
>> +/* This will fail because we cannot SLP the load groups yet. */
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" { target vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { ! vect_partial_vectors } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
>> index 47d2a50218bd1b32fe43edcaaabb1079d0b26223..643016b2ccfea29ba36d65c8070f255cb8179481 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
>> @@ -41,4 +41,6 @@ main ()
>> return 0;
>> }
>>
>> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
>> +/* This will fail because we cannot SLP the load groups yet. */
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" { target vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { ! vect_partial_vectors } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
>> index 8062fbbf6422af6a2e42de9574e88d411a8fb917..36fc6a6eb60fae70f8f05a3d9435f5adce025847 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
>> @@ -23,4 +23,5 @@ unsigned test4(unsigned x)
>> return ret;
>> }
>>
>> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
>> \ No newline at end of file
>> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target vect_load_lanes } } } */
>> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" { target { ! vect_load_lanes } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
>> index 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46a462238c3b5825ef 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
>> @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n)
>> return ret;
>> }
>>
>> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
>> +/* cannot safely vectorize this due due to the group misalignment. */
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
>> index 7e9f635a0b5a8f6fb5da5d7cc6a426f343af4b56..0cfa2428cc61d5f4ea0784367acea6436736970f 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
>> @@ -27,4 +27,6 @@ unsigned test4(unsigned x)
>> return ret;
>> }
>>
>> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
>> \ No newline at end of file
>> +/* This will fail because we cannot SLP the load groups yet. */
>> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" { target { ! vect_partial_vectors } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
>> index 7e9f635a0b5a8f6fb5da5d7cc6a426f343af4b56..0cfa2428cc61d5f4ea0784367acea6436736970f 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
>> @@ -27,4 +27,6 @@ unsigned test4(unsigned x)
>> return ret;
>> }
>>
>> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
>> \ No newline at end of file
>> +/* This will fail because we cannot SLP the load groups yet. */
>> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" { target { ! vect_partial_vectors } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
>> index a02d5986ba3cfc117b19305c5e96711299996931..d4fd0d39a25a5659e3d9452b79f3e0fabba8b3c0 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
>> @@ -2,6 +2,7 @@
>> /* { dg-do compile } */
>> /* { dg-require-effective-target vect_early_break } */
>> /* { dg-require-effective-target vect_int } */
>> +/* { dg-require-effective-target vect_partial_vectors } */
>>
>> void abort ();
>> int a[64], b[64];
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
>> index 9096f66647c7b3cb430562d35f8ce076244f7c11..b35e737fa3b9137cd745c14f7ad915a3f81c38c4 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
>> @@ -4,6 +4,7 @@
>> /* { dg-require-effective-target vect_int } */
>> /* { dg-add-options bind_pic_locally } */
>> /* { dg-require-effective-target vect_early_break_hw } */
>> +/* { dg-require-effective-target vect_partial_vectors } */
>>
>> #include <stdarg.h>
>> #include "tree-vect.h"
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
>> index 319bd125c3156f13c300ff2b94d269bb9ec29e97..a4886654f152b2c0568286febea2b31cb7be8499 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
>> @@ -5,8 +5,9 @@
>>
>> /* { dg-additional-options "-Ofast" } */
>>
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
>> +/* Multiple loads of different alignments, we can't peel this. */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */
>>
>> void abort ();
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
>> index 7b870e9c60dcac6164d879dd70c1fc07ec0221fe..c7cce81f52c80d83bd2c1face8cbd13f93834531 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
>> @@ -5,7 +5,9 @@
>>
>> /* { dg-additional-options "-Ofast" } */
>>
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* This will fail because we cannot SLP the load groups yet. */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_partial_vectors } } } } */
>>
>> #define N 1024
>> unsigned vect_a[N];
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
>> index d218a0686719fee4c167684dcf26402851b53260..34d187483320b9cc215304b73e28d45d7031516e 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
>> @@ -5,7 +5,10 @@
>>
>> /* { dg-additional-options "-Ofast" } */
>>
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! "arm*-*-*" } } } } */
>> +/* Complex numbers read x and x+1, which on non-load lanes targets require partial loops. */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! "arm*-*-*" } && vect_load_lanes } } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { { "arm*-*-*" } || { ! vect_load_lanes } } } } } */
>> +
>>
>> #include <complex.h>
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
>> index 8a8c076ba92ca6fef419cb23b457a23555c61c64..b58a4611d6b8d86f0247d9ea44ab4750473589a9 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
>> @@ -5,8 +5,9 @@
>>
>> /* { dg-additional-options "-Ofast" } */
>>
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
>> +/* Multiple loads with different misalignments. Can't peel need partial loop support. */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */
>> void abort ();
>>
>> unsigned short sa[32];
>> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
>> index 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..9949fc3d98852399242a96095f4dae5ffe7613b3 100644
>> --- a/gcc/tree-vect-data-refs.cc
>> +++ b/gcc/tree-vect-data-refs.cc
>> @@ -731,7 +731,9 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
>> if (is_gimple_debug (stmt))
>> continue;
>>
>> - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
>> + stmt_vec_info stmt_vinfo
>> + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
>> + stmt = STMT_VINFO_STMT (stmt_vinfo);
>> auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
>> if (!dr_ref)
>> continue;
>> @@ -748,26 +750,16 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
>> bounded by VF so accesses are within range. We only need to check
>> the reads since writes are moved to a safe place where if we get
>> there we know they are safe to perform. */
>> - if (DR_IS_READ (dr_ref)
>> - && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
>> + if (DR_IS_READ (dr_ref))
>> {
>> - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
>> - || STMT_VINFO_STRIDED_P (stmt_vinfo))
>> - {
>> - const char *msg
>> - = "early break not supported: cannot peel "
>> - "for alignment, vectorization would read out of "
>> - "bounds at %G";
>> - return opt_result::failure_at (stmt, msg, stmt);
>> - }
>> -
>> - dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
>> - dr_info->need_peeling_for_alignment = true;
>> + dr_set_safe_speculative_read_required (stmt_vinfo, true);
>> + bool inbounds = ref_within_array_bound (stmt, DR_REF (dr_ref));
>> + DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_vinfo)) = inbounds;
>>
>> if (dump_enabled_p ())
>> dump_printf_loc (MSG_NOTE, vect_location,
>> - "marking DR (read) as needing peeling for "
>> - "alignment at %G", stmt);
>> + "marking DR (read) as possibly needing peeling "
>> + "for alignment at %G", stmt);
>> }
>>
>> if (DR_IS_READ (dr_ref))
>> @@ -1326,9 +1318,6 @@ vect_record_base_alignments (vec_info *vinfo)
>> Compute the misalignment of the data reference DR_INFO when vectorizing
>> with VECTYPE.
>>
>> - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
>> - be set appropriately on failure (but is otherwise left unchanged).
>> -
>> Output:
>> 1. initialized misalignment info for DR_INFO
>>
>> @@ -1337,7 +1326,7 @@ vect_record_base_alignments (vec_info *vinfo)
>>
>> static void
>> vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
>> - tree vectype, opt_result *result = nullptr)
>> + tree vectype)
>> {
>> stmt_vec_info stmt_info = dr_info->stmt;
>> vec_base_alignments *base_alignments = &vinfo->base_alignments;
>> @@ -1365,63 +1354,29 @@ vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
>> = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
>> BITS_PER_UNIT);
>>
>> - /* If this DR needs peeling for alignment for correctness, we must
>> - ensure the target alignment is a constant power-of-two multiple of the
>> - amount read per vector iteration (overriding the above hook where
>> - necessary). */
>> - if (dr_info->need_peeling_for_alignment)
>> + if (loop_vinfo
>> + && dr_safe_speculative_read_required (stmt_info))
>> {
>> - /* Vector size in bytes. */
>> - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
>> -
>> - /* We can only peel for loops, of course. */
>> - gcc_checking_assert (loop_vinfo);
>> -
>> - /* Calculate the number of vectors read per vector iteration. If
>> - it is a power of two, multiply through to get the required
>> - alignment in bytes. Otherwise, fail analysis since alignment
>> - peeling wouldn't work in such a case. */
>> - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>> + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>> + auto vectype_size
>> + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
>> + poly_uint64 new_alignment = vf * vectype_size;
>> + /* If we have a grouped access we require that the alignment be N * elem. */
>> if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>> - num_scalars *= DR_GROUP_SIZE (stmt_info);
>> + new_alignment *= DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
>>
>> - auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
>> - if (!pow2p_hwi (num_vectors))
>> - {
>> - *result = opt_result::failure_at (vect_location,
>> - "non-power-of-two num vectors %u "
>> - "for DR needing peeling for "
>> - "alignment at %G",
>> - num_vectors, stmt_info->stmt);
>> - return;
>> - }
>> -
>> - safe_align *= num_vectors;
>> - if (maybe_gt (safe_align, 4096U))
>> - {
>> - pretty_printer pp;
>> - pp_wide_integer (&pp, safe_align);
>> - *result = opt_result::failure_at (vect_location,
>> - "alignment required for correctness"
>> - " (%s) may exceed page size",
>> - pp_formatted_text (&pp));
>> - return;
>> - }
>> -
>> - unsigned HOST_WIDE_INT multiple;
>> - if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
>> - || !pow2p_hwi (multiple))
>> + unsigned HOST_WIDE_INT target_alignment;
>> + if (new_alignment.is_constant (&target_alignment)
>> + && pow2p_hwi (target_alignment))
>> {
>> if (dump_enabled_p ())
>> {
>> dump_printf_loc (MSG_NOTE, vect_location,
>> - "forcing alignment for DR from preferred (");
>> - dump_dec (MSG_NOTE, vector_alignment);
>> - dump_printf (MSG_NOTE, ") to safe align (");
>> - dump_dec (MSG_NOTE, safe_align);
>> - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
>> + "alignment increased due to early break to ");
>> + dump_dec (MSG_NOTE, new_alignment);
>> + dump_printf (MSG_NOTE, " bytes.\n");
>> }
>> - vector_alignment = safe_align;
>> + vector_alignment = target_alignment;
>> }
>> }
>>
>> @@ -2487,6 +2442,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
>> || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
>> loop_preheader_edge (loop))
>> || loop->inner
>> + /* We don't currently maintaing the LCSSA for prologue peeled inversed
>> + loops. */
>> || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
>> do_peeling = false;
>>
>> @@ -2950,12 +2907,9 @@ vect_analyze_data_refs_alignment (loop_vec_info loop_vinfo)
>> if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
>> && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
>> continue;
>> - opt_result res = opt_result::success ();
>> +
>> vect_compute_data_ref_alignment (loop_vinfo, dr_info,
>> - STMT_VINFO_VECTYPE (dr_info->stmt),
>> - &res);
>> - if (!res)
>> - return res;
>> + STMT_VINFO_VECTYPE (dr_info->stmt));
>> }
>> }
>>
>> @@ -7219,7 +7173,7 @@ vect_supportable_dr_alignment (vec_info *vinfo, dr_vec_info *dr_info,
>>
>> if (misalignment == 0)
>> return dr_aligned;
>> - else if (dr_info->need_peeling_for_alignment)
>> + else if (dr_safe_speculative_read_required (stmt_info))
>> return dr_unaligned_unsupported;
>>
>> /* For now assume all conditional loads/stores support unaligned
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..743631f944884a31505a95f7a188fd4e4ca3797d 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -2597,6 +2597,128 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
>> return false;
>> }
>>
>> +
>> + /* Checks if all scalar iterations are known to be inbounds. */
>> + bool inbounds = DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_info));
>> +
>> + /* Check if we support the operation if early breaks are needed. Here we
>> + must ensure that we don't access any more than the scalar code would
>> + have. A masked operation would ensure this, so for these load types
>> + force masking. */
>> + if (loop_vinfo
>> + && dr_safe_speculative_read_required (stmt_info)
>> + && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
>> + && (*memory_access_type == VMAT_GATHER_SCATTER
>> + || *memory_access_type == VMAT_STRIDED_SLP))
>> + {
>> + if (dump_enabled_p ())
>> + dump_printf_loc (MSG_NOTE, vect_location,
>> + "early break not supported: cannot peel for "
>> + "alignment. With non-contiguous memory vectorization"
>> + " could read out of bounds at %G ",
>> + STMT_VINFO_STMT (stmt_info));
>> + if (inbounds)
>> + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
>> + else
>> + return false;
>> + }
>> +
>> + /* If this DR needs alignment for correctness, we must ensure the target
>> + alignment is a constant power-of-two multiple of the amount read per
>> + vector iteration or force masking. */
>> + if (dr_safe_speculative_read_required (stmt_info)
>> + && *alignment_support_scheme == dr_aligned)
>> + {
>> + /* We can only peel for loops, of course. */
>> + gcc_checking_assert (loop_vinfo);
>> +
>> + auto target_alignment
>> + = DR_TARGET_ALIGNMENT (STMT_VINFO_DR_INFO (stmt_info));
>> + unsigned HOST_WIDE_INT target_align;
>> +
>> + bool group_aligned = false;
>> + if (target_alignment.is_constant (&target_align)
>> + && nunits.is_constant ())
>> + {
>> + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>> + auto vectype_size
>> + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
>> + poly_uint64 required_alignment = vf * vectype_size;
>> + /* If we have a grouped access we require that the alignment be N * elem. */
>> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>> + required_alignment *=
>> + DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
>> + if (!multiple_p (target_alignment, required_alignment))
>> + {
>> + if (dump_enabled_p ())
>> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> + "desired alignment %wu not met. Instead got %wu "
>> + "for DR alignment at %G",
>> + required_alignment.to_constant (),
>> + target_align, STMT_VINFO_STMT (stmt_info));
>> + return false;
>> + }
>> +
>> + if (!pow2p_hwi (target_align))
>> + {
>> + if (dump_enabled_p ())
>> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> + "non-power-of-two vector alignment %wd "
>> + "for DR alignment at %G",
>> + target_align, STMT_VINFO_STMT (stmt_info));
>> + return false;
>> + }
>> +
>> + /* For VLA we have to insert a runtime check that the vector loads
>> + per iterations don't exceed a page size. For now we can use
>> + POLY_VALUE_MAX as a proxy as we can't peel for VLA. */
>> + if (known_gt (required_alignment, (unsigned)param_min_pagesize))
>> + {
>> + if (dump_enabled_p ())
>> + {
>> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> + "alignment required for correctness (");
>> + dump_dec (MSG_MISSED_OPTIMIZATION, required_alignment);
>> + dump_printf (MSG_NOTE, ") may exceed page size\n");
>> + }
>> + return false;
>> + }
>> +
>> + group_aligned = true;
>> + }
>> +
>> + /* There are multiple loads that have a misalignment that we couldn't
>> + align. We would need LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P to
>> + vectorize. */
>> + if (!group_aligned)
>> + {
>> + if (inbounds)
>> + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
>> + else
>> + return false;
>> + }
>> +
>> + /* When using a group access the first element may be aligned but the
>> + subsequent loads may not be. For LOAD_LANES since the loads are based
>> + on the first DR then all loads in the group are aligned. For
>> + non-LOAD_LANES this is not the case. In particular a load + blend when
>> + there are gaps can have the non first loads issued unaligned, even
>> + partially overlapping the memory of the first load in order to simplify
>> + the blend. This is what the x86_64 backend does for instance. As
>> + such only the first load in the group is aligned, the rest are not.
>> + Because of this the permutes may break the alignment requirements that
>> + have been set, and as such we should for now, reject them. */
>> + if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
>> + {
>> + if (dump_enabled_p ())
>> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> + "loads with load permutations not supported for "
>> + "speculative early break loads for %G",
>> + STMT_VINFO_STMT (stmt_info));
>> + return false;
>> + }
>> + }
>> +
>> if (*alignment_support_scheme == dr_unaligned_unsupported)
>> {
>> if (dump_enabled_p ())
>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>> index b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..97caf61b345735d297ec49fd6ca64797435b46fc 100644
>> --- a/gcc/tree-vectorizer.h
>> +++ b/gcc/tree-vectorizer.h
>> @@ -1281,7 +1281,11 @@ public:
>>
>> /* Set by early break vectorization when this DR needs peeling for alignment
>> for correctness. */
>> - bool need_peeling_for_alignment;
>> + bool safe_speculative_read_required;
>> +
>> + /* Set by early break vectorization when this DR's scalar accesses are known
>> + to be inbounds of a known bounds loop. */
>> + bool scalar_access_known_in_bounds;
>>
>> tree base_decl;
>>
>> @@ -1997,6 +2001,35 @@ dr_target_alignment (dr_vec_info *dr_info)
>> return dr_info->target_alignment;
>> }
>> #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
>> +#define DR_SCALAR_KNOWN_BOUNDS(DR) (DR)->scalar_access_known_in_bounds
>> +
>> +/* Return if the stmt_vec_info requires peeling for alignment. */
>> +inline bool
>> +dr_safe_speculative_read_required (stmt_vec_info stmt_info)
>> +{
>> + dr_vec_info *dr_info;
>> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>> + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
>> + else
>> + dr_info = STMT_VINFO_DR_INFO (stmt_info);
>> +
>> + return dr_info->safe_speculative_read_required;
>> +}
>> +
>> +/* Set the safe_speculative_read_required for the the stmt_vec_info, if group
>> + access then set on the fist element otherwise set on DR directly. */
>> +inline void
>> +dr_set_safe_speculative_read_required (stmt_vec_info stmt_info,
>> + bool requires_alignment)
>> +{
>> + dr_vec_info *dr_info;
>> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>> + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
>> + else
>> + dr_info = STMT_VINFO_DR_INFO (stmt_info);
>> +
>> + dr_info->safe_speculative_read_required = requires_alignment;
>> +}
>>
>> inline void
>> set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
>>
>>
>>
@@ -17246,7 +17246,7 @@ Maximum number of relations the oracle will register in a basic block.
Work bound when discovering transitive relations from existing relations.
@item min-pagesize
-Minimum page size for warning purposes.
+Minimum page size for warning and early break vectorization purposes.
@item openacc-kernels
Specify mode of OpenACC `kernels' constructs handling.
new file mode 100644
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+typedef decltype(sizeof(0)) size_t;
+struct ts1 {
+ int spans[6][2];
+};
+struct gg {
+ int t[6];
+};
+ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
+ ts1 ret;
+ for (size_t i = 0; i != t; i++) {
+ if (!(i < t)) __builtin_abort();
+ ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
+ ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
+ }
+ return ret;
+}
@@ -55,7 +55,9 @@ int main()
}
}
rephase ();
+#pragma GCC novector
for (i = 0; i < 32; ++i)
+#pragma GCC novector
for (j = 0; j < 3; ++j)
#pragma GCC novector
for (k = 0; k < 3; ++k)
@@ -5,7 +5,8 @@
/* { dg-additional-options "-O3" } */
/* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* Arm creates a group size of 3 here, which we can't support yet. */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! arm*-*-* } } } } */
typedef struct filter_list_entry {
const char *name;
new file mode 100644
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+int a, b, c, d, e, f;
+short g[1];
+int main() {
+ int h;
+ while (a) {
+ while (h)
+ ;
+ for (b = 2; b; b--) {
+ while (c)
+ ;
+ f = g[a];
+ if (d)
+ break;
+ }
+ while (e)
+ ;
+ }
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+char string[1020];
+
+char * find(int n, char c)
+{
+ for (int i = 1; i < n; i++) {
+ if (string[i] == c)
+ return &string[i];
+ }
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
new file mode 100644
@@ -0,0 +1,24 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* Alignment requirement too big, load lanes targets can't safely vectorize this. */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump "Peeling for alignment will be applied" "vect" { target { ! vect_load_lanes } } } } */
+
+unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < (n - 2); i+=2)
+ {
+ if (vect_a[i] > x || vect_a[i+2] > x)
+ return 1;
+
+ vect_b[i] = x;
+ vect_b[i+1] = x+1;
+ }
+ return ret;
+}
new file mode 100644
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+char string[1020];
+
+char * find(int n, char c)
+{
+ for (int i = 0; i < n; i++) {
+ if (string[i] == c)
+ return &string[i];
+ }
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
new file mode 100644
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+char string[1020] __attribute__((aligned(1)));
+
+char * find(int n, char c)
+{
+ for (int i = 1; i < n; i++) {
+ if (string[i] == c)
+ return &string[i];
+ }
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
+/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
new file mode 100644
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+char string[1020] __attribute__((aligned(1)));
+
+char * find(int n, char c)
+{
+ for (int i = 0; i < n; i++) {
+ if (string[i] == c)
+ return &string[i];
+ }
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
+/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
new file mode 100644
@@ -0,0 +1,23 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+unsigned test4(char x, char *vect, int n)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < n; i++)
+ {
+ if (vect[i] > x)
+ return 1;
+
+ vect[i] = x;
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "Peeling for alignment will be applied" "vect" } } */
new file mode 100644
@@ -0,0 +1,23 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+unsigned test4(char x, char *vect_a, char *vect_b, int n)
+{
+ unsigned ret = 0;
+ for (int i = 1; i < n; i++)
+ {
+ if (vect_a[i] > x || vect_b[i] > x)
+ return 1;
+
+ vect_a[i] = x;
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" "vect" } } */
new file mode 100644
@@ -0,0 +1,23 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* This should be vectorizable through load_lanes and linear targets. */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < n; i+=2)
+ {
+ if (vect_a[i] > x || vect_a[i+1] > x)
+ return 1;
+
+ vect_b[i] = x;
+ vect_b[i+1] = x+1;
+ }
+ return ret;
+}
new file mode 100644
@@ -0,0 +1,27 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "Access will not read beyond buffer to due known size buffer" "vect" } } */
+
+
+char vect_a[1025];
+char vect_b[1025];
+
+unsigned test4(char x, int n)
+{
+ unsigned ret = 0;
+ for (int i = 1; i < (n - 2); i+=2)
+ {
+ if (vect_a[i] > x || vect_a[i+1] > x)
+ return 1;
+
+ vect_b[i] = x;
+ vect_b[i+1] = x+1;
+ }
+ return ret;
+}
new file mode 100644
@@ -0,0 +1,28 @@
+/* { dg-add-options vect_early_break } */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* Group size is uneven, load lanes targets can't safely vectorize this. */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump "Peeling for alignment will be applied" "vect" { target { ! vect_load_lanes } } } } */
+
+
+char vect_a[1025];
+char vect_b[1025];
+
+unsigned test4(char x, int n)
+{
+ unsigned ret = 0;
+ for (int i = 1; i < (n - 2); i+=2)
+ {
+ if (vect_a[i-1] > x || vect_a[i+2] > x)
+ return 1;
+
+ vect_b[i] = x;
+ vect_b[i+1] = x+1;
+ }
+ return ret;
+}
@@ -42,4 +42,6 @@ main ()
return 0;
}
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
+/* Targets that make this LOAD_LANES fail due to the group misalignment. */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target vect_load_lanes } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" { target { ! vect_load_lanes } } } } */
@@ -731,7 +731,8 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
if (is_gimple_debug (stmt))
continue;
- stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+ stmt_vec_info stmt_vinfo
+ = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
if (!dr_ref)
continue;
@@ -748,26 +749,15 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
bounded by VF so accesses are within range. We only need to check
the reads since writes are moved to a safe place where if we get
there we know they are safe to perform. */
- if (DR_IS_READ (dr_ref)
- && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
+ if (DR_IS_READ (dr_ref))
{
- if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
- || STMT_VINFO_STRIDED_P (stmt_vinfo))
- {
- const char *msg
- = "early break not supported: cannot peel "
- "for alignment, vectorization would read out of "
- "bounds at %G";
- return opt_result::failure_at (stmt, msg, stmt);
- }
-
dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
dr_info->need_peeling_for_alignment = true;
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
- "marking DR (read) as needing peeling for "
- "alignment at %G", stmt);
+ "marking DR (read) as possibly needing peeling "
+ "for alignment at %G", stmt);
}
if (DR_IS_READ (dr_ref))
@@ -1326,9 +1316,6 @@ vect_record_base_alignments (vec_info *vinfo)
Compute the misalignment of the data reference DR_INFO when vectorizing
with VECTYPE.
- RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT will
- be set appropriately on failure (but is otherwise left unchanged).
-
Output:
1. initialized misalignment info for DR_INFO
@@ -1337,7 +1324,7 @@ vect_record_base_alignments (vec_info *vinfo)
static void
vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
- tree vectype, opt_result *result = nullptr)
+ tree vectype)
{
stmt_vec_info stmt_info = dr_info->stmt;
vec_base_alignments *base_alignments = &vinfo->base_alignments;
@@ -1365,66 +1352,6 @@ vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
= exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
BITS_PER_UNIT);
- /* If this DR needs peeling for alignment for correctness, we must
- ensure the target alignment is a constant power-of-two multiple of the
- amount read per vector iteration (overriding the above hook where
- necessary). */
- if (dr_info->need_peeling_for_alignment)
- {
- /* Vector size in bytes. */
- poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
-
- /* We can only peel for loops, of course. */
- gcc_checking_assert (loop_vinfo);
-
- /* Calculate the number of vectors read per vector iteration. If
- it is a power of two, multiply through to get the required
- alignment in bytes. Otherwise, fail analysis since alignment
- peeling wouldn't work in such a case. */
- poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
- if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
- num_scalars *= DR_GROUP_SIZE (stmt_info);
-
- auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
- if (!pow2p_hwi (num_vectors))
- {
- *result = opt_result::failure_at (vect_location,
- "non-power-of-two num vectors %u "
- "for DR needing peeling for "
- "alignment at %G",
- num_vectors, stmt_info->stmt);
- return;
- }
-
- safe_align *= num_vectors;
- if (maybe_gt (safe_align, 4096U))
- {
- pretty_printer pp;
- pp_wide_integer (&pp, safe_align);
- *result = opt_result::failure_at (vect_location,
- "alignment required for correctness"
- " (%s) may exceed page size",
- pp_formatted_text (&pp));
- return;
- }
-
- unsigned HOST_WIDE_INT multiple;
- if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
- || !pow2p_hwi (multiple))
- {
- if (dump_enabled_p ())
- {
- dump_printf_loc (MSG_NOTE, vect_location,
- "forcing alignment for DR from preferred (");
- dump_dec (MSG_NOTE, vector_alignment);
- dump_printf (MSG_NOTE, ") to safe align (");
- dump_dec (MSG_NOTE, safe_align);
- dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
- }
- vector_alignment = safe_align;
- }
- }
-
SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment);
/* If the main loop has peeled for alignment we have no way of knowing
@@ -2479,7 +2406,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
|| !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
loop_preheader_edge (loop))
|| loop->inner
- || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
+ || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) // <<-- ??
do_peeling = false;
struct _vect_peel_extended_info peel_for_known_alignment;
@@ -2942,12 +2869,9 @@ vect_analyze_data_refs_alignment (loop_vec_info loop_vinfo)
if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
&& DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
continue;
- opt_result res = opt_result::success ();
+
vect_compute_data_ref_alignment (loop_vinfo, dr_info,
- STMT_VINFO_VECTYPE (dr_info->stmt),
- &res);
- if (!res)
- return res;
+ STMT_VINFO_VECTYPE (dr_info->stmt));
}
}
@@ -7211,7 +7135,7 @@ vect_supportable_dr_alignment (vec_info *vinfo, dr_vec_info *dr_info,
if (misalignment == 0)
return dr_aligned;
- else if (dr_info->need_peeling_for_alignment)
+ else if (dr_peeling_alignment (stmt_info))
return dr_unaligned_unsupported;
/* For now assume all conditional loads/stores support unaligned
@@ -2597,6 +2597,69 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
return false;
}
+ auto unit_size = GET_MODE_UNIT_SIZE (TYPE_MODE (vectype));
+ /* Check if a misalignment with an unsupported peeling for early break is
+ still OK. First we need to distinguish between when we've reached here do
+ to dependency analysis or when the user has requested -mstrict-align or
+ similar. In those cases we must not override it. */
+ if (dr_peeling_alignment (stmt_info)
+ && *alignment_support_scheme == dr_unaligned_unsupported
+ /* We can only attempt to override if the misalignment is a multiple of
+ the element being loaded, otherwise peeling or versioning would have
+ really been required. */
+ && multiple_p (*misalignment, unit_size))
+ {
+ bool inbounds
+ = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
+ DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
+ /* If we have a known misalignment, and are doing a group load for a DR
+ that requires aligned access, check if the misalignment is a multiple
+ of the unit size. In which case the group load will be issued aligned
+ as long as the first load in the group is aligned.
+
+ For the non-inbound case we'd need goup_size * vectype alignment. But
+ this is quite huge and unlikely to ever happen so if we can't peel for
+ it, just reject it. */
+ if (*memory_access_type == VMAT_LOAD_STORE_LANES
+ && (STMT_VINFO_GROUPED_ACCESS (stmt_info) || slp_node))
+ {
+ /* ?? This needs updating whenever we support slp group > 1. */
+ auto group_size = DR_GROUP_SIZE (stmt_info);
+ /* For the inbound case it's enough to check for an alignment of
+ GROUP_SIZE * element size. */
+ if (inbounds
+ && *misalignment % (group_size * unit_size)
+ && group_size % 2 == 0)
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Assuming grouped access is aligned due to load "
+ "lanes, overriding alignment scheme\n");
+
+ *alignment_support_scheme = dr_unaligned_supported;
+ }
+ }
+ /* If we have a linear access and know the misalignment and know we won't
+ read out of bounds then it's also ok if the misalignment is a multiple
+ of the element size. We get this when the loop has known misalignments
+ but the misalignments of the DRs can't be peeled to reach mutual
+ alignment. Because the misalignments are known however we also know
+ that versioning won't work. If the target does support unaligned
+ accesses and we know we are free to read the entire buffer then we
+ can allow the unaligned access if it's on elements for an early break
+ condition. */
+ else if (*memory_access_type != VMAT_GATHER_SCATTER
+ && inbounds)
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Access will not read beyond buffer to due known size "
+ "buffer, overriding alignment scheme\n");
+
+ *alignment_support_scheme = dr_unaligned_supported;
+ }
+ }
+
if (*alignment_support_scheme == dr_unaligned_unsupported)
{
if (dump_enabled_p ())
@@ -10520,6 +10583,66 @@ vectorizable_load (vec_info *vinfo,
/* Transform. */
dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info), *first_dr_info = NULL;
+
+ /* Check if we support the operation if early breaks are needed. */
+ if (loop_vinfo
+ && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+ && (memory_access_type == VMAT_GATHER_SCATTER
+ || memory_access_type == VMAT_STRIDED_SLP))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "early break not supported: cannot peel for "
+ "alignment. With non-contiguous memory vectorization"
+ " could read out of bounds at %G ",
+ STMT_VINFO_STMT (stmt_info));
+ return false;
+ }
+
+ /* If this DR needs peeling for alignment for correctness, we must
+ ensure the target alignment is a constant power-of-two multiple of the
+ amount read per vector iteration (overriding the above hook where
+ necessary). We don't support group loads, which would have been filterd
+ out in the check above. For now it means we don't have to look at the
+ group info and just check that the load is continguous and can just use
+ dr_info. For known size buffers we still need to check if the vector
+ is misaligned and if so we need to peel. */
+ if (costing_p && dr_info->need_peeling_for_alignment)
+ {
+ /* Vector size in bytes. */
+ poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
+
+ /* We can only peel for loops, of course. */
+ gcc_checking_assert (loop_vinfo);
+
+ auto num_vectors = ncopies;
+ if (!pow2p_hwi (num_vectors))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "non-power-of-two num vectors %u "
+ "for DR needing peeling for "
+ "alignment at %G",
+ num_vectors, STMT_VINFO_STMT (stmt_info));
+ return false;
+ }
+
+ safe_align *= num_vectors;
+ /* For VLA we don't support PFA when any unrolling needs to be done.
+ We could though but too much work for GCC 15. */
+ if (maybe_gt (safe_align, (unsigned)param_min_pagesize))
+ {
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "alignment required for correctness (");
+ dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
+ dump_printf (MSG_NOTE, ") may exceed page size\n");
+ }
+ return false;
+ }
+ }
+
ensure_base_align (dr_info);
if (memory_access_type == VMAT_INVARIANT)
@@ -1998,6 +1998,19 @@ dr_target_alignment (dr_vec_info *dr_info)
}
#define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
+/* Return if the stmt_vec_info requires peeling for alignment. */
+inline bool
+dr_peeling_alignment (stmt_vec_info stmt_info)
+{
+ dr_vec_info *dr_info;
+ if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
+ dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
+ else
+ dr_info = STMT_VINFO_DR_INFO (stmt_info);
+
+ return dr_info->need_peeling_for_alignment;
+}
+
inline void
set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
{