From patchwork Tue Apr 26 14:34:12 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: "Andre Vieira (lists)" <Andre.SimoesDiasVieira@arm.com>
X-Patchwork-Id: 53228
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 69F003857809
	for <patchwork@sourceware.org>; Tue, 26 Apr 2022 14:35:01 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 69F003857809
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1650983701;
	bh=CDcL5OJbrT5L4wGPTCkr6VewpjBqA3KokQ2tkWAxmbo=;
	h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:Cc:From;
	b=LbHt57jSPU7Swf6jhviZcOjAmreFHxXIJn9JbbsMopdScwDSvcnCZCLkHKLo9Wn9v
	 /nuXKQm813XgP4iRdCyzrpOMM7xRSL4mAV7qG7y0BAwvONJOqSzfeWLSWk+Gwv5UP/
	 5gjHwO8c5MnYgMUT/zkA8MKzmu+9Jtz1fI5md+rY=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id E386D3858C56
 for <gcc-patches@gcc.gnu.org>; Tue, 26 Apr 2022 14:34:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E386D3858C56
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7826523A;
 Tue, 26 Apr 2022 07:34:18 -0700 (PDT)
Received: from [10.57.10.193] (unknown [10.57.10.193])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E46633F73B;
 Tue, 26 Apr 2022 07:34:17 -0700 (PDT)
Message-ID: <8462f41b-895f-9aca-499e-7713ec161673@arm.com>
Date: Tue, 26 Apr 2022 15:34:12 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.8.1
Content-Language: en-US
Subject: [PATCH] vect, tree-optimization/105219: Disable epilogue
 vectorization when peeling for alignment
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, KAM_LOTSOFHASH, KAM_SHORT, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: "Andre Vieira \(lists\) via Gcc-patches"
 <gcc-patches@gcc.gnu.org>
From: "Andre Vieira (lists)" <Andre.SimoesDiasVieira@arm.com>
Reply-To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>
Cc: Richard Sandiford <richard.sandiford@arm.com>,
 Richard Biener <rguenther@suse.de>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Hi,

This patch disables epilogue vectorization when we are peeling for 
alignment in the prologue and we can't guarantee the main vectorized 
loop is entered.  This is to prevent executing vectorized code with an 
unaligned access if the target has indicated it wants to peel for 
alignment. We take this conservative approach as we currently do not 
distinguish between peeling for alignment for correctness or for 
performance.

A better codegen would be to make it skip to the scalar epilogue in case 
the main loop isn't entered when alignment peeling is required. However, 
that would require a more aggressive change to the codebase which we 
chose to avoid at this point of development.  We can revisit this option 
during stage 1 if we choose to.

Bootstrapped on aarch64-none-linux and regression tested on 
aarch64-none-elf.

gcc/ChangeLog:

     PR tree-optimization/105219
     * tree-vect-loop.cc (vect_epilogue_when_peeling_for_alignment): New 
function.
     (vect_analyze_loop): Use vect_epilogue_when_peeling_for_alignment 
to determine
     whether to vectorize epilogue.
     * testsuite/gcc.target/aarch64/pr105219.c: New.
     * testsuite/gcc.target/aarch64/pr105219-2.c: New.
     * testsuite/gcc.target/aarch64/pr105219-3.c: New.

diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219-2.c b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..c97d1dc100181b77af0766e08407e1e352f604fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx -fno-vect-cost-model" } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { "-march=armv8.2-a" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { "-mtune=thunderx" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
+/* PR 105219.  */
+int data[128];
+
+void __attribute((noipa))
+foo (int *data, int n)
+{
+  for (int i = 0; i < n; ++i)
+    data[i] = i;
+}
+
+int main()
+{
+  for (int start = 0; start < 16; ++start)
+    for (int n = 1; n < 3*16; ++n)
+      {
+        __builtin_memset (data, 0, sizeof (data));
+        foo (&data[start], n);
+        for (int j = 0; j < n; ++j)
+          if (data[start + j] != j)
+            __builtin_abort ();
+      }
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219-3.c b/gcc/testsuite/gcc.target/aarch64/pr105219-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..444352fc051b787369f6f1be6236d1ff0fc2d392
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr105219-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { "-march=armv8.2-a" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { "-mtune=thunderx" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
+/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx -fno-vect-cost-model -fdump-tree-vect-all" } */
+/* PR 105219.  */
+int data[128];
+
+void foo (void)
+{
+  for (int i = 0; i < 9; ++i)
+    data[i + 1] = i;
+}
+
+/* { dg-final { scan-tree-dump "EPILOGUE VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219.c b/gcc/testsuite/gcc.target/aarch64/pr105219.c
new file mode 100644
index 0000000000000000000000000000000000000000..bbdefb549f6a4e803852f69d20ce1ef9152a526c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr105219.c
@@ -0,0 +1,28 @@
+/* { dg-do run { target aarch64_sve128_hw } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { "-march=armv8.2-a+sve" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { "-mtune=thunderx" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-msve-vector-bits=*" } { "-msve-vector-bits=128" } } */
+/* { dg-options "-O3 -march=armv8.2-a+sve -msve-vector-bits=128 -mtune=thunderx" } */
+/* PR 105219.  */
+int a;
+char b[60];
+short c[18];
+short d[4][19];
+long long f;
+void e(int g, int h, short k[][19]) {
+  for (signed i = 0; i < 3; i += 2)
+    for (signed j = 1; j < h + 14; j++) {
+      b[i * 14 + j] = 1;
+      c[i + j] = k[2][j];
+      a = g ? k[i][j] : 0;
+    }
+}
+int main() {
+  e(9, 1, d);
+  for (long l = 0; l < 6; ++l)
+    for (long m = 0; m < 4; ++m)
+      f ^= b[l + m * 4];
+  if (f)
+    __builtin_abort ();
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d7bc34636bd52b2f67cdecd3dc16fcff684dba07..a23e6181dec8126bcb691ea9474095bf65483863 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2942,6 +2942,38 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared *shared,
   return opt_loop_vec_info::success (loop_vinfo);
 }
 
+/* Function vect_epilogue_when_peeling_for_alignment
+
+   PR 105219: If we are peeling for alignment in the prologue then we do not
+   vectorize the epilogue unless we are certain we will enter the main
+   vectorized loop.  This is to prevent entering the vectorized epilogue in
+   case there aren't enough iterations to enter the main loop.
+*/
+
+static bool
+vect_epilogue_when_peeling_for_alignment (loop_vec_info loop_vinfo)
+{
+  if (vect_use_loop_mask_for_alignment_p (loop_vinfo))
+    return true;
+
+  int prologue_peeling = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
+  if (prologue_peeling > 0 && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+    {
+      poly_uint64 niters_for_main
+	= upper_bound (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
+		       LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo));
+      niters_for_main
+	= upper_bound (niters_for_main,
+		       LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo));
+      niters_for_main += prologue_peeling;
+      if (maybe_le (LOOP_VINFO_INT_NITERS (loop_vinfo), niters_for_main))
+	return false;
+    }
+  else if (prologue_peeling < 0)
+    return false;
+  return true;
+}
+
 /* Function vect_analyze_loop.
 
    Apply a set of analyses on LOOP, and create a loop_vec_info struct
@@ -3151,7 +3183,8 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 		}
 	    }
 	  /* For now only allow one epilogue loop.  */
-	  if (first_loop_vinfo->epilogue_vinfos.is_empty ())
+	  if (first_loop_vinfo->epilogue_vinfos.is_empty ()
+	      && vect_epilogue_when_peeling_for_alignment (first_loop_vinfo))
 	    {
 	      first_loop_vinfo->epilogue_vinfos.safe_push (loop_vinfo);
 	      poly_uint64 th = LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo);