From patchwork Sun Nov 13 23:05:19 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: =?utf-8?q?Christoph_M=C3=BCllner?=
 <christoph.muellner@vrull.eu>
X-Patchwork-Id: 60561
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id BDB7F388A413
	for <patchwork@sourceware.org>; Sun, 13 Nov 2022 23:05:55 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com
 [IPv6:2a00:1450:4864:20::533])
 by sourceware.org (Postfix) with ESMTPS id B0167384F034
 for <gcc-patches@gcc.gnu.org>; Sun, 13 Nov 2022 23:05:31 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B0167384F034
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=vrull.eu
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu
Received: by mail-ed1-x533.google.com with SMTP id s12so14906761edd.5
 for <gcc-patches@gcc.gnu.org>; Sun, 13 Nov 2022 15:05:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=1ZutoYwdyLokk8NkpUbBx6XK3mlzqoA78Ap3nQT675Q=;
 b=sTO97jEIpdQiU8aOAkFhuFh2aJi0xjSaISwXGHWF2H4WAbSDah87QW74ivqNJzxdJJ
 XtI4saOL0FFnoBNZzbhlo3XSbrTzQl2nTSmdIroZLkjSIFkauy766j2Pw6S3aXn0P6c0
 6Mh0kUI8WB46oWeXBvEMFlt750QVsfsgvsoTBQch6nzPyVJmJrI5lLd+lmVqsN9aP1pE
 fC2OWyaAmeKqLMqlJV+ViUliltuX9vrkoI9s6rrlJij+OZB1bD/0O9R0aY9nUiiifkkr
 Kj/Rlmt+xGVPggpeP7gXmtOXKI0vpWzfYpYt1E5aWiHv9zyxEfW6P5LzeFXdRdL2Xu9e
 I/CQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=1ZutoYwdyLokk8NkpUbBx6XK3mlzqoA78Ap3nQT675Q=;
 b=cA/VagKLeOwTqpsWsN7DNBRL2iOScwfDaEbflVGxYj3XOcAcen3LRHRdKeuUaU4rn4
 cIfrWwoDrfFTPw1hBkoXLYd1A9uYtJceLhlE8KHyB07ACaDKrDAnvFGcqYNHjmugrKWu
 7SIHjDLXzKtdtLny6mZxWkrt5fV/0dOg7gqMWESSMvHsNHhjQh99hxaFKs8qyL64f2vo
 4B7wcFQ9o12QE6pMnxJbZblemxKl+4S2Ps1RTOw2lBLSdymXAv2++bAR461c0DefM0Cd
 e2me5t6gVhgD/84gry9MK39P0JZBhJkfqJZxxl8Z2eRRNVhIPTg2xThRiuEtCX5uHBgS
 COwg==
X-Gm-Message-State: ANoB5pl84N1xb+1FuNv5tw2EZhnfW5I1WezluG4aWdpo293tfFHs3xha
 BJG7UiIbLnjQJ1KkiUQBTGFhUt20DMFLNqm+
X-Google-Smtp-Source: 
 AA0mqf4vR6ZeQ0JG0b+qBMz+T5UgwM9EnAECkgrNibrQF1RE9qyCTPobaa+XHmk9j7mzUp5/2jiKaQ==
X-Received: by 2002:a05:6402:528f:b0:464:4a3f:510b with SMTP id
 en15-20020a056402528f00b004644a3f510bmr9635313edb.222.1668380730190;
 Sun, 13 Nov 2022 15:05:30 -0800 (PST)
Received: from beast.fritz.box (62-178-148-172.cable.dynamic.surfer.at.
 [62.178.148.172]) by smtp.gmail.com with ESMTPSA id
 ku3-20020a170907788300b007ae21bbdd3fsm2361281ejc.162.2022.11.13.15.05.29
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Sun, 13 Nov 2022 15:05:29 -0800 (PST)
From: Christoph Muellner <christoph.muellner@vrull.eu>
To: gcc-patches@gcc.gnu.org, Kito Cheng <kito.cheng@sifive.com>,
 Jim Wilson <jim.wilson.gcc@gmail.com>, Palmer Dabbelt <palmer@dabbelt.com>,
 Andrew Waterman <andrew@sifive.com>,
 Philipp Tomsich <philipp.tomsich@vrull.eu>,
 Jeff Law <jeffreyalaw@gmail.com>, Vineet Gupta <vineetg@rivosinc.com>
Cc: =?utf-8?q?Christoph_M=C3=BCllner?= <christoph.muellner@vrull.eu>
Subject: [PATCH 5/7] riscv: Use by-pieces to do overlapping accesses in
 block_move_straight
Date: Mon, 14 Nov 2022 00:05:19 +0100
Message-Id: <20221113230521.712693-6-christoph.muellner@vrull.eu>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20221113230521.712693-1-christoph.muellner@vrull.eu>
References: <20221113230521.712693-1-christoph.muellner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_MANYTO, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

From: Christoph Müllner <christoph.muellner@vrull.eu>

The current implementation of riscv_block_move_straight() emits a couple
of load-store pairs with maximum width (e.g. 8-byte for RV64).
The remainder is handed over to move_by_pieces(), which emits code based
target settings like slow_unaligned_access and overlap_op_by_pieces.

move_by_pieces() will emit overlapping memory accesses with maximum
width only if the given length exceeds the size of one access
(e.g. 15-bytes for 8-byte accesses).

This patch changes the implementation of riscv_block_move_straight()
such, that it preserves a remainder within the interval
[delta..2*delta) instead of [0..delta), so that overlapping memory
access may be emitted (if the requirements for them are given).

gcc/ChangeLog:

	* config/riscv/riscv-string.c (riscv_block_move_straight):
	  Adjust range for emitted load/store pairs.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
---
 gcc/config/riscv/riscv-string.cc              |  8 ++++----
 .../gcc.target/riscv/memcpy-overlapping.c     | 19 ++++++++-----------
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6882f0be269..1137df475be 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -57,18 +57,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
      the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     {
       regs[i] = gen_reg_rtx (mode);
       riscv_emit_move (regs[i], adjust_address (src, mode, offset));
     }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
@@ -166,7 +166,7 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
 
       if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
 	{
-	  riscv_block_move_straight (dest, src, INTVAL (length));
+	  riscv_block_move_straight (dest, src, hwi_length);
 	  return true;
 	}
       else if (optimize && align >= BITS_PER_WORD)
diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
index ffb7248bfd1..ef95bfb879b 100644
--- a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
+++ b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
@@ -25,26 +25,23 @@ COPY_N(15)
 /* Emits 2x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(19)
 
-/* Emits 3x ld and 3x sd.  */
+/* Emits 3x {ld,sd}.  */
 COPY_N(23)
 
 /* The by-pieces infrastructure handles up to 24 bytes.
    So the code below is emitted via cpymemsi/block_move_straight.  */
 
-/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+/* Emits 3x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(27)
 
-/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(29)
 
-/* Emits 3x {ld,sd} and 2x {lw,sw}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(31)
 
-/* { dg-final { scan-assembler-times "ld\t" 21 } } */
-/* { dg-final { scan-assembler-times "sd\t" 21 } } */
+/* { dg-final { scan-assembler-times "ld\t" 23 } } */
+/* { dg-final { scan-assembler-times "sd\t" 23 } } */
 
-/* { dg-final { scan-assembler-times "lw\t" 5 } } */
-/* { dg-final { scan-assembler-times "sw\t" 5 } } */
-
-/* { dg-final { scan-assembler-times "lbu\t" 2 } } */
-/* { dg-final { scan-assembler-times "sb\t" 2 } } */
+/* { dg-final { scan-assembler-times "lw\t" 3 } } */
+/* { dg-final { scan-assembler-times "sw\t" 3 } } */