From patchwork Sun Nov 13 23:05:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christoph_M=C3=BCllner?= X-Patchwork-Id: 60562 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CA6923896C31 for ; Sun, 13 Nov 2022 23:05:55 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by sourceware.org (Postfix) with ESMTPS id 8B3E6385703A for ; Sun, 13 Nov 2022 23:05:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8B3E6385703A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-ed1-x535.google.com with SMTP id v17so14907504edc.8 for ; Sun, 13 Nov 2022 15:05:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BuvSS4rH1/NUDdF5Vg00Ta0jaFp+8X5cjKSDVCC2uQQ=; b=EBMTC2BjwBzb2rABcIctCYNDb1f7pvQCb4LRQJX1SFPykiDB+71ljblR8dE7cqBKN4 UQ19WNjS+PD8d6zmhXT2bhBQO7lvNwCUiWGB9tIKeol+tk8eOrtxflrSf6LM3O84cyn/ 9C2BS1eB+UYlk/ogE+VyEQQeyHB6Q3Ux9BfCVaKMazkiQ87w2gifUciS1FtTrTrWhI5V r8ZARfH+jO6If6XQ0hQd/LU9fv7Crw2b7ayVn9kNrHaKutnJDyP9pPNJ/nULE/+W2vej 9JNQ5fsFgRumlZuHod9XycxsCbMX5Wo2eGHhfJn0N7DJUWMFkdLbxMdeGoZfFy8j3PFR VOzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BuvSS4rH1/NUDdF5Vg00Ta0jaFp+8X5cjKSDVCC2uQQ=; b=v7Kuh064Bc/O55vMYyTACUY4h80WiiGUEIaiXU8HYBdNPBVoKHS3QjCq1ITxBPumQF 0tBc7QtUA+Jbk2yPWYMRs6+eJIGF9txOI+sgfaow4Y8b1ddITwq96q+boBmkmdAUvLiZ TtChqfMLquwv70SUbXQKvGfuw+OWSYItYoIlxQqP8h+40NjQy9i9Tm6ozOGxeCz16LIM 7CRbxhSWM1uJ2wZ/WeXSP3gLa4qNU0taIvmTAYyirpMbtu7nkQogHL1f8XzMhv5M9YYC SXFPVT5Yfr+FIuddM4in0Wrzdj/yRnEDQbR6/HXo51wRTsBCHma2okFmg1SPw7hJUGy8 HUIA== X-Gm-Message-State: ANoB5pnSyzSb9K0KMgmCa8BYikc/OUJSGloARCSBUEdo7IXaLQsbvoCR MBA5IkkkHH+eq3o7efssI0GaSCgPHbefdIr2 X-Google-Smtp-Source: AA0mqf5wGxn7qjYQ2+y8onSfdk9rDDmRj/Wowo2rs2XYQdPr3H+V03l0OtbaDL0Xkpngv3f8jaizqg== X-Received: by 2002:a05:6402:c89:b0:462:f2e2:53aa with SMTP id cm9-20020a0564020c8900b00462f2e253aamr9276983edb.384.1668380727989; Sun, 13 Nov 2022 15:05:27 -0800 (PST) Received: from beast.fritz.box (62-178-148-172.cable.dynamic.surfer.at. [62.178.148.172]) by smtp.gmail.com with ESMTPSA id ku3-20020a170907788300b007ae21bbdd3fsm2361281ejc.162.2022.11.13.15.05.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 13 Nov 2022 15:05:27 -0800 (PST) From: Christoph Muellner To: gcc-patches@gcc.gnu.org, Kito Cheng , Jim Wilson , Palmer Dabbelt , Andrew Waterman , Philipp Tomsich , Jeff Law , Vineet Gupta Cc: =?utf-8?q?Christoph_M=C3=BCllner?= Subject: [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param Date: Mon, 14 Nov 2022 00:05:17 +0100 Message-Id: <20221113230521.712693-4-christoph.muellner@vrull.eu> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221113230521.712693-1-christoph.muellner@vrull.eu> References: <20221113230521.712693-1-christoph.muellner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_MANYTO, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Christoph Müllner This patch adds the field overlap_op_by_pieces to the struct riscv_tune_param, which allows to enable the overlap_op_by_pieces infrastructure. gcc/ChangeLog: * config/riscv/riscv.c (struct riscv_tune_param): New field. (riscv_overlap_op_by_pieces): New function. (TARGET_OVERLAP_OP_BY_PIECES_P): Connect to riscv_overlap_op_by_pieces. Signed-off-by: Christoph Müllner --- gcc/config/riscv/riscv.cc | 17 +++++- .../gcc.target/riscv/memcpy-nonoverlapping.c | 54 +++++++++++++++++++ .../gcc.target/riscv/memcpy-overlapping.c | 50 +++++++++++++++++ .../gcc.target/riscv/memset-nonoverlapping.c | 45 ++++++++++++++++ .../gcc.target/riscv/memset-overlapping.c | 43 +++++++++++++++ 5 files changed, 208 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c create mode 100644 gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c create mode 100644 gcc/testsuite/gcc.target/riscv/memset-nonoverlapping.c create mode 100644 gcc/testsuite/gcc.target/riscv/memset-overlapping.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index a0c00cfb66f..7357cf51cdf 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -243,6 +243,7 @@ struct riscv_tune_param unsigned short fmv_cost; bool slow_unaligned_access; unsigned int fusible_ops; + bool overlap_op_by_pieces; }; /* Information about one micro-arch we know about. */ @@ -331,6 +332,7 @@ static const struct riscv_tune_param rocket_tune_info = { 8, /* fmv_cost */ true, /* slow_unaligned_access */ RISCV_FUSE_NOTHING, /* fusible_ops */ + false, /* overlap_op_by_pieces */ }; /* Costs to use when optimizing for Sifive 7 Series. */ @@ -346,6 +348,7 @@ static const struct riscv_tune_param sifive_7_tune_info = { 8, /* fmv_cost */ true, /* slow_unaligned_access */ RISCV_FUSE_NOTHING, /* fusible_ops */ + false, /* overlap_op_by_pieces */ }; /* Costs to use when optimizing for T-HEAD c906. */ @@ -361,6 +364,7 @@ static const struct riscv_tune_param thead_c906_tune_info = { 8, /* fmv_cost */ false, /* slow_unaligned_access */ RISCV_FUSE_NOTHING, /* fusible_ops */ + false, /* overlap_op_by_pieces */ }; /* Costs to use when optimizing for size. */ @@ -376,6 +380,7 @@ static const struct riscv_tune_param optimize_size_tune_info = { 8, /* fmv_cost */ false, /* slow_unaligned_access */ RISCV_FUSE_NOTHING, /* fusible_ops */ + false, /* overlap_op_by_pieces */ }; /* Costs to use when optimizing for Ventana Micro VT1. */ @@ -393,7 +398,8 @@ static const struct riscv_tune_param ventana_vt1_tune_info = { ( RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH | /* fusible_ops */ RISCV_FUSE_ZEXTWS | RISCV_FUSE_LDINDEXED | RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI | - RISCV_FUSE_LUI_LD | RISCV_FUSE_AUIPC_LD ) + RISCV_FUSE_LUI_LD | RISCV_FUSE_AUIPC_LD ), + true, /* overlap_op_by_pieces */ }; static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *); @@ -6444,6 +6450,12 @@ riscv_slow_unaligned_access (machine_mode, unsigned int) return riscv_slow_unaligned_access_p; } +static bool +riscv_overlap_op_by_pieces (void) +{ + return tune_param->overlap_op_by_pieces; +} + /* Implement TARGET_CAN_CHANGE_MODE_CLASS. */ static bool @@ -6974,6 +6986,9 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor, #undef TARGET_SLOW_UNALIGNED_ACCESS #define TARGET_SLOW_UNALIGNED_ACCESS riscv_slow_unaligned_access +#undef TARGET_OVERLAP_OP_BY_PIECES_P +#define TARGET_OVERLAP_OP_BY_PIECES_P riscv_overlap_op_by_pieces + #undef TARGET_SECONDARY_MEMORY_NEEDED #define TARGET_SECONDARY_MEMORY_NEEDED riscv_secondary_memory_needed diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c b/gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c new file mode 100644 index 00000000000..1c99e13fc26 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c @@ -0,0 +1,54 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=sifive-u74 -march=rv64gc -mabi=lp64" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz" "-Og" } } */ + + +#define COPY_N(N) \ +void copy##N (char *src, char *dst) \ +{ \ + dst = __builtin_assume_aligned (dst, 4096); \ + src = __builtin_assume_aligned (src, 4096); \ + __builtin_memcpy (dst, src, N); \ +} + +/* Emits 1x {ld,sd} and 1x {lhu,lbu,sh,sb}. */ +COPY_N(11) + +/* Emits 1x {ld,sd} and 1x {lw,lbu,sw,sb}. */ +COPY_N(13) + +/* Emits 1x {ld,sd} and 1x {lw,lhu,sw,sh}. */ +COPY_N(14) + +/* Emits 1x {ld,sd} and 1x {lw,lhu,lbu,sw,sh,sb}. */ +COPY_N(15) + +/* Emits 2x {ld,sd} and 1x {lhu,lbu,sh,sb}. */ +COPY_N(19) + +/* Emits 2x {ld,sd} and 1x {lw,lhu,lbu,sw,sh,sb}. */ +COPY_N(23) + +/* The by-pieces infrastructure handles up to 24 bytes. + So the code below is emitted via cpymemsi/block_move_straight. */ + +/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}. */ +COPY_N(27) + +/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}. */ +COPY_N(29) + +/* Emits 3x {ld,sd} and 1x {lw,lhu,lbu,sw,sh,sb}. */ +COPY_N(31) + +/* { dg-final { scan-assembler-times "ld\t" 17 } } */ +/* { dg-final { scan-assembler-times "sd\t" 17 } } */ + +/* { dg-final { scan-assembler-times "lw\t" 6 } } */ +/* { dg-final { scan-assembler-times "sw\t" 6 } } */ + +/* { dg-final { scan-assembler-times "lhu\t" 7 } } */ +/* { dg-final { scan-assembler-times "sh\t" 7 } } */ + +/* { dg-final { scan-assembler-times "lbu\t" 8 } } */ +/* { dg-final { scan-assembler-times "sb\t" 8 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c new file mode 100644 index 00000000000..ffb7248bfd1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ventana-vt1 -march=rv64gc -mabi=lp64" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz" "-Og" } } */ + +#define COPY_N(N) \ +void copy##N (char *src, char *dst) \ +{ \ + dst = __builtin_assume_aligned (dst, 4096); \ + src = __builtin_assume_aligned (src, 4096); \ + __builtin_memcpy (dst, src, N); \ +} + +/* Emits 1x {ld,sd} and 1x {lw,sw}. */ +COPY_N(11) + +/* Emits 2x {ld,sd}. */ +COPY_N(13) + +/* Emits 2x {ld,sd}. */ +COPY_N(14) + +/* Emits 2x {ld,sd}. */ +COPY_N(15) + +/* Emits 2x {ld,sd} and 1x {lw,sw}. */ +COPY_N(19) + +/* Emits 3x ld and 3x sd. */ +COPY_N(23) + +/* The by-pieces infrastructure handles up to 24 bytes. + So the code below is emitted via cpymemsi/block_move_straight. */ + +/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}. */ +COPY_N(27) + +/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}. */ +COPY_N(29) + +/* Emits 3x {ld,sd} and 2x {lw,sw}. */ +COPY_N(31) + +/* { dg-final { scan-assembler-times "ld\t" 21 } } */ +/* { dg-final { scan-assembler-times "sd\t" 21 } } */ + +/* { dg-final { scan-assembler-times "lw\t" 5 } } */ +/* { dg-final { scan-assembler-times "sw\t" 5 } } */ + +/* { dg-final { scan-assembler-times "lbu\t" 2 } } */ +/* { dg-final { scan-assembler-times "sb\t" 2 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/memset-nonoverlapping.c b/gcc/testsuite/gcc.target/riscv/memset-nonoverlapping.c new file mode 100644 index 00000000000..c4311c7a8d0 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/memset-nonoverlapping.c @@ -0,0 +1,45 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=sifive-u74 -march=rv64gc -mabi=lp64" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz" "-Og" } } */ + +#define ZERO_N(N) \ +void zero##N (char *dst) \ +{ \ + dst = __builtin_assume_aligned (dst, 4096); \ + __builtin_memset (dst, 0, N); \ +} + +/* Emits 1x sd and 1x {sh,sb}. */ +ZERO_N(11) + +/* Emits 1x sd and 1x {sw,sb}. */ +ZERO_N(13) + +/* Emits 1x sd and 1x {sw,sh}. */ +ZERO_N(14) + +/* Emits 1x sd and 1x {sw,sh,sb}. */ +ZERO_N(15) + +/* Emits 2x sd and 1x {sh,sb}. */ +ZERO_N(19) + +/* Emits 2x sd and 1x {sw,sh,sb}. */ +ZERO_N(23) + +/* The by-pieces infrastructure handles up to 24 bytes. + So the code below is emitted via cpymemsi/block_move_straight. */ + +/* Emits 3x sd and 1x {sh,sb}. */ +ZERO_N(27) + +/* Emits 3x sd and 1x {sw,sb}. */ +ZERO_N(29) + +/* Emits 3x sd and 1x {sw,sh,sb}. */ +ZERO_N(31) + +/* { dg-final { scan-assembler-times "sd\t" 17 } } */ +/* { dg-final { scan-assembler-times "sw\t" 6 } } */ +/* { dg-final { scan-assembler-times "sh\t" 7 } } */ +/* { dg-final { scan-assembler-times "sb\t" 8 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/memset-overlapping.c b/gcc/testsuite/gcc.target/riscv/memset-overlapping.c new file mode 100644 index 00000000000..793766b5262 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/memset-overlapping.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ventana-vt1 -march=rv64gc -mabi=lp64" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz" "-Og" } } */ + +#define ZERO_N(N) \ +void zero##N (char *dst) \ +{ \ + dst = __builtin_assume_aligned (dst, 4096); \ + __builtin_memset (dst, 0, N); \ +} + +/* Emits 1x sd and 1x sw. */ +ZERO_N(11) + +/* Emits 2x sd. */ +ZERO_N(13) + +/* Emits 2x sd. */ +ZERO_N(14) + +/* Emits 2x sd. */ +ZERO_N(15) + +/* Emits 2x sd and 1x sw. */ +ZERO_N(19) + +/* Emits 3x sd. */ +ZERO_N(23) + +/* The by-pieces infrastructure handles up to 24 bytes. + So the code below is emitted via cpymemsi/block_move_straight. */ + +/* Emits 3x sd and 1x sw. */ +ZERO_N(27) + +/* Emits 4x sd. */ +ZERO_N(29) + +/* Emits 4x sd. */ +ZERO_N(31) + +/* { dg-final { scan-assembler-times "sd\t" 23 } } */ +/* { dg-final { scan-assembler-times "sw\t" 3 } } */