From patchwork Wed Mar 2 21:18:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 51513 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 59CB3385803D for ; Wed, 2 Mar 2022 21:18:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 59CB3385803D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1646255921; bh=ppciM6VfL49I/shyGeo9SHXMm7nV5VNcqZxLJqSAj3U=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=DGESlhKybWi8xxTV4Gj1W69uo5Yf3rGRREcS3UEzxaVrNpjBnDG3U+rKF4+2hFXXL 7B6gwmjC8QSE7zATqxS02VPNF+qRsD9D6wOcMtMbIvpBB2TPKsK3h91w0Kz5RxLqPS Bfs7JmT3/repa7VQcte759vfl846dqHsaQHRMJI4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by sourceware.org (Postfix) with ESMTPS id 5B2883858D39 for ; Wed, 2 Mar 2022 21:18:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5B2883858D39 Received: by mail-pj1-x1036.google.com with SMTP id gb21so2899299pjb.5 for ; Wed, 02 Mar 2022 13:18:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ppciM6VfL49I/shyGeo9SHXMm7nV5VNcqZxLJqSAj3U=; b=Y+vrW2r51En9jkmeMwGhwbaF73NjSyhU6BgE7nyMSdD84/cWQd3XYc+Ah30rq3Jh7A EGInS4Wv9Je0Q39jliMMJMWXTexv9NkfKSDjd77yWzIWqjx8ByjI+hFiNJdhhkFtK3tx EQ28GoHUB6MVlVTAo9JLjD3eM8WnZiHRKle3JqX/g+LDQMsSzwO9/r99TH5PKG+tmEAL vC+BfOE9VYKU/jHvmsnOQKWxPSr2Fu2BBgXGFrLkgCUqBLNEjh6lxMvCpkFU3mX4ybkB pcfzgZUqPc05d7vEBxr+hdfsZluawZpDhSJIzwwmM12XG9/7M1bPNbdKOXhCStoWErf8 4qiA== X-Gm-Message-State: AOAM531JrOwVKCdq+VKhLpAdediPbZtLhCK4jwrqaE0K31+TC6eUpu8O GuOlHDI4rwMHdlkg5SjQyeU= X-Google-Smtp-Source: ABdhPJzXicLhZmJiOOy6zdBa3WTvXl8Vos9XFSz6PwzQzTMqc3JW0K+wwhI/uSuFkovu8lRMziRYkQ== X-Received: by 2002:a17:902:aa86:b0:14d:a4aa:e6f5 with SMTP id d6-20020a170902aa8600b0014da4aae6f5mr32971414plr.8.1646255889234; Wed, 02 Mar 2022 13:18:09 -0800 (PST) Received: from gnu-tgl-3.localdomain ([172.58.35.88]) by smtp.gmail.com with ESMTPSA id h7-20020a056a00170700b004e01663f59bsm95800pfc.117.2022.03.02.13.18.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Mar 2022 13:18:08 -0800 (PST) Received: by gnu-tgl-3.localdomain (Postfix, from userid 1000) id 6E25AC028A; Wed, 2 Mar 2022 13:18:07 -0800 (PST) Date: Wed, 2 Mar 2022 13:18:07 -0800 To: Richard Biener Subject: [PATCH v2] Add TARGET_MOVE_WITH_MODE_P Message-ID: References: <20220301224100.910199-1-hjl.tools@gmail.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-3028.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Gcc-patches" From: "H.J. Lu" Reply-To: "H.J. Lu" Cc: GCC Patches , Richard Earnshaw Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote: > On Tue, Mar 1, 2022 at 11:41 PM H.J. Lu via Gcc-patches > wrote: > > > > Add TARGET_FOLD_MEMCPY_MAX for the maximum number of bytes to fold memcpy. > > The default is > > > > MOVE_MAX * MOVE_RATIO (optimize_function_for_size_p (cfun)) > > > > For x86, it is MOVE_MAX to restore the old behavior before > > I know we've discussed this to death in the PR, I just want to repeat here > that the GIMPLE folding expects to generate a single load and a single > store (that is what it does on the GIMPLE level) which is why MOVE_MAX > was chosen originally (it's documented to what a "single instruction" does). > In practice MOVE_MAX does not seem to cover vector register sizes > so Richard pulled MOVE_RATIO which is really intended to cover > the case of using multiple instructions for moving memory (but then I > don't remember whether for the ARM case the single load/store GIMPLE > will be expanded to multiple load/store instructions). > > TARGET_FOLD_MEMCPY_MAX sounds like a stop-gap solution, > being very specific for memcpy folding (we also fold memmove btw). > > There is also MOVE_MAX_PIECES which _might_ be more appropriate > than MOVE_MAX here and still honor the idea of single instructions. > Now neither arm nor aarch64 define this and it defaults to MOVE_MAX, > not MOVE_MAX * MOVE_RATIO. > > So if we need a new hook then that hook should at least get the > 'speed' argument of MOVE_RATIO and it should get a better name. > > I still think that it should be possible to improve the insn check to > avoid use of "disabled" modes, maybe that's also a point to add > a new hook like .move_with_mode_p or so? To quote, we do Here is the v2 patch to add TARGET_MOVE_WITH_MODE_P. > > scalar_int_mode mode; > if (int_mode_for_size (ilen * 8, 0).exists (&mode) > && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8 > && have_insn_for (SET, mode) > /* If the destination pointer is not aligned we must be able > to emit an unaligned store. */ > && (dest_align >= GET_MODE_ALIGNMENT (mode) > || !targetm.slow_unaligned_access (mode, dest_align) > || (optab_handler (movmisalign_optab, mode) > != CODE_FOR_nothing))) > > where I understand the ISA is enabled and if the user explicitely > uses it that's OK but -mprefer-avx128 should tell GCC to never > generate AVX256 code where the user was not explicitely using it > (still for example glibc might happily use AVX256 code to implement > the memcpy we are folding!) > > Note the BB vectorizer also might end up with using AVX256 because > in places it also relies on optab queries and the vector_mode_supported_p > check (but the memcpy folding uses the fake integer modes). So > x86 might need to implement the related_mode hook to avoid "auto"-using > a larger vector mode which the default implementation would happily do. > > Richard. OK for master? Thanks. H.J. --- Add TARGET_MOVE_WITH_MODE_P to return true if move with mode can be generated implicitly. The default definition returns true. The x86 version returns true if the mode size <= MOVE_MAX, which is the max number of bytes we can move in one reasonably fast instruction. gcc/ PR target/103393 * gimple-fold.cc (gimple_fold_builtin_memory_op): Call targetm.move_with_mode_p to check if move with mode can be generated implicitly. * target.def: Add move_with_mode_p. * targhooks.cc (default_move_with_mode_p): New. * targhooks.h (default_move_with_mode_p): Likewise. * config/i386/i386.cc (ix86_move_with_mode_p): New. (TARGET_MOVE_WITH_MODE_P): Likewise. * doc/tm.texi.in: Add TARGET_MOVE_WITH_MODE_P. * doc/tm.texi: Regenerate. gcc/testsuite/ PR target/103393 * gcc.target/i386/pr103393-1.c: New test. * gcc.target/i386/pr103393-2.c: Likewise. * gcc.target/i386/pr103393-3.c: Likewise. * gcc.target/i386/pr103393-4.c: Likewise. * gcc.target/i386/pr103393-5.c: Likewise. --- gcc/config/i386/i386.cc | 12 ++++++++++++ gcc/doc/tm.texi | 5 +++++ gcc/doc/tm.texi.in | 2 ++ gcc/gimple-fold.cc | 1 + gcc/target.def | 7 +++++++ gcc/testsuite/gcc.target/i386/pr103393-1.c | 16 ++++++++++++++++ gcc/testsuite/gcc.target/i386/pr103393-2.c | 16 ++++++++++++++++ gcc/testsuite/gcc.target/i386/pr103393-3.c | 16 ++++++++++++++++ gcc/testsuite/gcc.target/i386/pr103393-4.c | 16 ++++++++++++++++ gcc/testsuite/gcc.target/i386/pr103393-5.c | 16 ++++++++++++++++ 10 files changed, 107 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-5.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index b2bf90576d5..68a2c59220c 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -23918,6 +23918,15 @@ ix86_push_rounding (poly_int64 bytes) return ROUND_UP (bytes, UNITS_PER_WORD); } +/* Implement the TARGET_MOVE_WITH_MODE_P hook. Return true if move + with MODE can be generated implicitly. */ + +static bool +ix86_move_with_mode_p (machine_mode mode) +{ + return GET_MODE_SIZE (mode) <= MOVE_MAX; +} + /* Target-specific selftests. */ #if CHECKING_P @@ -24735,6 +24744,9 @@ static bool ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED) #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests #endif /* #if CHECKING_P */ +#undef TARGET_MOVE_WITH_MODE_P +#define TARGET_MOVE_WITH_MODE_P ix86_move_with_mode_p + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-i386.h" diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 49864dd79f8..9d5058610e1 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11924,6 +11924,11 @@ statement holding the function call. Returns true if any change was made to the GIMPLE stream. @end deftypefn +@deftypefn {Target Hook} bool TARGET_MOVE_WITH_MODE_P (machine_mode @var{mode}) +This target hook returns true if move with mode @var{mode} can be +generated implicitly. The default definition returns true. +@end deftypefn + @deftypefn {Target Hook} int TARGET_COMPARE_VERSION_PRIORITY (tree @var{decl1}, tree @var{decl2}) This hook is used to compare the target attributes in two functions to determine which function's features get higher priority. This is used diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 95e5e341f07..e9158ab90c6 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7924,6 +7924,8 @@ to by @var{ce_info}. @hook TARGET_GIMPLE_FOLD_BUILTIN +@hook TARGET_MOVE_WITH_MODE_P + @hook TARGET_COMPARE_VERSION_PRIORITY @hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc index c9179abb27e..93267eeabb2 100644 --- a/gcc/gimple-fold.cc +++ b/gcc/gimple-fold.cc @@ -1007,6 +1007,7 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, if (int_mode_for_size (ilen * 8, 0).exists (&mode) && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8 && have_insn_for (SET, mode) + && targetm.move_with_mode_p (mode) /* If the destination pointer is not aligned we must be able to emit an unaligned store. */ && (dest_align >= GET_MODE_ALIGNMENT (mode) diff --git a/gcc/target.def b/gcc/target.def index 72c2e1ef756..041d944cb38 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2489,6 +2489,13 @@ was made to the GIMPLE stream.", bool, (gimple_stmt_iterator *gsi), hook_bool_gsiptr_false) +DEFHOOK +(move_with_mode_p, + "This target hook returns true if move with mode @var{mode} can be\n\ +generated implicitly. The default definition returns true.", + bool, (machine_mode mode), + hook_bool_mode_true) + /* Target hook is used to compare the target attributes in two functions to determine which function's features get higher priority. This is used during function multi-versioning to figure out the order in which two diff --git a/gcc/testsuite/gcc.target/i386/pr103393-1.c b/gcc/testsuite/gcc.target/i386/pr103393-1.c new file mode 100644 index 00000000000..2091d33c45f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103393-1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mavx -mprefer-avx128 -mprefer-vector-width=128" } */ + +struct TestData { + float arr[8]; +}; + +void +cpy (struct TestData *s1, struct TestData *s2 ) +{ + for(int i=0; i<8; ++i) + s1->arr[i] = s2->arr[i]; +} + +/* { dg-final { scan-assembler "jmp\[\\t \]+_?memmove" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "call\[\\t \]+_?memmove" { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr103393-2.c b/gcc/testsuite/gcc.target/i386/pr103393-2.c new file mode 100644 index 00000000000..4ad8c8bf379 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103393-2.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake-avx512" } */ + +struct TestData { + float arr[8]; +}; + +void +cpy (struct TestData *s1, struct TestData *s2 ) +{ + for(int i=0; i<16; ++i) + s1->arr[i] = s2->arr[i]; +} + +/* { dg-final { scan-assembler "jmp\[\\t \]+_?memmove" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "call\[\\t \]+_?memmove" { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr103393-3.c b/gcc/testsuite/gcc.target/i386/pr103393-3.c new file mode 100644 index 00000000000..7a03160e512 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103393-3.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=sapphirerapids" } */ + +struct TestData { + float arr[8]; +}; + +void +cpy (struct TestData *s1, struct TestData *s2 ) +{ + for(int i=0; i<16; ++i) + s1->arr[i] = s2->arr[i]; +} + +/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 2 } } */ +/* { dg-final { scan-assembler-not "\[\\t \]+_?memmove" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr103393-4.c b/gcc/testsuite/gcc.target/i386/pr103393-4.c new file mode 100644 index 00000000000..ae2599f6411 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103393-4.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake-avx512 -mstore-max=512" } */ + +struct TestData { + float arr[8]; +}; + +void +cpy (struct TestData *s1, struct TestData *s2 ) +{ + for(int i=0; i<16; ++i) + s1->arr[i] = s2->arr[i]; +} + +/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 2 } } */ +/* { dg-final { scan-assembler-not "\[\\t \]+_?memmove" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr103393-5.c b/gcc/testsuite/gcc.target/i386/pr103393-5.c new file mode 100644 index 00000000000..f9173684212 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103393-5.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake-avx512 -mmove-max=512" } */ + +struct TestData { + float arr[8]; +}; + +void +cpy (struct TestData *s1, struct TestData *s2 ) +{ + for(int i=0; i<16; ++i) + s1->arr[i] = s2->arr[i]; +} + +/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 2 } } */ +/* { dg-final { scan-assembler-not "\[\\t \]+_?memmove" } } */