From patchwork Tue Sep 16 08:04:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 120338 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DE3E2385700E for ; Tue, 16 Sep 2025 08:05:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DE3E2385700E Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=j/Ch+jHm X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) by sourceware.org (Postfix) with ESMTPS id 86732385743A for ; Tue, 16 Sep 2025 08:04:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 86732385743A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 86732385743A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1758009849; cv=none; b=JFV51MOp/9E94MdgWmckOwlfQnpWMnx+2Ft+Pomb1PhVKS/M0Jc8g0hToPpy+jpG1ig4dp95fQba+o/AIcZjoPkQsFPiTNYhgb4X2HeyZJuPEVmFrsRgnJlC3q/Y96Vr1He9Wdm5zxI9r89SOW0Olxw1BfoT2xVFJMftEujzvyI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1758009849; c=relaxed/simple; bh=R0FwYU1WInhMtTc2p7FE2w2Jy31hKwbe9cW2TM6AR5M=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=NTGydvVyHPZ1g6ZJlgcg1IAK9pPy+cACBvlvSSFXrkY8koA4BvJ3SM6sOsardEtj9AwRYAUpppNQYERovusNxPqHfFetYLPkLiGFR9DU0rWH9TmnOZZp2fK1lemY1SXf6rbhXk3HhzejStWGwDRDTxqWqENdSQvBkM9xLUKrW2U= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 86732385743A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1758009849; x=1789545849; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=R0FwYU1WInhMtTc2p7FE2w2Jy31hKwbe9cW2TM6AR5M=; b=j/Ch+jHmJu65jnUzBOKRPmoLxWdOjYIYXuWhQQLjiRwc718OKVPAga5G stK09y+rfQt0XjyWpFy7/sEzJKN6lEhWpt3gYSCSPEwtjUpPJPLOjEbDx 1mcAV5t7pevO8NK2Myp1ymP7SOdDXA8FBqo2fjDTGB82/WsdYMsZBeA3m ORBP9lMT1e7Du+BYoyXui2XI6CCAn2UQplXy/P80YHG0KkxCzL/h/Y0iQ 3wj/Yzf58ELyQerRx8jAC7y7WEg7Sd9guySc+lpuwaFoOwCODNkQXGG9N VchWGRY2Zr08bjg4ddKMMzMuIWN91w2Bgucem8GsCGWVbr4JSGgePjzfd g==; X-CSE-ConnectionGUID: PsJ5RKHNTUGRmxF+3rOy5A== X-CSE-MsgGUID: DH6IPehUSzWTFt4Km2LAPw== X-IronPort-AV: E=McAfee;i="6800,10657,11554"; a="70963371" X-IronPort-AV: E=Sophos;i="6.18,268,1751266800"; d="scan'208";a="70963371" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Sep 2025 01:04:08 -0700 X-CSE-ConnectionGUID: rNATHuohRCezC+6sp1Z3KA== X-CSE-MsgGUID: MnhHnmzqR0KagxU6feYAOA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.18,268,1751266800"; d="scan'208";a="175289991" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa008.fm.intel.com with ESMTP; 16 Sep 2025 01:04:08 -0700 Received: from jfel-spr-6155.jf.intel.com (jfel-spr-6155.jf.intel.com [10.165.119.109]) by scymds04.sc.intel.com (Postfix) with ESMTP id 582F520030E0; Tue, 16 Sep 2025 01:04:08 -0700 (PDT) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, "hongtao.liu" Subject: [PATCH v2] Remove SPR/GNR/DMR from avx512_move_by_pieces tune. Date: Tue, 16 Sep 2025 01:04:08 -0700 Message-Id: <20250916080408.2093777-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org From: "hongtao.liu" Update in V2: Only remove SPR/GNR/DMR from avx512_move_by_pieces. Align move_max with prefer_vector_width for SPR/GNR/DMR similar as below commit. commit 6ea25c041964bf63014fcf7bb68fb1f5a0a4e123 Author: liuhongt Date: Thu Aug 15 12:54:07 2024 +0800 Align ix86_{move_max,store_max} with vectorizer. When none of mprefer-vector-width, avx256_optimal/avx128_optimal, avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will set ix86_{move_max,store_max} as max available vector length except for AVX part. if (TARGET_AVX512F_P (opts->x_ix86_isa_flags) && TARGET_EVEX512_P (opts->x_ix86_isa_flags2)) opts->x_ix86_move_max = PVW_AVX512; else opts->x_ix86_move_max = PVW_AVX128; So for -mavx2, vectorizer will choose 256-bit for vectorization, but 128-bit is used for struct copy, there could be a potential STLF issue due to this "misalign". gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES): Remove SPR/GNR/DMR. gcc/testsuite/ChangeLog: * gcc.target/i386/pieces-memcpy-18.c: Use -mtune=znver5 instead of -mtune=sapphirerapids. * gcc.target/i386/pieces-memcpy-21.c: Ditto. --- gcc/config/i386/x86-tune.def | 5 +++-- gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c | 2 +- gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index a86cbad281c..1debc824afc 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -612,6 +612,8 @@ DEF_TUNE (X86_TUNE_AVX256_AVOID_VEC_PERM, /* X86_TUNE_AVX256_SPLIT_REGS: if true, AVX512 ops are split into two AVX256 ops. */ DEF_TUNE (X86_TUNE_AVX512_SPLIT_REGS, "avx512_split_regs", m_ZNVER4) +/* It's better to align MOVE_MAX with prefer_vector_width to reduce + risk of STLF stalls(small store followed by big load.) */ /* X86_TUNE_AVX256_MOVE_BY_PIECES: Optimize move_by_pieces with 256-bit AVX instructions. */ DEF_TUNE (X86_TUNE_AVX256_MOVE_BY_PIECES, "avx256_move_by_pieces", @@ -625,8 +627,7 @@ DEF_TUNE (X86_TUNE_AVX256_STORE_BY_PIECES, "avx256_store_by_pieces", /* X86_TUNE_AVX512_MOVE_BY_PIECES: Optimize move_by_pieces with 512-bit AVX instructions. */ DEF_TUNE (X86_TUNE_AVX512_MOVE_BY_PIECES, "avx512_move_by_pieces", - m_SAPPHIRERAPIDS | m_GRANITERAPIDS | m_GRANITERAPIDS_D - | m_DIAMONDRAPIDS | m_ZNVER4 | m_ZNVER5) + m_ZNVER4 | m_ZNVER5) /* X86_TUNE_AVX512_STORE_BY_PIECES: Optimize store_by_pieces with 512-bit AVX instructions. */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c index b15a0db9ff0..b4995ac0598 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -march=sapphirerapids" } */ +/* { dg-options "-O2 -march=znver5" } */ extern char *dst, *src; diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c index ef439f20f74..804a2989d64 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mtune=sapphirerapids -march=x86-64 -mavx2" } */ +/* { dg-options "-O2 -mtune=znver5 -march=x86-64 -mavx2" } */ extern char *dst, *src;