From patchwork Mon Feb 10 05:43:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liu, Hongtao" X-Patchwork-Id: 106221 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3195C385840D for ; Mon, 10 Feb 2025 05:49:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3195C385840D Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=TruCCXYI X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) by sourceware.org (Postfix) with ESMTPS id A4FD13858CDA for ; Mon, 10 Feb 2025 05:43:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A4FD13858CDA Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A4FD13858CDA Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1739166207; cv=none; b=ll26MwId7hQjPF5NC+OCt02AZ99NacY2mUHJXrE3pLCBZaa9V3NgLwlfd72YNoiGUzLOkBaYkar4BdKohYa4+uJ3Gr+6YvmMLr5KnsexWWwPFZtn7ODbXNr8UusixEfl5g+4NApGON021w9Sw4VMHQBMAgmiTpv30/DW21ER1OQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1739166207; c=relaxed/simple; bh=fNIqNUilre9BC6yE+EkC1ut1SWK7FYoDRnXQeqpzKng=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=qRTUMJO/9NYr5nP3l0dp3/pKXWAEtl5y88VD15HtaGVhB6QmofqZFdE7vTMcZbH2WuaKvPS6KR+ZwQ+KF71MaMuwuSEd/cco8SqEgw+7z7mEbzp/K9ScaCm8kJXgzD8HY6/XWsNb6+4DOJt822VZ9p+x08Feg9APL+Qd+5VdEPY= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A4FD13858CDA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739166208; x=1770702208; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fNIqNUilre9BC6yE+EkC1ut1SWK7FYoDRnXQeqpzKng=; b=TruCCXYIB0W4Y4M0KKJmH1FzA+giFUTxuAGh7bmIEpboZMjfRzXuob0J Bs9Cb3xMmBGedMIgAK1eUJ3crG5KbZWDG4p9Kl2Jg0B/AQT/SpTuZfwtZ oDrzNW/n0SySD+gzt+gtCpRvNE77qGm5gOedQX+nr9T3GIgTi+iM2q4lz 3Bibe41j5uPS1HrnEook3pgKz1qIZqYXjqWaxaquxSYx18h14IIjC57mp foQ4akWv/pSz//Sou6f3dQEVRMVSYJgwJWPFrCep+glFBuxywR45JqZLD nycqfx2RlsldGAvR9PuvlaXcujhrY08mEGXEyG+UF2j6GXmSzyseM/M4L w==; X-CSE-ConnectionGUID: E7sLXvxSTrSpz6JTcrFNvg== X-CSE-MsgGUID: /sSyU5DVQLGN1xV5jaRyvg== X-IronPort-AV: E=McAfee;i="6700,10204,11340"; a="49968388" X-IronPort-AV: E=Sophos;i="6.13,273,1732608000"; d="scan'208";a="49968388" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Feb 2025 21:43:21 -0800 X-CSE-ConnectionGUID: cUMjfoxJRLCOs6pMxmD4+A== X-CSE-MsgGUID: ooAWAK2qRW22x9Q1hXNEHw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,273,1732608000"; d="scan'208";a="112016229" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa007.fm.intel.com with ESMTP; 09 Feb 2025 21:43:21 -0800 Received: from jfel-spr-6155.jf.intel.com (jfel-spr-6155.jf.intel.com [10.7.183.102]) by scymds04.sc.intel.com (Postfix) with ESMTP id 694B72003AE0; Sun, 9 Feb 2025 21:43:20 -0800 (PST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com Subject: [PATCH 1/3] Use NO_REGS in cost calculation when the preferred register class are not known yet. Date: Sun, 9 Feb 2025 21:43:18 -0800 Message-Id: <20250210054320.1014838-2-hongtao.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250210054320.1014838-1-hongtao.liu@intel.com> References: <20250210054320.1014838-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org gcc/ChangeLog: PR rtl-optimization/108707 * ira-costs.cc (scan_one_insn): Use NO_REGS instead of GENERAL_REGS when preferred reg_class is not known. gcc/testsuite/ChangeLog: * gcc.target/i386/pr108707.c: New test. (cherry picked from commit 0368d169492017cfab5622d38b15be94154d458c) --- gcc/ira-costs.cc | 5 ++++- gcc/testsuite/gcc.target/i386/pr108707.c | 16 ++++++++++++++++ 2 files changed, 20 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc index bdb1356af91..003963f2a19 100644 --- a/gcc/ira-costs.cc +++ b/gcc/ira-costs.cc @@ -1572,7 +1572,10 @@ scan_one_insn (rtx_insn *insn) && (! ira_use_lra_p || ! pic_offset_table_rtx || ! contains_symbol_ref_p (XEXP (note, 0)))) { - enum reg_class cl = GENERAL_REGS; + /* Costs for NO_REGS are used in cost calculation on the + 1st pass when the preferred register classes are not + known yet. In this case we take the best scenario. */ + enum reg_class cl = NO_REGS; rtx reg = SET_DEST (set); int num = COST_INDEX (REGNO (reg)); diff --git a/gcc/testsuite/gcc.target/i386/pr108707.c b/gcc/testsuite/gcc.target/i386/pr108707.c new file mode 100644 index 00000000000..6405cfe7cdc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr108707.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-not {(?n)vfmadd[1-3]*ps.*\(} { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times {(?n)vfmadd[1-3]*ps[ \t]*} 3 } } */ + +#include + +void +foo (__m512 pv, __m512 a, __m512 b, __m512 c, + __m512* pdest, __m512* p1) +{ + __m512 t = *p1; + pdest[0] = _mm512_fmadd_ps (t, pv, a); + pdest[1] = _mm512_fmadd_ps (t, pv, b); + pdest[2] = _mm512_fmadd_ps (t, pv, c); +} From patchwork Mon Feb 10 05:43:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liu, Hongtao" X-Patchwork-Id: 106220 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B8A1B385840D for ; Mon, 10 Feb 2025 05:45:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B8A1B385840D Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=FeTILQz/ X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) by sourceware.org (Postfix) with ESMTPS id 3097C3858C62 for ; Mon, 10 Feb 2025 05:43:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3097C3858C62 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3097C3858C62 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1739166208; cv=none; b=bDy93L6XlUshz6zH8x09o/DpZ7M915HfDgGJ6oTVhQ0KBxyfo4CeWKXvwfUyM06bKOLR/ZSd6g+ydh2WfibaFZ0ebkiAne7TmMvQAUPL9XIUo9ufd9icLmJcKX582DOFtav5LXBSV0verNZj8LhxufSf5+nzOFbEauhfFd2E/Xk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1739166208; c=relaxed/simple; bh=GFtOVbveTDG2Y4idcA2OUsMzwrA9Mw6CWLz7ybyxr6o=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=ITZGvQwVjXQ+UkcCi8SGabJcWmsATH4swX24BteTH7zFwqmutVkwwq5LfwRhXPk9/4Y8sDJT7E9yqEdAZ3z2YUol7RqoAX8ZRENo0O2WbtHirgCRUa4OdeVspWFrVXNmiozQTHtYhj/K2Q5bed9dKB3EShb1nskUpZJRPgDX9Pk= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3097C3858C62 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739166208; x=1770702208; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GFtOVbveTDG2Y4idcA2OUsMzwrA9Mw6CWLz7ybyxr6o=; b=FeTILQz/MRGODbZdMoNq4VxqtG4RtmAiYSgzvUCxLc/QNN8r8ouUwHfg HO12UrhI3daPL2I1ZGe6TL3ImzKpr+/gfi2nbyxhaw2TGI2lntWnZSM3R VptA2zkSB58EVgG9rqySaN1Murbche8dvdZXanb/ggjOJrWvUTf3trcvb MGCzSX/9tcBuG9uLW9s5Txa1Q8VqOLdnfe5OJdv4o3ny//1bouYEsd977 uYOAVB5dfsfCqO72wbZ4KAIwqXe85KrUscjXYmrufxeJGkq3c+fDWobC7 AF1EnDb/PLLEZjPKkT3y5ntYX6F5YSLsPdvAtbChY6pZYlAGP8vZ5hIRE Q==; X-CSE-ConnectionGUID: iyXq0ulxSniAwEv9I5GY+Q== X-CSE-MsgGUID: 7WSlnKlPQAep8fxqx7bPbA== X-IronPort-AV: E=McAfee;i="6700,10204,11340"; a="49968390" X-IronPort-AV: E=Sophos;i="6.13,273,1732608000"; d="scan'208";a="49968390" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Feb 2025 21:43:21 -0800 X-CSE-ConnectionGUID: QKZTf6FqR9GlsGtYmcVtAg== X-CSE-MsgGUID: k0c5cfcWTEG9zirCz6q1xA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,273,1732608000"; d="scan'208";a="112016230" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa007.fm.intel.com with ESMTP; 09 Feb 2025 21:43:21 -0800 Received: from jfel-spr-6155.jf.intel.com (jfel-spr-6155.jf.intel.com [10.7.183.102]) by scymds04.sc.intel.com (Postfix) with ESMTP id 7E5A72003ADF; Sun, 9 Feb 2025 21:43:20 -0800 (PST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com Subject: [PATCH 2/3] Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode. Date: Sun, 9 Feb 2025 21:43:19 -0800 Message-Id: <20250210054320.1014838-3-hongtao.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250210054320.1014838-1-hongtao.liu@intel.com> References: <20250210054320.1014838-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost calculation when the preferred register class are not known yet. It regressed powerpc PR109610 and PR109858, it looks too aggressive to use NO_REGS when mode can be allocated with GENERAL_REGS. The patch takes a step back, still use GENERAL_REGS when hard_regno_mode_ok for mode and GENERAL_REGS, otherwise uses NO_REGS. gcc/ChangeLog: PR target/109610 PR target/109858 * ira-costs.cc (scan_one_insn): Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode, otherwise still use GENERAL_REGS. (cherry picked from commit 4fb66b2329319e9b47e89200d613b6f741a114fc) --- gcc/ira-costs.cc | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc index 003963f2a19..d9e700e8947 100644 --- a/gcc/ira-costs.cc +++ b/gcc/ira-costs.cc @@ -1572,12 +1572,16 @@ scan_one_insn (rtx_insn *insn) && (! ira_use_lra_p || ! pic_offset_table_rtx || ! contains_symbol_ref_p (XEXP (note, 0)))) { - /* Costs for NO_REGS are used in cost calculation on the - 1st pass when the preferred register classes are not - known yet. In this case we take the best scenario. */ - enum reg_class cl = NO_REGS; + enum reg_class cl = GENERAL_REGS; rtx reg = SET_DEST (set); int num = COST_INDEX (REGNO (reg)); + /* Costs for NO_REGS are used in cost calculation on the + 1st pass when the preferred register classes are not + known yet. In this case we take the best scenario when + mode can't be put into GENERAL_REGS. */ + if (!targetm.hard_regno_mode_ok (ira_class_hard_regs[cl][0], + GET_MODE (reg))) + cl = NO_REGS; COSTS (costs, num)->mem_cost -= ira_memory_move_cost[GET_MODE (reg)][cl][1] * frequency; From patchwork Mon Feb 10 05:43:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liu, Hongtao" X-Patchwork-Id: 106222 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 452CC3858420 for ; Mon, 10 Feb 2025 05:51:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 452CC3858420 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=VrZ0mITd X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) by sourceware.org (Postfix) with ESMTPS id 9CDE73858410 for ; Mon, 10 Feb 2025 05:43:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9CDE73858410 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9CDE73858410 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1739166212; cv=none; b=tk123EHyaC5m4iDxW32kiWQwp3UQtcho3zbwPn6SzsJ0VvOtm++lW0sE/8tYTHpdBMSVoZYYdtXAMZwygGTtlMS29NVQQYp70o8JVCXl9U3k5gvaI9YNT3W0AM5DxdoJcuHkYPXNbmuJ9wdxNX7PtSIOW7ceJUc/HrNTkSgETdc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1739166212; c=relaxed/simple; bh=7EPAjxo33Nx7mE4IUC7Ga5Cj9v98+OgKRuTCIaC1nc8=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=reBj863ZMBK754sKSGqO7nFUDlHY+ewCERaz06umG42HF5bDXgrF0o83sF4AwstcXcf8D3rUr0l7DP652SBmtRpWH8rvcaYiRxcFjFgBVN2ewKWNMVUTwAi8AZo1qwYiHq366PUYMaVqGvYv6u5v40V/y+a4Lb3edmZxDE/kXAU= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9CDE73858410 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739166212; x=1770702212; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7EPAjxo33Nx7mE4IUC7Ga5Cj9v98+OgKRuTCIaC1nc8=; b=VrZ0mITdA0Zz8lOD1BCns+PQARcc9qHKzkYW0XaYmP8ICKqFSBWGAHLu qHWpFGaZ0fro1LQlprKUBqjkJHpjrt1IBjT+3OLoHyAOyWTyqv5LJqFji PQ6lEecgImXmmblnOf+zkxZr/j/l6JAKi9Wb5/gi/ujw3SYMJV3Vxy5Km dr/gFfodc8YlOql+cMoXEqKrxaHP/V4PlXybuRH18tp7F6qMh49frlp8A 97PVrkFuqUX7F+svzh0V5Odtoq8zj9B9MW1dndBBvZlARk7s31prM2e+y MYJ9KjJ0snPG93Ja4PPf7XReSBIb1ckvnQYjgdswL2DVhqQL44CJxGhhB g==; X-CSE-ConnectionGUID: wx7dynyqShWAcHW1De8dpg== X-CSE-MsgGUID: ze91laBDSeCbpy95fKFkaQ== X-IronPort-AV: E=McAfee;i="6700,10204,11340"; a="49968391" X-IronPort-AV: E=Sophos;i="6.13,273,1732608000"; d="scan'208";a="49968391" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Feb 2025 21:43:21 -0800 X-CSE-ConnectionGUID: sW/HgX8wTUWJK0bPGtUQXQ== X-CSE-MsgGUID: u3V+EycARBq0fgfsfBuBiQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,273,1732608000"; d="scan'208";a="112016232" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa007.fm.intel.com with ESMTP; 09 Feb 2025 21:43:21 -0800 Received: from jfel-spr-6155.jf.intel.com (jfel-spr-6155.jf.intel.com [10.7.183.102]) by scymds04.sc.intel.com (Postfix) with ESMTP id 940542003AE0; Sun, 9 Feb 2025 21:43:20 -0800 (PST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com Subject: [PATCH 3/3] Adjust testcases after better RA decision. Date: Sun, 9 Feb 2025 21:43:20 -0800 Message-Id: <20250210054320.1014838-4-hongtao.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250210054320.1014838-1-hongtao.liu@intel.com> References: <20250210054320.1014838-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org After optimization for RA, memory op is not propagated into instructions(>1), and it make testcases not generate vxorps since the memory is loaded into the dest, and the dest is never unused now. So rewrite testcases to make the codegen more stable. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-dest-false-dep-for-glc.c: Rewrite testcase to make the codegen more stable. * gcc.target/i386/avx512dq-dest-false-dep-for-glc.c: Ditto * gcc.target/i386/avx512f-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512vl-dest-false-dep-for-glc.c: Ditto. (cherry picked from commit 525713ed9db904ed2504decc5bde9d58bd5d98b4) --- .../i386/avx2-dest-false-dep-for-glc.c | 28 +- .../i386/avx512dq-dest-false-dep-for-glc.c | 257 ++++++++++--- .../i386/avx512f-dest-false-dep-for-glc.c | 348 ++++++++++++++---- .../i386/avx512fp16-dest-false-dep-for-glc.c | 118 ++++-- .../i386/avx512vl-dest-false-dep-for-glc.c | 243 +++++++++--- 5 files changed, 790 insertions(+), 204 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c b/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c index fe331fe5e2c..e260888627f 100644 --- a/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c +++ b/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c @@ -5,16 +5,28 @@ #include -extern __m256i i1, i2, i3, i4; -extern __m256d d1, d2; -extern __m256 f1, f2; +__m256i +foo0 (__m256i i3, __m256i i1, __m256i i2) +{ + return _mm256_permutevar8x32_epi32 (i1, i2); +} + +__m256i +foo1 (__m256i i2, __m256i i1) +{ + return _mm256_permute4x64_epi64 (i1, 12); +} + +__m256d +foo2 (__m256d d2, __m256d d1) +{ + return _mm256_permute4x64_pd (d1, 12); +} -void vperm_test (void) +__m256 +foo3 (__m256 f2, __m256i i2, __m256 f1) { - i3 = _mm256_permutevar8x32_epi32 (i1, i2); - i4 = _mm256_permute4x64_epi64 (i1, 12); - d2 = _mm256_permute4x64_pd (d1, 12); - f2 = _mm256_permutevar8x32_ps (f1, i2); + return _mm256_permutevar8x32_ps (f1, i2); } /* { dg-final { scan-assembler-times "vxorps" 4 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c b/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c index b334b88194b..b615b55558d 100644 --- a/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c +++ b/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c @@ -13,56 +13,219 @@ extern __m512 f1, f11; extern __m256 f2; extern __m128 f3, f33; -__mmask32 m32; __mmask16 m16; __mmask8 m8; -void mullo_test (void) -{ - i1 = _mm512_mullo_epi64 (i1, i1); - i1 = _mm512_mask_mullo_epi64 (i1, m8, i1, i1); - i1 = _mm512_maskz_mullo_epi64 (m8, i1, i1); - i2 = _mm256_mullo_epi64 (i2, i2); - i2 = _mm256_mask_mullo_epi64 (i2, m8, i2, i2); - i2 = _mm256_maskz_mullo_epi64 (m8, i2, i2); - i3 = _mm_mullo_epi64 (i3, i3); - i3 = _mm_mask_mullo_epi64 (i3, m8, i3, i3); - i3 = _mm_maskz_mullo_epi64 (m8, i3, i3); -} - -void range_test (void) -{ - d1 = _mm512_range_pd (d1, d11, 15); - d11 = _mm512_range_round_pd (d11, d1, 15, 8); - d1 = _mm512_mask_range_pd (d1, m8, d11, d11, 15); - d11 = _mm512_mask_range_round_pd (d11, m8, d1, d1, 15, 8); - d1 = _mm512_maskz_range_pd (m8, d11, d11, 15); - d11 = _mm512_maskz_range_round_pd (m8, d1, d1, 15, 8); - d2 = _mm256_range_pd (d2, d2, 15); - d2 = _mm256_mask_range_pd (d2, m8, d2, d2, 15); - d2 = _mm256_maskz_range_pd (m8, d2, d2, 15); - d3 = _mm_range_pd (d3, d3, 15); - d3 = _mm_mask_range_pd (d3, m8, d3, d3, 15); - d3 = _mm_maskz_range_pd (m8, d3, d3, 15); - d33 = _mm_range_sd (d33, d33, 15); - d33 = _mm_mask_range_sd (d33, m8, d33, d33, 15); - d33 = _mm_maskz_range_sd (m8, d33, d33, 15); - - f1 = _mm512_range_ps (f1, f11, 15); - f11 = _mm512_range_round_ps (f11, f1, 15, 8); - f1 = _mm512_mask_range_ps (f1, m16, f11, f11, 15); - f11 = _mm512_mask_range_round_ps (f11, m16, f1, f1, 15, 8); - f1 = _mm512_maskz_range_ps (m16, f11, f11, 15); - f11 = _mm512_maskz_range_round_ps (m16, f1, f1, 15, 8); - f2 = _mm256_range_ps (f2, f2, 15); - f2 = _mm256_mask_range_ps (f2, m8, f2, f2, 15); - f2 = _mm256_maskz_range_ps (m8, f2, f2, 15); - f3 = _mm_range_ps (f3, f3, 15); - f3 = _mm_mask_range_ps (f3, m8, f3, f3, 15); - f3 = _mm_maskz_range_ps (m8, f3, f3, 15); - f33 = _mm_range_ss (f33, f33, 15); - f33 = _mm_mask_range_ss (f33, m8, f33, f33, 15); - f33 = _mm_maskz_range_ss (m8, f33, f33, 15); +#define MULLO(func, type) \ + type \ + mullo##type (type i2, type i1) \ + { \ + return func (i1, i1); \ + } + +#define MULLO_MASK(func, type) \ + type \ + mullo_mask##type (type i2, type i1) \ + { \ + return func (i1, m8, i1, i1); \ + } + +#define MULLO_MASKZ(func, type) \ + type \ + mullo_maksz##type (type i2, type i1) \ + { \ + return func (m8, i1, i1); \ + } + +MULLO (_mm512_mullo_epi64, __m512i); +MULLO_MASK (_mm512_mask_mullo_epi64, __m512i); +MULLO_MASKZ (_mm512_maskz_mullo_epi64, __m512i); +MULLO (_mm256_mullo_epi64, __m256i); +MULLO_MASK (_mm256_mask_mullo_epi64, __m256i); +MULLO_MASKZ (_mm256_maskz_mullo_epi64, __m256i); +MULLO (_mm_mullo_epi64, __m128i); +MULLO_MASK (_mm_mask_mullo_epi64, __m128i); +MULLO_MASKZ (_mm_maskz_mullo_epi64, __m128i); + + +__m512d +foo1 (__m512d d2, __m512d d1, __m512d d11) +{ + return _mm512_range_pd (d1, d11, 15); +} + +__m512d +foo2 (__m512d d2, __m512d d1, __m512d d11) +{ + return _mm512_range_round_pd (d11, d1, 15, 8); +} + +__m512d +foo3 (__m512d d2, __m512d d1, __m512d d11) +{ + return _mm512_mask_range_pd (d1, m8, d11, d11, 15); +} + +__m512d +foo4 (__m512d d2, __m512d d1, __m512d d11) +{ + return _mm512_mask_range_round_pd (d11, m8, d1, d1, 15, 8); +} + +__m512d +foo5 (__m512d d2, __m512d d1, __m512d d11) +{ + return _mm512_maskz_range_pd (m8, d11, d11, 15); +} + +__m512d +foo6 (__m512d d2, __m512d d1, __m512d d11) +{ + return _mm512_maskz_range_round_pd (m8, d1, d1, 15, 8); +} + +__m256d +foo7 (__m256d d1, __m256d d2) +{ + return _mm256_range_pd (d2, d2, 15); +} + +__m256d +foo8 (__m256d d1, __m256d d2) +{ + return _mm256_mask_range_pd (d2, m8, d2, d2, 15); +} + +__m256d +foo9 (__m256d d1, __m256d d2) +{ + return _mm256_maskz_range_pd (m8, d2, d2, 15); +} + +__m128d +foo10 (__m128d d1, __m128d d3) +{ + return _mm_range_pd (d3, d3, 15); +} + +__m128d +foo11 (__m128d d1, __m128d d3) +{ + return _mm_mask_range_pd (d3, m8, d3, d3, 15); +} + +__m128d +foo12 (__m128d d1, __m128d d3) +{ + return _mm_maskz_range_pd (m8, d3, d3, 15); +} + +__m128d +foo13 (__m128d d1, __m128d d33) +{ + return _mm_range_sd (d33, d33, 15); +} + +__m128d +foo14 (__m128d d1, __m128d d33) +{ + return _mm_mask_range_sd (d33, m8, d33, d33, 15); +} + +__m128d +foo15 (__m128d d1, __m128d d33) +{ + return _mm_maskz_range_sd (m8, d33, d33, 15); +} + +__m512 +bar1 (__m512 d2, __m512 d1, __m512 d11) +{ + return _mm512_range_ps (d1, d11, 15); +} + +__m512 +bar2 (__m512 d2, __m512 d1, __m512 d11) +{ + return _mm512_range_round_ps (d11, d1, 15, 8); +} + +__m512 +bar3 (__m512 d2, __m512 d1, __m512 d11) +{ + return _mm512_mask_range_ps (d1, m16, d11, d11, 15); +} + +__m512 +bar4 (__m512 d2, __m512 d1, __m512 d11) +{ + return _mm512_mask_range_round_ps (d11, m16, d1, d1, 15, 8); +} + +__m512 +bar5 (__m512 d2, __m512 d1, __m512 d11) +{ + return _mm512_maskz_range_ps (m16, d11, d11, 15); +} + +__m512 +bar6 (__m512 d2, __m512 d1, __m512 d11) +{ + return _mm512_maskz_range_round_ps (m16, d1, d1, 15, 8); +} + +__m256 +bar7 (__m256 d1, __m256 d2) +{ + return _mm256_range_ps (d2, d2, 15); +} + +__m256 +bar8 (__m256 d1, __m256 d2) +{ + return _mm256_mask_range_ps (d2, m8, d2, d2, 15); +} + +__m256 +bar9 (__m256 d1, __m256 d2) +{ + return _mm256_maskz_range_ps (m8, d2, d2, 15); +} + +__m128 +bar10 (__m128 d1, __m128 d3) +{ + return _mm_range_ps (d3, d3, 15); +} + +__m128 +bar11 (__m128 d1, __m128 d3) +{ + return _mm_mask_range_ps (d3, m8, d3, d3, 15); +} + +__m128 +bar12 (__m128 d1, __m128 d3) +{ + return _mm_maskz_range_ps (m8, d3, d3, 15); +} + +__m128 +bar13 (__m128 d1, __m128 d33) +{ + return _mm_range_ss (d33, d33, 15); +} + +__m128 +bar14 (__m128 d1, __m128 d33) +{ + return _mm_mask_range_ss (d33, m8, d33, d33, 15); +} + +__m128 +bar15 (__m128 d1, __m128 d33) +{ + return _mm_maskz_range_ss (m8, d33, d33, 15); } /* { dg-final { scan-assembler-times "vxorps" 26 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-dest-false-dep-for-glc.c b/gcc/testsuite/gcc.target/i386/avx512f-dest-false-dep-for-glc.c index 26e4ba7e969..1517878ef85 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-dest-false-dep-for-glc.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-dest-false-dep-for-glc.c @@ -13,86 +13,288 @@ volatile __m512d *pd11; __mmask16 m16; __mmask8 m8; -void vperm_test (void) -{ - d1 = _mm512_permutex_pd (d1, 12); - d1 = _mm512_mask_permutex_pd (d1, m8, d1, 13); - d1 = _mm512_maskz_permutex_pd (m8, d1, 14); - d11 = _mm512_permutexvar_pd (i1, d11); - d11 = _mm512_mask_permutexvar_pd (d11, m8, i2, d11); - d11 = _mm512_maskz_permutexvar_pd (m8, i3, d11); - - f1 = _mm512_permutexvar_ps (i1, f1); - f1 = _mm512_mask_permutexvar_ps (f1, m16, i1, f1); - f1 = _mm512_maskz_permutexvar_ps (m16, i1, f1); - - i3 = _mm512_permutexvar_epi64 (i3, i3); - i3 = _mm512_mask_permutexvar_epi64 (i3, m8, i1, i1); - i3 = _mm512_maskz_permutexvar_epi64 (m8, i3, i1); - i1 = _mm512_permutex_epi64 (i3, 12); - i1 = _mm512_mask_permutex_epi64 (i1, m8, i1, 12); - i1 = _mm512_maskz_permutex_epi64 (m8, i1, 12); - - i2 = _mm512_permutexvar_epi32 (i2, i2); - i2 = _mm512_mask_permutexvar_epi32 (i2, m16, i2, i2); - i3 = _mm512_maskz_permutexvar_epi32 (m16, i3, i3); -} - -void getmant_test (void) -{ - d1 = _mm512_getmant_pd (*pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d1 = _mm512_getmant_round_pd (*pd11, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - d1 = _mm512_mask_getmant_pd (d1, m8, *pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d1 = _mm512_mask_getmant_round_pd (d1, m8, *pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - d1 = _mm512_maskz_getmant_pd (m8, *pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d1 = _mm512_maskz_getmant_round_pd (m8, *pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - f1 = _mm512_getmant_ps (*pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f1 = _mm512_getmant_round_ps (*pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - f1 = _mm512_mask_getmant_ps (f1, m16, *pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f1 = _mm512_mask_getmant_round_ps (f1, m16, *pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - f1 = _mm512_maskz_getmant_ps (m16, *pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f1 = _mm512_maskz_getmant_round_ps (m16, *pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - - d2 = _mm_getmant_sd (d2, d2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d2 = _mm_getmant_round_sd (d2, d2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - d2 = _mm_mask_getmant_sd (d2, m8, d2, d2, _MM_MANT_NORM_p75_1p5, +__m512d +foo1 (__m512d d2, __m512d d1) +{ + return _mm512_permutex_pd (d1, 12); +} + +__m512d +foo2 (__m512d d2, __m512d d1) +{ + return _mm512_mask_permutex_pd (d1, m8, d1, 13); +} + +__m512d +foo3 (__m512d d2, __m512d d1) +{ + return _mm512_maskz_permutex_pd (m8, d1, 14); +} + +__m512d +foo4 (__m512d d2, __m512d d11, __m512i i1) +{ + return _mm512_permutexvar_pd (i1, d11); +} + +__m512d +foo5 (__m512d d2, __m512d d11, __m512i i2) +{ + return _mm512_mask_permutexvar_pd (d11, m8, i2, d11); +} + +__m512d +foo6 (__m512d d2, __m512d d11, __m512i i3) +{ + return _mm512_maskz_permutexvar_pd (m8, i3, d11); +} + +__m512i +ioo1 (__m512i d2, __m512i d1) +{ + return _mm512_permutex_epi64 (d1, 12); +} + +__m512i +ioo2 (__m512i d2, __m512i d1) +{ + return _mm512_mask_permutex_epi64 (d1, m8, d1, 13); +} + +__m512i +ioo3 (__m512i d2, __m512i d1) +{ + return _mm512_maskz_permutex_epi64 (m8, d1, 14); +} + +__m512i +ioo4 (__m512i d2, __m512i d11, __m512i i1) +{ + return _mm512_permutexvar_epi64 (i1, d11); +} + +__m512i +ioo5 (__m512i d2, __m512i d11, __m512i i2) +{ + return _mm512_mask_permutexvar_epi64 (d11, m8, i2, d11); +} + +__m512i +ioo6 (__m512i d2, __m512i d11, __m512i i3) +{ + return _mm512_maskz_permutexvar_epi64 (m8, i3, d11); +} + +__m512 +koo1 (__m512 f2, __m512i i1, __m512 f1) +{ + return _mm512_permutexvar_ps (i1, f1); +} + +__m512 +koo2 (__m512 f2, __m512i i1, __m512 f1) +{ + return _mm512_mask_permutexvar_ps (f1, m16, i1, f1); +} + +__m512 +koo3 (__m512 f2, __m512i i1, __m512 f1) +{ + return _mm512_maskz_permutexvar_ps (m16, i1, f1); +} + +__m512i +hoo1 (__m512i f2, __m512i i1, __m512i f1) +{ + return _mm512_permutexvar_epi32 (i1, f1); +} + +__m512i +hoo2 (__m512i f2, __m512i i1, __m512i f1) +{ + return _mm512_mask_permutexvar_epi32 (f1, m16, i1, f1); +} + +__m512i +hoo3 (__m512i f2, __m512i i1, __m512i f1) +{ + return _mm512_maskz_permutexvar_epi32 (m16, i1, f1); +} + +__m512d +moo1 (__m512d d2, __m512d* d1) +{ + return _mm512_getmant_pd (*d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m512d +moo2 (__m512d d2, __m512d* d1) +{ + return _mm512_getmant_round_pd (*d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m512d +moo3 (__m512d d2, __m512d d1, __m512d* d3) +{ + + return _mm512_mask_getmant_pd (d1, m8, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m512d +moo4 (__m512d d2, __m512d d1, __m512d* d3) +{ + return _mm512_mask_getmant_round_pd (d1, m8, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m512d +moo5 (__m512d d2, __m512d* d1) +{ + return _mm512_maskz_getmant_pd (m8, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m512d +moo6 (__m512d d2, __m512d* d1, __m512d d3) +{ + return _mm512_maskz_getmant_round_pd (m8, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m512 +noo1 (__m512 d2, __m512* d1) +{ + return _mm512_getmant_ps (*d1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); - d2 = _mm_mask_getmant_round_sd (d2, m8, d2, d2, _MM_MANT_NORM_p75_1p5, +} + +__m512 +noo2 (__m512 d2, __m512* d1) +{ + return _mm512_getmant_round_ps (*d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m512 +noo3 (__m512 d2, __m512 d1, __m512* d3) +{ + + return _mm512_mask_getmant_ps (d1, m16, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m512 +noo4 (__m512 d2, __m512 d1, __m512* d3) +{ + return _mm512_mask_getmant_round_ps (d1, m16, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m512 +noo5 (__m512 d2, __m512* d1) +{ + return _mm512_maskz_getmant_ps (m16, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m512 +noo6 (__m512 d2, __m512* d1, __m512 d3) +{ + return _mm512_maskz_getmant_round_ps (m16, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + + +__m128d +ooo1 (__m128d d2, __m128d d1) +{ + return _mm_getmant_sd (d1, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128d +ooo2 (__m128d d2, __m128d d1) +{ + return _mm_getmant_round_sd (d1, d1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, 8); - d2 = _mm_maskz_getmant_sd (m8, d2, d2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d2 = _mm_maskz_getmant_round_sd (m8, d2, d2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - f2 = _mm_getmant_ss (f2, f2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f2 = _mm_getmant_round_ss (f2, f2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - f2 = _mm_mask_getmant_ss (f2, m8, f2, f2, _MM_MANT_NORM_p75_1p5, +} + +__m128d +ooo3 (__m128d d2, __m128d d1, __m128d d3) +{ + + return _mm_mask_getmant_sd (d1, m8, d3, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128d +ooo4 (__m128d d2, __m128d d1, __m128d d3) +{ + return _mm_mask_getmant_round_sd (d1, m8, d3, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m128d +ooo5 (__m128d d2, __m128d d1) +{ + return _mm_maskz_getmant_sd (m8, d1, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128d +ooo6 (__m128d d2, __m128d d1, __m128d d3) +{ + return _mm_maskz_getmant_round_sd (m8, d1, d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m128 +poo1 (__m128 d2, __m128 d1) +{ + return _mm_getmant_ss (d1, d1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); - f2 = _mm_mask_getmant_round_ss (f2, m8, f2, f2, _MM_MANT_NORM_p75_1p5, +} + +__m128 +poo2 (__m128 d2, __m128 d1) +{ + return _mm_getmant_round_ss (d1, d1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, 8); - f2 = _mm_maskz_getmant_ss (m8, f2, f2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f2 = _mm_maskz_getmant_round_ss (m8, f2, f2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); +} + +__m128 +poo3 (__m128 d2, __m128 d1, __m128 d3) +{ + + return _mm_mask_getmant_ss (d1, m8, d3, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128 +poo4 (__m128 d2, __m128 d1, __m128 d3) +{ + return _mm_mask_getmant_round_ss (d1, m8, d3, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} +__m128 +poo5 (__m128 d2, __m128 d1) +{ + return _mm_maskz_getmant_ss (m8, d1, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128 +poo6 (__m128 d2, __m128 d1, __m128 d3) +{ + return _mm_maskz_getmant_round_ss (m8, d1, d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); } -/* { dg-final { scan-assembler-times "vxorps" 22 } } */ +/* { dg-final { scan-assembler-times "vxorps" 24 } } */ /* { dg-final { scan-assembler-times "vpermd" 3 } } */ /* { dg-final { scan-assembler-times "vpermq" 6 } } */ /* { dg-final { scan-assembler-times "vpermps" 3 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c b/gcc/testsuite/gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c index 990d65b0904..55c7399da3b 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c @@ -11,32 +11,98 @@ __mmask32 m32; __mmask16 m16; __mmask8 m8; -void complex_mul_test (void) -{ - h1 = _mm512_fmul_pch (h1, h1); - h1 = _mm512_fmul_round_pch (h1, h1, 8); - h1 = _mm512_mask_fmul_pch (h1, m32, h1, h1); - h1 = _mm512_mask_fmul_round_pch (h1, m32, h1, h1, 8); - h1 = _mm512_maskz_fmul_pch (m32, h1, h1); - h1 = _mm512_maskz_fmul_round_pch (m32, h1, h1, 11); - - h3 = _mm_fmul_sch (h3, h3); - h3 = _mm_fmul_round_sch (h3, h3, 8); - h3 = _mm_mask_fmul_sch (h3, m8, h3, h3); - h3 = _mm_mask_fmul_round_sch (h3, m8, h3, h3, 8); - h3 = _mm_maskz_fmul_sch (m8, h3, h3); - h3 = _mm_maskz_fmul_round_sch (m8, h3, h3, 11); -} - -void vgetmant_test (void) -{ - h3 = _mm_getmant_sh (h3, h3, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - h3 = _mm_mask_getmant_sh (h3, m8, h3, h3, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - h3 = _mm_maskz_getmant_sh (m8, h3, h3, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); -} +__m512h +foo1 (__m512h h2, __m512h h1) +{ + return _mm512_fmul_pch (h1, h1); +} + +__m512h +foo2 (__m512h h2, __m512h h1) +{ + return _mm512_fmul_round_pch (h1, h1, 8); +} + +__m512h +foo3 (__m512h h2, __m512h h1) +{ + return _mm512_mask_fmul_pch (h1, m32, h1, h1); +} + +__m512h +foo4 (__m512h h2, __m512h h1) +{ + return _mm512_mask_fmul_round_pch (h1, m32, h1, h1, 8); +} + +__m512h +foo5 (__m512h h2, __m512h h1) +{ + return _mm512_maskz_fmul_pch (m32, h1, h1); +} + +__m512h +foo6 (__m512h h2, __m512h h1) +{ + return _mm512_maskz_fmul_round_pch (m32, h1, h1, 11); +} + +__m128h +bar1 (__m128h h2, __m128h h1) +{ + return _mm_fmul_sch (h1, h1); +} + +__m128h +bar2 (__m128h h2, __m128h h1) +{ + return _mm_fmul_round_sch (h1, h1, 8); +} + +__m128h +bar3 (__m128h h2, __m128h h1) +{ + return _mm_mask_fmul_sch (h1, m8, h1, h1); +} + +__m128h +bar4 (__m128h h2, __m128h h1) +{ + return _mm_mask_fmul_round_sch (h1, m8, h1, h1, 8); +} + +__m128h +bar5 (__m128h h2, __m128h h1) +{ + return _mm_maskz_fmul_sch (m8, h1, h1); +} + +__m128h +bar6 (__m128h h2, __m128h h1) +{ + return _mm_maskz_fmul_round_sch (m8, h1, h1, 11); +} + +__m128h +zoo1 (__m128h h1, __m128h h3) +{ + return _mm_getmant_sh (h3, h3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128h +zoo2 (__m128h h1, __m128h h3) +{ + return _mm_mask_getmant_sh (h3, m8, h3, h3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128h +zoo3 (__m128h h1, __m128h h3) +{ + return _mm_maskz_getmant_sh (m8, h3, h3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} /* { dg-final { scan-assembler-times "vxorps" 10 } } */ /* { dg-final { scan-assembler-times "vfmulcph" 6 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-dest-false-dep-for-glc.c b/gcc/testsuite/gcc.target/i386/avx512vl-dest-false-dep-for-glc.c index 37d3ba51452..1437254d3ce 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-dest-false-dep-for-glc.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-dest-false-dep-for-glc.c @@ -13,60 +13,203 @@ extern __m128 f2, *pf2; __mmask16 m16; __mmask8 m8; -void vperm_test (void) -{ - d1 = _mm256_permutex_pd (d1, 12); - d1 = _mm256_mask_permutex_pd (d1, m8, d1, 12); - d1 = _mm256_maskz_permutex_pd (m8, d1, 12); - d11 = _mm256_permutexvar_pd (i1, d11); - d11 = _mm256_mask_permutexvar_pd (d11, m8, i1, d11); - d11 = _mm256_maskz_permutexvar_pd (m8, i1, d11); - - f1 = _mm256_permutexvar_ps (i1, f1); - f1 = _mm256_mask_permutexvar_ps (f1, m8, i1, f1); - f1 = _mm256_maskz_permutexvar_ps (m8, i1, f1); - - i1 = _mm256_permutexvar_epi64 (i1, i1); - i1 = _mm256_mask_permutexvar_epi64 (i1, m8, i1, i1); - i1 = _mm256_maskz_permutexvar_epi64 (m8, i1, i1); - i1 = _mm256_permutex_epi64 (i1, 12); - i1 = _mm256_mask_permutex_epi64 (i1, m8, i1, 12); - i1 = _mm256_maskz_permutex_epi64 (m8, i1, 12); - - i2 = _mm256_permutexvar_epi32 (i2, i2); - i2 = _mm256_mask_permutexvar_epi32 (i2, m8, i2, i2); - i3 = _mm256_maskz_permutexvar_epi32 (m8, i3, i3); -} - -void getmant_test (void) -{ - d1 = _mm256_getmant_pd (*pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d1 = _mm256_mask_getmant_pd (d1, m8, *pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d1 = _mm256_maskz_getmant_pd (m8, *pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d2 = _mm_getmant_pd (*pd2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d2 = _mm_mask_getmant_pd (d2, m8, *pd2, _MM_MANT_NORM_p75_1p5, +__m256d +foo1 (__m256d d2, __m256d d1) +{ + return _mm256_permutex_pd (d1, 12); +} + +__m256d +foo2 (__m256d d2, __m256d d1) +{ + return _mm256_mask_permutex_pd (d1, m8, d1, 13); +} + +__m256d +foo3 (__m256d d2, __m256d d1) +{ + return _mm256_maskz_permutex_pd (m8, d1, 14); +} + +__m256d +foo4 (__m256d d2, __m256d d11, __m256i i1) +{ + return _mm256_permutexvar_pd (i1, d11); +} + +__m256d +foo5 (__m256d d2, __m256d d11, __m256i i2) +{ + return _mm256_mask_permutexvar_pd (d11, m8, i2, d11); +} + +__m256d +foo6 (__m256d d2, __m256d d11, __m256i i3) +{ + return _mm256_maskz_permutexvar_pd (m8, i3, d11); +} + +__m256i +ioo1 (__m256i d2, __m256i d1) +{ + return _mm256_permutex_epi64 (d1, 12); +} + +__m256i +ioo2 (__m256i d2, __m256i d1) +{ + return _mm256_mask_permutex_epi64 (d1, m8, d1, 13); +} + +__m256i +ioo3 (__m256i d2, __m256i d1) +{ + return _mm256_maskz_permutex_epi64 (m8, d1, 14); +} + +__m256i +ioo4 (__m256i d2, __m256i d11, __m256i i1) +{ + return _mm256_permutexvar_epi64 (i1, d11); +} + +__m256i +ioo5 (__m256i d2, __m256i d11, __m256i i2) +{ + return _mm256_mask_permutexvar_epi64 (d11, m8, i2, d11); +} + +__m256i +ioo6 (__m256i d2, __m256i d11, __m256i i3) +{ + return _mm256_maskz_permutexvar_epi64 (m8, i3, d11); +} + +__m256 +koo1 (__m256 f2, __m256i i1, __m256 f1) +{ + return _mm256_permutexvar_ps (i1, f1); +} + +__m256 +koo2 (__m256 f2, __m256i i1, __m256 f1) +{ + return _mm256_mask_permutexvar_ps (f1, m8, i1, f1); +} + +__m256 +koo3 (__m256 f2, __m256i i1, __m256 f1) +{ + return _mm256_maskz_permutexvar_ps (m8, i1, f1); +} + +__m256i +hoo1 (__m256i f2, __m256i i1, __m256i f1) +{ + return _mm256_permutexvar_epi32 (i1, f1); +} + +__m256i +hoo2 (__m256i f2, __m256i i1, __m256i f1) +{ + return _mm256_mask_permutexvar_epi32 (f1, m8, i1, f1); +} + +__m256i +hoo3 (__m256i f2, __m256i i1, __m256i f1) +{ + return _mm256_maskz_permutexvar_epi32 (m8, i1, f1); +} + +__m256d +moo1 (__m256d d2, __m256d* d1) +{ + return _mm256_getmant_pd (*d1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); - d2 = _mm_maskz_getmant_pd (m8, *pd2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f1 = _mm256_getmant_ps (*pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f1 = _mm256_mask_getmant_ps (f1, m8, *pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f1 = _mm256_maskz_getmant_ps (m8, *pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f2 = _mm_getmant_ps (*pf2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f2 = _mm_mask_getmant_ps (f2, m8, *pf2, _MM_MANT_NORM_p75_1p5, +} + +__m256d +moo3 (__m256d d2, __m256d d1, __m256d* d3) +{ + + return _mm256_mask_getmant_pd (d1, m8, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m256d +moo5 (__m256d d2, __m256d* d1) +{ + return _mm256_maskz_getmant_pd (m8, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128d +moo2 (__m128d d2, __m128d* d1) +{ + return _mm_getmant_pd (*d1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); - f2 = _mm_maskz_getmant_ps (m8, *pf2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); } -/* { dg-final { scan-assembler-times "vxorps" 19 } } */ +__m128d +moo4 (__m128d d2, __m128d d1, __m128d* d3) +{ + + return _mm_mask_getmant_pd (d1, m8, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128d +moo6 (__m128d d2, __m128d* d1) +{ + return _mm_maskz_getmant_pd (m8, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m256 +noo1 (__m256 d2, __m256* d1) +{ + return _mm256_getmant_ps (*d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m256 +noo3 (__m256 d2, __m256 d1, __m256* d3) +{ + + return _mm256_mask_getmant_ps (d1, m8, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m256 +noo5 (__m256 d2, __m256* d1) +{ + return _mm256_maskz_getmant_ps (m8, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128 +noo2 (__m128 d2, __m128* d1) +{ + return _mm_getmant_ps (*d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128 +noo4 (__m128 d2, __m128 d1, __m128* d3) +{ + + return _mm_mask_getmant_ps (d1, m8, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128 +noo6 (__m128 d2, __m128* d1) +{ + return _mm_maskz_getmant_ps (m8, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +/* { dg-final { scan-assembler-times "vxorps" 20 } } */ /* { dg-final { scan-assembler-times "vpermpd" 6 } } */ /* { dg-final { scan-assembler-times "vpermps" 3 } } */ /* { dg-final { scan-assembler-times "vpermq" 6 } } */