From patchwork Wed Dec 29 08:39:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 49393 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9B6763858003 for ; Wed, 29 Dec 2021 08:39:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B6763858003 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1640767186; bh=zH3THPKPUTzd2tBNfHUZ5GWSvcfHiah5ZaOkvB0rZLE=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=xgTtl9ipMB4W1ZW75ncjKtagtSuJLGMaReQs6PVLKn6sdeAXudKg9tJ2xyDJiOcK1 etaHMETU2tWefFxN5flf6u3GpkfZLmu07AuHUNEucfFq0r7ixxoCAugOgtlwLD5iVX vceNf6lw0xnCJzS3tbrfjJPD3u7Sv9XjBwfzFM2Q= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com [IPv6:2607:f8b0:4864:20::730]) by sourceware.org (Postfix) with ESMTPS id 555233858D39 for ; Wed, 29 Dec 2021 08:39:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 555233858D39 Received: by mail-qk1-x730.google.com with SMTP id f138so19283584qke.10 for ; Wed, 29 Dec 2021 00:39:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=zH3THPKPUTzd2tBNfHUZ5GWSvcfHiah5ZaOkvB0rZLE=; b=ikbpDO+eLqLDHZ/EFxwZXYpXrCxTQTtcEmWhoOy7n7oZHAwqYwXKhltxE5BqqyOKJX wufDXuhNR5uURXCz4G/fGtz8JqPDEN6aUWrG1PCPSynztu8suTmgC5vHxj1kDHj99hlT PgLK6XBDoUzcA8J2IOt3k6e9dHlkLOHArnhfo/XUKs++5pIzsNfTGdXj8SGqBHHvTxlc 3QIbD0tCK8JX3exMkYDN5wCygCMXSjf1bFrdTm7abNVwNVE7HE9duob1Ct+EOPEhSPAR mSE4GcvKjLjrHhAu+zK0bDJyLsgh3OCSwHSMjfXikM4uQmo28kqjJXjzAW7imwt1UqBS ttYA== X-Gm-Message-State: AOAM532KaBm7a2wjKW1oAq5BeaLndTJW/BEdO0tCzVEEk7L0S2q+dek9 IlXGjPwYsdJc6D0rhf0meCz61NzD5vLqwrFATa3DISAjIBSyEw== X-Google-Smtp-Source: ABdhPJzNESTPyqd+u0ha2CEUjLsmirvN3SM77ckaVJ8ZldO12kiTRr6LufzjuUJcRWYIN+z83ki5pJF+aug2S5cvnsM= X-Received: by 2002:a05:620a:2486:: with SMTP id i6mr17836277qkn.522.1640767156637; Wed, 29 Dec 2021 00:39:16 -0800 (PST) MIME-Version: 1.0 Date: Wed, 29 Dec 2021 09:39:05 +0100 Message-ID: Subject: [PATCH] i386: Robustify some expanders w.r.t. paradoxical SUBREGs To: "gcc-patches@gcc.gnu.org" X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Uros Bizjak via Gcc-patches From: Uros Bizjak Reply-To: Uros Bizjak Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" lowpart_subreg might fail in some cases when trying to create paradoxical SUBREGs. Use force_reg on input operand, use new temporary output operand and emit move into the destination afterwards. Also, replace simplify_gen_subreg (Mx, op, My, 0) with equivalent lowpart_subreg (Mx, op, My). 2021-12-29 Uroš Bizjak gcc/ChangeLog: * config/i386/sse.md (vec_pack_float_): Perform gen_avx512dq_floatv2div2sf2 into a pseudo and emit move insn into operands[0]. (fix_truncv2sfv2di2): Use lowpart_subreg instead of simplify_gen_subreg. (trunc2): Perform gen_avx512vl_truncatevqi2 into a pseudo and emit move insn into operands[0]. (trunc2): Perform gen_avx512vl_truncatevhi2 into a pseudo and emit move insn into operands[0]. (truncv2div2si2): Perform gen_avx512vl_truncatev2div2si2 into a pseudo and emit move insn into operands[0]. (truncv8div8qi2): Perform gen_avx512f_truncatev8div16qi2 into a pseudo and emit move insn into operands[0]. (v8qiv8hi2): Use lowpart_subreg instead of simplify_gen_subreg. (v8qiv8si2): Ditto. (v4qiv4si2): Ditto. (v4hiv4si2): Ditto. (v8qiv8di2): Ditto. (v4qiv4di2): Ditto. (v2qiv2di2): Ditto. (v4hiv4di2): Ditto. (v2hiv2di2): Ditto. (v2siv2di2): Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Pushed to master. Uros. diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 69c754751a8..0997d9edf9d 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -8115,9 +8115,12 @@ (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand")))] "TARGET_AVX512DQ && TARGET_AVX512VL" { - operands[0] = simplify_gen_subreg (V4SFmode, operands[0], V2SFmode, 0); + rtx op0 = gen_reg_rtx (V4SFmode); + emit_insn (gen_avx512dq_floatv2div2sf2 - (operands[0], operands[1])); + (op0, operands[1])); + + emit_move_insn (operands[0], lowpart_subreg (V2SFmode, op0, V4SFmode)); DONE; }) @@ -8547,10 +8550,10 @@ (match_operand:V2SF 1 "register_operand")))] "TARGET_AVX512DQ && TARGET_AVX512VL" { - operands[1] = force_reg (V2SFmode, operands[1]); - operands[1] = simplify_gen_subreg (V4SFmode, operands[1], V2SFmode, 0); + rtx op1 = force_reg (V2SFmode, operands[1]); + op1 = lowpart_subreg (V4SFmode, op1, V2SFmode); emit_insn (gen_avx512dq_fix_truncv2sfv2di2 - (operands[0], operands[1])); + (operands[0], op1)); DONE; }) @@ -13631,10 +13634,13 @@ (match_operand:PMOV_SRC_MODE_3 1 "register_operand")))] "TARGET_AVX512VL" { - operands[0] = simplify_gen_subreg (V16QImode, operands[0], mode, 0); - emit_insn (gen_avx512vl_truncatevqi2 (operands[0], - operands[1], - CONST0_RTX (mode))); + rtx op0 = gen_reg_rtx (V16QImode); + + emit_insn (gen_avx512vl_truncatevqi2 + (op0, operands[1], CONST0_RTX (mode))); + + emit_move_insn (operands[0], + lowpart_subreg (mode, op0, V16QImode)); DONE; }) @@ -14006,12 +14012,14 @@ (match_operand:PMOV_SRC_MODE_4 1 "register_operand")))] "TARGET_AVX512VL" { - operands[0] = simplify_gen_subreg (V8HImode, operands[0], mode, 0); - emit_insn (gen_avx512vl_truncatevhi2 (operands[0], - operands[1], - CONST0_RTX (mode))); - DONE; + rtx op0 = gen_reg_rtx (V8HImode); + + emit_insn (gen_avx512vl_truncatevhi2 + (op0, operands[1], CONST0_RTX (mode))); + emit_move_insn (operands[0], + lowpart_subreg (mode, op0, V8HImode)); + DONE; }) (define_insn "avx512vl_vhi2" @@ -14251,10 +14259,13 @@ (match_operand:V2DI 1 "register_operand")))] "TARGET_AVX512VL" { - operands[0] = simplify_gen_subreg (V4SImode, operands[0], V2SImode, 0); - emit_insn (gen_avx512vl_truncatev2div2si2 (operands[0], - operands[1], - CONST0_RTX (V2SImode))); + rtx op0 = gen_reg_rtx (V4SImode); + + emit_insn (gen_avx512vl_truncatev2div2si2 + (op0, operands[1], CONST0_RTX (V2SImode))); + + emit_move_insn (operands[0], + lowpart_subreg (V2SImode, op0, V4SImode)); DONE; }) @@ -14389,8 +14400,12 @@ (match_operand:V8DI 1 "register_operand")))] "TARGET_AVX512F" { - operands[0] = simplify_gen_subreg (V16QImode, operands[0], V8QImode, 0); - emit_insn (gen_avx512f_truncatev8div16qi2 (operands[0], operands[1])); + rtx op0 = gen_reg_rtx (V16QImode); + + emit_insn (gen_avx512f_truncatev8div16qi2 (op0, operands[1])); + + emit_move_insn (operands[0], + lowpart_subreg (V8QImode, op0, V16QImode)); DONE; }) @@ -21625,9 +21640,9 @@ { if (!MEM_P (operands[1])) { - operands[1] = force_reg (V8QImode, operands[1]); - operands[1] = simplify_gen_subreg (V16QImode, operands[1], V8QImode, 0); - emit_insn (gen_sse4_1_v8qiv8hi2 (operands[0], operands[1])); + rtx op1 = force_reg (V8QImode, operands[1]); + op1 = lowpart_subreg (V16QImode, op1, V8QImode); + emit_insn (gen_sse4_1_v8qiv8hi2 (operands[0], op1)); DONE; } }) @@ -21703,9 +21718,9 @@ { if (!MEM_P (operands[1])) { - operands[1] = force_reg (V8QImode, operands[1]); - operands[1] = simplify_gen_subreg (V16QImode, operands[1], V8QImode, 0); - emit_insn (gen_avx2_v8qiv8si2 (operands[0], operands[1])); + rtx op1 = force_reg (V8QImode, operands[1]); + op1 = lowpart_subreg (V16QImode, op1, V8QImode); + emit_insn (gen_avx2_v8qiv8si2 (operands[0], op1)); DONE; } }) @@ -21767,9 +21782,9 @@ { if (!MEM_P (operands[1])) { - operands[1] = force_reg (V4QImode, operands[1]); - operands[1] = simplify_gen_subreg (V16QImode, operands[1], V4QImode, 0); - emit_insn (gen_sse4_1_v4qiv4si2 (operands[0], operands[1])); + rtx op1 = force_reg (V4QImode, operands[1]); + op1 = lowpart_subreg (V16QImode, op1, V4QImode); + emit_insn (gen_sse4_1_v4qiv4si2 (operands[0], op1)); DONE; } }) @@ -21935,9 +21950,9 @@ { if (!MEM_P (operands[1])) { - operands[1] = force_reg (V4HImode, operands[1]); - operands[1] = simplify_gen_subreg (V8HImode, operands[1], V4HImode, 0); - emit_insn (gen_sse4_1_v4hiv4si2 (operands[0], operands[1])); + rtx op1 = force_reg (V4HImode, operands[1]); + op1 = lowpart_subreg (V8HImode, op1, V4HImode); + emit_insn (gen_sse4_1_v4hiv4si2 (operands[0], op1)); DONE; } }) @@ -22056,9 +22071,9 @@ { if (!MEM_P (operands[1])) { - operands[1] = force_reg (V8QImode, operands[1]); - operands[1] = simplify_gen_subreg (V16QImode, operands[1], V8QImode, 0); - emit_insn (gen_avx512f_v8qiv8di2 (operands[0], operands[1])); + rtx op1 = force_reg (V8QImode, operands[1]); + op1 = lowpart_subreg (V16QImode, op1, V8QImode); + emit_insn (gen_avx512f_v8qiv8di2 (operands[0], op1)); DONE; } }) @@ -22118,9 +22133,9 @@ { if (!MEM_P (operands[1])) { - operands[1] = force_reg (V4QImode, operands[1]); - operands[1] = simplify_gen_subreg (V16QImode, operands[1], V4QImode, 0); - emit_insn (gen_avx2_v4qiv4di2 (operands[0], operands[1])); + rtx op1 = force_reg (V4QImode, operands[1]); + op1 = lowpart_subreg (V16QImode, operands[1], V4QImode); + emit_insn (gen_avx2_v4qiv4di2 (operands[0], op1)); DONE; } }) @@ -22145,9 +22160,9 @@ (match_operand:V2QI 1 "register_operand")))] "TARGET_SSE4_1" { - operands[1] = force_reg (V2QImode, operands[1]); - operands[1] = simplify_gen_subreg (V16QImode, operands[1], V2QImode, 0); - emit_insn (gen_sse4_1_v2qiv2di2 (operands[0], operands[1])); + rtx op1 = force_reg (V2QImode, operands[1]); + op1 = lowpart_subreg (V16QImode, op1, V2QImode); + emit_insn (gen_sse4_1_v2qiv2di2 (operands[0], op1)); DONE; }) @@ -22218,9 +22233,9 @@ { if (!MEM_P (operands[1])) { - operands[1] = force_reg (V4HImode, operands[1]); - operands[1] = simplify_gen_subreg (V8HImode, operands[1], V4HImode, 0); - emit_insn (gen_avx2_v4hiv4di2 (operands[0], operands[1])); + rtx op1 = force_reg (V4HImode, operands[1]); + op1 = lowpart_subreg (V8HImode, op1, V4HImode); + emit_insn (gen_avx2_v4hiv4di2 (operands[0], op1)); DONE; } }) @@ -22280,9 +22295,9 @@ { if (!MEM_P (operands[1])) { - operands[1] = force_reg (V2HImode, operands[1]); - operands[1] = simplify_gen_subreg (V8HImode, operands[1], V2HImode, 0); - emit_insn (gen_sse4_1_v2hiv2di2 (operands[0], operands[1])); + rtx op1 = force_reg (V2HImode, operands[1]); + op1 = lowpart_subreg (V8HImode, op1, V2HImode); + emit_insn (gen_sse4_1_v2hiv2di2 (operands[0], op1)); DONE; } }) @@ -22497,9 +22512,9 @@ { if (!MEM_P (operands[1])) { - operands[1] = force_reg (V2SImode, operands[1]); - operands[1] = simplify_gen_subreg (V4SImode, operands[1], V2SImode, 0); - emit_insn (gen_sse4_1_v2siv2di2 (operands[0], operands[1])); + rtx op1 = force_reg (V2SImode, operands[1]); + op1 = lowpart_subreg (V4SImode, op1, V2SImode); + emit_insn (gen_sse4_1_v2siv2di2 (operands[0], op1)); DONE; } })