From patchwork Fri Feb 18 18:40:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 51216 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DD648385EC45 for ; Fri, 18 Feb 2022 18:42:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DD648385EC45 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1645209729; bh=RMVCJg7oAH2Nt31tTXIybz/iif8EcpNGqkvIGctZNcw=; h=References:In-Reply-To:Date:Subject:To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=C1DiJ7+LIKWV636r7sO3XaDkcvWfMMIwW0y+CPRBcxs6hts9KFqFkW4j2+mMOIeLh awPiy1h6PchamySoBGw2qfiY+Ni2r+hHYwhgMgMzGjgWpVvnn84TOMhlfpi0qjtzPQ sJCbXCyqwk8Z3eSrp00/NTghaFQvcuDgmgRqm5Ok= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by sourceware.org (Postfix) with ESMTPS id C5D3E385ED4C for ; Fri, 18 Feb 2022 18:41:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C5D3E385ED4C Received: by mail-pg1-x52c.google.com with SMTP id z4so8577211pgh.12 for ; Fri, 18 Feb 2022 10:41:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RMVCJg7oAH2Nt31tTXIybz/iif8EcpNGqkvIGctZNcw=; b=LdMTM8inGmynZBrSMJG8FGodI5qPysIJMfsYKKl9I3Vqvo1DOUp7g2uPtTygswUz1L AUdb6im+up2U0K62csj3rJs9BwYPHhYwELtLYtgU/fb4esgbn+kefQXUBP4/bx7uc8jg JhD+B0JZEeCY4imjZv1s/xb3eoJXq4txn6Cu7XsAdof4PQ41RKlZqXBBdt/Df7/5spQY ndphyKVMcxXkXg+4c5MqVnRHiMBuKSmTwSrZAEFtCjqWqZvy96RSiloLZjhwMRZ1BoBf F3LQ8qI5SpwLTOPV/PwWdt8nmdOMoZxAnQt1nZ39SSrbcPClwh3dRC3wfCS+cjIIIKAm r0dA== X-Gm-Message-State: AOAM532UcIGXqo2/fz3m6MnKvCkKy5S2H6cHO6vCwqgrYRzgat9Ot20r lM/A3HdyileS61yZD/yCuYZ4Tkq0ve4KaMcrqmo= X-Google-Smtp-Source: ABdhPJzU1xvg4t4SKSNwL624mshV/QPKxln/72wqKgZq3jOFp6dvr1XfOAbenbJWtBe/i8Ux1RaO/p7hymNF4cQbc8M= X-Received: by 2002:a62:770a:0:b0:4e0:2547:9219 with SMTP id s10-20020a62770a000000b004e025479219mr8836553pfc.43.1645209684644; Fri, 18 Feb 2022 10:41:24 -0800 (PST) MIME-Version: 1.0 References: <20220217134640.20BA913BF6@imap2.suse-dmz.suse.de> In-Reply-To: Date: Fri, 18 Feb 2022 10:40:48 -0800 Message-ID: Subject: [PATCH] pieces-memset-21.c: Expect vzeroupper for ia32 To: Hongtao Liu X-Spam-Status: No, score=-3026.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Gcc-patches" From: "H.J. Lu" Reply-To: "H.J. Lu" Cc: GCC Patches , "Liu, Hongtao" , Richard Biener Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" On Thu, Feb 17, 2022 at 6:32 PM Hongtao Liu via Gcc-patches wrote: > > On Thu, Feb 17, 2022 at 9:47 PM Richard Biener via Gcc-patches > wrote: > > > > The x86 backend piggy-backs on mode-switching for insertion of > > vzeroupper. A recent improvement there was implemented in a way > > to walk possibly the whole basic-block for all DF reg def definitions > > in its mode_needed hook which is called for each instruction in > > a basic-block during mode-switching local analysis. > > > > The following mostly reverts this improvement. It needs to be > > re-done in a way more consistent with a local dataflow which > > probably means making targets aware of the state of the local > > dataflow analysis. > > > > This improves compile-time of some 538.imagick_r TU from > > 362s to 16s with -Ofast -mavx2 -fprofile-generate. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? > LGTM, have talked to H.J, he also agrees. > > > > Thanks, > > Richard. > > > > 2022-02-17 Richard Biener > > > > PR target/104581 > > * config/i386/i386.cc (ix86_avx_u128_mode_source): Remove. > > (ix86_avx_u128_mode_needed): Return AVX_U128_DIRTY instead > > of calling ix86_avx_u128_mode_source which would eventually > > have returned AVX_U128_ANY in some very special case. > > > > * gcc.target/i386/pr101456-1.c: XFAIL. > > --- > > gcc/config/i386/i386.cc | 78 +--------------------- > > gcc/testsuite/gcc.target/i386/pr101456-1.c | 3 +- > > 2 files changed, 5 insertions(+), 76 deletions(-) > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index cf246e74e57..e4b42fbba6f 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -14377,80 +14377,12 @@ ix86_check_avx_upper_register (const_rtx exp) > > > > static void > > ix86_check_avx_upper_stores (rtx dest, const_rtx, void *data) > > - { > > - if (ix86_check_avx_upper_register (dest)) > > +{ > > + if (ix86_check_avx_upper_register (dest)) > > { > > bool *used = (bool *) data; > > *used = true; > > } > > - } > > - > > -/* For YMM/ZMM store or YMM/ZMM extract. Return mode for the source > > - operand of SRC DEFs in the same basic block before INSN. */ > > - > > -static int > > -ix86_avx_u128_mode_source (rtx_insn *insn, const_rtx src) > > -{ > > - basic_block bb = BLOCK_FOR_INSN (insn); > > - rtx_insn *end = BB_END (bb); > > - > > - /* Return AVX_U128_DIRTY if there is no DEF in the same basic > > - block. */ > > - int status = AVX_U128_DIRTY; > > - > > - for (df_ref def = DF_REG_DEF_CHAIN (REGNO (src)); > > - def; def = DF_REF_NEXT_REG (def)) > > - if (DF_REF_BB (def) == bb) > > - { > > - /* Ignore DEF from different basic blocks. */ > > - rtx_insn *def_insn = DF_REF_INSN (def); > > - > > - /* Check if DEF_INSN is before INSN. */ > > - rtx_insn *next; > > - for (next = NEXT_INSN (def_insn); > > - next != nullptr && next != end && next != insn; > > - next = NEXT_INSN (next)) > > - ; > > - > > - /* Skip if DEF_INSN isn't before INSN. */ > > - if (next != insn) > > - continue; > > - > > - /* Return AVX_U128_DIRTY if the source operand of DEF_INSN > > - isn't constant zero. */ > > - > > - if (CALL_P (def_insn)) > > - { > > - bool avx_upper_reg_found = false; > > - note_stores (def_insn, > > - ix86_check_avx_upper_stores, > > - &avx_upper_reg_found); > > - > > - /* Return AVX_U128_DIRTY if call returns AVX. */ > > - if (avx_upper_reg_found) > > - return AVX_U128_DIRTY; > > - > > - continue; > > - } > > - > > - rtx set = single_set (def_insn); > > - if (!set) > > - return AVX_U128_DIRTY; > > - > > - rtx dest = SET_DEST (set); > > - > > - /* Skip if DEF_INSN is not an AVX load. Return AVX_U128_DIRTY > > - if the source operand isn't constant zero. */ > > - if (ix86_check_avx_upper_register (dest) > > - && standard_sse_constant_p (SET_SRC (set), > > - GET_MODE (dest)) != 1) > > - return AVX_U128_DIRTY; > > - > > - /* We get here only if all AVX loads are from constant zero. */ > > - status = AVX_U128_ANY; > > - } > > - > > - return status; > > } > > > > /* Return needed mode for entity in optimize_mode_switching pass. */ > > @@ -14520,11 +14452,7 @@ ix86_avx_u128_mode_needed (rtx_insn *insn) > > { > > FOR_EACH_SUBRTX (iter, array, src, NONCONST) > > if (ix86_check_avx_upper_register (*iter)) > > - { > > - int status = ix86_avx_u128_mode_source (insn, *iter); > > - if (status == AVX_U128_DIRTY) > > - return status; > > - } > > + return AVX_U128_DIRTY; > > } > > > > /* This isn't YMM/ZMM load/store. */ > > diff --git a/gcc/testsuite/gcc.target/i386/pr101456-1.c b/gcc/testsuite/gcc.target/i386/pr101456-1.c > > index 803fc6e0207..7fb3a3f055c 100644 > > --- a/gcc/testsuite/gcc.target/i386/pr101456-1.c > > +++ b/gcc/testsuite/gcc.target/i386/pr101456-1.c > > @@ -30,4 +30,5 @@ foo3 (void) > > bar (); > > } > > > > -/* { dg-final { scan-assembler-not "vzeroupper" } } */ > > +/* See PR104581 for the XFAIL reason. */ > > +/* { dg-final { scan-assembler-not "vzeroupper" { xfail *-*-* } } } */ > > -- > > 2.34.1 > I am checking this patch. From 1931cbad498e625b1e24452dcfffe02539b12224 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Fri, 18 Feb 2022 10:36:53 -0800 Subject: [PATCH] pieces-memset-21.c: Expect vzeroupper for ia32 Update gcc.target/i386/pieces-memset-21.c to expect vzeroupper for ia32 caused by commit fe79d652c96b53384ddfa43e312cb0010251391b Author: Richard Biener Date: Thu Feb 17 14:40:16 2022 +0100 target/104581 - compile-time regression in mode-switching PR target/104581 * gcc.target/i386/pieces-memset-21.c: Expect vzeroupper for ia32. --- gcc/testsuite/gcc.target/i386/pieces-memset-21.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-21.c b/gcc/testsuite/gcc.target/i386/pieces-memset-21.c index d87d084bf2a..4e2a7407c8a 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-memset-21.c +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-21.c @@ -11,7 +11,8 @@ foo (void) /* { dg-final { scan-assembler-times "vpxor(?:d|)\[ \\t\]+\[^\n\]*%xmm" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu(?:64|8)\[ \\t\]+\[^\n\]*%zmm" 1 } } */ -/* { dg-final { scan-assembler-not "vzeroupper" } } */ +/* { dg-final { scan-assembler-not "vzeroupper" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "vzeroupper" { target ia32 } } } */ /* No need to dynamically realign the stack here. */ /* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ /* Nor use a frame pointer. */ -- 2.35.1