From patchwork Wed Jan 8 10:58:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 104339 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 39DAA3858402 for ; Wed, 8 Jan 2025 11:07:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 39DAA3858402 Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Z5LwKHoZ X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 1B9193858C41 for ; Wed, 8 Jan 2025 10:58:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1B9193858C41 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1B9193858C41 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736333895; cv=none; b=pnDMnbyniw9yswTNJV5vLN+jq9dreyWch7Y/HLyuARfzcptJkPshK8TXGk03Yle7ztXqrc0O9t5IbqlF9rtF0evm9qfM7tU7e00ZdXPKSPDYzpny7GXHND+lGqJzQDN0PrJntO+xECsg2/dlBzi6HpPdjxbKMhOoUG9igql397E= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736333895; c=relaxed/simple; bh=gh1Q5BgVr6IVOMtzG0C54vXqGhKpwa/6qOlrYOUK2N4=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=YSau6QP8N+LDoR1f5SKxOepQBvR67IUiTxbpQGmPIMcYe9V4pOLBMhSaRbjEqpSKpQNvWibXdWev2cKnOcNqLUGvG8T7oKhUCSktZkHgTLbgtf/cREMwwKobo8al7STVY0u08NTvHx+QP8NLkZ+ZsZhyg4gY8sDLkyCwPmgdb3g= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1B9193858C41 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736333894; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=ZmQa189mDrQbP0d+M5zho92htj6CSZ7FaOP9y+Ghk0Y=; b=Z5LwKHoZQjB4PqzXBzpNwxa3jKc2VeV7S2FuJqwWihkXj9nNFFmom9OUA9AA7j0SMe0x6p GNNfL0jBicRIZA8fjdRAwaaYqxOHmGQn8/85aUEhLsCshTyoybtnjNL2DfvHV6tNrYbk4A cUEm5gk96WHztqPG8Xvar/HthuLfrP4= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-244-xJAtjDJqMyWRyoaGutZrHg-1; Wed, 08 Jan 2025 05:58:12 -0500 X-MC-Unique: xJAtjDJqMyWRyoaGutZrHg-1 X-Mimecast-MFC-AGG-ID: xJAtjDJqMyWRyoaGutZrHg Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EC148197732C; Wed, 8 Jan 2025 10:58:11 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.45.224.5]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 947AE195608E; Wed, 8 Jan 2025 10:58:10 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 508Aw7t63361272 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Wed, 8 Jan 2025 11:58:07 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 508Aw7xE3361271; Wed, 8 Jan 2025 11:58:07 +0100 Date: Wed, 8 Jan 2025 11:58:07 +0100 From: Jakub Jelinek To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] match.pd, v2: Avoid introducing UB in the a r<< (32-b) -> a r>> b optimization [PR117927] Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: op89e3xt07y9v8yjoig13l5-J3V5Yl_nFVenNaU2H3Q_1736333892 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-3.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Jakub Jelinek Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org On Wed, Jan 08, 2025 at 10:17:59AM +0100, Richard Biener wrote: > > As mentioned in the PR, the a r<< (bitsize-b) to a r>> b and similar > > match.pd optimization which has been introduced in GCC 15 can introduce > > UB which wasn't there before, in particular if b is equal at runtime > > to bitsize, then a r<< 0 is turned into a r>> bitsize. > > > > The following patch fixes it by optimizing it early only if VRP > > tells us the count isn't equal to the bitsize, and late into > > a r>> (b & (bitsize - 1)) if bitsize is power of two and the subtraction > > has single use, on various targets the masking then goes away because > > its rotate instructions do masking already. The latter can't be > > done too early though, because then the expr_not_equal_to case is > > basically useless and we introduce the masking always and can't find out > > anymore that there was originally no masking. Even cfun->after_inlining > > check would be too early, there is forwprop before vrp, so the patch > > introduces a new PROP for vrp2 finish, after which there is another > > forwprop. Or do you have ideas about what other condition could be used > > for late matching only? > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Maybe instead add PROP_last_full_fold and set that at the start of the > last forwprop? I think in the end we may want to have a special > pre-expand (at ISEL time?) folding and have extra patterns enabled > there (and some disabled eventually), but I think we don't want to > introduce this now. > > With the rest I agree. So like this? Works in quick testing, ok for trunk if it passes full bootstrap/regtest? 2025-01-08 Jakub Jelinek Andrew Pinski PR tree-optimization/117927 * tree-pass.h (PROP_last_full_fold): Define. * passes.def: Add last= parameters to pass_forwprop. * tree-ssa-forwprop.cc (pass_forwprop): Add last_p non-static data member and initialize it in the ctor. (pass_forwprop::set_pass_param): New method. (pass_forwprop::execute): Set PROP_last_full_fold in curr_properties at the start if last_p. * match.pd (a rrotate (32-b) -> a lrotate b): Only optimize either if @2 is known not to be equal to prec or if during/after last forwprop the subtraction has single use and prec is power of two; in that case transform it into orotate by masked count. * gcc.dg/tree-ssa/pr117927.c: New test. Jakub --- gcc/tree-pass.h.jj 2025-01-08 10:05:09.634203321 +0100 +++ gcc/tree-pass.h 2025-01-08 11:22:56.356672128 +0100 @@ -230,6 +230,7 @@ protected: #define PROP_assumptions_done (1 << 19) /* Assume function kept around. */ #define PROP_gimple_lbitint (1 << 20) /* lowered large _BitInt */ +#define PROP_last_full_fold (1 << 21) /* Start of last forwprop. */ #define PROP_gimple \ (PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_lomp) --- gcc/passes.def.jj 2025-01-02 11:23:21.976440640 +0100 +++ gcc/passes.def 2025-01-08 11:24:47.865096546 +0100 @@ -80,7 +80,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_ccp, false /* nonzero_p */); /* After CCP we rewrite no longer addressed locals into SSA form if possible. */ - NEXT_PASS (pass_forwprop); + NEXT_PASS (pass_forwprop, /*last=*/false); NEXT_PASS (pass_early_thread_jumps, /*first=*/true); NEXT_PASS (pass_sra_early); /* pass_build_ealias is a dummy pass that ensures that we @@ -214,7 +214,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_complete_unrolli); NEXT_PASS (pass_backprop); NEXT_PASS (pass_phiprop); - NEXT_PASS (pass_forwprop); + NEXT_PASS (pass_forwprop, /*last=*/false); /* pass_build_alias is a dummy pass that ensures that we execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_alias); @@ -254,7 +254,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_isolate_erroneous_paths); NEXT_PASS (pass_reassoc, true /* early_p */); NEXT_PASS (pass_dce); - NEXT_PASS (pass_forwprop); + NEXT_PASS (pass_forwprop, /*last=*/false); NEXT_PASS (pass_phiopt, false /* early_p */); NEXT_PASS (pass_ccp, true /* nonzero_p */); /* After CCP we rewrite no longer addressed locals into SSA @@ -356,7 +356,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_dce, true /* update_address_taken_p */, true /* remove_unused_locals */); /* After late DCE we rewrite no longer addressed locals into SSA form if possible. */ - NEXT_PASS (pass_forwprop); + NEXT_PASS (pass_forwprop, /*last=*/true); NEXT_PASS (pass_sink_code, true /* unsplit edges */); NEXT_PASS (pass_phiopt, false /* early_p */); NEXT_PASS (pass_fold_builtins); --- gcc/tree-ssa-forwprop.cc.jj 2025-01-03 17:55:01.067219567 +0100 +++ gcc/tree-ssa-forwprop.cc 2025-01-08 11:46:05.935096203 +0100 @@ -4107,14 +4107,22 @@ class pass_forwprop : public gimple_opt_ { public: pass_forwprop (gcc::context *ctxt) - : gimple_opt_pass (pass_data_forwprop, ctxt) + : gimple_opt_pass (pass_data_forwprop, ctxt), last_p (false) {} /* opt_pass methods: */ opt_pass * clone () final override { return new pass_forwprop (m_ctxt); } + void set_pass_param (unsigned int n, bool param) final override + { + gcc_assert (n == 0); + last_p = param; + } bool gate (function *) final override { return flag_tree_forwprop; } unsigned int execute (function *) final override; + private: + /* Determines whether the pass instance should set PROP_last_full_fold. */ + bool last_p; }; // class pass_forwprop unsigned int @@ -4123,6 +4131,8 @@ pass_forwprop::execute (function *fun) unsigned int todoflags = 0; cfg_changed = false; + if (last_p) + fun->curr_properties |= PROP_last_full_fold; calculate_dominance_info (CDI_DOMINATORS); --- gcc/match.pd.jj 2025-01-08 10:05:24.515994525 +0100 +++ gcc/match.pd 2025-01-08 11:30:24.822335431 +0100 @@ -4950,14 +4950,32 @@ (define_operator_list SYNC_FETCH_AND_AND build_int_cst (TREE_TYPE (@1), element_precision (type)), @1); })) -/* a rrotate (32-b) -> a lrotate b */ -/* a lrotate (32-b) -> a rrotate b */ +/* a rrotate (bitsize-b) -> a lrotate b */ +/* a lrotate (bitsize-b) -> a rrotate b */ +/* Only do the above when it is known that b != bitsize. + Otherwise canonicalize to a orotate (b & mask) when the subtraction + has single use and prec is a power of two. */ (for rotate (lrotate rrotate) orotate (rrotate lrotate) (simplify - (rotate @0 (minus INTEGER_CST@1 @2)) - (if (element_precision (TREE_TYPE (@0)) == wi::to_wide (@1)) - (orotate @0 @2)))) + (rotate @0 (minus@3 INTEGER_CST@1 @2)) + (with { auto prec = element_precision (TREE_TYPE (@0)); } + (if (prec == wi::to_wide (@1)) + (switch + (if (expr_not_equal_to (@2, wi::uhwi (prec, + TYPE_PRECISION (TREE_TYPE (@2))))) + (orotate @0 @2)) + (if (single_use (@3) + && pow2p_hwi (prec) + /* Defer this case until last forwprop, so that VRP could be run and + expr_not_equal_to had a chance to match. Otherwise we'd do + pretty much always just the second case. */ + && cfun + && ((cfun->curr_properties & PROP_last_full_fold) != 0 + || !flag_tree_vrp + || optimize_debug)) + (orotate @0 + (bit_and @2 { build_int_cst (TREE_TYPE (@2), prec - 1); })))))))) /* Turn (a OP c1) OP c2 into a OP (c1+c2). */ (for op (lrotate rrotate rshift lshift) --- gcc/testsuite/gcc.dg/tree-ssa/pr117927.c.jj 2025-01-08 11:22:04.773400985 +0100 +++ gcc/testsuite/gcc.dg/tree-ssa/pr117927.c 2025-01-08 11:22:04.773400985 +0100 @@ -0,0 +1,97 @@ +/* PR tree-optimization/117927 */ +/* { dg-do compile { target { int32 && longlong64 } } } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times " r<< " 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " r>> " 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " & 31;" 2 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " & 63;" 2 "optimized" } } */ +/* { dg-final { scan-tree-dump-not "32 - " "optimized" } } */ +/* { dg-final { scan-tree-dump-not "64 - " "optimized" } } */ + +static inline +unsigned lrotate32 (unsigned x, int t) +{ + unsigned tl = x << t; + unsigned th = x >> (-t & 31); + return tl | th; +} + +static inline +unsigned rrotate32 (unsigned x, int t) +{ + unsigned tl = x >> t; + unsigned th = x << (-t & 31); + return tl | th; +} + +static inline +unsigned long long lrotate64 (unsigned long long x, int t) +{ + unsigned long long tl = x << t; + unsigned long long th = x >> (-t & 63); + return tl | th; +} + +static inline +unsigned long long rrotate64 (unsigned long long x, int t) +{ + unsigned long long tl = x >> t; + unsigned long long th = x << (-t & 63); + return tl | th; +} + +unsigned +f1 (unsigned x, int t) +{ + return lrotate32 (x, 32 - t); +} + +unsigned long long +f2 (unsigned long long x, int t) +{ + return lrotate64 (x, 64 - t); +} + +unsigned +f3 (unsigned x, int t) +{ + if (t == 32) + __builtin_unreachable (); + return lrotate32 (x, 32 - t); +} + +unsigned long long +f4 (unsigned long long x, int t) +{ + if (t == 64) + __builtin_unreachable (); + return lrotate64 (x, 64 - t); +} + +unsigned +f5 (unsigned x, int t) +{ + return rrotate32 (x, 32 - t); +} + +unsigned long long +f6 (unsigned long long x, int t) +{ + return rrotate64 (x, 64 - t); +} + +unsigned +f7 (unsigned x, int t) +{ + if (t == 32) + __builtin_unreachable (); + return rrotate32 (x, 32 - t); +} + +unsigned long long +f8 (unsigned long long x, int t) +{ + if (t == 64) + __builtin_unreachable (); + return rrotate64 (x, 64 - t); +}