From patchwork Wed Mar 2 08:58:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 51497 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4B8B83858430 for ; Wed, 2 Mar 2022 08:59:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4B8B83858430 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1646211556; bh=bWiCRhDDXjrcRu70/eeQd0qZvPTFcQUvAUmuMNFxztY=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=XSq1z20VAzyiix8T6tdqcuOdEWyX/OUUrDmVsYLR5PzXJ0lse2iCj6g0/nYQfuP2V MfCuq1y4CZiGEK5Tzv0CE2TNmHnaQRjhBkmzt4kC3NVxYrbkwiIng0Z7YG87iESC41 ssXl8UJgjy9H25PAy7NLqcUhgFP1+dRT1Zm6Bsv4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 6FF983858D39 for ; Wed, 2 Mar 2022 08:58:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6FF983858D39 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 5CA351F37C; Wed, 2 Mar 2022 08:58:45 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 423FC13345; Wed, 2 Mar 2022 08:58:45 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id K2EGD8UxH2JDUgAAMHmgww (envelope-from ); Wed, 02 Mar 2022 08:58:45 +0000 Date: Wed, 2 Mar 2022 09:58:44 +0100 (CET) To: gcc-patches@gcc.gnu.org Subject: [PATCH] rtl-optimization/104686 - speedup IRA allocno conflict test MIME-Version: 1.0 Message-Id: <20220302085845.423FC13345@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" In this PR allocnos_conflict_p takes 90% of the compile-time via the calls from update_conflict_hard_regno_costs. This is due to the high number of conflicts recorded in the dense bitvector representation. Fortunately we can take advantage of the bitvector representation here and turn the O(n) conflict test into an O(1) one, greatly speeding up the compile of the testcase from 39s to just 4s (93% IRA time to 26% IRA time). While for the testcase in question the first allocno is almost always the nice one the patch tries a more systematic approach to finding the allocno to iterate object conflicts over. That does reduce the actual number of compares for the testcase but it doesn't make a measurable difference wall-clock wise. That's not guaranteed though I think so I've kept this systematic way of choosing the cheapest allocno. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Richard. 2022-03-02 Richard Biener PR rtl-optimization/104686 * ira-color.cc (object_conflicts_with_allocno_p): New function using a bitvector test instead of iterating when possible. (allocnos_conflict_p): Choose the best allocno to iterate over object conflicts. (update_conflict_hard_regno_costs): Do allocnos_conflict_p test last. --- gcc/ira-color.cc | 75 ++++++++++++++++++++++++++++++++++++------------ 1 file changed, 57 insertions(+), 18 deletions(-) diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc index 8b6db1bb417..e01d1841a08 100644 --- a/gcc/ira-color.cc +++ b/gcc/ira-color.cc @@ -1338,26 +1338,65 @@ update_allocno_cost (ira_allocno_t allocno, int hard_regno, return true; } +/* Return TRUE if the object OBJ conflicts with the allocno A. */ +static bool +object_conflicts_with_allocno_p (ira_object_t obj, ira_allocno_t a) +{ + if (!OBJECT_CONFLICT_VEC_P (obj)) + for (int word = 0; word < ALLOCNO_NUM_OBJECTS (a); word++) + { + ira_object_t another_obj = ALLOCNO_OBJECT (a, word); + if (OBJECT_CONFLICT_ID (another_obj) >= OBJECT_MIN (obj) + && OBJECT_CONFLICT_ID (another_obj) <= OBJECT_MAX (obj) + && TEST_MINMAX_SET_BIT (OBJECT_CONFLICT_BITVEC (obj), + OBJECT_CONFLICT_ID (another_obj), + OBJECT_MIN (obj), OBJECT_MAX (obj))) + return true; + } + else + { + /* If this linear walk ever becomes a bottleneck we could add a + conflict_vec_sorted_p flag and if not set, sort the conflicts after + their ID so we can use a binary search. That would also require + tracking the actual number of conflicts in the vector to not rely + on the NULL termination. */ + ira_object_conflict_iterator oci; + ira_object_t conflict_obj; + FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci) + if (OBJECT_ALLOCNO (conflict_obj) == a) + return true; + } + return false; +} + /* Return TRUE if allocnos A1 and A2 conflicts. Here we are - interesting only in conflicts of allocnos with intersected allocno - classes. */ + interested only in conflicts of allocnos with intersecting allocno + classes. */ static bool allocnos_conflict_p (ira_allocno_t a1, ira_allocno_t a2) { - ira_object_t obj, conflict_obj; - ira_object_conflict_iterator oci; - int word, nwords = ALLOCNO_NUM_OBJECTS (a1); - - for (word = 0; word < nwords; word++) + /* Compute the upper bound for the linear iteration when the object + conflicts are represented as a sparse vector. In particular this + will make sure we prefer O(1) bitvector testing. */ + int num_conflicts_in_vec1 = 0, num_conflicts_in_vec2 = 0; + for (int word = 0; word < ALLOCNO_NUM_OBJECTS (a1); ++word) + if (OBJECT_CONFLICT_VEC_P (ALLOCNO_OBJECT (a1, word))) + num_conflicts_in_vec1 += OBJECT_NUM_CONFLICTS (ALLOCNO_OBJECT (a1, word)); + for (int word = 0; word < ALLOCNO_NUM_OBJECTS (a2); ++word) + if (OBJECT_CONFLICT_VEC_P (ALLOCNO_OBJECT (a2, word))) + num_conflicts_in_vec2 += OBJECT_NUM_CONFLICTS (ALLOCNO_OBJECT (a2, word)); + if (num_conflicts_in_vec2 < num_conflicts_in_vec1) + std::swap (a1, a2); + + for (int word = 0; word < ALLOCNO_NUM_OBJECTS (a1); word++) { - obj = ALLOCNO_OBJECT (a1, word); + ira_object_t obj = ALLOCNO_OBJECT (a1, word); /* Take preferences of conflicting allocnos into account. */ - FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci) - if (OBJECT_ALLOCNO (conflict_obj) == a2) - return true; + if (object_conflicts_with_allocno_p (obj, a2)) + return true; } return false; -} +} /* Update (decrease if DECR_P) HARD_REGNO cost of allocnos connected by copies to ALLOCNO to increase chances to remove some copies as @@ -1572,15 +1611,15 @@ update_conflict_hard_regno_costs (int *costs, enum reg_class aclass, else gcc_unreachable (); + another_aclass = ALLOCNO_CLASS (another_allocno); if (another_allocno == from - || allocnos_conflict_p (another_allocno, start)) - continue; - - another_aclass = ALLOCNO_CLASS (another_allocno); - if (! ira_reg_classes_intersect_p[aclass][another_aclass] || ALLOCNO_ASSIGNED_P (another_allocno) - || ALLOCNO_COLOR_DATA (another_allocno)->may_be_spilled_p) + || ALLOCNO_COLOR_DATA (another_allocno)->may_be_spilled_p + || ! ira_reg_classes_intersect_p[aclass][another_aclass]) + continue; + if (allocnos_conflict_p (another_allocno, start)) continue; + class_size = ira_class_hard_regs_num[another_aclass]; ira_allocate_and_copy_costs (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno),