Message ID | eb57c126-4b5e-dedf-e4e5-24036616a7aa@redhat.com |
---|---|
State | Committed |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AEEE5385BF83 for <patchwork@sourceware.org>; Thu, 18 Nov 2021 19:30:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AEEE5385BF83 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1637263816; bh=ufbhmMI5OUpws9W/DaAAdZ2MCjswtaftUo/HvC5LDdA=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=nQCbEyM2Dd8oOSuTTbSoNztuaWNwM1G5XgxPHQ3Td95kZgIQzkSsjdNHj4ztAieiK NLEzo8rc1Suq3A0JvXBotdFx3PSt8ZEQIzmE+CcAwyYRFbpEzJHbXYS+6nUsqYAFV1 BwfQMUDoUBBz8zt1ggU2OXEKbv5jRaqslq2HDQ8U= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 7595C385C41D for <gcc-patches@gcc.gnu.org>; Thu, 18 Nov 2021 19:28:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7595C385C41D Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-294-F_TJTRsyNayfe0MmTSdniQ-1; Thu, 18 Nov 2021 14:28:14 -0500 X-MC-Unique: F_TJTRsyNayfe0MmTSdniQ-1 Received: by mail-qk1-f197.google.com with SMTP id bq9-20020a05620a468900b004681cdb3483so5780038qkb.23 for <gcc-patches@gcc.gnu.org>; Thu, 18 Nov 2021 11:28:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:from:subject:cc; bh=oyqGtjb/V5VxhjtWbbaskON2nWbABpZv/RVa0XTD05Y=; b=GaWEXsFY5Ly8ipn15iSIre0cVODhcOVAKGhz3pcyQtpSMConXNrD7MB1za5Crb7Np7 PJbS9NG8lhV6tAnaPf5f/JbMxbWmCnX0577weMHP0jkVx2NYipVfHt0tZsUG+WwVx32k 7gGa+wA+zSB4lRZgQMO3bJu0EoBwvuQdM8etwDYsPS8jw44PDfA++uoIeP6vtRPdL9YY itCoMmuVpS9XyM/ueREn3z/KlTWPhFs5hKroMc14La6aDzklJ28bpbR8aR7ouSp9bo1/ v494gyYRCv08fUVL3eNHPhWiMcfm74BtH3h4gaMdHk+cUpwhfnjskLV4wFySggpm1ABN NKvg== X-Gm-Message-State: AOAM532cLD3gl95P68kdr7eo2MKH/tZvU+0iOAllosW/g4tPbYLvxH6B BTLXOcWGfpSvGJLLYT50ExuJd6Dv9PLLKXoWz9aJrmmuBftCLq3lKPhLwnVXN/9H9n9SkJfHP5/ V+aI/MQmiJiQj4rA0fWui5+oefFak1o/8idFLPmqYltmefwMgXrgKYgF1u4R+zwhtVdcOKw== X-Received: by 2002:ac8:5a90:: with SMTP id c16mr28639943qtc.199.1637263692907; Thu, 18 Nov 2021 11:28:12 -0800 (PST) X-Google-Smtp-Source: ABdhPJwdrQ/jrpSeuqRfn19CO1iGwY9D1VxyBOu9P63mVXysHyZTx/DDO/KzihZnwa6EcAPWODt48A== X-Received: by 2002:ac8:5a90:: with SMTP id c16mr28639913qtc.199.1637263692653; Thu, 18 Nov 2021 11:28:12 -0800 (PST) Received: from ?IPV6:2607:fea8:a262:5f00::38cf? ([2607:fea8:a262:5f00::38cf]) by smtp.gmail.com with ESMTPSA id x125sm387044qkd.8.2021.11.18.11.28.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 18 Nov 2021 11:28:12 -0800 (PST) Message-ID: <eb57c126-4b5e-dedf-e4e5-24036616a7aa@redhat.com> Date: Thu, 18 Nov 2021 14:28:10 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.1 To: gcc-patches <gcc-patches@gcc.gnu.org> Subject: [PATCH] PR tree-optimization/103254 - Limit depth for all GORI expressions. X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="------------FqcdAR1O2umwdv76PLigQJcG" Content-Language: en-CA X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Andrew MacLeod via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Andrew MacLeod <amacleod@redhat.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
PR tree-optimization/103254 - Limit depth for all GORI expressions.
|
|
Commit Message
Andrew MacLeod
Nov. 18, 2021, 7:28 p.m. UTC
At issue here is the dynamic approach we currently use for outgoing edge calculations. It isn't normally a problem, but once you get a very large number of possible outgoing values (ie very long unrolled blocks) with pairs of values on a statement, and individual queries for each one, it becomes exponential if they relate to each other. So. We previously introduced a param for PR 100081 which limited the depth of logical expressions GORI would pursue because we always make 2 or 4 queries for each logical expression, which accelerated the exponential behaviour much quicker. This patch reuses that to apply to any statement which has 2 ssa-names, as the potential for 2 queries can happen then as well.. leading to exponential behaviour. Each statement we step back through with multiple ssa-names decreases the odds of calculating a useful outgoing range anyway. This will remove ridiculous behaviour like this PR demonstrates. Next release I plan to revamp GORI to cache outgoing values, which will eliminate all this sort of behaviour, and we can remove all such restrictions. Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK for trunk? Andrew PS. This also points out something we are investigating with the threader. There are no uses of _14 outside the block, so asking for its range is problematic and contributing to excess work and cache filling that we don't need to be doing. The VRP passes do not demonstrate this behaviour unless, as observed, we ask for details output which queries *all* the names.
Comments
On Thu, Nov 18, 2021 at 8:30 PM Andrew MacLeod via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > At issue here is the dynamic approach we currently use for outgoing edge > calculations. It isn't normally a problem, but once you get a very > large number of possible outgoing values (ie very long unrolled blocks) > with pairs of values on a statement, and individual queries for each > one, it becomes exponential if they relate to each other. > > So. We previously introduced a param for PR 100081 which limited the > depth of logical expressions GORI would pursue because we always make 2 > or 4 queries for each logical expression, which accelerated the > exponential behaviour much quicker. > > This patch reuses that to apply to any statement which has 2 ssa-names, > as the potential for 2 queries can happen then as well.. leading to > exponential behaviour. Each statement we step back through with > multiple ssa-names decreases the odds of calculating a useful outgoing > range anyway. This will remove ridiculous behaviour like this PR > demonstrates. > > Next release I plan to revamp GORI to cache outgoing values, which will > eliminate all this sort of behaviour, and we can remove all such > restrictions. > > Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK for trunk? I think it's reasonable. SCEV tries to limit the overall number of SSA -> def visits, capturing deep vs. wide in a more uniform (and single) way. (--param scev-max-expr-size and the duplicate - huh - --param scev-max-expr-complexity). For SCEV running into the limit means fatal fail of the analysis, for ranger it only means less refinement? So it might make a difference which path you visit and which not (just thinking about reproducability of range analysis when we say, swap operands of a stmt)? Thanks, Richard. > Andrew > > > PS. This also points out something we are investigating with the > threader. There are no uses of _14 outside the block, so asking for its > range is problematic and contributing to excess work and cache filling > that we don't need to be doing. The VRP passes do not demonstrate this > behaviour unless, as observed, we ask for details output which queries > *all* the names.
On 11/18/21 8:28 PM, Andrew MacLeod wrote: > @@ -376,6 +366,14 @@ range_def_chain::get_def_chain (tree name) > return NULL; > } > > + // Terminate the def chains if we see too many cascading stmts. > + if (m_logical_depth == param_ranger_logical_depth) > + return NULL; > + > + // Increase the depth if we have a pair of ssa-names. > + if (ssa1 && ssa2) > + m_logical_depth++; > + Not sure it matters, since this is an internal knob, but if the --param now applies to all statements, perhaps we should remove the "logical" reference from the knob. Aldy
On 11/19/21 05:15, Aldy Hernandez wrote: > > > On 11/18/21 8:28 PM, Andrew MacLeod wrote: > >> @@ -376,6 +366,14 @@ range_def_chain::get_def_chain (tree name) >> return NULL; >> } >> >> + // Terminate the def chains if we see too many cascading stmts. >> + if (m_logical_depth == param_ranger_logical_depth) >> + return NULL; >> + >> + // Increase the depth if we have a pair of ssa-names. >> + if (ssa1 && ssa2) >> + m_logical_depth++; >> + > > Not sure it matters, since this is an internal knob, but if the > --param now applies to all statements, perhaps we should remove the > "logical" reference from the knob. > > Aldy > I considered that, but recall the last time I changed a name in my --param it affected some LTO reading thing invalidating existing objects due to a parameter mismatch I think... So I didn't see the need :-) So instead I now interpret it to mean "Logical analysis depth" or something :-) If no one cares about that, we could drop the logical part of it. Andrew
On 11/19/21 04:21, Richard Biener wrote: > On Thu, Nov 18, 2021 at 8:30 PM Andrew MacLeod via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> At issue here is the dynamic approach we currently use for outgoing edge >> calculations. It isn't normally a problem, but once you get a very >> large number of possible outgoing values (ie very long unrolled blocks) >> with pairs of values on a statement, and individual queries for each >> one, it becomes exponential if they relate to each other. >> >> So. We previously introduced a param for PR 100081 which limited the >> depth of logical expressions GORI would pursue because we always make 2 >> or 4 queries for each logical expression, which accelerated the >> exponential behaviour much quicker. >> >> This patch reuses that to apply to any statement which has 2 ssa-names, >> as the potential for 2 queries can happen then as well.. leading to >> exponential behaviour. Each statement we step back through with >> multiple ssa-names decreases the odds of calculating a useful outgoing >> range anyway. This will remove ridiculous behaviour like this PR >> demonstrates. >> >> Next release I plan to revamp GORI to cache outgoing values, which will >> eliminate all this sort of behaviour, and we can remove all such >> restrictions. >> >> Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK for trunk? > I think it's reasonable. SCEV tries to limit the overall number of SSA -> def > visits, capturing deep vs. wide in a more uniform (and single) way. > (--param scev-max-expr-size and the duplicate - huh - --param > scev-max-expr-complexity). > > For SCEV running into the limit means fatal fail of the analysis, for ranger > it only means less refinement? So it might make a difference which path > you visit and which not (just thinking about reproducability of range analysis > when we say, swap operands of a stmt)? > Yes, Its means less refinement for "deeper" dependencies in the block.. It will not be affected by operand swapping because we will either look at dependencies for both operands of a stmt, or neither. The path you visit shouldn't make any difference. Note this has nothing to do with generating ranges for those statements, this is only for refining outgoing ranges created by the condition at the bottom of the block. To be precise, we will stop looking for possible range changes once our dependency search, which always starts from the bottom of the block, encounters 6 stmts with multiple SSA operands. Statements with a single SSA operand will continue to build the dependency chains. so for instance, following just the chain for first operands of this somewhat truncated listing: <...> _137 = (short int) j_131; _138 = _129 & _137; _139 = (int) _138; j_140 = j_131 & _139; << depth 6 ... we don't go look at j_131 or _139 for further dependencies _146 = (short int) j_140; _147 = _138 & _146; _148 = (int) _147; j_149 = j_140 & _148; << depth 5 _155 = (short int) j_149; _156 = _147 & _155; _157 = (int) _156; j_158 = j_149 & _157; << depth 4 _164 = (short int) j_158; _165 = _156 & _164; _166 = (int) _165; j_167 = j_158 & _166; <<-- depth 3 _1 = (short int) j_167; _3 = _1 & _165; <<-depth 2 _5 = (int) _3; j_22 = _5 & j_167; <<-- depth 1 j_20 = j_22 + 4; if (j_20 == 4) goto <bb 3>; We'll generate dependencies, and thus able to generate outgoing ranges for all these ssa-names still until we hit a dependency depth of 6 (the current default), which for one chain comes to an end at j_140 = j_131 & _139 We will be able to generate outgoing ranges for j_131 and _139 on the edges, but we will not try for any ssa_names used to define those.. ie, _138 might not be able to have outgoing ranges generated for it in this block. working the j_20 : [4,4] range on the true edge back thru those expressions, once we get that deep the odds of finding something of value is remote :-) We will also only generate these outgoing ranges when they are asked for. Many of these names are not used outside the block (especially back the deeper you go), and so never get asked for anyway... but you could :-) Anyway, this limit in no way introduces any kind of ordering variation.. Andrew PS for amusement and information overload... it turns out in this hunk of code we end up knowing that globally the range of most of these names are [0, 4], and the actual ranges we COULD generate if asked: 3->5 (T) _1 : short int [0, 4] 3->5 (T) _3 : short int [0, 4] 3->5 (T) _5 : int [0, 4] 3->5 (T) j_20 : int [4, 4] 3->5 (T) j_22 : int [0, 0] 3->5 (T) _120 : short int [0, 4] 3->5 (T) _121 : int [0, 4] 3->5 (T) _128 : short int [0, 4] 3->5 (T) _129 : short int [0, 4] 3->5 (T) _130 : int [0, 4] 3->5 (T) j_131 : int [0, 4] 3->5 (T) _137 : short int [0, 4] 3->5 (T) _138 : short int [0, 4] 3->5 (T) _139 : int [0, 4] 3->5 (T) j_140 : int [0, 4] 3->5 (T) _146 : short int [0, 4] 3->5 (T) _147 : short int [0, 4] 3->5 (T) _148 : int [0, 4] 3->5 (T) j_149 : int [0, 4] 3->5 (T) _155 : short int [0, 4] 3->5 (T) _156 : short int [0, 4] 3->5 (T) _157 : int [0, 4] 3->5 (T) j_158 : int [0, 4] 3->5 (T) _164 : short int [0, 4] 3->5 (T) _165 : short int [0, 4] 3->5 (T) _166 : int [0, 4] 3->5 (T) j_167 : int [0, 4] 3->4 (F) _1 : short int [1, 4] 3->4 (F) _3 : short int [1, 4] 3->4 (F) _5 : int [1, 4] 3->4 (F) j_20 : int [5, 8] 3->4 (F) j_22 : int [1, 4] 3->4 (F) _120 : short int [1, 4] 3->4 (F) _121 : int [1, 4] 3->4 (F) _128 : short int [1, 4] 3->4 (F) _129 : short int [1, 4] 3->4 (F) _130 : int [1, 4] 3->4 (F) j_131 : int [1, 4] 3->4 (F) _137 : short int [1, 4] 3->4 (F) _138 : short int [1, 4] 3->4 (F) _139 : int [1, 4] 3->4 (F) j_140 : int [1, 4] 3->4 (F) _146 : short int [1, 4] 3->4 (F) _147 : short int [1, 4] 3->4 (F) _148 : int [1, 4] 3->4 (F) j_149 : int [1, 4] 3->4 (F) _155 : short int [1, 4] 3->4 (F) _156 : short int [1, 4] 3->4 (F) _157 : int [1, 4] 3->4 (F) j_158 : int [1, 4] 3->4 (F) _164 : short int [1, 4] 3->4 (F) _165 : short int [1, 4] 3->4 (F) _166 : int [1, 4] 3->4 (F) j_167 : int [1, 4] So the chains go slightly deeper than my example, but the end result is most of the names in the _30 thru _119 range on the false edge would come back with [0,4] instead of [1,4] with this change. The actual dependency chains from the dump: Exports: _1 _3 _5 j_20 j_22 _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166 j_167 The exports are the names that ranger can potentially generate a range on outgoing edges that may differ from the range in the block. It means GORI might be able to take the condition at the bottom and adjust those ranges. Any name not in the list will simply use the range it has in the block. Adjusting that param value will change the number of these. The dependency list for each ssa name is also shown: and we can see how the higher in the block we go, the deeper in the dependency chain from the bottom of the block we are, and thus there are less names in the list for each name _1 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166 j_167 _3 : _1 _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166 j_167 _5 : _1 _3 _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166 j_167 j_20 : _1 _3 _5 j_22 _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166 j_167 j_22 : _1 _3 _5 _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166 j_167 _129 : _120 _128 _137 : j_131 _138 : _120 _128 _129 j_131 _137 j_140 : j_131 _139 _146 : j_131 _139 j_140 _147 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _148 : _147 j_149 : j_131 _139 j_140 _147 _148 _155 : j_131 _139 j_140 _147 _148 j_149 _156 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _147 _148 j_149 _155 _157 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _147 _148 j_149 _155 _156 j_158 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _147 _148 j_149 _155 _156 _157 _164 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _165 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164 _166 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165 j_167 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166
On 11/19/21 04:21, Richard Biener wrote: > On Thu, Nov 18, 2021 at 8:30 PM Andrew MacLeod via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> At issue here is the dynamic approach we currently use for outgoing edge >> calculations. It isn't normally a problem, but once you get a very >> large number of possible outgoing values (ie very long unrolled blocks) >> with pairs of values on a statement, and individual queries for each >> one, it becomes exponential if they relate to each other. >> >> So. We previously introduced a param for PR 100081 which limited the >> depth of logical expressions GORI would pursue because we always make 2 >> or 4 queries for each logical expression, which accelerated the >> exponential behaviour much quicker. >> >> This patch reuses that to apply to any statement which has 2 ssa-names, >> as the potential for 2 queries can happen then as well.. leading to >> exponential behaviour. Each statement we step back through with >> multiple ssa-names decreases the odds of calculating a useful outgoing >> range anyway. This will remove ridiculous behaviour like this PR >> demonstrates. >> >> Next release I plan to revamp GORI to cache outgoing values, which will >> eliminate all this sort of behaviour, and we can remove all such >> restrictions. >> >> Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK for trunk? > I think it's reasonable. SCEV tries to limit the overall number of SSA -> def Pushed.
On November 19, 2021 4:00:01 PM GMT+01:00, Andrew MacLeod <amacleod@redhat.com> wrote: >On 11/19/21 04:21, Richard Biener wrote: >> On Thu, Nov 18, 2021 at 8:30 PM Andrew MacLeod via Gcc-patches >> <gcc-patches@gcc.gnu.org> wrote: >>> At issue here is the dynamic approach we currently use for outgoing edge >>> calculations. It isn't normally a problem, but once you get a very >>> large number of possible outgoing values (ie very long unrolled blocks) >>> with pairs of values on a statement, and individual queries for each >>> one, it becomes exponential if they relate to each other. >>> >>> So. We previously introduced a param for PR 100081 which limited the >>> depth of logical expressions GORI would pursue because we always make 2 >>> or 4 queries for each logical expression, which accelerated the >>> exponential behaviour much quicker. >>> >>> This patch reuses that to apply to any statement which has 2 ssa-names, >>> as the potential for 2 queries can happen then as well.. leading to >>> exponential behaviour. Each statement we step back through with >>> multiple ssa-names decreases the odds of calculating a useful outgoing >>> range anyway. This will remove ridiculous behaviour like this PR >>> demonstrates. >>> >>> Next release I plan to revamp GORI to cache outgoing values, which will >>> eliminate all this sort of behaviour, and we can remove all such >>> restrictions. >>> >>> Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK for trunk? >> I think it's reasonable. SCEV tries to limit the overall number of SSA -> def >> visits, capturing deep vs. wide in a more uniform (and single) way. >> (--param scev-max-expr-size and the duplicate - huh - --param >> scev-max-expr-complexity). >> >> For SCEV running into the limit means fatal fail of the analysis, for ranger >> it only means less refinement? So it might make a difference which path >> you visit and which not (just thinking about reproducability of range analysis >> when we say, swap operands of a stmt)? >> >Yes, Its means less refinement for "deeper" dependencies in the block.. >It will not be affected by operand swapping because we will either look >at dependencies for both operands of a stmt, or neither. The path you >visit shouldn't make any difference. Note this has nothing to do with >generating ranges for those statements, this is only for refining >outgoing ranges created by the condition at the bottom of the block. > >To be precise, we will stop looking for possible range changes once our >dependency search, which always starts from the bottom of the block, >encounters 6 stmts with multiple SSA operands. Statements with a single >SSA operand will continue to build the dependency chains. > >so for instance, following just the chain for first operands of this >somewhat truncated listing: > <...> > _137 = (short int) j_131; > _138 = _129 & _137; > _139 = (int) _138; > j_140 = j_131 & _139; << depth 6 ... we don't go look at j_131 >or _139 for further dependencies > _146 = (short int) j_140; > _147 = _138 & _146; > _148 = (int) _147; > j_149 = j_140 & _148; << depth 5 > _155 = (short int) j_149; > _156 = _147 & _155; > _157 = (int) _156; > j_158 = j_149 & _157; << depth 4 > _164 = (short int) j_158; > _165 = _156 & _164; > _166 = (int) _165; > j_167 = j_158 & _166; <<-- depth 3 > _1 = (short int) j_167; > _3 = _1 & _165; <<-depth 2 > _5 = (int) _3; > j_22 = _5 & j_167; <<-- depth 1 > j_20 = j_22 + 4; > if (j_20 == 4) > goto <bb 3>; > >We'll generate dependencies, and thus able to generate outgoing ranges >for all these ssa-names still until we hit a dependency depth of 6 (the >current default), which for one chain comes to an end at > >j_140 = j_131 & _139 > > We will be able to generate outgoing ranges for j_131 and _139 on the >edges, but we will not try for any ssa_names used to define those.. ie, >_138 might not be able to have outgoing ranges generated for it in this >block. > >working the j_20 : [4,4] range on the true edge back thru those >expressions, once we get that deep the odds of finding something of >value is remote :-) > >We will also only generate these outgoing ranges when they are asked >for. Many of these names are not used outside the block (especially >back the deeper you go), and so never get asked for anyway... but you >could :-) I see. I suppose you could prune the list simply by asking whether a Def has a single use only (that you just followed) and follow the chain but not generate an outgoing range for the Def. Not sure if that will make a big difference in compile time. >Anyway, this limit in no way introduces any kind of ordering variation.. > >Andrew > >PS for amusement and information overload... it turns out in this hunk >of code we end up knowing that globally the range of most of these names >are [0, 4], and the actual ranges we COULD generate if asked: > >3->5 (T) _1 : short int [0, 4] >3->5 (T) _3 : short int [0, 4] >3->5 (T) _5 : int [0, 4] >3->5 (T) j_20 : int [4, 4] >3->5 (T) j_22 : int [0, 0] >3->5 (T) _120 : short int [0, 4] >3->5 (T) _121 : int [0, 4] >3->5 (T) _128 : short int [0, 4] >3->5 (T) _129 : short int [0, 4] >3->5 (T) _130 : int [0, 4] >3->5 (T) j_131 : int [0, 4] >3->5 (T) _137 : short int [0, 4] >3->5 (T) _138 : short int [0, 4] >3->5 (T) _139 : int [0, 4] >3->5 (T) j_140 : int [0, 4] >3->5 (T) _146 : short int [0, 4] >3->5 (T) _147 : short int [0, 4] >3->5 (T) _148 : int [0, 4] >3->5 (T) j_149 : int [0, 4] >3->5 (T) _155 : short int [0, 4] >3->5 (T) _156 : short int [0, 4] >3->5 (T) _157 : int [0, 4] >3->5 (T) j_158 : int [0, 4] >3->5 (T) _164 : short int [0, 4] >3->5 (T) _165 : short int [0, 4] >3->5 (T) _166 : int [0, 4] >3->5 (T) j_167 : int [0, 4] >3->4 (F) _1 : short int [1, 4] >3->4 (F) _3 : short int [1, 4] >3->4 (F) _5 : int [1, 4] >3->4 (F) j_20 : int [5, 8] >3->4 (F) j_22 : int [1, 4] >3->4 (F) _120 : short int [1, 4] >3->4 (F) _121 : int [1, 4] >3->4 (F) _128 : short int [1, 4] >3->4 (F) _129 : short int [1, 4] >3->4 (F) _130 : int [1, 4] >3->4 (F) j_131 : int [1, 4] >3->4 (F) _137 : short int [1, 4] >3->4 (F) _138 : short int [1, 4] >3->4 (F) _139 : int [1, 4] >3->4 (F) j_140 : int [1, 4] >3->4 (F) _146 : short int [1, 4] >3->4 (F) _147 : short int [1, 4] >3->4 (F) _148 : int [1, 4] >3->4 (F) j_149 : int [1, 4] >3->4 (F) _155 : short int [1, 4] >3->4 (F) _156 : short int [1, 4] >3->4 (F) _157 : int [1, 4] >3->4 (F) j_158 : int [1, 4] >3->4 (F) _164 : short int [1, 4] >3->4 (F) _165 : short int [1, 4] >3->4 (F) _166 : int [1, 4] >3->4 (F) j_167 : int [1, 4] > >So the chains go slightly deeper than my example, but the end result is >most of the names in the _30 thru _119 range on the false edge would >come back with [0,4] instead of [1,4] with this change. > >The actual dependency chains from the dump: > >Exports: _1 _3 _5 j_20 j_22 _120 _128 _129 j_131 _137 _138 >_139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164 >_165 _166 j_167 > >The exports are the names that ranger can potentially generate a range >on outgoing edges that may differ from the range in the block. It means >GORI might be able to take the condition at the bottom and adjust those >ranges. Any name not in the list will simply use the range it has in >the block. Adjusting that param value will change the number of these. > >The dependency list for each ssa name is also shown: and we can see >how the higher in the block we go, the deeper in the dependency chain >from the bottom of the block we are, and thus there are less names in >the list for each name > > _1 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 >_147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166 j_167 > _3 : _1 _120 _128 _129 j_131 _137 _138 _139 j_140 >_146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166 j_167 > _5 : _1 _3 _120 _128 _129 j_131 _137 _138 _139 j_140 >_146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166 j_167 > j_20 : _1 _3 _5 j_22 _120 _128 _129 j_131 _137 _138 >_139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164 >_165 _166 j_167 > j_22 : _1 _3 _5 _120 _128 _129 j_131 _137 _138 _139 >j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165 >_166 j_167 > _129 : _120 _128 > _137 : j_131 > _138 : _120 _128 _129 j_131 _137 > j_140 : j_131 _139 > _146 : j_131 _139 j_140 > _147 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 > _148 : _147 > j_149 : j_131 _139 j_140 _147 _148 > _155 : j_131 _139 j_140 _147 _148 j_149 > _156 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 >_147 _148 j_149 _155 > _157 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 >_147 _148 j_149 _155 _156 > j_158 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 >_147 _148 j_149 _155 _156 _157 > _164 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 >_147 _148 j_149 _155 _156 _157 j_158 > _165 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 >_147 _148 j_149 _155 _156 _157 j_158 _164 > _166 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 >_147 _148 j_149 _155 _156 _157 j_158 _164 _165 > j_167 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146 >_147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166 >
On 11/19/21 13:13, Richard Biener wrote: > On November 19, 2021 4:00:01 PM GMT+01:00, Andrew MacLeod <amacleod@redhat.com> wrote: >> On 11/19/21 04:21, Richard Biener wrote: >>> On Thu, Nov 18, 2021 at 8:30 PM Andrew MacLeod via Gcc-patches >>> <gcc-patches@gcc.gnu.org> wrote: >>>> At issue here is the dynamic approach we currently use for outgoing edge >>>> calculations. It isn't normally a problem, but once you get a very >>>> large number of possible outgoing values (ie very long unrolled blocks) >>>> with pairs of values on a statement, and individual queries for each >>>> one, it becomes exponential if they relate to each other. >>>> >>>> So. We previously introduced a param for PR 100081 which limited the >>>> depth of logical expressions GORI would pursue because we always make 2 >>>> or 4 queries for each logical expression, which accelerated the >>>> exponential behaviour much quicker. >>>> >>>> This patch reuses that to apply to any statement which has 2 ssa-names, >>>> as the potential for 2 queries can happen then as well.. leading to >>>> exponential behaviour. Each statement we step back through with >>>> multiple ssa-names decreases the odds of calculating a useful outgoing >>>> range anyway. This will remove ridiculous behaviour like this PR >>>> demonstrates. >>>> >>>> Next release I plan to revamp GORI to cache outgoing values, which will >>>> eliminate all this sort of behaviour, and we can remove all such >>>> restrictions. >>>> >>>> Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK for trunk? >>> I think it's reasonable. SCEV tries to limit the overall number of SSA -> def >>> visits, capturing deep vs. wide in a more uniform (and single) way. >>> (--param scev-max-expr-size and the duplicate - huh - --param >>> scev-max-expr-complexity). >>> >>> For SCEV running into the limit means fatal fail of the analysis, for ranger >>> it only means less refinement? So it might make a difference which path >>> you visit and which not (just thinking about reproducability of range analysis >>> when we say, swap operands of a stmt)? >>> >> Yes, Its means less refinement for "deeper" dependencies in the block.. >> It will not be affected by operand swapping because we will either look >> at dependencies for both operands of a stmt, or neither. The path you >> visit shouldn't make any difference. Note this has nothing to do with >> generating ranges for those statements, this is only for refining >> outgoing ranges created by the condition at the bottom of the block. >> >> To be precise, we will stop looking for possible range changes once our >> dependency search, which always starts from the bottom of the block, >> encounters 6 stmts with multiple SSA operands. Statements with a single >> SSA operand will continue to build the dependency chains. >> >> so for instance, following just the chain for first operands of this >> somewhat truncated listing: >> <...> >> _137 = (short int) j_131; >> _138 = _129 & _137; >> _139 = (int) _138; >> j_140 = j_131 & _139; << depth 6 ... we don't go look at j_131 >> or _139 for further dependencies >> _146 = (short int) j_140; >> _147 = _138 & _146; >> _148 = (int) _147; >> j_149 = j_140 & _148; << depth 5 >> _155 = (short int) j_149; >> _156 = _147 & _155; >> _157 = (int) _156; >> j_158 = j_149 & _157; << depth 4 >> _164 = (short int) j_158; >> _165 = _156 & _164; >> _166 = (int) _165; >> j_167 = j_158 & _166; <<-- depth 3 >> _1 = (short int) j_167; >> _3 = _1 & _165; <<-depth 2 >> _5 = (int) _3; >> j_22 = _5 & j_167; <<-- depth 1 >> j_20 = j_22 + 4; >> if (j_20 == 4) >> goto <bb 3>; >> >> We'll generate dependencies, and thus able to generate outgoing ranges >> for all these ssa-names still until we hit a dependency depth of 6 (the >> current default), which for one chain comes to an end at >> >> j_140 = j_131 & _139 >> >> We will be able to generate outgoing ranges for j_131 and _139 on the >> edges, but we will not try for any ssa_names used to define those.. ie, >> _138 might not be able to have outgoing ranges generated for it in this >> block. >> >> working the j_20 : [4,4] range on the true edge back thru those >> expressions, once we get that deep the odds of finding something of >> value is remote :-) >> >> We will also only generate these outgoing ranges when they are asked >> for. Many of these names are not used outside the block (especially >> back the deeper you go), and so never get asked for anyway... but you >> could :-) > I see. I suppose you could prune the list simply by asking whether a Def has a single use only (that you just followed) and follow the chain but not generate an outgoing range for the Def. Not sure if that will make a big difference in compile time. > It wouldn't make any difference in compile time. The export list is used to determine which names a block could generate a range for, but if there are no other uses, there 's unlikely to be any such direct request. it will not be calculated, except in passing if a previous name in the dependency chain is used outside and we have to calculate thru it. Andrew
commit bfecf512fb719dcb072e54d842b1e7a62fe03d2d Author: Andrew MacLeod <amacleod@redhat.com> Date: Wed Nov 17 14:14:06 2021 -0500 Limit depth for all GORI expressions. Apply the logical_depth limit ranger uses to all stmts with multiple ssa-names to avoid excessive outgoing calculations. gcc/ PR tree-optimization/103254 * gimple-range-gori.cc (range_def_chain::get_def_chain): Limit the depth for all statements with multple ssa names. gcc/testsuite/ * gcc.dg/pr103254.c: New. diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc index fb2d571ef44..911d7ac4ec8 100644 --- a/gcc/gimple-range-gori.cc +++ b/gcc/gimple-range-gori.cc @@ -331,7 +331,6 @@ range_def_chain::get_def_chain (tree name) { tree ssa1, ssa2, ssa3; unsigned v = SSA_NAME_VERSION (name); - bool is_logical = false; // If it has already been processed, just return the cached value. if (has_def_chain (name)) @@ -348,15 +347,6 @@ range_def_chain::get_def_chain (tree name) gimple *stmt = SSA_NAME_DEF_STMT (name); if (gimple_range_handler (stmt)) { - is_logical = is_gimple_logical_p (stmt); - // Terminate the def chains if we see too many cascading logical stmts. - if (is_logical) - { - if (m_logical_depth == param_ranger_logical_depth) - return NULL; - m_logical_depth++; - } - ssa1 = gimple_range_ssa_p (gimple_range_operand1 (stmt)); ssa2 = gimple_range_ssa_p (gimple_range_operand2 (stmt)); ssa3 = NULL_TREE; @@ -376,6 +366,14 @@ range_def_chain::get_def_chain (tree name) return NULL; } + // Terminate the def chains if we see too many cascading stmts. + if (m_logical_depth == param_ranger_logical_depth) + return NULL; + + // Increase the depth if we have a pair of ssa-names. + if (ssa1 && ssa2) + m_logical_depth++; + register_dependency (name, ssa1, gimple_bb (stmt)); register_dependency (name, ssa2, gimple_bb (stmt)); register_dependency (name, ssa3, gimple_bb (stmt)); @@ -383,7 +381,7 @@ range_def_chain::get_def_chain (tree name) if (!ssa1 && !ssa2 & !ssa3) set_import (m_def_chain[v], name, NULL); - if (is_logical) + if (ssa1 && ssa2) m_logical_depth--; return m_def_chain[v].bm; diff --git a/gcc/testsuite/gcc.dg/pr103254.c b/gcc/testsuite/gcc.dg/pr103254.c new file mode 100644 index 00000000000..62d2415f216 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr103254.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-timeout 10 } */ + +short int i; + +void +foo (void) +{ + for (i = 1; i < 2; i += 4) + { + int j; + + for (j = 0; j < 5; j += 4) + { + int k; + + for (k = 0; k < 68; k += 4) + { + i &= j; + j &= i; + } + } + } +}