[v3,1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
Commit Message
This patch avoids sched-deps.cc:find_inc() creating exponential number
of dependencies, which become memory and compilation time hogs.
Consider example (simplified from PR96388) ...
===
sp=sp-4 // sp_insnA
mem_insnA1[sp+A1]
...
mem_insnAN[sp+AN]
sp=sp-4 // sp_insnB
mem_insnB1[sp+B1]
...
mem_insnBM[sp+BM]
===
[For simplicity, let's assume find_inc(backwards==true)].
In this example find_modifiable_mems() will arrange for mem_insnA*
to be able to pass sp_insnA, and, while doing this, will create
dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
is a consumer of sp_insnA. After this sp_insnB will have N new
backward dependencies.
Then find_modifiable_mems() gets to mem_insnB*s and starts to create
N new dependencies for _every_ mem_insnB*. This gets us N*M new
dependencies.
In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
30GB and compilation time of 30 minutes, with sched2 accounting for
95% of both metrics. After this patch the RAM usage is down to 1GB
and compilation time is down to 3-4 minutes, with sched2 no longer
standing out on -ftime-report or memory usage.
gcc/ChangeLog:
PR rtl-optimization/96388
PR rtl-optimization/111554
* sched-deps.cc (find_inc): Avoid exponential behavior.
---
gcc/sched-deps.cc | 48 +++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 44 insertions(+), 4 deletions(-)
Comments
Hi Vladimir,
Hi Jeff,
Richard and Alexander have reviewed this patch and [I assume] have no
further comments. OK to merge?
On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
wrote:
> This patch avoids sched-deps.cc:find_inc() creating exponential number
> of dependencies, which become memory and compilation time hogs.
> Consider example (simplified from PR96388) ...
> ===
> sp=sp-4 // sp_insnA
> mem_insnA1[sp+A1]
> ...
> mem_insnAN[sp+AN]
> sp=sp-4 // sp_insnB
> mem_insnB1[sp+B1]
> ...
> mem_insnBM[sp+BM]
> ===
>
> [For simplicity, let's assume find_inc(backwards==true)].
> In this example find_modifiable_mems() will arrange for mem_insnA*
> to be able to pass sp_insnA, and, while doing this, will create
> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
> is a consumer of sp_insnA. After this sp_insnB will have N new
> backward dependencies.
> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
> N new dependencies for _every_ mem_insnB*. This gets us N*M new
> dependencies.
>
> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
> 30GB and compilation time of 30 minutes, with sched2 accounting for
> 95% of both metrics. After this patch the RAM usage is down to 1GB
> and compilation time is down to 3-4 minutes, with sched2 no longer
> standing out on -ftime-report or memory usage.
>
> gcc/ChangeLog:
>
> PR rtl-optimization/96388
> PR rtl-optimization/111554
> * sched-deps.cc (find_inc): Avoid exponential behavior.
> ---
> gcc/sched-deps.cc | 48 +++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index c23218890f3..005fc0f567e 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii,
> rtx_insn *insn, bool before_mem)
> /* Once a suitable mem reference has been found and the corresponding data
> in MII has been filled in, this function is called to find a suitable
> add or inc insn involving the register we found in the memory
> - reference. */
> + reference.
> + If successful, this function will create additional dependencies
> between
> + - mii->inc_insn's producers and mii->mem_insn as a consumer (if
> backwards)
> + - mii->inc_insn's consumers and mii->mem_insn as a producer (if
> !backwards).
> +*/
>
> static bool
> find_inc (struct mem_inc_info *mii, bool backwards)
> {
> sd_iterator_def sd_it;
> dep_t dep;
> + sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK :
> SD_LIST_FORW;
> + int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
>
> - sd_it = sd_iterator_start (mii->mem_insn,
> - backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
> + sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
> while (sd_iterator_cond (&sd_it, &dep))
> {
> dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
> rtx_insn *pro = DEP_PRO (dep);
> rtx_insn *con = DEP_CON (dep);
> - rtx_insn *inc_cand = backwards ? pro : con;
> + rtx_insn *inc_cand;
> + int n_inc_deps;
> +
> if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
> goto next;
> +
> + if (backwards)
> + {
> + inc_cand = pro;
> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
> + }
> + else
> + {
> + inc_cand = con;
> + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
> + }
> +
> + /* In the FOR_EACH_DEP loop below we will create additional
> n_inc_deps
> + for mem_insn. This by itself is not a problem, since each
> mem_insn
> + will have only a few inc_insns associated with it. However, if
> + we consider that a single inc_insn may have a lot of mem_insns,
> AND,
> + on top of that, a few other inc_insns associated with it --
> + those _other inc_insns_ will get (n_mem_deps * number of MEM
> insns)
> + dependencies created for them. This may cause an exponential
> + growth of memory usage and scheduling time.
> + See PR96388 for details.
> + We [heuristically] use n_inc_deps as a proxy for the number of MEM
> + insns, and drop opportunities for breaking modifiable_mem
> dependencies
> + when dependency lists grow beyond reasonable size. */
> + if (n_mem_deps * n_inc_deps
> + >= param_max_pending_list_length * param_max_pending_list_length)
> + goto next;
> +
> if (parse_add_or_inc (mii, inc_cand, backwards))
> {
> struct dep_replacement *desc;
> @@ -4838,6 +4873,11 @@ find_inc (struct mem_inc_info *mii, bool backwards)
> desc->insn = mii->mem_insn;
> move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
> INSN_SPEC_BACK_DEPS (con));
> +
> + /* Make sure that n_inc_deps above is consistent with
> dependencies
> + we create. */
> + gcc_assert (mii->inc_insn == inc_cand);
> +
> if (backwards)
> {
> FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
> --
> 2.34.1
>
>
On 1/15/24 07:56, Maxim Kuvyrkov wrote:
> Hi Vladimir,
> Hi Jeff,
>
> Richard and Alexander have reviewed this patch and [I assume] have no
> further comments. OK to merge?
>
>
I trust Richard and Alexander therefore I did not do additional review
of the patches and have no any comment. Richard's or Alexander's
approval is enough for comitting the patches.
On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> Hi Vladimir,
> Hi Jeff,
>
> Richard and Alexander have reviewed this patch and [I assume] have no
> further comments. OK to merge?
I think the question is whether or not we're too late. I know that
Richard S has held off on his late-combine pass and I'm holding off on
the ext-dce work due to the fact that we're well past stage1 close.
I think the release managers ought to have the final say on this.
jeff
On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> > Hi Vladimir,
> > Hi Jeff,
> >
> > Richard and Alexander have reviewed this patch and [I assume] have no
> > further comments. OK to merge?
> I think the question is whether or not we're too late. I know that
> Richard S has held off on his late-combine pass and I'm holding off on
> the ext-dce work due to the fact that we're well past stage1 close.
>
> I think the release managers ought to have the final say on this.
I'm fine with this now, it doesn't change code generation.
Richard.
>
> jeff
> On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>>
>> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
>>> Hi Vladimir,
>>> Hi Jeff,
>>>
>>> Richard and Alexander have reviewed this patch and [I assume] have no
>>> further comments. OK to merge?
>> I think the question is whether or not we're too late. I know that
>> Richard S has held off on his late-combine pass and I'm holding off on
>> the ext-dce work due to the fact that we're well past stage1 close.
>>
>> I think the release managers ought to have the final say on this.
>
> I'm fine with this now, it doesn't change code generation.
Thanks, Richard.
I'll merge the fix for PR96388 and PR111554 -- patch 1/8. I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens.
Regards,
--
Maxim Kuvyrkov
https://www.linaro.org
On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
<maxim.kuvyrkov@linaro.org> wrote:
>
> > On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> >>
> >>
> >>
> >> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> >>> Hi Vladimir,
> >>> Hi Jeff,
> >>>
> >>> Richard and Alexander have reviewed this patch and [I assume] have no
> >>> further comments. OK to merge?
> >> I think the question is whether or not we're too late. I know that
> >> Richard S has held off on his late-combine pass and I'm holding off on
> >> the ext-dce work due to the fact that we're well past stage1 close.
> >>
> >> I think the release managers ought to have the final say on this.
> >
> > I'm fine with this now, it doesn't change code generation.
>
> Thanks, Richard.
>
> I'll merge the fix for PR96388 and PR111554 -- patch 1/8. I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens.
This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
gcc/fortran/f95-lang.o differs
does n_mem_deps or n_inc_deps include debug insns?
Richard.
> Regards,
>
> --
> Maxim Kuvyrkov
> https://www.linaro.org
>
> On Jan 17, 2024, at 19:02, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
> <maxim.kuvyrkov@linaro.org> wrote:
>>
>>> On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote:
>>>
>>> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
>>>>> Hi Vladimir,
>>>>> Hi Jeff,
>>>>>
>>>>> Richard and Alexander have reviewed this patch and [I assume] have no
>>>>> further comments. OK to merge?
>>>> I think the question is whether or not we're too late. I know that
>>>> Richard S has held off on his late-combine pass and I'm holding off on
>>>> the ext-dce work due to the fact that we're well past stage1 close.
>>>>
>>>> I think the release managers ought to have the final say on this.
>>>
>>> I'm fine with this now, it doesn't change code generation.
>>
>> Thanks, Richard.
>>
>> I'll merge the fix for PR96388 and PR111554 -- patch 1/8. I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens.
>
> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
>
> gcc/fortran/f95-lang.o differs
>
> does n_mem_deps or n_inc_deps include debug insns?
Thanks, investigating.
--
Maxim Kuvyrkov
https://www.linaro.org
> On Jan 17, 2024, at 19:05, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
>
>> On Jan 17, 2024, at 19:02, Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
>> <maxim.kuvyrkov@linaro.org> wrote:
>>>
>>>> On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote:
>>>>
>>>> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
>>>>>> Hi Vladimir,
>>>>>> Hi Jeff,
>>>>>>
>>>>>> Richard and Alexander have reviewed this patch and [I assume] have no
>>>>>> further comments. OK to merge?
>>>>> I think the question is whether or not we're too late. I know that
>>>>> Richard S has held off on his late-combine pass and I'm holding off on
>>>>> the ext-dce work due to the fact that we're well past stage1 close.
>>>>>
>>>>> I think the release managers ought to have the final say on this.
>>>>
>>>> I'm fine with this now, it doesn't change code generation.
>>>
>>> Thanks, Richard.
>>>
>>> I'll merge the fix for PR96388 and PR111554 -- patch 1/8. I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens.
>>
>> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
>>
>> gcc/fortran/f95-lang.o differs
>>
>> does n_mem_deps or n_inc_deps include debug insns?
>
> Thanks, investigating.
Hi Richard,
Yes, both n_mem_deps or n_inc_deps include debug insns, I posted a patch for this in https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643267.html . Testing it now.
If you prefer, I can revert the fix for PR96388 and PR111554.
Kind regards,
--
Maxim Kuvyrkov
https://www.linaro.org
On Wed, Jan 17, 2024 at 7:02 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
> <maxim.kuvyrkov@linaro.org> wrote:
> >
> > > On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> > >>
> > >>
> > >>
> > >> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> > >>> Hi Vladimir,
> > >>> Hi Jeff,
> > >>>
> > >>> Richard and Alexander have reviewed this patch and [I assume] have no
> > >>> further comments. OK to merge?
> > >> I think the question is whether or not we're too late. I know that
> > >> Richard S has held off on his late-combine pass and I'm holding off on
> > >> the ext-dce work due to the fact that we're well past stage1 close.
> > >>
> > >> I think the release managers ought to have the final say on this.
> > >
> > > I'm fine with this now, it doesn't change code generation.
> >
> > Thanks, Richard.
> >
> > I'll merge the fix for PR96388 and PR111554 -- patch 1/8. I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens.
>
> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
>
> gcc/fortran/f95-lang.o differs
>
> does n_mem_deps or n_inc_deps include debug insns?
>
> Richard.
FWIW, I opened:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113456
@@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn *insn, bool before_mem)
/* Once a suitable mem reference has been found and the corresponding data
in MII has been filled in, this function is called to find a suitable
add or inc insn involving the register we found in the memory
- reference. */
+ reference.
+ If successful, this function will create additional dependencies between
+ - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards)
+ - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards).
+*/
static bool
find_inc (struct mem_inc_info *mii, bool backwards)
{
sd_iterator_def sd_it;
dep_t dep;
+ sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
+ int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
- sd_it = sd_iterator_start (mii->mem_insn,
- backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
+ sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
while (sd_iterator_cond (&sd_it, &dep))
{
dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
rtx_insn *pro = DEP_PRO (dep);
rtx_insn *con = DEP_CON (dep);
- rtx_insn *inc_cand = backwards ? pro : con;
+ rtx_insn *inc_cand;
+ int n_inc_deps;
+
if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
goto next;
+
+ if (backwards)
+ {
+ inc_cand = pro;
+ n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
+ }
+ else
+ {
+ inc_cand = con;
+ n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
+ }
+
+ /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
+ for mem_insn. This by itself is not a problem, since each mem_insn
+ will have only a few inc_insns associated with it. However, if
+ we consider that a single inc_insn may have a lot of mem_insns, AND,
+ on top of that, a few other inc_insns associated with it --
+ those _other inc_insns_ will get (n_mem_deps * number of MEM insns)
+ dependencies created for them. This may cause an exponential
+ growth of memory usage and scheduling time.
+ See PR96388 for details.
+ We [heuristically] use n_inc_deps as a proxy for the number of MEM
+ insns, and drop opportunities for breaking modifiable_mem dependencies
+ when dependency lists grow beyond reasonable size. */
+ if (n_mem_deps * n_inc_deps
+ >= param_max_pending_list_length * param_max_pending_list_length)
+ goto next;
+
if (parse_add_or_inc (mii, inc_cand, backwards))
{
struct dep_replacement *desc;
@@ -4838,6 +4873,11 @@ find_inc (struct mem_inc_info *mii, bool backwards)
desc->insn = mii->mem_insn;
move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
INSN_SPEC_BACK_DEPS (con));
+
+ /* Make sure that n_inc_deps above is consistent with dependencies
+ we create. */
+ gcc_assert (mii->inc_insn == inc_cand);
+
if (backwards)
{
FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)