i386: Enable small loop unrolling for O2

Message ID 20221026055248.94100-1-hongyu.wang@intel.com
State New
Headers
Series i386: Enable small loop unrolling for O2 |

Commit Message

Hongyu Wang Oct. 26, 2022, 5:52 a.m. UTC
  Hi,

Inspired by rs6000 and s390 port changes, this patch
enables loop unrolling for small size loop at O2 by default.
The default behavior is to unroll loop with unknown trip-count and
less than 4 insns by 1 time.

This improves 548.exchange2 by 3.5% on icelake and 6% on zen3 with
1.2% codesize increment. For other benchmarks the variants are minor
and overall codesize increased by 0.2%.

The kernel image size increased by 0.06%, and no impact on eembc.

Bootstrapped & regrtested on x86_64-pc-linux-gnu.

Ok for trunk?

gcc/ChangeLog:

	* common/config/i386/i386-common.cc (ix86_optimization_table):
	Enable loop unroll and small loop unroll at O2 by default.
	* config/i386/i386-options.cc
	(ix86_override_options_after_change):
	Disable small loop unroll when funroll-loops enabled, reset
	cunroll_grow_size when it is not explicitly enabled.
	(ix86_option_override_internal): Call
	ix86_override_options_after_change instead of calling
	ix86_recompute_optlev_based_flags and ix86_default_align
	separately.
	* config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
	factor if -munroll-only-small-loops enabled.
	* config/i386/i386.opt: Add -munroll-only-small-loops,
	-param=x86-small-unroll-ninsns= for loop insn limit,
	-param=x86-small-unroll-factor= for unroll factor.
	* doc/invoke.texi: Document -munroll-only-small-loops,
	x86-small-unroll-ninsns and x86-small-unroll-factor.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr86270.c: Add -mno-unroll-only-small-loops.
	* gcc.target/i386/pr93002.c: Likewise.
---
 gcc/common/config/i386/i386-common.cc   |  6 ++++
 gcc/config/i386/i386-options.cc         | 40 ++++++++++++++++++++++---
 gcc/config/i386/i386.cc                 | 13 ++++++++
 gcc/config/i386/i386.opt                | 13 ++++++++
 gcc/doc/invoke.texi                     | 14 +++++++++
 gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
 7 files changed, 84 insertions(+), 6 deletions(-)
  

Comments

Uros Bizjak Oct. 26, 2022, 6:56 a.m. UTC | #1
On Wed, Oct 26, 2022 at 7:53 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> Hi,
>
> Inspired by rs6000 and s390 port changes, this patch
> enables loop unrolling for small size loop at O2 by default.
> The default behavior is to unroll loop with unknown trip-count and
> less than 4 insns by 1 time.
>
> This improves 548.exchange2 by 3.5% on icelake and 6% on zen3 with
> 1.2% codesize increment. For other benchmarks the variants are minor
> and overall codesize increased by 0.2%.
>
> The kernel image size increased by 0.06%, and no impact on eembc.

Does this setting benefit all targets?  IIRC, in the past all
benchmarks also enabled -funroll-loops, so it looks to me that
unrolling small loops by default is a good compromise.

The patch is technically OK, but as a tuning default, I would leave
the final approval to HJ.

Thanks,
Uros.

>
> Bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
>         * common/config/i386/i386-common.cc (ix86_optimization_table):
>         Enable loop unroll and small loop unroll at O2 by default.
>         * config/i386/i386-options.cc
>         (ix86_override_options_after_change):
>         Disable small loop unroll when funroll-loops enabled, reset
>         cunroll_grow_size when it is not explicitly enabled.
>         (ix86_option_override_internal): Call
>         ix86_override_options_after_change instead of calling
>         ix86_recompute_optlev_based_flags and ix86_default_align
>         separately.
>         * config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
>         factor if -munroll-only-small-loops enabled.
>         * config/i386/i386.opt: Add -munroll-only-small-loops,
>         -param=x86-small-unroll-ninsns= for loop insn limit,
>         -param=x86-small-unroll-factor= for unroll factor.
>         * doc/invoke.texi: Document -munroll-only-small-loops,
>         x86-small-unroll-ninsns and x86-small-unroll-factor.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/pr86270.c: Add -mno-unroll-only-small-loops.
>         * gcc.target/i386/pr93002.c: Likewise.
> ---
>  gcc/common/config/i386/i386-common.cc   |  6 ++++
>  gcc/config/i386/i386-options.cc         | 40 ++++++++++++++++++++++---
>  gcc/config/i386/i386.cc                 | 13 ++++++++
>  gcc/config/i386/i386.opt                | 13 ++++++++
>  gcc/doc/invoke.texi                     | 14 +++++++++
>  gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
>  7 files changed, 84 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
> index d6a68dc9b1d..0e580b39d14 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -1686,6 +1686,12 @@ static const struct default_options ix86_option_optimization_table[] =
>      /* The STC algorithm produces the smallest code at -Os, for x86.  */
>      { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
>        REORDER_BLOCKS_ALGORITHM_STC },
> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
> +    /* Turns off -frename-registers and -fweb which are enabled by
> +       funroll-loops.  */
> +    { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
> +    { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
>      /* Turn off -fschedule-insns by default.  It tends to make the
>         problem with not enough registers even worse.  */
>      { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index acb2291e70f..6ea347c32e1 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -1819,8 +1819,43 @@ ix86_recompute_optlev_based_flags (struct gcc_options *opts,
>  void
>  ix86_override_options_after_change (void)
>  {
> +  /* Default align_* from the processor table.  */
>    ix86_default_align (&global_options);
> +
>    ix86_recompute_optlev_based_flags (&global_options, &global_options_set);
> +
> +  /* Disable unrolling small loops when there's explicit
> +     -f{,no}unroll-loop.  */
> +  if ((OPTION_SET_P (flag_unroll_loops))
> +     || (OPTION_SET_P (flag_unroll_all_loops)
> +        && flag_unroll_all_loops))
> +    {
> +      if (!OPTION_SET_P (ix86_unroll_only_small_loops))
> +       ix86_unroll_only_small_loops = 0;
> +      /* Re-enable -frename-registers and -fweb if funroll-loops
> +        enabled.  */
> +      if (!OPTION_SET_P (flag_web))
> +       flag_web = flag_unroll_loops;
> +      if (!OPTION_SET_P (flag_rename_registers))
> +       flag_rename_registers = flag_unroll_loops;
> +      if (!OPTION_SET_P (flag_cunroll_grow_size))
> +       flag_cunroll_grow_size = flag_unroll_loops
> +                                || flag_peel_loops
> +                                || optimize >= 3;
> +    }
> +  else
> +    {
> +      if (!OPTION_SET_P (flag_cunroll_grow_size))
> +       flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
> +      /* Disables loop unrolling if -mno-unroll-only-small-loops is
> +        explicitly set and -funroll-loops is not enabled.  */
> +      if (OPTION_SET_P (ix86_unroll_only_small_loops)
> +         && !ix86_unroll_only_small_loops
> +         && !(OPTION_SET_P (flag_unroll_loops)
> +              || OPTION_SET_P (flag_unroll_all_loops)))
> +       flag_unroll_loops = flag_unroll_all_loops = 0;
> +    }
> +
>  }
>
>  /* Clear stack slot assignments remembered from previous functions.
> @@ -2332,7 +2367,7 @@ ix86_option_override_internal (bool main_args_p,
>
>    set_ix86_tune_features (opts, ix86_tune, opts->x_ix86_dump_tunes);
>
> -  ix86_recompute_optlev_based_flags (opts, opts_set);
> +  ix86_override_options_after_change ();
>
>    ix86_tune_cost = processor_cost_table[ix86_tune];
>    /* TODO: ix86_cost should be chosen at instruction or function granuality
> @@ -2363,9 +2398,6 @@ ix86_option_override_internal (bool main_args_p,
>        || TARGET_64BIT_P (opts->x_ix86_isa_flags))
>      opts->x_ix86_regparm = REGPARM_MAX;
>
> -  /* Default align_* from the processor table.  */
> -  ix86_default_align (opts);
> -
>    /* Provide default for -mbranch-cost= value.  */
>    SET_OPTION_IF_UNSET (opts, opts_set, ix86_branch_cost,
>                        ix86_tune_cost->branch_cost);
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 480db35f6cd..75829a5d0f4 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -23820,6 +23820,19 @@ ix86_loop_unroll_adjust (unsigned nunroll, class loop *loop)
>    unsigned i;
>    unsigned mem_count = 0;
>
> +  /* Unroll small size loop when unroll factor is not explicitly
> +     specified.  */
> +  if (ix86_unroll_only_small_loops && !loop->unroll)
> +    {
> +      int small_unroll = 0;
> +      if (loop->ninsns <= (unsigned) ix86_small_unroll_ninsns)
> +       small_unroll = MIN ((unsigned) ix86_small_unroll_factor,
> +                           nunroll);
> +      else
> +       small_unroll = 1;
> +      return small_unroll;
> +    }
> +
>    if (!TARGET_ADJUST_UNROLL)
>       return nunroll;
>
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 0dbaacb57ed..a724c73c0c4 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1214,3 +1214,16 @@ Do not use GOT to access external symbols.
>  -param=x86-stlf-window-ninsns=
>  Target Joined UInteger Var(x86_stlf_window_ninsns) Init(64) Param
>  Instructions number above which STFL stall penalty can be compensated.
> +
> +munroll-only-small-loops
> +Target Var(ix86_unroll_only_small_loops) Init(0) Save
> +Enable conservative small loop unrolling.
> +
> +-param=x86-small-unroll-ninsns=
> +Target Joined UInteger Var(ix86_small_unroll_ninsns) Init(4) Param
> +Insturctions number limit for loop to be unrolled under
> +-munroll-only-small-loops.
> +
> +-param=x86-small-unroll-factor=
> +Target Joined UInteger Var(ix86_small_unroll_factor) Init(2) Param
> +Unroll factor for -munroll-only-small-loops.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index cd4d3c1d72c..b6fa79eccc3 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -15779,6 +15779,14 @@ The following choices of @var{name} are available on i386 and x86_64 targets:
>  @item x86-stlf-window-ninsns
>  Instructions number above which STFL stall penalty can be compensated.
>
> +@item x86-small-unroll-ninsns
> +If -munroll-only-small-loops is enabled, only unroll loops with instruction
> +count less than this parameter. The default value is 4.
> +
> +@item x86-small-unroll-factor
> +If -munroll-only-small-loops is enabled, reset the unroll factor with this
> +value. The default value is 2 which means the loop will be unrolled once.
> +
>  @end table
>
>  @end table
> @@ -25186,6 +25194,12 @@ environments where no dynamic link is performed, like firmwares, OS
>  kernels, executables linked with @option{-static} or @option{-static-pie}.
>  @option{-mdirect-extern-access} is not compatible with @option{-fPIC} or
>  @option{-fpic}.
> +
> +@item -munroll-only-small-loops
> +@itemx -mno-unroll-only-small-loops
> +@opindex munroll-only-small-loops
> +Controls conservative small loop unrolling. It is default enbaled by
> +O2, and unrolls loop with less than 4 insns by 1 time.
>  @end table
>
>  @node M32C Options
> diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c b/gcc/testsuite/gcc.target/i386/pr86270.c
> index 81841ef5bd7..cbc9fbb0450 100644
> --- a/gcc/testsuite/gcc.target/i386/pr86270.c
> +++ b/gcc/testsuite/gcc.target/i386/pr86270.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -mno-unroll-only-small-loops" } */
>
>  int *a;
>  long len;
> diff --git a/gcc/testsuite/gcc.target/i386/pr93002.c b/gcc/testsuite/gcc.target/i386/pr93002.c
> index 0248fcc00a5..f75a847f75d 100644
> --- a/gcc/testsuite/gcc.target/i386/pr93002.c
> +++ b/gcc/testsuite/gcc.target/i386/pr93002.c
> @@ -1,6 +1,6 @@
>  /* PR target/93002 */
>  /* { dg-do compile } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -mno-unroll-only-small-loops" } */
>  /* { dg-final { scan-assembler-not "cmp\[^\n\r]*-1" } } */
>
>  volatile int sink;
> --
> 2.18.1
>
  
Hongyu Wang Oct. 26, 2022, 8:13 a.m. UTC | #2
> Does this setting benefit all targets?  IIRC, in the past all
> benchmarks also enabled -funroll-loops, so it looks to me that
> unrolling small loops by default is a good compromise.

The idea to unroll small loops can be explained from the x86
micro-architecture. Modern x86 processors has multiple way instruction
decoder (5uops for icelake/zen3). So for small loop with <= 4
instructions (usually has 3 uops with a cmp/jmp pair that can be
macro-fused), the decoder would have 2 uops bubble for each iteration
and the pipeline could not be fully utilized. Therefore we decide to
unroll the 4 insn loop once to at least to full-fill the decoder and
enhance the pipeline utilization.
We are not familiar with micro architecture of other targets, we don't
know whether the unrolling could benefit the instruction decoder, so
the decision could be different.

Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2022年10月26日周三 14:57写道:

>
> On Wed, Oct 26, 2022 at 7:53 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> >
> > Hi,
> >
> > Inspired by rs6000 and s390 port changes, this patch
> > enables loop unrolling for small size loop at O2 by default.
> > The default behavior is to unroll loop with unknown trip-count and
> > less than 4 insns by 1 time.
> >
> > This improves 548.exchange2 by 3.5% on icelake and 6% on zen3 with
> > 1.2% codesize increment. For other benchmarks the variants are minor
> > and overall codesize increased by 0.2%.
> >
> > The kernel image size increased by 0.06%, and no impact on eembc.
>
> Does this setting benefit all targets?  IIRC, in the past all
> benchmarks also enabled -funroll-loops, so it looks to me that
> unrolling small loops by default is a good compromise.
>
> The patch is technically OK, but as a tuning default, I would leave
> the final approval to HJ.
>
> Thanks,
> Uros.
>
> >
> > Bootstrapped & regrtested on x86_64-pc-linux-gnu.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >         * common/config/i386/i386-common.cc (ix86_optimization_table):
> >         Enable loop unroll and small loop unroll at O2 by default.
> >         * config/i386/i386-options.cc
> >         (ix86_override_options_after_change):
> >         Disable small loop unroll when funroll-loops enabled, reset
> >         cunroll_grow_size when it is not explicitly enabled.
> >         (ix86_option_override_internal): Call
> >         ix86_override_options_after_change instead of calling
> >         ix86_recompute_optlev_based_flags and ix86_default_align
> >         separately.
> >         * config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
> >         factor if -munroll-only-small-loops enabled.
> >         * config/i386/i386.opt: Add -munroll-only-small-loops,
> >         -param=x86-small-unroll-ninsns= for loop insn limit,
> >         -param=x86-small-unroll-factor= for unroll factor.
> >         * doc/invoke.texi: Document -munroll-only-small-loops,
> >         x86-small-unroll-ninsns and x86-small-unroll-factor.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/pr86270.c: Add -mno-unroll-only-small-loops.
> >         * gcc.target/i386/pr93002.c: Likewise.
> > ---
> >  gcc/common/config/i386/i386-common.cc   |  6 ++++
> >  gcc/config/i386/i386-options.cc         | 40 ++++++++++++++++++++++---
> >  gcc/config/i386/i386.cc                 | 13 ++++++++
> >  gcc/config/i386/i386.opt                | 13 ++++++++
> >  gcc/doc/invoke.texi                     | 14 +++++++++
> >  gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
> >  7 files changed, 84 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
> > index d6a68dc9b1d..0e580b39d14 100644
> > --- a/gcc/common/config/i386/i386-common.cc
> > +++ b/gcc/common/config/i386/i386-common.cc
> > @@ -1686,6 +1686,12 @@ static const struct default_options ix86_option_optimization_table[] =
> >      /* The STC algorithm produces the smallest code at -Os, for x86.  */
> >      { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
> >        REORDER_BLOCKS_ALGORITHM_STC },
> > +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> > +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
> > +    /* Turns off -frename-registers and -fweb which are enabled by
> > +       funroll-loops.  */
> > +    { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
> > +    { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
> >      /* Turn off -fschedule-insns by default.  It tends to make the
> >         problem with not enough registers even worse.  */
> >      { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> > diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> > index acb2291e70f..6ea347c32e1 100644
> > --- a/gcc/config/i386/i386-options.cc
> > +++ b/gcc/config/i386/i386-options.cc
> > @@ -1819,8 +1819,43 @@ ix86_recompute_optlev_based_flags (struct gcc_options *opts,
> >  void
> >  ix86_override_options_after_change (void)
> >  {
> > +  /* Default align_* from the processor table.  */
> >    ix86_default_align (&global_options);
> > +
> >    ix86_recompute_optlev_based_flags (&global_options, &global_options_set);
> > +
> > +  /* Disable unrolling small loops when there's explicit
> > +     -f{,no}unroll-loop.  */
> > +  if ((OPTION_SET_P (flag_unroll_loops))
> > +     || (OPTION_SET_P (flag_unroll_all_loops)
> > +        && flag_unroll_all_loops))
> > +    {
> > +      if (!OPTION_SET_P (ix86_unroll_only_small_loops))
> > +       ix86_unroll_only_small_loops = 0;
> > +      /* Re-enable -frename-registers and -fweb if funroll-loops
> > +        enabled.  */
> > +      if (!OPTION_SET_P (flag_web))
> > +       flag_web = flag_unroll_loops;
> > +      if (!OPTION_SET_P (flag_rename_registers))
> > +       flag_rename_registers = flag_unroll_loops;
> > +      if (!OPTION_SET_P (flag_cunroll_grow_size))
> > +       flag_cunroll_grow_size = flag_unroll_loops
> > +                                || flag_peel_loops
> > +                                || optimize >= 3;
> > +    }
> > +  else
> > +    {
> > +      if (!OPTION_SET_P (flag_cunroll_grow_size))
> > +       flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
> > +      /* Disables loop unrolling if -mno-unroll-only-small-loops is
> > +        explicitly set and -funroll-loops is not enabled.  */
> > +      if (OPTION_SET_P (ix86_unroll_only_small_loops)
> > +         && !ix86_unroll_only_small_loops
> > +         && !(OPTION_SET_P (flag_unroll_loops)
> > +              || OPTION_SET_P (flag_unroll_all_loops)))
> > +       flag_unroll_loops = flag_unroll_all_loops = 0;
> > +    }
> > +
> >  }
> >
> >  /* Clear stack slot assignments remembered from previous functions.
> > @@ -2332,7 +2367,7 @@ ix86_option_override_internal (bool main_args_p,
> >
> >    set_ix86_tune_features (opts, ix86_tune, opts->x_ix86_dump_tunes);
> >
> > -  ix86_recompute_optlev_based_flags (opts, opts_set);
> > +  ix86_override_options_after_change ();
> >
> >    ix86_tune_cost = processor_cost_table[ix86_tune];
> >    /* TODO: ix86_cost should be chosen at instruction or function granuality
> > @@ -2363,9 +2398,6 @@ ix86_option_override_internal (bool main_args_p,
> >        || TARGET_64BIT_P (opts->x_ix86_isa_flags))
> >      opts->x_ix86_regparm = REGPARM_MAX;
> >
> > -  /* Default align_* from the processor table.  */
> > -  ix86_default_align (opts);
> > -
> >    /* Provide default for -mbranch-cost= value.  */
> >    SET_OPTION_IF_UNSET (opts, opts_set, ix86_branch_cost,
> >                        ix86_tune_cost->branch_cost);
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 480db35f6cd..75829a5d0f4 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -23820,6 +23820,19 @@ ix86_loop_unroll_adjust (unsigned nunroll, class loop *loop)
> >    unsigned i;
> >    unsigned mem_count = 0;
> >
> > +  /* Unroll small size loop when unroll factor is not explicitly
> > +     specified.  */
> > +  if (ix86_unroll_only_small_loops && !loop->unroll)
> > +    {
> > +      int small_unroll = 0;
> > +      if (loop->ninsns <= (unsigned) ix86_small_unroll_ninsns)
> > +       small_unroll = MIN ((unsigned) ix86_small_unroll_factor,
> > +                           nunroll);
> > +      else
> > +       small_unroll = 1;
> > +      return small_unroll;
> > +    }
> > +
> >    if (!TARGET_ADJUST_UNROLL)
> >       return nunroll;
> >
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > index 0dbaacb57ed..a724c73c0c4 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -1214,3 +1214,16 @@ Do not use GOT to access external symbols.
> >  -param=x86-stlf-window-ninsns=
> >  Target Joined UInteger Var(x86_stlf_window_ninsns) Init(64) Param
> >  Instructions number above which STFL stall penalty can be compensated.
> > +
> > +munroll-only-small-loops
> > +Target Var(ix86_unroll_only_small_loops) Init(0) Save
> > +Enable conservative small loop unrolling.
> > +
> > +-param=x86-small-unroll-ninsns=
> > +Target Joined UInteger Var(ix86_small_unroll_ninsns) Init(4) Param
> > +Insturctions number limit for loop to be unrolled under
> > +-munroll-only-small-loops.
> > +
> > +-param=x86-small-unroll-factor=
> > +Target Joined UInteger Var(ix86_small_unroll_factor) Init(2) Param
> > +Unroll factor for -munroll-only-small-loops.
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index cd4d3c1d72c..b6fa79eccc3 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -15779,6 +15779,14 @@ The following choices of @var{name} are available on i386 and x86_64 targets:
> >  @item x86-stlf-window-ninsns
> >  Instructions number above which STFL stall penalty can be compensated.
> >
> > +@item x86-small-unroll-ninsns
> > +If -munroll-only-small-loops is enabled, only unroll loops with instruction
> > +count less than this parameter. The default value is 4.
> > +
> > +@item x86-small-unroll-factor
> > +If -munroll-only-small-loops is enabled, reset the unroll factor with this
> > +value. The default value is 2 which means the loop will be unrolled once.
> > +
> >  @end table
> >
> >  @end table
> > @@ -25186,6 +25194,12 @@ environments where no dynamic link is performed, like firmwares, OS
> >  kernels, executables linked with @option{-static} or @option{-static-pie}.
> >  @option{-mdirect-extern-access} is not compatible with @option{-fPIC} or
> >  @option{-fpic}.
> > +
> > +@item -munroll-only-small-loops
> > +@itemx -mno-unroll-only-small-loops
> > +@opindex munroll-only-small-loops
> > +Controls conservative small loop unrolling. It is default enbaled by
> > +O2, and unrolls loop with less than 4 insns by 1 time.
> >  @end table
> >
> >  @node M32C Options
> > diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c b/gcc/testsuite/gcc.target/i386/pr86270.c
> > index 81841ef5bd7..cbc9fbb0450 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr86270.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr86270.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2" } */
> > +/* { dg-options "-O2 -mno-unroll-only-small-loops" } */
> >
> >  int *a;
> >  long len;
> > diff --git a/gcc/testsuite/gcc.target/i386/pr93002.c b/gcc/testsuite/gcc.target/i386/pr93002.c
> > index 0248fcc00a5..f75a847f75d 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr93002.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr93002.c
> > @@ -1,6 +1,6 @@
> >  /* PR target/93002 */
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2" } */
> > +/* { dg-options "-O2 -mno-unroll-only-small-loops" } */
> >  /* { dg-final { scan-assembler-not "cmp\[^\n\r]*-1" } } */
> >
> >  volatile int sink;
> > --
> > 2.18.1
> >
  
Richard Biener Oct. 28, 2022, 7:33 a.m. UTC | #3
On Wed, Oct 26, 2022 at 7:53 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> Hi,
>
> Inspired by rs6000 and s390 port changes, this patch
> enables loop unrolling for small size loop at O2 by default.
> The default behavior is to unroll loop with unknown trip-count and
> less than 4 insns by 1 time.
>
> This improves 548.exchange2 by 3.5% on icelake and 6% on zen3 with
> 1.2% codesize increment. For other benchmarks the variants are minor
> and overall codesize increased by 0.2%.
>
> The kernel image size increased by 0.06%, and no impact on eembc.
>
> Bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
>         * common/config/i386/i386-common.cc (ix86_optimization_table):
>         Enable loop unroll and small loop unroll at O2 by default.
>         * config/i386/i386-options.cc
>         (ix86_override_options_after_change):
>         Disable small loop unroll when funroll-loops enabled, reset
>         cunroll_grow_size when it is not explicitly enabled.
>         (ix86_option_override_internal): Call
>         ix86_override_options_after_change instead of calling
>         ix86_recompute_optlev_based_flags and ix86_default_align
>         separately.
>         * config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
>         factor if -munroll-only-small-loops enabled.
>         * config/i386/i386.opt: Add -munroll-only-small-loops,
>         -param=x86-small-unroll-ninsns= for loop insn limit,
>         -param=x86-small-unroll-factor= for unroll factor.
>         * doc/invoke.texi: Document -munroll-only-small-loops,
>         x86-small-unroll-ninsns and x86-small-unroll-factor.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/pr86270.c: Add -mno-unroll-only-small-loops.
>         * gcc.target/i386/pr93002.c: Likewise.
> ---
>  gcc/common/config/i386/i386-common.cc   |  6 ++++
>  gcc/config/i386/i386-options.cc         | 40 ++++++++++++++++++++++---
>  gcc/config/i386/i386.cc                 | 13 ++++++++
>  gcc/config/i386/i386.opt                | 13 ++++++++
>  gcc/doc/invoke.texi                     | 14 +++++++++
>  gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
>  7 files changed, 84 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
> index d6a68dc9b1d..0e580b39d14 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -1686,6 +1686,12 @@ static const struct default_options ix86_option_optimization_table[] =
>      /* The STC algorithm produces the smallest code at -Os, for x86.  */
>      { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
>        REORDER_BLOCKS_ALGORITHM_STC },
> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
> +    /* Turns off -frename-registers and -fweb which are enabled by
> +       funroll-loops.  */
> +    { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
> +    { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },

I'm quite sure that if this works it's not by intention.  Doesn't this
also disable
register renaming and web when the user explicitely specifies -funroll-loops?

Doesn't this change -funroll-loops behavior everywhere, only unrolling small
loops?

I'd like to see a -munroll-only-small-loops addition that doesn't have any such
effects.  Note RTL unrolling could also
conditionally enabled on a new -funroll-small-loops which wouldn't enable
register renaming or web.

>      /* Turn off -fschedule-insns by default.  It tends to make the
>         problem with not enough registers even worse.  */
>      { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index acb2291e70f..6ea347c32e1 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -1819,8 +1819,43 @@ ix86_recompute_optlev_based_flags (struct gcc_options *opts,
>  void
>  ix86_override_options_after_change (void)
>  {
> +  /* Default align_* from the processor table.  */
>    ix86_default_align (&global_options);
> +
>    ix86_recompute_optlev_based_flags (&global_options, &global_options_set);
> +
> +  /* Disable unrolling small loops when there's explicit
> +     -f{,no}unroll-loop.  */
> +  if ((OPTION_SET_P (flag_unroll_loops))
> +     || (OPTION_SET_P (flag_unroll_all_loops)
> +        && flag_unroll_all_loops))
> +    {
> +      if (!OPTION_SET_P (ix86_unroll_only_small_loops))
> +       ix86_unroll_only_small_loops = 0;
> +      /* Re-enable -frename-registers and -fweb if funroll-loops
> +        enabled.  */
> +      if (!OPTION_SET_P (flag_web))
> +       flag_web = flag_unroll_loops;
> +      if (!OPTION_SET_P (flag_rename_registers))
> +       flag_rename_registers = flag_unroll_loops;
> +      if (!OPTION_SET_P (flag_cunroll_grow_size))
> +       flag_cunroll_grow_size = flag_unroll_loops
> +                                || flag_peel_loops
> +                                || optimize >= 3;
> +    }
> +  else
> +    {
> +      if (!OPTION_SET_P (flag_cunroll_grow_size))
> +       flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
> +      /* Disables loop unrolling if -mno-unroll-only-small-loops is
> +        explicitly set and -funroll-loops is not enabled.  */
> +      if (OPTION_SET_P (ix86_unroll_only_small_loops)
> +         && !ix86_unroll_only_small_loops
> +         && !(OPTION_SET_P (flag_unroll_loops)
> +              || OPTION_SET_P (flag_unroll_all_loops)))
> +       flag_unroll_loops = flag_unroll_all_loops = 0;
> +    }

Ugh, that's all quite ugly and unmaintainable, no?

> +
>  }
>
>  /* Clear stack slot assignments remembered from previous functions.
> @@ -2332,7 +2367,7 @@ ix86_option_override_internal (bool main_args_p,
>
>    set_ix86_tune_features (opts, ix86_tune, opts->x_ix86_dump_tunes);
>
> -  ix86_recompute_optlev_based_flags (opts, opts_set);
> +  ix86_override_options_after_change ();
>
>    ix86_tune_cost = processor_cost_table[ix86_tune];
>    /* TODO: ix86_cost should be chosen at instruction or function granuality
> @@ -2363,9 +2398,6 @@ ix86_option_override_internal (bool main_args_p,
>        || TARGET_64BIT_P (opts->x_ix86_isa_flags))
>      opts->x_ix86_regparm = REGPARM_MAX;
>
> -  /* Default align_* from the processor table.  */
> -  ix86_default_align (opts);
> -
>    /* Provide default for -mbranch-cost= value.  */
>    SET_OPTION_IF_UNSET (opts, opts_set, ix86_branch_cost,
>                        ix86_tune_cost->branch_cost);
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 480db35f6cd..75829a5d0f4 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -23820,6 +23820,19 @@ ix86_loop_unroll_adjust (unsigned nunroll, class loop *loop)
>    unsigned i;
>    unsigned mem_count = 0;
>
> +  /* Unroll small size loop when unroll factor is not explicitly
> +     specified.  */
> +  if (ix86_unroll_only_small_loops && !loop->unroll)
> +    {
> +      int small_unroll = 0;
> +      if (loop->ninsns <= (unsigned) ix86_small_unroll_ninsns)
> +       small_unroll = MIN ((unsigned) ix86_small_unroll_factor,
> +                           nunroll);
> +      else
> +       small_unroll = 1;
> +      return small_unroll;
> +    }
> +
>    if (!TARGET_ADJUST_UNROLL)
>       return nunroll;
>
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 0dbaacb57ed..a724c73c0c4 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1214,3 +1214,16 @@ Do not use GOT to access external symbols.
>  -param=x86-stlf-window-ninsns=
>  Target Joined UInteger Var(x86_stlf_window_ninsns) Init(64) Param
>  Instructions number above which STFL stall penalty can be compensated.
> +
> +munroll-only-small-loops
> +Target Var(ix86_unroll_only_small_loops) Init(0) Save
> +Enable conservative small loop unrolling.
> +
> +-param=x86-small-unroll-ninsns=
> +Target Joined UInteger Var(ix86_small_unroll_ninsns) Init(4) Param
> +Insturctions number limit for loop to be unrolled under
> +-munroll-only-small-loops.
> +
> +-param=x86-small-unroll-factor=
> +Target Joined UInteger Var(ix86_small_unroll_factor) Init(2) Param
> +Unroll factor for -munroll-only-small-loops.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index cd4d3c1d72c..b6fa79eccc3 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -15779,6 +15779,14 @@ The following choices of @var{name} are available on i386 and x86_64 targets:
>  @item x86-stlf-window-ninsns
>  Instructions number above which STFL stall penalty can be compensated.
>
> +@item x86-small-unroll-ninsns
> +If -munroll-only-small-loops is enabled, only unroll loops with instruction
> +count less than this parameter. The default value is 4.
> +
> +@item x86-small-unroll-factor
> +If -munroll-only-small-loops is enabled, reset the unroll factor with this
> +value. The default value is 2 which means the loop will be unrolled once.
> +
>  @end table
>
>  @end table
> @@ -25186,6 +25194,12 @@ environments where no dynamic link is performed, like firmwares, OS
>  kernels, executables linked with @option{-static} or @option{-static-pie}.
>  @option{-mdirect-extern-access} is not compatible with @option{-fPIC} or
>  @option{-fpic}.
> +
> +@item -munroll-only-small-loops
> +@itemx -mno-unroll-only-small-loops
> +@opindex munroll-only-small-loops
> +Controls conservative small loop unrolling. It is default enbaled by
> +O2, and unrolls loop with less than 4 insns by 1 time.
>  @end table
>
>  @node M32C Options
> diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c b/gcc/testsuite/gcc.target/i386/pr86270.c
> index 81841ef5bd7..cbc9fbb0450 100644
> --- a/gcc/testsuite/gcc.target/i386/pr86270.c
> +++ b/gcc/testsuite/gcc.target/i386/pr86270.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -mno-unroll-only-small-loops" } */
>
>  int *a;
>  long len;
> diff --git a/gcc/testsuite/gcc.target/i386/pr93002.c b/gcc/testsuite/gcc.target/i386/pr93002.c
> index 0248fcc00a5..f75a847f75d 100644
> --- a/gcc/testsuite/gcc.target/i386/pr93002.c
> +++ b/gcc/testsuite/gcc.target/i386/pr93002.c
> @@ -1,6 +1,6 @@
>  /* PR target/93002 */
>  /* { dg-do compile } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -mno-unroll-only-small-loops" } */
>  /* { dg-final { scan-assembler-not "cmp\[^\n\r]*-1" } } */
>
>  volatile int sink;
> --
> 2.18.1
>
  
Hongyu Wang Oct. 28, 2022, 8:03 a.m. UTC | #4
> Ugh, that's all quite ugly and unmaintainable, no?
Agreed, I have the same feeling.

> I'm quite sure that if this works it's not by intention.  Doesn't this
> also disable
> register renaming and web when the user explicitely specifies -funroll-loops?
>
> Doesn't this change -funroll-loops behavior everywhere, only unrolling small
> loops?

The ugly part ensures that -funroll-loops would not be affected at all
by -munroll-only-small-loops.

>
> I'd like to see a -munroll-only-small-loops addition that doesn't have any such
> effects.  Note RTL unrolling could also
> conditionally enabled on a new -funroll-small-loops which wouldn't enable
> register renaming or web.

Did you mean something like

index b9e07973dd6..b707d4afb84 100644
--- a/gcc/loop-init.cc
+++ b/gcc/loop-init.cc
@@ -567,7 +567,8 @@ public:
   /* opt_pass methods: */
   bool gate (function *) final override
     {
-      return (flag_unroll_loops || flag_unroll_all_loops || cfun->has_unroll);
+      return (flag_unroll_loops || flag_unroll_all_loops || cfun->has_unroll
+             || flag_unroll_only_small_loops);
     }

then the backend can turn it on by default in O2?
I don't know if there is a way to turn on middle-end pass by
target-specific flags.

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> 于2022年10月28日周五 15:33写道:
>
> On Wed, Oct 26, 2022 at 7:53 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> >
> > Hi,
> >
> > Inspired by rs6000 and s390 port changes, this patch
> > enables loop unrolling for small size loop at O2 by default.
> > The default behavior is to unroll loop with unknown trip-count and
> > less than 4 insns by 1 time.
> >
> > This improves 548.exchange2 by 3.5% on icelake and 6% on zen3 with
> > 1.2% codesize increment. For other benchmarks the variants are minor
> > and overall codesize increased by 0.2%.
> >
> > The kernel image size increased by 0.06%, and no impact on eembc.
> >
> > Bootstrapped & regrtested on x86_64-pc-linux-gnu.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >         * common/config/i386/i386-common.cc (ix86_optimization_table):
> >         Enable loop unroll and small loop unroll at O2 by default.
> >         * config/i386/i386-options.cc
> >         (ix86_override_options_after_change):
> >         Disable small loop unroll when funroll-loops enabled, reset
> >         cunroll_grow_size when it is not explicitly enabled.
> >         (ix86_option_override_internal): Call
> >         ix86_override_options_after_change instead of calling
> >         ix86_recompute_optlev_based_flags and ix86_default_align
> >         separately.
> >         * config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
> >         factor if -munroll-only-small-loops enabled.
> >         * config/i386/i386.opt: Add -munroll-only-small-loops,
> >         -param=x86-small-unroll-ninsns= for loop insn limit,
> >         -param=x86-small-unroll-factor= for unroll factor.
> >         * doc/invoke.texi: Document -munroll-only-small-loops,
> >         x86-small-unroll-ninsns and x86-small-unroll-factor.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/pr86270.c: Add -mno-unroll-only-small-loops.
> >         * gcc.target/i386/pr93002.c: Likewise.
> > ---
> >  gcc/common/config/i386/i386-common.cc   |  6 ++++
> >  gcc/config/i386/i386-options.cc         | 40 ++++++++++++++++++++++---
> >  gcc/config/i386/i386.cc                 | 13 ++++++++
> >  gcc/config/i386/i386.opt                | 13 ++++++++
> >  gcc/doc/invoke.texi                     | 14 +++++++++
> >  gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
> >  7 files changed, 84 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
> > index d6a68dc9b1d..0e580b39d14 100644
> > --- a/gcc/common/config/i386/i386-common.cc
> > +++ b/gcc/common/config/i386/i386-common.cc
> > @@ -1686,6 +1686,12 @@ static const struct default_options ix86_option_optimization_table[] =
> >      /* The STC algorithm produces the smallest code at -Os, for x86.  */
> >      { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
> >        REORDER_BLOCKS_ALGORITHM_STC },
> > +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> > +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
> > +    /* Turns off -frename-registers and -fweb which are enabled by
> > +       funroll-loops.  */
> > +    { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
> > +    { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
>
> I'm quite sure that if this works it's not by intention.  Doesn't this
> also disable
> register renaming and web when the user explicitely specifies -funroll-loops?
>
> Doesn't this change -funroll-loops behavior everywhere, only unrolling small
> loops?
>
> I'd like to see a -munroll-only-small-loops addition that doesn't have any such
> effects.  Note RTL unrolling could also
> conditionally enabled on a new -funroll-small-loops which wouldn't enable
> register renaming or web.
>
> >      /* Turn off -fschedule-insns by default.  It tends to make the
> >         problem with not enough registers even worse.  */
> >      { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> > diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> > index acb2291e70f..6ea347c32e1 100644
> > --- a/gcc/config/i386/i386-options.cc
> > +++ b/gcc/config/i386/i386-options.cc
> > @@ -1819,8 +1819,43 @@ ix86_recompute_optlev_based_flags (struct gcc_options *opts,
> >  void
> >  ix86_override_options_after_change (void)
> >  {
> > +  /* Default align_* from the processor table.  */
> >    ix86_default_align (&global_options);
> > +
> >    ix86_recompute_optlev_based_flags (&global_options, &global_options_set);
> > +
> > +  /* Disable unrolling small loops when there's explicit
> > +     -f{,no}unroll-loop.  */
> > +  if ((OPTION_SET_P (flag_unroll_loops))
> > +     || (OPTION_SET_P (flag_unroll_all_loops)
> > +        && flag_unroll_all_loops))
> > +    {
> > +      if (!OPTION_SET_P (ix86_unroll_only_small_loops))
> > +       ix86_unroll_only_small_loops = 0;
> > +      /* Re-enable -frename-registers and -fweb if funroll-loops
> > +        enabled.  */
> > +      if (!OPTION_SET_P (flag_web))
> > +       flag_web = flag_unroll_loops;
> > +      if (!OPTION_SET_P (flag_rename_registers))
> > +       flag_rename_registers = flag_unroll_loops;
> > +      if (!OPTION_SET_P (flag_cunroll_grow_size))
> > +       flag_cunroll_grow_size = flag_unroll_loops
> > +                                || flag_peel_loops
> > +                                || optimize >= 3;
> > +    }
> > +  else
> > +    {
> > +      if (!OPTION_SET_P (flag_cunroll_grow_size))
> > +       flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
> > +      /* Disables loop unrolling if -mno-unroll-only-small-loops is
> > +        explicitly set and -funroll-loops is not enabled.  */
> > +      if (OPTION_SET_P (ix86_unroll_only_small_loops)
> > +         && !ix86_unroll_only_small_loops
> > +         && !(OPTION_SET_P (flag_unroll_loops)
> > +              || OPTION_SET_P (flag_unroll_all_loops)))
> > +       flag_unroll_loops = flag_unroll_all_loops = 0;
> > +    }
>
> Ugh, that's all quite ugly and unmaintainable, no?
>
> > +
> >  }
> >
> >  /* Clear stack slot assignments remembered from previous functions.
> > @@ -2332,7 +2367,7 @@ ix86_option_override_internal (bool main_args_p,
> >
> >    set_ix86_tune_features (opts, ix86_tune, opts->x_ix86_dump_tunes);
> >
> > -  ix86_recompute_optlev_based_flags (opts, opts_set);
> > +  ix86_override_options_after_change ();
> >
> >    ix86_tune_cost = processor_cost_table[ix86_tune];
> >    /* TODO: ix86_cost should be chosen at instruction or function granuality
> > @@ -2363,9 +2398,6 @@ ix86_option_override_internal (bool main_args_p,
> >        || TARGET_64BIT_P (opts->x_ix86_isa_flags))
> >      opts->x_ix86_regparm = REGPARM_MAX;
> >
> > -  /* Default align_* from the processor table.  */
> > -  ix86_default_align (opts);
> > -
> >    /* Provide default for -mbranch-cost= value.  */
> >    SET_OPTION_IF_UNSET (opts, opts_set, ix86_branch_cost,
> >                        ix86_tune_cost->branch_cost);
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 480db35f6cd..75829a5d0f4 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -23820,6 +23820,19 @@ ix86_loop_unroll_adjust (unsigned nunroll, class loop *loop)
> >    unsigned i;
> >    unsigned mem_count = 0;
> >
> > +  /* Unroll small size loop when unroll factor is not explicitly
> > +     specified.  */
> > +  if (ix86_unroll_only_small_loops && !loop->unroll)
> > +    {
> > +      int small_unroll = 0;
> > +      if (loop->ninsns <= (unsigned) ix86_small_unroll_ninsns)
> > +       small_unroll = MIN ((unsigned) ix86_small_unroll_factor,
> > +                           nunroll);
> > +      else
> > +       small_unroll = 1;
> > +      return small_unroll;
> > +    }
> > +
> >    if (!TARGET_ADJUST_UNROLL)
> >       return nunroll;
> >
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > index 0dbaacb57ed..a724c73c0c4 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -1214,3 +1214,16 @@ Do not use GOT to access external symbols.
> >  -param=x86-stlf-window-ninsns=
> >  Target Joined UInteger Var(x86_stlf_window_ninsns) Init(64) Param
> >  Instructions number above which STFL stall penalty can be compensated.
> > +
> > +munroll-only-small-loops
> > +Target Var(ix86_unroll_only_small_loops) Init(0) Save
> > +Enable conservative small loop unrolling.
> > +
> > +-param=x86-small-unroll-ninsns=
> > +Target Joined UInteger Var(ix86_small_unroll_ninsns) Init(4) Param
> > +Insturctions number limit for loop to be unrolled under
> > +-munroll-only-small-loops.
> > +
> > +-param=x86-small-unroll-factor=
> > +Target Joined UInteger Var(ix86_small_unroll_factor) Init(2) Param
> > +Unroll factor for -munroll-only-small-loops.
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index cd4d3c1d72c..b6fa79eccc3 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -15779,6 +15779,14 @@ The following choices of @var{name} are available on i386 and x86_64 targets:
> >  @item x86-stlf-window-ninsns
> >  Instructions number above which STFL stall penalty can be compensated.
> >
> > +@item x86-small-unroll-ninsns
> > +If -munroll-only-small-loops is enabled, only unroll loops with instruction
> > +count less than this parameter. The default value is 4.
> > +
> > +@item x86-small-unroll-factor
> > +If -munroll-only-small-loops is enabled, reset the unroll factor with this
> > +value. The default value is 2 which means the loop will be unrolled once.
> > +
> >  @end table
> >
> >  @end table
> > @@ -25186,6 +25194,12 @@ environments where no dynamic link is performed, like firmwares, OS
> >  kernels, executables linked with @option{-static} or @option{-static-pie}.
> >  @option{-mdirect-extern-access} is not compatible with @option{-fPIC} or
> >  @option{-fpic}.
> > +
> > +@item -munroll-only-small-loops
> > +@itemx -mno-unroll-only-small-loops
> > +@opindex munroll-only-small-loops
> > +Controls conservative small loop unrolling. It is default enbaled by
> > +O2, and unrolls loop with less than 4 insns by 1 time.
> >  @end table
> >
> >  @node M32C Options
> > diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c b/gcc/testsuite/gcc.target/i386/pr86270.c
> > index 81841ef5bd7..cbc9fbb0450 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr86270.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr86270.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2" } */
> > +/* { dg-options "-O2 -mno-unroll-only-small-loops" } */
> >
> >  int *a;
> >  long len;
> > diff --git a/gcc/testsuite/gcc.target/i386/pr93002.c b/gcc/testsuite/gcc.target/i386/pr93002.c
> > index 0248fcc00a5..f75a847f75d 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr93002.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr93002.c
> > @@ -1,6 +1,6 @@
> >  /* PR target/93002 */
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2" } */
> > +/* { dg-options "-O2 -mno-unroll-only-small-loops" } */
> >  /* { dg-final { scan-assembler-not "cmp\[^\n\r]*-1" } } */
> >
> >  volatile int sink;
> > --
> > 2.18.1
> >
  
Richard Biener Oct. 28, 2022, 8:41 a.m. UTC | #5
On Fri, Oct 28, 2022 at 10:08 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
>
> > Ugh, that's all quite ugly and unmaintainable, no?
> Agreed, I have the same feeling.
>
> > I'm quite sure that if this works it's not by intention.  Doesn't this
> > also disable
> > register renaming and web when the user explicitely specifies -funroll-loops?
> >
> > Doesn't this change -funroll-loops behavior everywhere, only unrolling small
> > loops?
>
> The ugly part ensures that -funroll-loops would not be affected at all
> by -munroll-only-small-loops.
>
> >
> > I'd like to see a -munroll-only-small-loops addition that doesn't have any such
> > effects.  Note RTL unrolling could also
> > conditionally enabled on a new -funroll-small-loops which wouldn't enable
> > register renaming or web.
>
> Did you mean something like
>
> index b9e07973dd6..b707d4afb84 100644
> --- a/gcc/loop-init.cc
> +++ b/gcc/loop-init.cc
> @@ -567,7 +567,8 @@ public:
>    /* opt_pass methods: */
>    bool gate (function *) final override
>      {
> -      return (flag_unroll_loops || flag_unroll_all_loops || cfun->has_unroll);
> +      return (flag_unroll_loops || flag_unroll_all_loops || cfun->has_unroll
> +             || flag_unroll_only_small_loops);
>      }
>
> then the backend can turn it on by default in O2?
> I don't know if there is a way to turn on middle-end pass by
> target-specific flags.

There isn't, it would need to be a target hook.  Currently only i386, rs6000
and s390 have loop_unroll_adjust.  We could enable the pass conditional
on implementing that hook (and optimize >= 2, hopefully the pass only
unrolls loops that are optimized for speed)?

>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> 于2022年10月28日周五 15:33写道:
> >
> > On Wed, Oct 26, 2022 at 7:53 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> > >
> > > Hi,
> > >
> > > Inspired by rs6000 and s390 port changes, this patch
> > > enables loop unrolling for small size loop at O2 by default.
> > > The default behavior is to unroll loop with unknown trip-count and
> > > less than 4 insns by 1 time.
> > >
> > > This improves 548.exchange2 by 3.5% on icelake and 6% on zen3 with
> > > 1.2% codesize increment. For other benchmarks the variants are minor
> > > and overall codesize increased by 0.2%.
> > >
> > > The kernel image size increased by 0.06%, and no impact on eembc.
> > >
> > > Bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > >
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > >         * common/config/i386/i386-common.cc (ix86_optimization_table):
> > >         Enable loop unroll and small loop unroll at O2 by default.
> > >         * config/i386/i386-options.cc
> > >         (ix86_override_options_after_change):
> > >         Disable small loop unroll when funroll-loops enabled, reset
> > >         cunroll_grow_size when it is not explicitly enabled.
> > >         (ix86_option_override_internal): Call
> > >         ix86_override_options_after_change instead of calling
> > >         ix86_recompute_optlev_based_flags and ix86_default_align
> > >         separately.
> > >         * config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
> > >         factor if -munroll-only-small-loops enabled.
> > >         * config/i386/i386.opt: Add -munroll-only-small-loops,
> > >         -param=x86-small-unroll-ninsns= for loop insn limit,
> > >         -param=x86-small-unroll-factor= for unroll factor.
> > >         * doc/invoke.texi: Document -munroll-only-small-loops,
> > >         x86-small-unroll-ninsns and x86-small-unroll-factor.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/i386/pr86270.c: Add -mno-unroll-only-small-loops.
> > >         * gcc.target/i386/pr93002.c: Likewise.
> > > ---
> > >  gcc/common/config/i386/i386-common.cc   |  6 ++++
> > >  gcc/config/i386/i386-options.cc         | 40 ++++++++++++++++++++++---
> > >  gcc/config/i386/i386.cc                 | 13 ++++++++
> > >  gcc/config/i386/i386.opt                | 13 ++++++++
> > >  gcc/doc/invoke.texi                     | 14 +++++++++
> > >  gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
> > >  7 files changed, 84 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
> > > index d6a68dc9b1d..0e580b39d14 100644
> > > --- a/gcc/common/config/i386/i386-common.cc
> > > +++ b/gcc/common/config/i386/i386-common.cc
> > > @@ -1686,6 +1686,12 @@ static const struct default_options ix86_option_optimization_table[] =
> > >      /* The STC algorithm produces the smallest code at -Os, for x86.  */
> > >      { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
> > >        REORDER_BLOCKS_ALGORITHM_STC },
> > > +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> > > +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
> > > +    /* Turns off -frename-registers and -fweb which are enabled by
> > > +       funroll-loops.  */
> > > +    { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
> > > +    { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
> >
> > I'm quite sure that if this works it's not by intention.  Doesn't this
> > also disable
> > register renaming and web when the user explicitely specifies -funroll-loops?
> >
> > Doesn't this change -funroll-loops behavior everywhere, only unrolling small
> > loops?
> >
> > I'd like to see a -munroll-only-small-loops addition that doesn't have any such
> > effects.  Note RTL unrolling could also
> > conditionally enabled on a new -funroll-small-loops which wouldn't enable
> > register renaming or web.
> >
> > >      /* Turn off -fschedule-insns by default.  It tends to make the
> > >         problem with not enough registers even worse.  */
> > >      { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> > > diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> > > index acb2291e70f..6ea347c32e1 100644
> > > --- a/gcc/config/i386/i386-options.cc
> > > +++ b/gcc/config/i386/i386-options.cc
> > > @@ -1819,8 +1819,43 @@ ix86_recompute_optlev_based_flags (struct gcc_options *opts,
> > >  void
> > >  ix86_override_options_after_change (void)
> > >  {
> > > +  /* Default align_* from the processor table.  */
> > >    ix86_default_align (&global_options);
> > > +
> > >    ix86_recompute_optlev_based_flags (&global_options, &global_options_set);
> > > +
> > > +  /* Disable unrolling small loops when there's explicit
> > > +     -f{,no}unroll-loop.  */
> > > +  if ((OPTION_SET_P (flag_unroll_loops))
> > > +     || (OPTION_SET_P (flag_unroll_all_loops)
> > > +        && flag_unroll_all_loops))
> > > +    {
> > > +      if (!OPTION_SET_P (ix86_unroll_only_small_loops))
> > > +       ix86_unroll_only_small_loops = 0;
> > > +      /* Re-enable -frename-registers and -fweb if funroll-loops
> > > +        enabled.  */
> > > +      if (!OPTION_SET_P (flag_web))
> > > +       flag_web = flag_unroll_loops;
> > > +      if (!OPTION_SET_P (flag_rename_registers))
> > > +       flag_rename_registers = flag_unroll_loops;
> > > +      if (!OPTION_SET_P (flag_cunroll_grow_size))
> > > +       flag_cunroll_grow_size = flag_unroll_loops
> > > +                                || flag_peel_loops
> > > +                                || optimize >= 3;
> > > +    }
> > > +  else
> > > +    {
> > > +      if (!OPTION_SET_P (flag_cunroll_grow_size))
> > > +       flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
> > > +      /* Disables loop unrolling if -mno-unroll-only-small-loops is
> > > +        explicitly set and -funroll-loops is not enabled.  */
> > > +      if (OPTION_SET_P (ix86_unroll_only_small_loops)
> > > +         && !ix86_unroll_only_small_loops
> > > +         && !(OPTION_SET_P (flag_unroll_loops)
> > > +              || OPTION_SET_P (flag_unroll_all_loops)))
> > > +       flag_unroll_loops = flag_unroll_all_loops = 0;
> > > +    }
> >
> > Ugh, that's all quite ugly and unmaintainable, no?
> >
> > > +
> > >  }
> > >
> > >  /* Clear stack slot assignments remembered from previous functions.
> > > @@ -2332,7 +2367,7 @@ ix86_option_override_internal (bool main_args_p,
> > >
> > >    set_ix86_tune_features (opts, ix86_tune, opts->x_ix86_dump_tunes);
> > >
> > > -  ix86_recompute_optlev_based_flags (opts, opts_set);
> > > +  ix86_override_options_after_change ();
> > >
> > >    ix86_tune_cost = processor_cost_table[ix86_tune];
> > >    /* TODO: ix86_cost should be chosen at instruction or function granuality
> > > @@ -2363,9 +2398,6 @@ ix86_option_override_internal (bool main_args_p,
> > >        || TARGET_64BIT_P (opts->x_ix86_isa_flags))
> > >      opts->x_ix86_regparm = REGPARM_MAX;
> > >
> > > -  /* Default align_* from the processor table.  */
> > > -  ix86_default_align (opts);
> > > -
> > >    /* Provide default for -mbranch-cost= value.  */
> > >    SET_OPTION_IF_UNSET (opts, opts_set, ix86_branch_cost,
> > >                        ix86_tune_cost->branch_cost);
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index 480db35f6cd..75829a5d0f4 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -23820,6 +23820,19 @@ ix86_loop_unroll_adjust (unsigned nunroll, class loop *loop)
> > >    unsigned i;
> > >    unsigned mem_count = 0;
> > >
> > > +  /* Unroll small size loop when unroll factor is not explicitly
> > > +     specified.  */
> > > +  if (ix86_unroll_only_small_loops && !loop->unroll)
> > > +    {
> > > +      int small_unroll = 0;
> > > +      if (loop->ninsns <= (unsigned) ix86_small_unroll_ninsns)
> > > +       small_unroll = MIN ((unsigned) ix86_small_unroll_factor,
> > > +                           nunroll);
> > > +      else
> > > +       small_unroll = 1;
> > > +      return small_unroll;
> > > +    }
> > > +
> > >    if (!TARGET_ADJUST_UNROLL)
> > >       return nunroll;
> > >
> > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > > index 0dbaacb57ed..a724c73c0c4 100644
> > > --- a/gcc/config/i386/i386.opt
> > > +++ b/gcc/config/i386/i386.opt
> > > @@ -1214,3 +1214,16 @@ Do not use GOT to access external symbols.
> > >  -param=x86-stlf-window-ninsns=
> > >  Target Joined UInteger Var(x86_stlf_window_ninsns) Init(64) Param
> > >  Instructions number above which STFL stall penalty can be compensated.
> > > +
> > > +munroll-only-small-loops
> > > +Target Var(ix86_unroll_only_small_loops) Init(0) Save
> > > +Enable conservative small loop unrolling.
> > > +
> > > +-param=x86-small-unroll-ninsns=
> > > +Target Joined UInteger Var(ix86_small_unroll_ninsns) Init(4) Param
> > > +Insturctions number limit for loop to be unrolled under
> > > +-munroll-only-small-loops.
> > > +
> > > +-param=x86-small-unroll-factor=
> > > +Target Joined UInteger Var(ix86_small_unroll_factor) Init(2) Param
> > > +Unroll factor for -munroll-only-small-loops.
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index cd4d3c1d72c..b6fa79eccc3 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -15779,6 +15779,14 @@ The following choices of @var{name} are available on i386 and x86_64 targets:
> > >  @item x86-stlf-window-ninsns
> > >  Instructions number above which STFL stall penalty can be compensated.
> > >
> > > +@item x86-small-unroll-ninsns
> > > +If -munroll-only-small-loops is enabled, only unroll loops with instruction
> > > +count less than this parameter. The default value is 4.
> > > +
> > > +@item x86-small-unroll-factor
> > > +If -munroll-only-small-loops is enabled, reset the unroll factor with this
> > > +value. The default value is 2 which means the loop will be unrolled once.
> > > +
> > >  @end table
> > >
> > >  @end table
> > > @@ -25186,6 +25194,12 @@ environments where no dynamic link is performed, like firmwares, OS
> > >  kernels, executables linked with @option{-static} or @option{-static-pie}.
> > >  @option{-mdirect-extern-access} is not compatible with @option{-fPIC} or
> > >  @option{-fpic}.
> > > +
> > > +@item -munroll-only-small-loops
> > > +@itemx -mno-unroll-only-small-loops
> > > +@opindex munroll-only-small-loops
> > > +Controls conservative small loop unrolling. It is default enbaled by
> > > +O2, and unrolls loop with less than 4 insns by 1 time.
> > >  @end table
> > >
> > >  @node M32C Options
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c b/gcc/testsuite/gcc.target/i386/pr86270.c
> > > index 81841ef5bd7..cbc9fbb0450 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr86270.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr86270.c
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do compile } */
> > > -/* { dg-options "-O2" } */
> > > +/* { dg-options "-O2 -mno-unroll-only-small-loops" } */
> > >
> > >  int *a;
> > >  long len;
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr93002.c b/gcc/testsuite/gcc.target/i386/pr93002.c
> > > index 0248fcc00a5..f75a847f75d 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr93002.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr93002.c
> > > @@ -1,6 +1,6 @@
> > >  /* PR target/93002 */
> > >  /* { dg-do compile } */
> > > -/* { dg-options "-O2" } */
> > > +/* { dg-options "-O2 -mno-unroll-only-small-loops" } */
> > >  /* { dg-final { scan-assembler-not "cmp\[^\n\r]*-1" } } */
> > >
> > >  volatile int sink;
> > > --
> > > 2.18.1
> > >
  

Patch

diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index d6a68dc9b1d..0e580b39d14 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -1686,6 +1686,12 @@  static const struct default_options ix86_option_optimization_table[] =
     /* The STC algorithm produces the smallest code at -Os, for x86.  */
     { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
       REORDER_BLOCKS_ALGORITHM_STC },
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
+    /* Turns off -frename-registers and -fweb which are enabled by
+       funroll-loops.  */
+    { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
+    { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
     /* Turn off -fschedule-insns by default.  It tends to make the
        problem with not enough registers even worse.  */
     { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index acb2291e70f..6ea347c32e1 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -1819,8 +1819,43 @@  ix86_recompute_optlev_based_flags (struct gcc_options *opts,
 void
 ix86_override_options_after_change (void)
 {
+  /* Default align_* from the processor table.  */
   ix86_default_align (&global_options);
+
   ix86_recompute_optlev_based_flags (&global_options, &global_options_set);
+
+  /* Disable unrolling small loops when there's explicit
+     -f{,no}unroll-loop.  */
+  if ((OPTION_SET_P (flag_unroll_loops))
+     || (OPTION_SET_P (flag_unroll_all_loops)
+	 && flag_unroll_all_loops))
+    {
+      if (!OPTION_SET_P (ix86_unroll_only_small_loops))
+	ix86_unroll_only_small_loops = 0;
+      /* Re-enable -frename-registers and -fweb if funroll-loops
+	 enabled.  */
+      if (!OPTION_SET_P (flag_web))
+	flag_web = flag_unroll_loops;
+      if (!OPTION_SET_P (flag_rename_registers))
+	flag_rename_registers = flag_unroll_loops;
+      if (!OPTION_SET_P (flag_cunroll_grow_size))
+	flag_cunroll_grow_size = flag_unroll_loops
+				 || flag_peel_loops
+				 || optimize >= 3;
+    }
+  else
+    {
+      if (!OPTION_SET_P (flag_cunroll_grow_size))
+	flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
+      /* Disables loop unrolling if -mno-unroll-only-small-loops is
+	 explicitly set and -funroll-loops is not enabled.  */
+      if (OPTION_SET_P (ix86_unroll_only_small_loops)
+	  && !ix86_unroll_only_small_loops
+	  && !(OPTION_SET_P (flag_unroll_loops)
+	       || OPTION_SET_P (flag_unroll_all_loops)))
+	flag_unroll_loops = flag_unroll_all_loops = 0;
+    }
+
 }
 
 /* Clear stack slot assignments remembered from previous functions.
@@ -2332,7 +2367,7 @@  ix86_option_override_internal (bool main_args_p,
 
   set_ix86_tune_features (opts, ix86_tune, opts->x_ix86_dump_tunes);
 
-  ix86_recompute_optlev_based_flags (opts, opts_set);
+  ix86_override_options_after_change ();
 
   ix86_tune_cost = processor_cost_table[ix86_tune];
   /* TODO: ix86_cost should be chosen at instruction or function granuality
@@ -2363,9 +2398,6 @@  ix86_option_override_internal (bool main_args_p,
       || TARGET_64BIT_P (opts->x_ix86_isa_flags))
     opts->x_ix86_regparm = REGPARM_MAX;
 
-  /* Default align_* from the processor table.  */
-  ix86_default_align (opts);
-
   /* Provide default for -mbranch-cost= value.  */
   SET_OPTION_IF_UNSET (opts, opts_set, ix86_branch_cost,
 		       ix86_tune_cost->branch_cost);
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 480db35f6cd..75829a5d0f4 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23820,6 +23820,19 @@  ix86_loop_unroll_adjust (unsigned nunroll, class loop *loop)
   unsigned i;
   unsigned mem_count = 0;
 
+  /* Unroll small size loop when unroll factor is not explicitly
+     specified.  */
+  if (ix86_unroll_only_small_loops && !loop->unroll)
+    {
+      int small_unroll = 0;
+      if (loop->ninsns <= (unsigned) ix86_small_unroll_ninsns)
+	small_unroll = MIN ((unsigned) ix86_small_unroll_factor,
+			    nunroll);
+      else
+	small_unroll = 1;
+      return small_unroll;
+    }
+
   if (!TARGET_ADJUST_UNROLL)
      return nunroll;
 
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 0dbaacb57ed..a724c73c0c4 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1214,3 +1214,16 @@  Do not use GOT to access external symbols.
 -param=x86-stlf-window-ninsns=
 Target Joined UInteger Var(x86_stlf_window_ninsns) Init(64) Param
 Instructions number above which STFL stall penalty can be compensated.
+
+munroll-only-small-loops
+Target Var(ix86_unroll_only_small_loops) Init(0) Save
+Enable conservative small loop unrolling.
+
+-param=x86-small-unroll-ninsns=
+Target Joined UInteger Var(ix86_small_unroll_ninsns) Init(4) Param
+Insturctions number limit for loop to be unrolled under
+-munroll-only-small-loops.
+
+-param=x86-small-unroll-factor=
+Target Joined UInteger Var(ix86_small_unroll_factor) Init(2) Param
+Unroll factor for -munroll-only-small-loops.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index cd4d3c1d72c..b6fa79eccc3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15779,6 +15779,14 @@  The following choices of @var{name} are available on i386 and x86_64 targets:
 @item x86-stlf-window-ninsns
 Instructions number above which STFL stall penalty can be compensated.
 
+@item x86-small-unroll-ninsns
+If -munroll-only-small-loops is enabled, only unroll loops with instruction
+count less than this parameter. The default value is 4.
+
+@item x86-small-unroll-factor
+If -munroll-only-small-loops is enabled, reset the unroll factor with this
+value. The default value is 2 which means the loop will be unrolled once.
+
 @end table
 
 @end table
@@ -25186,6 +25194,12 @@  environments where no dynamic link is performed, like firmwares, OS
 kernels, executables linked with @option{-static} or @option{-static-pie}.
 @option{-mdirect-extern-access} is not compatible with @option{-fPIC} or
 @option{-fpic}.
+
+@item -munroll-only-small-loops
+@itemx -mno-unroll-only-small-loops
+@opindex munroll-only-small-loops
+Controls conservative small loop unrolling. It is default enbaled by
+O2, and unrolls loop with less than 4 insns by 1 time.
 @end table
 
 @node M32C Options
diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c b/gcc/testsuite/gcc.target/i386/pr86270.c
index 81841ef5bd7..cbc9fbb0450 100644
--- a/gcc/testsuite/gcc.target/i386/pr86270.c
+++ b/gcc/testsuite/gcc.target/i386/pr86270.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mno-unroll-only-small-loops" } */
 
 int *a;
 long len;
diff --git a/gcc/testsuite/gcc.target/i386/pr93002.c b/gcc/testsuite/gcc.target/i386/pr93002.c
index 0248fcc00a5..f75a847f75d 100644
--- a/gcc/testsuite/gcc.target/i386/pr93002.c
+++ b/gcc/testsuite/gcc.target/i386/pr93002.c
@@ -1,6 +1,6 @@ 
 /* PR target/93002 */
 /* { dg-do compile } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mno-unroll-only-small-loops" } */
 /* { dg-final { scan-assembler-not "cmp\[^\n\r]*-1" } } */
 
 volatile int sink;