x86: Properly find the maximum stack slot alignment

Message ID CAMe9rOrneWEBFFx_m5iBoHTBkAtZzbFrjdGY3V9ZqtXzufhthA@mail.gmail.com
State New
Headers
Series x86: Properly find the maximum stack slot alignment |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-arm-bootstrap success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-aarch64-bootstrap success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Test passed

Commit Message

H.J. Lu Feb. 12, 2025, 5:24 a.m. UTC
  Don't assume that stack slots can only be accessed by stack or frame
registers.  We first find all registers defined by stack or frame
registers.  Then check memory accesses by such registers, including
stack and frame registers.

gcc/

PR target/109780
PR target/109093
* config/i386/i386.cc (ix86_update_stack_alignment): New.
(ix86_find_all_reg_use): Likewise.
(ix86_find_max_used_stack_alignment): Also check memory accesses
from registers defined by stack or frame registers.

gcc/testsuite/

PR target/109780
PR target/109093
* g++.target/i386/pr109780-1.C: New test.
* gcc.target/i386/pr109093-1.c: Likewise.
* gcc.target/i386/pr109780-1.c: Likewise.
* gcc.target/i386/pr109780-2.c: Likewise.
  

Comments

Uros Bizjak Feb. 12, 2025, 7:15 a.m. UTC | #1
On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> Don't assume that stack slots can only be accessed by stack or frame
> registers.  We first find all registers defined by stack or frame
> registers.  Then check memory accesses by such registers, including
> stack and frame registers.

I wonder if this approach will also handle cases like e.g.:

    lea    64(%rsp), %rbx
    ...
    movaps    16(%rbx, %rcx), %xmm0

and:

    movq    %rsp, %rax
    ...
    lea    64(%rax), %rbx
    ...
    movaps    16(%rbx), %xmm0

?

Thanks,
uros.


>
> gcc/
>
> PR target/109780
> PR target/109093
> * config/i386/i386.cc (ix86_update_stack_alignment): New.
> (ix86_find_all_reg_use): Likewise.
> (ix86_find_max_used_stack_alignment): Also check memory accesses
> from registers defined by stack or frame registers.
>
> gcc/testsuite/
>
> PR target/109780
> PR target/109093
> * g++.target/i386/pr109780-1.C: New test.
> * gcc.target/i386/pr109093-1.c: Likewise.
> * gcc.target/i386/pr109780-1.c: Likewise.
> * gcc.target/i386/pr109780-2.c: Likewise.
>
> --
> H.J.
  
H.J. Lu Feb. 12, 2025, 7:22 a.m. UTC | #2
On Wed, Feb 12, 2025 at 3:16 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > Don't assume that stack slots can only be accessed by stack or frame
> > registers.  We first find all registers defined by stack or frame
> > registers.  Then check memory accesses by such registers, including
> > stack and frame registers.
>
> I wonder if this approach will also handle cases like e.g.:
>
>     lea    64(%rsp), %rbx
>     ...
>     movaps    16(%rbx, %rcx), %xmm0
>
> and:
>
>     movq    %rsp, %rax
>     ...
>     lea    64(%rax), %rbx
>     ...
>     movaps    16(%rbx), %xmm0
>
> ?

They should be handled by ix86_find_all_reg_use

 do
    {
      reg = bitmap_clear_first_set_bit (worklist);
      ix86_find_all_reg_use (stack_slot_access, reg, worklist);
    }
  while (!bitmap_empty_p (worklist));


> Thanks,
> uros.
>
>
> >
> > gcc/
> >
> > PR target/109780
> > PR target/109093
> > * config/i386/i386.cc (ix86_update_stack_alignment): New.
> > (ix86_find_all_reg_use): Likewise.
> > (ix86_find_max_used_stack_alignment): Also check memory accesses
> > from registers defined by stack or frame registers.
> >
> > gcc/testsuite/
> >
> > PR target/109780
> > PR target/109093
> > * g++.target/i386/pr109780-1.C: New test.
> > * gcc.target/i386/pr109093-1.c: Likewise.
> > * gcc.target/i386/pr109780-1.c: Likewise.
> > * gcc.target/i386/pr109780-2.c: Likewise.
> >
> > --
> > H.J.



--
H.J.
  
Sam James Feb. 12, 2025, 8:03 a.m. UTC | #3
"H.J. Lu" <hjl.tools@gmail.com> writes:

> Don't assume that stack slots can only be accessed by stack or frame
> registers.  We first find all registers defined by stack or frame
> registers.  Then check memory accesses by such registers, including
> stack and frame registers.
>
> gcc/
>
> PR target/109780
> PR target/109093
> * config/i386/i386.cc (ix86_update_stack_alignment): New.
> (ix86_find_all_reg_use): Likewise.
> (ix86_find_max_used_stack_alignment): Also check memory accesses
> from registers defined by stack or frame registers.
>
> gcc/testsuite/
>
> PR target/109780
> PR target/109093
> * g++.target/i386/pr109780-1.C: New test.
> * gcc.target/i386/pr109093-1.c: Likewise.
> * gcc.target/i386/pr109780-1.c: Likewise.
> * gcc.target/i386/pr109780-2.c: Likewise.

Please add the runtime testcase at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109780#c29 too.

Also, for pr109093-1.c, please initialise 'f' to 1 to avoid UB (division
by zero).
  
Uros Bizjak Feb. 12, 2025, 9:28 a.m. UTC | #4
On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> Don't assume that stack slots can only be accessed by stack or frame
> registers.  We first find all registers defined by stack or frame
> registers.  Then check memory accesses by such registers, including
> stack and frame registers.
>
> gcc/
>
> PR target/109780
> PR target/109093
> * config/i386/i386.cc (ix86_update_stack_alignment): New.
> (ix86_find_all_reg_use): Likewise.
> (ix86_find_max_used_stack_alignment): Also check memory accesses
> from registers defined by stack or frame registers.
>
> gcc/testsuite/
>
> PR target/109780
> PR target/109093
> * g++.target/i386/pr109780-1.C: New test.
> * gcc.target/i386/pr109093-1.c: Likewise.
> * gcc.target/i386/pr109780-1.c: Likewise.
> * gcc.target/i386/pr109780-2.c: Likewise.

> +/* Find all registers defined with REG.  */
> +
> +static void
> +ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
> +               unsigned int reg, auto_bitmap &worklist)
> +{
> +  for (df_ref ref = DF_REG_USE_CHAIN (reg);
> +       ref != NULL;
> +       ref = DF_REF_NEXT_REG (ref))
> +    {
> +      if (DF_REF_IS_ARTIFICIAL (ref))
> +    continue;
> +
> +      rtx_insn *insn = DF_REF_INSN (ref);
> +      if (!NONDEBUG_INSN_P (insn))
> +    continue;
> +
> +      rtx set = single_set (insn);
> +      if (!set)
> +    continue;
> +

Isn't the above condition a bit too limiting? We can have insn with
multiple sets in the chain.

The issue at hand is the correctness issue (the program will segfault
if registers are not tracked correctly), not some missing
optimization. I'd suggest to stay on the safe side and also process
PARALLELs. Something similar to e.g. store_data_bypass_p from
recog.cc:

--cut here--
  rtx set = single_set (insn);
  if (set)
    ix86_find_all_reg_use_1(...);

  rtx pat = PATTERN (insn);
  if (GET_CODE (pat) != PARALLEL)
    return false;

  for (int i = 0; i < XVECLEN (pat, 0); i++)
    {
      rtx exp = XVECEXP (pat, 0, i);

      if (GET_CODE (exp) == CLOBBER || GET_CODE (exp) == USE)
    continue;

      gcc_assert (GET_CODE (exp) == SET);

      ix86_find_all_reg_use_1(...);
    }
--cut here--

The above will make ix86_find_all_reg_use significantly more robust.

Uros.
  
H.J. Lu Feb. 12, 2025, 10:05 a.m. UTC | #5
On Wed, Feb 12, 2025 at 5:28 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > Don't assume that stack slots can only be accessed by stack or frame
> > registers.  We first find all registers defined by stack or frame
> > registers.  Then check memory accesses by such registers, including
> > stack and frame registers.
> >
> > gcc/
> >
> > PR target/109780
> > PR target/109093
> > * config/i386/i386.cc (ix86_update_stack_alignment): New.
> > (ix86_find_all_reg_use): Likewise.
> > (ix86_find_max_used_stack_alignment): Also check memory accesses
> > from registers defined by stack or frame registers.
> >
> > gcc/testsuite/
> >
> > PR target/109780
> > PR target/109093
> > * g++.target/i386/pr109780-1.C: New test.
> > * gcc.target/i386/pr109093-1.c: Likewise.
> > * gcc.target/i386/pr109780-1.c: Likewise.
> > * gcc.target/i386/pr109780-2.c: Likewise.
>
> > +/* Find all registers defined with REG.  */
> > +
> > +static void
> > +ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
> > +               unsigned int reg, auto_bitmap &worklist)
> > +{
> > +  for (df_ref ref = DF_REG_USE_CHAIN (reg);
> > +       ref != NULL;
> > +       ref = DF_REF_NEXT_REG (ref))
> > +    {
> > +      if (DF_REF_IS_ARTIFICIAL (ref))
> > +    continue;
> > +
> > +      rtx_insn *insn = DF_REF_INSN (ref);
> > +      if (!NONDEBUG_INSN_P (insn))
> > +    continue;
> > +
> > +      rtx set = single_set (insn);
> > +      if (!set)
> > +    continue;
> > +
>
> Isn't the above condition a bit too limiting? We can have insn with
> multiple sets in the chain.
>
> The issue at hand is the correctness issue (the program will segfault
> if registers are not tracked correctly), not some missing
> optimization. I'd suggest to stay on the safe side and also process
> PARALLELs. Something similar to e.g. store_data_bypass_p from
> recog.cc:
>
> --cut here--
>   rtx set = single_set (insn);
>   if (set)
>     ix86_find_all_reg_use_1(...);
>
>   rtx pat = PATTERN (insn);
>   if (GET_CODE (pat) != PARALLEL)
>     return false;
>
>   for (int i = 0; i < XVECLEN (pat, 0); i++)
>     {
>       rtx exp = XVECEXP (pat, 0, i);
>
>       if (GET_CODE (exp) == CLOBBER || GET_CODE (exp) == USE)
>     continue;
>
>       gcc_assert (GET_CODE (exp) == SET);
>
>       ix86_find_all_reg_use_1(...);
>     }
> --cut here--
>
> The above will make ix86_find_all_reg_use significantly more robust.
>
> Uros.

Like this?

/* Helper function for ix86_find_all_reg_use.  */

static void
ix86_find_all_reg_use_1 (rtx set, HARD_REG_SET &stack_slot_access,
                         auto_bitmap &worklist)
{
  rtx src = SET_SRC (set);
  if (MEM_P (src))
    return;

  rtx dest = SET_DEST (set);
  if (!REG_P (dest))
    return;

  if (TEST_HARD_REG_BIT (stack_slot_access, REGNO (dest)))
    return;

  /* Add this register to stack_slot_access.  */
  add_to_hard_reg_set (&stack_slot_access, Pmode, REGNO (dest));
  bitmap_set_bit (worklist, REGNO (dest));
}

/* Find all registers defined with REG.  */

static void
ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
                       unsigned int reg, auto_bitmap &worklist)
{
  for (df_ref ref = DF_REG_USE_CHAIN (reg);
       ref != NULL;
       ref = DF_REF_NEXT_REG (ref))
    {
      if (DF_REF_IS_ARTIFICIAL (ref))
        continue;

      rtx_insn *insn = DF_REF_INSN (ref);
      if (!NONDEBUG_INSN_P (insn))
        continue;

      rtx set = single_set (insn);
      if (set)
        ix86_find_all_reg_use_1 (set, stack_slot_access, worklist);

      rtx pat = PATTERN (insn);
      if (GET_CODE (pat) != PARALLEL)
        continue;

      for (int i = 0; i < XVECLEN (pat, 0); i++)
        {
          rtx exp = XVECEXP (pat, 0, i);

          if (GET_CODE (exp) == CLOBBER || GET_CODE (exp) == USE)
            continue;

          gcc_assert (GET_CODE (exp) == SET);

          ix86_find_all_reg_use_1 (exp, stack_slot_access, worklist);
        }
    }
}
  
H.J. Lu Feb. 12, 2025, 10:06 a.m. UTC | #6
On Wed, Feb 12, 2025 at 4:03 PM Sam James <sam@gentoo.org> wrote:
>
> "H.J. Lu" <hjl.tools@gmail.com> writes:
>
> > Don't assume that stack slots can only be accessed by stack or frame
> > registers.  We first find all registers defined by stack or frame
> > registers.  Then check memory accesses by such registers, including
> > stack and frame registers.
> >
> > gcc/
> >
> > PR target/109780
> > PR target/109093
> > * config/i386/i386.cc (ix86_update_stack_alignment): New.
> > (ix86_find_all_reg_use): Likewise.
> > (ix86_find_max_used_stack_alignment): Also check memory accesses
> > from registers defined by stack or frame registers.
> >
> > gcc/testsuite/
> >
> > PR target/109780
> > PR target/109093
> > * g++.target/i386/pr109780-1.C: New test.
> > * gcc.target/i386/pr109093-1.c: Likewise.
> > * gcc.target/i386/pr109780-1.c: Likewise.
> > * gcc.target/i386/pr109780-2.c: Likewise.
>
> Please add the runtime testcase at
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109780#c29 too.
>
> Also, for pr109093-1.c, please initialise 'f' to 1 to avoid UB (division
> by zero).

Will do.

Thanks.
  
Uros Bizjak Feb. 12, 2025, 10:23 a.m. UTC | #7
On Wed, Feb 12, 2025 at 11:06 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Wed, Feb 12, 2025 at 5:28 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > Don't assume that stack slots can only be accessed by stack or frame
> > > registers.  We first find all registers defined by stack or frame
> > > registers.  Then check memory accesses by such registers, including
> > > stack and frame registers.
> > >
> > > gcc/
> > >
> > > PR target/109780
> > > PR target/109093
> > > * config/i386/i386.cc (ix86_update_stack_alignment): New.
> > > (ix86_find_all_reg_use): Likewise.
> > > (ix86_find_max_used_stack_alignment): Also check memory accesses
> > > from registers defined by stack or frame registers.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/109780
> > > PR target/109093
> > > * g++.target/i386/pr109780-1.C: New test.
> > > * gcc.target/i386/pr109093-1.c: Likewise.
> > > * gcc.target/i386/pr109780-1.c: Likewise.
> > > * gcc.target/i386/pr109780-2.c: Likewise.
> >
> > > +/* Find all registers defined with REG.  */
> > > +
> > > +static void
> > > +ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
> > > +               unsigned int reg, auto_bitmap &worklist)
> > > +{
> > > +  for (df_ref ref = DF_REG_USE_CHAIN (reg);
> > > +       ref != NULL;
> > > +       ref = DF_REF_NEXT_REG (ref))
> > > +    {
> > > +      if (DF_REF_IS_ARTIFICIAL (ref))
> > > +    continue;
> > > +
> > > +      rtx_insn *insn = DF_REF_INSN (ref);
> > > +      if (!NONDEBUG_INSN_P (insn))
> > > +    continue;
> > > +
> > > +      rtx set = single_set (insn);
> > > +      if (!set)
> > > +    continue;
> > > +
> >
> > Isn't the above condition a bit too limiting? We can have insn with
> > multiple sets in the chain.
> >
> > The issue at hand is the correctness issue (the program will segfault
> > if registers are not tracked correctly), not some missing
> > optimization. I'd suggest to stay on the safe side and also process
> > PARALLELs. Something similar to e.g. store_data_bypass_p from
> > recog.cc:
> >
> > --cut here--
> >   rtx set = single_set (insn);
> >   if (set)
> >     ix86_find_all_reg_use_1(...);
> >
> >   rtx pat = PATTERN (insn);
> >   if (GET_CODE (pat) != PARALLEL)
> >     return false;
> >
> >   for (int i = 0; i < XVECLEN (pat, 0); i++)
> >     {
> >       rtx exp = XVECEXP (pat, 0, i);
> >
> >       if (GET_CODE (exp) == CLOBBER || GET_CODE (exp) == USE)
> >     continue;
> >
> >       gcc_assert (GET_CODE (exp) == SET);
> >
> >       ix86_find_all_reg_use_1(...);
> >     }
> > --cut here--
> >
> > The above will make ix86_find_all_reg_use significantly more robust.
> >
> > Uros.
>
> Like this?

Yes.

Thanks,
Uros.

>
> /* Helper function for ix86_find_all_reg_use.  */
>
> static void
> ix86_find_all_reg_use_1 (rtx set, HARD_REG_SET &stack_slot_access,
>                          auto_bitmap &worklist)
> {
>   rtx src = SET_SRC (set);
>   if (MEM_P (src))
>     return;
>
>   rtx dest = SET_DEST (set);
>   if (!REG_P (dest))
>     return;
>
>   if (TEST_HARD_REG_BIT (stack_slot_access, REGNO (dest)))
>     return;
>
>   /* Add this register to stack_slot_access.  */
>   add_to_hard_reg_set (&stack_slot_access, Pmode, REGNO (dest));
>   bitmap_set_bit (worklist, REGNO (dest));
> }
>
> /* Find all registers defined with REG.  */
>
> static void
> ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
>                        unsigned int reg, auto_bitmap &worklist)
> {
>   for (df_ref ref = DF_REG_USE_CHAIN (reg);
>        ref != NULL;
>        ref = DF_REF_NEXT_REG (ref))
>     {
>       if (DF_REF_IS_ARTIFICIAL (ref))
>         continue;
>
>       rtx_insn *insn = DF_REF_INSN (ref);
>       if (!NONDEBUG_INSN_P (insn))
>         continue;
>
>       rtx set = single_set (insn);
>       if (set)
>         ix86_find_all_reg_use_1 (set, stack_slot_access, worklist);
>
>       rtx pat = PATTERN (insn);
>       if (GET_CODE (pat) != PARALLEL)
>         continue;
>
>       for (int i = 0; i < XVECLEN (pat, 0); i++)
>         {
>           rtx exp = XVECEXP (pat, 0, i);
>
>           if (GET_CODE (exp) == CLOBBER || GET_CODE (exp) == USE)
>             continue;
>
>           gcc_assert (GET_CODE (exp) == SET);
>
>           ix86_find_all_reg_use_1 (exp, stack_slot_access, worklist);
>         }
>     }
> }
>
>
> --
> H.J.
  

Patch

From 13da9e9be612333b7df7f66cf4b4c1396a64d89d Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Tue, 14 Mar 2023 11:41:51 -0700
Subject: [PATCH] x86: Properly find the maximum stack slot alignment

Don't assume that stack slots can only be accessed by stack or frame
registers.  We first find all registers defined by stack or frame
registers.  Then check memory accesses by such registers, including
stack and frame registers.

gcc/

	PR target/109780
	PR target/109093
	* config/i386/i386.cc (ix86_update_stack_alignment): New.
	(ix86_find_all_reg_use): Likewise.
	(ix86_find_max_used_stack_alignment): Also check memory accesses
	from registers defined by stack or frame registers.

gcc/testsuite/

	PR target/109780
	PR target/109093
	* g++.target/i386/pr109780-1.C: New test.
	* gcc.target/i386/pr109093-1.c: Likewise.
	* gcc.target/i386/pr109780-1.c: Likewise.
	* gcc.target/i386/pr109780-2.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
---
 gcc/config/i386/i386.cc                    | 128 +++++++++++++++++----
 gcc/testsuite/g++.target/i386/pr109780-1.C |  72 ++++++++++++
 gcc/testsuite/gcc.target/i386/pr109093-1.c |  38 ++++++
 gcc/testsuite/gcc.target/i386/pr109780-1.c |  14 +++
 gcc/testsuite/gcc.target/i386/pr109780-2.c |  21 ++++
 5 files changed, 252 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr109780-1.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109093-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-2.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3128973ba79..495b97116a4 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -8466,6 +8466,65 @@  output_probe_stack_range (rtx reg, rtx end)
   return "";
 }
 
+/* Update the maximum stack slot alignment from memory alignment in
+   PAT.  */
+
+static void
+ix86_update_stack_alignment (rtx, const_rtx pat, void *data)
+{
+  /* This insn may reference stack slot.  Update the maximum stack slot
+     alignment.  */
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, pat, ALL)
+    if (MEM_P (*iter))
+      {
+	unsigned int alignment = MEM_ALIGN (*iter);
+	unsigned int *stack_alignment
+	  = (unsigned int *) data;
+	if (alignment > *stack_alignment)
+	  *stack_alignment = alignment;
+	break;
+      }
+}
+
+/* Find all registers defined with REG.  */
+
+static void
+ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
+		       unsigned int reg, auto_bitmap &worklist)
+{
+  for (df_ref ref = DF_REG_USE_CHAIN (reg);
+       ref != NULL;
+       ref = DF_REF_NEXT_REG (ref))
+    {
+      if (DF_REF_IS_ARTIFICIAL (ref))
+	continue;
+
+      rtx_insn *insn = DF_REF_INSN (ref);
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+
+      rtx set = single_set (insn);
+      if (!set)
+	continue;
+
+      rtx src = SET_SRC (set);
+      if (MEM_P (src))
+	continue;
+
+      rtx dest = SET_DEST (set);
+      if (!REG_P (dest))
+	continue;
+
+      if (TEST_HARD_REG_BIT (stack_slot_access, REGNO (dest)))
+	continue;
+
+      /* Add this register to stack_slot_access.  */
+      add_to_hard_reg_set (&stack_slot_access, Pmode, REGNO (dest));
+      bitmap_set_bit (worklist, REGNO (dest));
+    }
+}
+
 /* Set stack_frame_required to false if stack frame isn't required.
    Update STACK_ALIGNMENT to the largest alignment, in bits, of stack
    slot used if stack frame is required and CHECK_STACK_SLOT is true.  */
@@ -8484,10 +8543,6 @@  ix86_find_max_used_stack_alignment (unsigned int &stack_alignment,
   add_to_hard_reg_set (&set_up_by_prologue, Pmode,
 		       HARD_FRAME_POINTER_REGNUM);
 
-  /* The preferred stack alignment is the minimum stack alignment.  */
-  if (stack_alignment > crtl->preferred_stack_boundary)
-    stack_alignment = crtl->preferred_stack_boundary;
-
   bool require_stack_frame = false;
 
   FOR_EACH_BB_FN (bb, cfun)
@@ -8499,27 +8554,58 @@  ix86_find_max_used_stack_alignment (unsigned int &stack_alignment,
 				       set_up_by_prologue))
 	  {
 	    require_stack_frame = true;
-
-	    if (check_stack_slot)
-	      {
-		/* Find the maximum stack alignment.  */
-		subrtx_iterator::array_type array;
-		FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL)
-		  if (MEM_P (*iter)
-		      && (reg_mentioned_p (stack_pointer_rtx,
-					   *iter)
-			  || reg_mentioned_p (frame_pointer_rtx,
-					      *iter)))
-		    {
-		      unsigned int alignment = MEM_ALIGN (*iter);
-		      if (alignment > stack_alignment)
-			stack_alignment = alignment;
-		    }
-	      }
+	    break;
 	  }
     }
 
   cfun->machine->stack_frame_required = require_stack_frame;
+
+  /* Stop if we don't need to check stack slot.  */
+  if (!check_stack_slot)
+    return;
+
+  /* The preferred stack alignment is the minimum stack alignment.  */
+  if (stack_alignment > crtl->preferred_stack_boundary)
+    stack_alignment = crtl->preferred_stack_boundary;
+
+  HARD_REG_SET stack_slot_access;
+  CLEAR_HARD_REG_SET (stack_slot_access);
+
+  /* Stack slot can be accessed by stack pointer, frame pointer or
+     registers defined by stack pointer or frame pointer.  */
+  auto_bitmap worklist;
+  add_to_hard_reg_set (&stack_slot_access, Pmode,
+		       STACK_POINTER_REGNUM);
+  bitmap_set_bit (worklist, STACK_POINTER_REGNUM);
+  if (frame_pointer_needed)
+    {
+      add_to_hard_reg_set (&stack_slot_access, Pmode,
+			   HARD_FRAME_POINTER_REGNUM);
+      bitmap_set_bit (worklist, HARD_FRAME_POINTER_REGNUM);
+    }
+  unsigned int reg;
+  do
+    {
+      reg = bitmap_clear_first_set_bit (worklist);
+      ix86_find_all_reg_use (stack_slot_access, reg, worklist);
+    }
+  while (!bitmap_empty_p (worklist));
+
+  hard_reg_set_iterator hrsi;
+  EXECUTE_IF_SET_IN_HARD_REG_SET (stack_slot_access, 0, reg, hrsi)
+    for (df_ref ref = DF_REG_USE_CHAIN (reg);
+	 ref != NULL;
+	 ref = DF_REF_NEXT_REG (ref))
+      {
+	if (DF_REF_IS_ARTIFICIAL (ref))
+	  continue;
+
+	rtx_insn *insn = DF_REF_INSN (ref);
+	if (!NONDEBUG_INSN_P (insn))
+	  continue;
+	note_stores (insn, ix86_update_stack_alignment,
+		     &stack_alignment);
+      }
 }
 
 /* Finalize stack_realign_needed and frame_pointer_needed flags, which
diff --git a/gcc/testsuite/g++.target/i386/pr109780-1.C b/gcc/testsuite/g++.target/i386/pr109780-1.C
new file mode 100644
index 00000000000..7e3eabdec94
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr109780-1.C
@@ -0,0 +1,72 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target c++17 } */
+/* { dg-options "-O2 -mavx2 -mtune=haswell" } */
+
+template <typename _Tp> struct remove_reference {
+  using type = __remove_reference(_Tp);
+};
+template <typename T> struct MaybeStorageBase {
+  T val;
+  struct Union {
+    ~Union();
+  } mStorage;
+};
+template <typename T> struct MaybeStorage : MaybeStorageBase<T> {
+  char mIsSome;
+};
+template <typename T, typename U = typename remove_reference<T>::type>
+constexpr MaybeStorage<U> Some(T &&);
+template <typename T, typename U> constexpr MaybeStorage<U> Some(T &&aValue) {
+  return {aValue};
+}
+template <class> struct Span {
+  int operator[](long idx) {
+    int *__trans_tmp_4;
+    if (__builtin_expect(idx, 0))
+      *(int *)__null = false;
+    __trans_tmp_4 = storage_.data();
+    return __trans_tmp_4[idx];
+  }
+  struct {
+    int *data() { return data_; }
+    int *data_;
+  } storage_;
+};
+struct Variant {
+  template <typename RefT> Variant(RefT) {}
+};
+long from_i, from___trans_tmp_9;
+namespace js::intl {
+struct DecimalNumber {
+  Variant string_;
+  unsigned long significandStart_;
+  unsigned long significandEnd_;
+  bool zero_ = false;
+  bool negative_;
+  template <typename CharT> DecimalNumber(CharT string) : string_(string) {}
+  template <typename CharT>
+  static MaybeStorage<DecimalNumber> from(Span<const CharT>);
+  void from();
+};
+} // namespace js::intl
+void js::intl::DecimalNumber::from() {
+  Span<const char16_t> __trans_tmp_3;
+  from(__trans_tmp_3);
+}
+template <typename CharT>
+MaybeStorage<js::intl::DecimalNumber>
+js::intl::DecimalNumber::from(Span<const CharT> chars) {
+  DecimalNumber number(chars);
+  if (auto ch = chars[from_i]) {
+    from_i++;
+    number.negative_ = ch == '-';
+  }
+  while (from___trans_tmp_9 && chars[from_i])
+    ;
+  if (chars[from_i])
+    while (chars[from_i - 1])
+      number.zero_ = true;
+  return Some(number);
+}
+
+/* { dg-final { scan-assembler-not "and\[lq\]?\[^\\n\]*-32,\[^\\n\]*sp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr109093-1.c b/gcc/testsuite/gcc.target/i386/pr109093-1.c
new file mode 100644
index 00000000000..87804a15e8e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr109093-1.c
@@ -0,0 +1,38 @@ 
+/* { dg-do run }  */
+/* { dg-options "-O2 -mavx2 -mtune=znver1 -ftrivial-auto-var-init=zero -fno-stack-protector" } */
+
+#include <cpuid.h>
+
+int a, b, c, d;
+char e, f;
+short g, h, i;
+
+void
+run (void)
+{
+  short j;
+
+  for (; g >= 0; --g)
+    {
+      int *k[10];
+
+      for (d = 0; d < 10; d++)
+	k[d] = &b;
+
+      c = *k[1];
+
+      for (; a;)
+	j = i - c / f || (e ^= h);
+    }
+}
+
+int
+main (void)
+{
+  __builtin_cpu_init ();
+
+  if (__builtin_cpu_supports ("avx2"))
+    run ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr109780-1.c b/gcc/testsuite/gcc.target/i386/pr109780-1.c
new file mode 100644
index 00000000000..6b06947f2a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr109780-1.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=skylake" } */
+
+char perm[64];
+
+void
+__attribute__((noipa))
+foo (int n)
+{
+  for (int i = 0; i < n; ++i)
+    perm[i] = i;
+}
+
+/* { dg-final { scan-assembler-not "and\[lq\]?\[^\\n\]*-32,\[^\\n\]*sp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr109780-2.c b/gcc/testsuite/gcc.target/i386/pr109780-2.c
new file mode 100644
index 00000000000..152da06c6ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr109780-2.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=skylake" } */
+
+#define N 9
+
+void
+f (double x, double y, double *res)
+{
+  y = -y;
+  for (int i = 0; i < N; ++i)
+    {
+      double tmp = y;
+      y = x;
+      x = tmp;
+      res[i] = i;
+    }
+  res[N] = y * y;
+  res[N + 1] = x;
+}
+
+/* { dg-final { scan-assembler-not "and\[lq\]?\[^\\n\]*-32,\[^\\n\]*sp" } } */
-- 
2.48.1