middle-end cse: Make sure duplicate elements are not entered into the equivalence set [PR103404]

Message ID patch-15109-tamar@arm.com
State Committed
Headers
Series middle-end cse: Make sure duplicate elements are not entered into the equivalence set [PR103404] |

Commit Message

Tamar Christina Nov. 29, 2021, 7:40 a.m. UTC
  Hi All,

CSE uses equivalence classes to keep track of expressions that all have the same
values at the current point in the program.

Normal equivalences through SETs only insert and perform lookups in this set but
equivalence determined from comparisons, e.g.

(insn 46 44 47 7 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:SI 105 [ iD.2893 ])
            (const_int 0 [0]))) "cse.c":18:22 7 {*cmpsi_ccno_1}
     (expr_list:REG_DEAD (reg:SI 105 [ iD.2893 ])
        (nil)))

creates the equivalence EQ on (reg:SI 105 [ iD.2893 ]) and (const_int 0 [0]).

This causes a merge to happen between the two equivalence sets denoted by
(const_int 0 [0]) and (reg:SI 105 [ iD.2893 ]) respectively.

The operation happens through merge_equiv_classes however this function has an
invariant that the classes to be merge not contain any duplicates.  This is
because it frees entries before merging.

The given testcase when using the supplied flags trigger an ICE due to the
equivalence set being

(rr) p dump_class (class1)
Equivalence chain for (reg:SI 105 [ iD.2893 ]):
(reg:SI 105 [ iD.2893 ])
$3 = void

(rr) p dump_class (class2)
Equivalence chain for (const_int 0 [0]):
(const_int 0 [0])
(reg:SI 97 [ _10 ])
(reg:SI 97 [ _10 ])
$4 = void

This happens because the original INSN being recorded is

(insn 18 17 24 2 (set (subreg:V1SI (reg:SI 97 [ _10 ]) 0)
        (const_vector:V1SI [
                (const_int 0 [0])
            ])) "cse.c":11:9 1363 {*movv1si_internal}
     (expr_list:REG_UNUSED (reg:SI 97 [ _10 ])
        (nil)))

and we end up generating two equivalences. the first one is simply that
reg:SI 97 is 0.  The second one is that 0 can be extracted from the V1SI, so
subreg (subreg:V1SI (reg:SI 97) 0) 0 == 0.  This nested subreg gets folded away
to just reg:SI 97 and we re-insert the same equivalence.

This patch changes it so that once we figure out the bucket to insert into we
check if the equivalence set already contains the entry and if so just return
the existing entry and exit.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.


Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR rtl-optimization/103404
	* cse.c (insert_with_costs): Check if item exists already before adding
	a new entry in the equivalence class.

gcc/testsuite/ChangeLog:

	PR rtl-optimization/103404
	* gcc.target/i386/pr103404.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/cse.c b/gcc/cse.c
index c1c7d0ca27b73c4b944b4719f95fece74e0358d5..08295246c594109e947276051c6776e4cabca4ec 100644


--
  

Comments

Richard Biener Nov. 29, 2021, 9:01 a.m. UTC | #1
On Mon, 29 Nov 2021, Tamar Christina wrote:

> Hi All,
> 
> CSE uses equivalence classes to keep track of expressions that all have the same
> values at the current point in the program.
> 
> Normal equivalences through SETs only insert and perform lookups in this set but
> equivalence determined from comparisons, e.g.
> 
> (insn 46 44 47 7 (set (reg:CCZ 17 flags)
>         (compare:CCZ (reg:SI 105 [ iD.2893 ])
>             (const_int 0 [0]))) "cse.c":18:22 7 {*cmpsi_ccno_1}
>      (expr_list:REG_DEAD (reg:SI 105 [ iD.2893 ])
>         (nil)))
> 
> creates the equivalence EQ on (reg:SI 105 [ iD.2893 ]) and (const_int 0 [0]).
> 
> This causes a merge to happen between the two equivalence sets denoted by
> (const_int 0 [0]) and (reg:SI 105 [ iD.2893 ]) respectively.
> 
> The operation happens through merge_equiv_classes however this function has an
> invariant that the classes to be merge not contain any duplicates.  This is
> because it frees entries before merging.
> 
> The given testcase when using the supplied flags trigger an ICE due to the
> equivalence set being
> 
> (rr) p dump_class (class1)
> Equivalence chain for (reg:SI 105 [ iD.2893 ]):
> (reg:SI 105 [ iD.2893 ])
> $3 = void
> 
> (rr) p dump_class (class2)
> Equivalence chain for (const_int 0 [0]):
> (const_int 0 [0])
> (reg:SI 97 [ _10 ])
> (reg:SI 97 [ _10 ])
> $4 = void
> 
> This happens because the original INSN being recorded is
> 
> (insn 18 17 24 2 (set (subreg:V1SI (reg:SI 97 [ _10 ]) 0)
>         (const_vector:V1SI [
>                 (const_int 0 [0])
>             ])) "cse.c":11:9 1363 {*movv1si_internal}
>      (expr_list:REG_UNUSED (reg:SI 97 [ _10 ])
>         (nil)))
> 
> and we end up generating two equivalences. the first one is simply that
> reg:SI 97 is 0.  The second one is that 0 can be extracted from the V1SI, so
> subreg (subreg:V1SI (reg:SI 97) 0) 0 == 0.  This nested subreg gets folded away
> to just reg:SI 97 and we re-insert the same equivalence.
> 
> This patch changes it so that once we figure out the bucket to insert into we
> check if the equivalence set already contains the entry and if so just return
> the existing entry and exit.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no regressions.
> 
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR rtl-optimization/103404
> 	* cse.c (insert_with_costs): Check if item exists already before adding
> 	a new entry in the equivalence class.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR rtl-optimization/103404
> 	* gcc.target/i386/pr103404.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/cse.c b/gcc/cse.c
> index c1c7d0ca27b73c4b944b4719f95fece74e0358d5..08295246c594109e947276051c6776e4cabca4ec 100644
> --- a/gcc/cse.c
> +++ b/gcc/cse.c
> @@ -1537,6 +1537,17 @@ insert_with_costs (rtx x, struct table_elt *classp, unsigned int hash,
>    if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER)
>      add_to_hard_reg_set (&hard_regs_in_table, GET_MODE (x), REGNO (x));
>  
> +  /* We cannot allow a duplicate to be entered into the equivalence sets
> +     and so we should perform a check before we do any allocations or
> +     change the buckets.  */
> +  if (classp)
> +    {
> +      struct table_elt *p;
> +      for (p = classp; p; p = p->next_same_value)
> +	if (exp_equiv_p (p->exp, x, 1, false))
> +	  return p;

not really a review, leaving that to who approved the original change,
but these things always look bad - this linear list walk makes
insert_with_costs quadratic.  Is there any mitigation (like limiting
the number of entries?), is that already an existing problem elsewhere
in CSE?

> +    }
> +
>    /* Put an element for X into the right hash bucket.  */
>  
>    elt = free_element_chain;
> diff --git a/gcc/testsuite/gcc.target/i386/pr103404.c b/gcc/testsuite/gcc.target/i386/pr103404.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..66f33645301db09503fc0977fd0f061a19e56ea5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103404.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Og -fcse-follow-jumps -fno-dce -fno-early-inlining -fgcse -fharden-conditional-branches -frerun-cse-after-loop -fno-tree-ccp -mavx5124fmaps -std=c99 -w" } */
> +
> +typedef unsigned __attribute__((__vector_size__ (4))) U;
> +typedef unsigned __attribute__((__vector_size__ (16))) V;
> +typedef unsigned __attribute__((__vector_size__ (64))) W;
> +
> +int x, y;
> +
> +V v;
> +W w;
> +
> +inline
> +int bar (U a)
> +{
> +  a |= x;
> +  W k =
> +    __builtin_shufflevector (v, 5 / a,
> +			     2, 4, 0, 2, 4, 1, 0, 1,
> +			     1, 2, 1, 3, 0, 4, 4, 0);
> +  w = k;
> +  y = 0;
> +}
> +
> +int
> +foo ()
> +{
> +  bar ((U){0xffffffff});
> +  for (unsigned i; i < sizeof (foo);)
> +    ;
> +}
> +
> 
> 
>
  
Tamar Christina Nov. 29, 2021, 9:16 a.m. UTC | #2
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, November 29, 2021 9:02 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@tachyum.com;
> Richard Sandiford <Richard.Sandiford@arm.com>
> Subject: Re: [PATCH]middle-end cse: Make sure duplicate elements are not
> entered into the equivalence set [PR103404]
> 
> On Mon, 29 Nov 2021, Tamar Christina wrote:
> 
> > Hi All,
> >
> > CSE uses equivalence classes to keep track of expressions that all
> > have the same values at the current point in the program.
> >
> > Normal equivalences through SETs only insert and perform lookups in
> > this set but equivalence determined from comparisons, e.g.
> >
> > (insn 46 44 47 7 (set (reg:CCZ 17 flags)
> >         (compare:CCZ (reg:SI 105 [ iD.2893 ])
> >             (const_int 0 [0]))) "cse.c":18:22 7 {*cmpsi_ccno_1}
> >      (expr_list:REG_DEAD (reg:SI 105 [ iD.2893 ])
> >         (nil)))
> >
> > creates the equivalence EQ on (reg:SI 105 [ iD.2893 ]) and (const_int 0 [0]).
> >
> > This causes a merge to happen between the two equivalence sets denoted
> > by (const_int 0 [0]) and (reg:SI 105 [ iD.2893 ]) respectively.
> >
> > The operation happens through merge_equiv_classes however this
> > function has an invariant that the classes to be merge not contain any
> > duplicates.  This is because it frees entries before merging.
> >
> > The given testcase when using the supplied flags trigger an ICE due to
> > the equivalence set being
> >
> > (rr) p dump_class (class1)
> > Equivalence chain for (reg:SI 105 [ iD.2893 ]):
> > (reg:SI 105 [ iD.2893 ])
> > $3 = void
> >
> > (rr) p dump_class (class2)
> > Equivalence chain for (const_int 0 [0]):
> > (const_int 0 [0])
> > (reg:SI 97 [ _10 ])
> > (reg:SI 97 [ _10 ])
> > $4 = void
> >
> > This happens because the original INSN being recorded is
> >
> > (insn 18 17 24 2 (set (subreg:V1SI (reg:SI 97 [ _10 ]) 0)
> >         (const_vector:V1SI [
> >                 (const_int 0 [0])
> >             ])) "cse.c":11:9 1363 {*movv1si_internal}
> >      (expr_list:REG_UNUSED (reg:SI 97 [ _10 ])
> >         (nil)))
> >
> > and we end up generating two equivalences. the first one is simply
> > that reg:SI 97 is 0.  The second one is that 0 can be extracted from
> > the V1SI, so subreg (subreg:V1SI (reg:SI 97) 0) 0 == 0.  This nested
> > subreg gets folded away to just reg:SI 97 and we re-insert the same
> equivalence.
> >
> > This patch changes it so that once we figure out the bucket to insert
> > into we check if the equivalence set already contains the entry and if
> > so just return the existing entry and exit.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no regressions.
> >
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	PR rtl-optimization/103404
> > 	* cse.c (insert_with_costs): Check if item exists already before
> adding
> > 	a new entry in the equivalence class.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	PR rtl-optimization/103404
> > 	* gcc.target/i386/pr103404.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/cse.c b/gcc/cse.c
> > index
> >
> c1c7d0ca27b73c4b944b4719f95fece74e0358d5..08295246c594109e947276051c
> 67
> > 76e4cabca4ec 100644
> > --- a/gcc/cse.c
> > +++ b/gcc/cse.c
> > @@ -1537,6 +1537,17 @@ insert_with_costs (rtx x, struct table_elt *classp,
> unsigned int hash,
> >    if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER)
> >      add_to_hard_reg_set (&hard_regs_in_table, GET_MODE (x), REGNO
> > (x));
> >
> > +  /* We cannot allow a duplicate to be entered into the equivalence sets
> > +     and so we should perform a check before we do any allocations or
> > +     change the buckets.  */
> > +  if (classp)
> > +    {
> > +      struct table_elt *p;
> > +      for (p = classp; p; p = p->next_same_value)
> > +	if (exp_equiv_p (p->exp, x, 1, false))
> > +	  return p;
> 
> not really a review, leaving that to who approved the original change, but
> these things always look bad - this linear list walk makes insert_with_costs
> quadratic.  Is there any mitigation (like limiting the number of entries?), is
> that already an existing problem elsewhere in CSE?

Hmm I believe insert_with_costs is already quadratic as it walks the list after allocations
in a similar manner. 

The problem is that the function assumes it can't ever fail, so it starts off by allocating
memory and modifying the table itself.  There are also two ways it can allocate memory,
so by the time it starts walking the classp itself and we notice any duplicates we did quite
a lot of work already and can't easily undo all of it. Ideally the function would identify
the insertion point before it does any changes.  But there are multiple ways to determine
where to insert.

But.. I think I can use the result of the first walk to seed the second walk which becomes O(1).
That way there's no additional work being done.  I'll update the patch.

Regards,
Tamar
> 
> > +    }
> > +
> >    /* Put an element for X into the right hash bucket.  */
> >
> >    elt = free_element_chain;
> > diff --git a/gcc/testsuite/gcc.target/i386/pr103404.c
> > b/gcc/testsuite/gcc.target/i386/pr103404.c
> > new file mode 100644
> > index
> >
> 0000000000000000000000000000000000000000..66f33645301db09503fc0977fd
> 0f
> > 061a19e56ea5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr103404.c
> > @@ -0,0 +1,32 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-Og -fcse-follow-jumps -fno-dce
> > +-fno-early-inlining -fgcse -fharden-conditional-branches
> > +-frerun-cse-after-loop -fno-tree-ccp -mavx5124fmaps -std=c99 -w" } */
> > +
> > +typedef unsigned __attribute__((__vector_size__ (4))) U; typedef
> > +unsigned __attribute__((__vector_size__ (16))) V; typedef unsigned
> > +__attribute__((__vector_size__ (64))) W;
> > +
> > +int x, y;
> > +
> > +V v;
> > +W w;
> > +
> > +inline
> > +int bar (U a)
> > +{
> > +  a |= x;
> > +  W k =
> > +    __builtin_shufflevector (v, 5 / a,
> > +			     2, 4, 0, 2, 4, 1, 0, 1,
> > +			     1, 2, 1, 3, 0, 4, 4, 0);
> > +  w = k;
> > +  y = 0;
> > +}
> > +
> > +int
> > +foo ()
> > +{
> > +  bar ((U){0xffffffff});
> > +  for (unsigned i; i < sizeof (foo);)
> > +    ;
> > +}
> > +
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
  
Tamar Christina Dec. 2, 2021, 2:12 p.m. UTC | #3
Hi All,

He's a respin of the patch which doesn't change the complexity of insert.

Richard S, since you approved the original patch could you take a look at this fix.

---

Hi All,

CSE uses equivalence classes to keep track of expressions that all have the same
values at the current point in the program.

Normal equivalences through SETs only insert and perform lookups in this set but
equivalence determined from comparisons, e.g.

(insn 46 44 47 7 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:SI 105 [ iD.2893 ])
            (const_int 0 [0]))) "cse.c":18:22 7 {*cmpsi_ccno_1}
     (expr_list:REG_DEAD (reg:SI 105 [ iD.2893 ])
        (nil)))

creates the equivalence EQ on (reg:SI 105 [ iD.2893 ]) and (const_int 0 [0]).

This causes a merge to happen between the two equivalence sets denoted by
(const_int 0 [0]) and (reg:SI 105 [ iD.2893 ]) respectively.

The operation happens through merge_equiv_classes however this function has an
invariant that the classes to be merge not contain any duplicates.  This is
because it frees entries before merging.

The given testcase when using the supplied flags trigger an ICE due to the
equivalence set being

(rr) p dump_class (class1)
Equivalence chain for (reg:SI 105 [ iD.2893 ]):
(reg:SI 105 [ iD.2893 ])
$3 = void

(rr) p dump_class (class2)
Equivalence chain for (const_int 0 [0]):
(const_int 0 [0])
(reg:SI 97 [ _10 ])
(reg:SI 97 [ _10 ])
$4 = void

This happens because the original INSN being recorded is

(insn 18 17 24 2 (set (subreg:V1SI (reg:SI 97 [ _10 ]) 0)
        (const_vector:V1SI [
                (const_int 0 [0])
            ])) "cse.c":11:9 1363 {*movv1si_internal}
     (expr_list:REG_UNUSED (reg:SI 97 [ _10 ])
        (nil)))

and we end up generating two equivalences. the first one is simply that
reg:SI 97 is 0.  The second one is that 0 can be extracted from the V1SI, so
subreg (subreg:V1SI (reg:SI 97) 0) 0 == 0.  This nested subreg gets folded away
to just reg:SI 97 and we re-insert the same equivalence.

This patch changes it so that once we figure out the bucket to insert into we
check if the equivalence set already contains the entry and if so just return
the existing entry and exit.

While doing so it also calculates the new insertion point such that this code
does not increase the worse case complexity of insert_with_costs.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.


Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR rtl-optimization/103404
	* cse.c (insert_with_costs): Check if item exists already before adding
	a new entry in the equivalence class.

gcc/testsuite/ChangeLog:

	PR rtl-optimization/103404
	* gcc.target/i386/pr103404.c: New test.

--- inline copy of patch ---

diff --git a/gcc/cse.c b/gcc/cse.c
index c1c7d0ca27b73c4b944b4719f95fece74e0358d5..be6be52376d3fb328ca6e5cced8a0456439a9a04 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -1528,6 +1528,7 @@ insert_with_costs (rtx x, struct table_elt *classp, unsigned int hash,
 		   machine_mode mode, int cost, int reg_cost)
 {
   struct table_elt *elt;
+  struct table_elt *ins_loc = NULL, *next = NULL;
 
   /* If X is a register and we haven't made a quantity for it,
      something is wrong.  */
@@ -1537,6 +1538,51 @@ insert_with_costs (rtx x, struct table_elt *classp, unsigned int hash,
   if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER)
     add_to_hard_reg_set (&hard_regs_in_table, GET_MODE (x), REGNO (x));
 
+  /* We cannot allow a duplicate to be entered into the equivalence sets
+     and so we should perform a check before we do any allocations or
+     change the buckets and simultaneously determine final insertion
+     point.  */
+  if (classp)
+    {
+      struct table_elt *p, *tmp;
+      ins_loc = classp->first_same_value;
+
+      /* Check the first element on its own.  */
+      if (exp_equiv_p (ins_loc->exp, x, 1, false))
+	return ins_loc;
+
+      /* And also cost it on its own as the conditions for it are slightly
+	 different from the others.  */
+      if (preferable (cost, reg_cost, ins_loc->cost, ins_loc->regcost) >= 0)
+	{
+	  /* Skip over elements that are cheaper than the element being
+	     inserted as unequal costs means it can't be a duplicate.  */
+	  for (p = ins_loc;
+	       (next = p->next_same_value)
+		 && preferable (next->cost, next->regcost, cost, reg_cost) < 0;
+	       p = next)
+	    ;
+
+	  /* Record p as the place where we want to insert elt if we are to
+	     insert an element.  */
+	  ins_loc = p;
+
+      /* The normal search stops when we encounter an element which has the same
+	 costs as us.  That's where we need to insert, but we still need to check
+	 the remaining list of equal cost things for a duplicate.  */
+      for (; p; p = tmp)
+	{
+	  if (exp_equiv_p (p->exp, x, 1, false))
+	    return p;
+
+	  if (preferable (cost, reg_cost, p->cost, p->regcost) <= 0)
+	    break;
+
+	  tmp = p->next_same_value;
+	}
+      }
+    }
+
   /* Put an element for X into the right hash bucket.  */
 
   elt = free_element_chain;
@@ -1579,22 +1625,15 @@ insert_with_costs (rtx x, struct table_elt *classp, unsigned int hash,
 	}
       else
 	{
-	  /* Insert not at head of the class.  */
-	  /* Put it after the last element cheaper than X.  */
-	  struct table_elt *p, *next;
-
-	  for (p = classp;
-	       (next = p->next_same_value) && CHEAPER (next, elt);
-	       p = next)
-	    ;
-
-	  /* Put it after P and before NEXT.  */
+	  /* Insert not at head of the class.
+	     Put it after the last element cheaper than X.
+	     Put it after INS_LOC and before NEXT.  */
 	  elt->next_same_value = next;
 	  if (next)
 	    next->prev_same_value = elt;
 
-	  elt->prev_same_value = p;
-	  p->next_same_value = elt;
+	  elt->prev_same_value = ins_loc;
+	  ins_loc->next_same_value = elt;
 	  elt->first_same_value = classp;
 	}
     }
diff --git a/gcc/testsuite/gcc.target/i386/pr103404.c b/gcc/testsuite/gcc.target/i386/pr103404.c
new file mode 100644
index 0000000000000000000000000000000000000000..66f33645301db09503fc0977fd0f061a19e56ea5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103404.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Og -fcse-follow-jumps -fno-dce -fno-early-inlining -fgcse -fharden-conditional-branches -frerun-cse-after-loop -fno-tree-ccp -mavx5124fmaps -std=c99 -w" } */
+
+typedef unsigned __attribute__((__vector_size__ (4))) U;
+typedef unsigned __attribute__((__vector_size__ (16))) V;
+typedef unsigned __attribute__((__vector_size__ (64))) W;
+
+int x, y;
+
+V v;
+W w;
+
+inline
+int bar (U a)
+{
+  a |= x;
+  W k =
+    __builtin_shufflevector (v, 5 / a,
+			     2, 4, 0, 2, 4, 1, 0, 1,
+			     1, 2, 1, 3, 0, 4, 4, 0);
+  w = k;
+  y = 0;
+}
+
+int
+foo ()
+{
+  bar ((U){0xffffffff});
+  for (unsigned i; i < sizeof (foo);)
+    ;
+}
+
  
Tamar Christina Dec. 6, 2021, 9:54 a.m. UTC | #4
Hi,

As discussed off-line this can only happen with a V1 mode, so here's a much simpler patch.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.


Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR rtl-optimization/103404
	* cse.c (find_sets_in_insn): Don't select elements out of a V1 mode
	subreg.

gcc/testsuite/ChangeLog:

	PR rtl-optimization/103404
	* gcc.target/i386/pr103404.c: New test.

--- inline copy of patch ---

diff --git a/gcc/cse.c b/gcc/cse.c
index c1c7d0ca27b73c4b944b4719f95fece74e0358d5..dc5d5aed047c7776f44b159a4286390d6499c18d 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4275,7 +4275,12 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> *psets)
       else if (GET_CODE (SET_SRC (x)) == CALL)
 	;
       else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR
-	       && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != MODE_VECTOR_BOOL)
+	       && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != MODE_VECTOR_BOOL
+	       /* Prevent duplicates from being generated if the type is a V1
+		  type and a subreg.  Folding this will result in the same
+		  element as folding x itself.  */
+	       && !(SUBREG_P (SET_DEST (x))
+		    && known_eq (GET_MODE_NUNITS (GET_MODE (SET_SRC (x))), 1)))
 	{
 	  /* First register the vector itself.  */
 	  add_to_set (psets, x);
diff --git a/gcc/testsuite/gcc.target/i386/pr103404.c b/gcc/testsuite/gcc.target/i386/pr103404.c
new file mode 100644
index 0000000000000000000000000000000000000000..66f33645301db09503fc0977fd0f061a19e56ea5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103404.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Og -fcse-follow-jumps -fno-dce -fno-early-inlining -fgcse -fharden-conditional-branches -frerun-cse-after-loop -fno-tree-ccp -mavx5124fmaps -std=c99 -w" } */
+
+typedef unsigned __attribute__((__vector_size__ (4))) U;
+typedef unsigned __attribute__((__vector_size__ (16))) V;
+typedef unsigned __attribute__((__vector_size__ (64))) W;
+
+int x, y;
+
+V v;
+W w;
+
+inline
+int bar (U a)
+{
+  a |= x;
+  W k =
+    __builtin_shufflevector (v, 5 / a,
+			     2, 4, 0, 2, 4, 1, 0, 1,
+			     1, 2, 1, 3, 0, 4, 4, 0);
+  w = k;
+  y = 0;
+}
+
+int
+foo ()
+{
+  bar ((U){0xffffffff});
+  for (unsigned i; i < sizeof (foo);)
+    ;
+}
+
  
Richard Sandiford Dec. 6, 2021, 10:10 a.m. UTC | #5
Tamar Christina <Tamar.Christina@arm.com> writes:
> Hi,
>
> As discussed off-line this can only happen with a V1 mode, so here's a much simpler patch.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no regressions.
>
>
> Ok for master?

OK, thanks.

Richard

> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         PR rtl-optimization/103404
>         * cse.c (find_sets_in_insn): Don't select elements out of a V1 mode
>         subreg.
>
> gcc/testsuite/ChangeLog:
>
>         PR rtl-optimization/103404
>         * gcc.target/i386/pr103404.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/cse.c b/gcc/cse.c
> index c1c7d0ca27b73c4b944b4719f95fece74e0358d5..dc5d5aed047c7776f44b159a4286390d6499c18d 100644
> --- a/gcc/cse.c
> +++ b/gcc/cse.c
> @@ -4275,7 +4275,12 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> *psets)
>        else if (GET_CODE (SET_SRC (x)) == CALL)
>         ;
>        else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR
> -              && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != MODE_VECTOR_BOOL)
> +              && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != MODE_VECTOR_BOOL
> +              /* Prevent duplicates from being generated if the type is a V1
> +                 type and a subreg.  Folding this will result in the same
> +                 element as folding x itself.  */
> +              && !(SUBREG_P (SET_DEST (x))
> +                   && known_eq (GET_MODE_NUNITS (GET_MODE (SET_SRC (x))), 1)))
>         {
>           /* First register the vector itself.  */
>           add_to_set (psets, x);
> diff --git a/gcc/testsuite/gcc.target/i386/pr103404.c b/gcc/testsuite/gcc.target/i386/pr103404.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..66f33645301db09503fc0977fd0f061a19e56ea5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103404.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Og -fcse-follow-jumps -fno-dce -fno-early-inlining -fgcse -fharden-conditional-branches -frerun-cse-after-loop -fno-tree-ccp -mavx5124fmaps -std=c99 -w" } */
> +
> +typedef unsigned __attribute__((__vector_size__ (4))) U;
> +typedef unsigned __attribute__((__vector_size__ (16))) V;
> +typedef unsigned __attribute__((__vector_size__ (64))) W;
> +
> +int x, y;
> +
> +V v;
> +W w;
> +
> +inline
> +int bar (U a)
> +{
> +  a |= x;
> +  W k =
> +    __builtin_shufflevector (v, 5 / a,
> +                            2, 4, 0, 2, 4, 1, 0, 1,
> +                            1, 2, 1, 3, 0, 4, 4, 0);
> +  w = k;
> +  y = 0;
> +}
> +
> +int
> +foo ()
> +{
> +  bar ((U){0xffffffff});
> +  for (unsigned i; i < sizeof (foo);)
> +    ;
> +}
> +
  

Patch

diff --git a/gcc/cse.c b/gcc/cse.c
index c1c7d0ca27b73c4b944b4719f95fece74e0358d5..08295246c594109e947276051c6776e4cabca4ec 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -1537,6 +1537,17 @@  insert_with_costs (rtx x, struct table_elt *classp, unsigned int hash,
   if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER)
     add_to_hard_reg_set (&hard_regs_in_table, GET_MODE (x), REGNO (x));
 
+  /* We cannot allow a duplicate to be entered into the equivalence sets
+     and so we should perform a check before we do any allocations or
+     change the buckets.  */
+  if (classp)
+    {
+      struct table_elt *p;
+      for (p = classp; p; p = p->next_same_value)
+	if (exp_equiv_p (p->exp, x, 1, false))
+	  return p;
+    }
+
   /* Put an element for X into the right hash bucket.  */
 
   elt = free_element_chain;
diff --git a/gcc/testsuite/gcc.target/i386/pr103404.c b/gcc/testsuite/gcc.target/i386/pr103404.c
new file mode 100644
index 0000000000000000000000000000000000000000..66f33645301db09503fc0977fd0f061a19e56ea5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103404.c
@@ -0,0 +1,32 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-Og -fcse-follow-jumps -fno-dce -fno-early-inlining -fgcse -fharden-conditional-branches -frerun-cse-after-loop -fno-tree-ccp -mavx5124fmaps -std=c99 -w" } */
+
+typedef unsigned __attribute__((__vector_size__ (4))) U;
+typedef unsigned __attribute__((__vector_size__ (16))) V;
+typedef unsigned __attribute__((__vector_size__ (64))) W;
+
+int x, y;
+
+V v;
+W w;
+
+inline
+int bar (U a)
+{
+  a |= x;
+  W k =
+    __builtin_shufflevector (v, 5 / a,
+			     2, 4, 0, 2, 4, 1, 0, 1,
+			     1, 2, 1, 3, 0, 4, 4, 0);
+  w = k;
+  y = 0;
+}
+
+int
+foo ()
+{
+  bar ((U){0xffffffff});
+  for (unsigned i; i < sizeof (foo);)
+    ;
+}
+