[4/4] aarch64: Fix up uses of mem following stp insert [PR113070]

Message ID ZaKwRyX/Ka3+xiqI@arm.com
State New
Headers
Series aarch64, rtl-ssa: Fix wrong code in ldp fusion pass [PR113070] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Testing passed

Commit Message

Alex Coplan Jan. 13, 2024, 3:46 p.m. UTC
  As the PR shows (specifically #c7) we are missing updating uses of mem
when inserting an stp in the aarch64 load/store pair fusion pass.  This
patch fixes that.

RTL-SSA has a simple view of memory and by default doesn't allow stores
to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
own alias analysis and so can re-order stores over other accesses when
we deem this is safe.  If neither store can be re-purposed (moved into
the required position to form the stp while respecting the RTL-SSA
constraints), then we turn both the candidate stores into "tombstone"
insns (logically delete them) and insert a new stp insn.

As it stands, we implement the insert case separately (after dealing
with the candidate stores) in fuse_pair by inserting into the middle of
the vector of changes.  This is OK when we only have to insert one
change, but with this fix we would need to insert the change for the new
stp plus multiple changes to fix up uses of mem (note the number of
fix-ups is naturally bounded by the alias limit param to prevent
quadratic behaviour).  If we kept the code structured as is and inserted
into the middle of the vector, that would lead to repeated moving of
elements in the vector which seems inefficient.  The structure of the
code would also be a little unwieldy.

To improve on that situation, this patch introduces a helper class,
stp_change_builder, which implements a state machine that helps to build
the required changes directly in program order.  That state machine is
reponsible for deciding what changes need to be made in what order, and
the code in fuse_pair then simply follows those steps.

Together with the fix in the previous patch for installing new defs
correctly in RTL-SSA, this fixes PR113070.

We take the opportunity to rename the function decide_stp_strategy to
try_repurpose_store, as that seems more descriptive of what it actually
does, since stp_change_builder is now responsible for the overall change
strategy.

Bootstrapped/regtested as a series with/without the passes enabled on
aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

	PR target/113070
	* config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
	(decide_stp_strategy): Reanme to ...
	(try_repurpose_store): ... this.
	(ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
	construct stp changes.  Fix up uses when inserting new stp insns.
---
 gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++++++++++++++++++-----
 1 file changed, 194 insertions(+), 54 deletions(-)
  

Comments

Alex Coplan Jan. 15, 2024, 3:39 p.m. UTC | #1
On 13/01/2024 15:46, Alex Coplan wrote:
> As the PR shows (specifically #c7) we are missing updating uses of mem
> when inserting an stp in the aarch64 load/store pair fusion pass.  This
> patch fixes that.
> 
> RTL-SSA has a simple view of memory and by default doesn't allow stores
> to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
> own alias analysis and so can re-order stores over other accesses when
> we deem this is safe.  If neither store can be re-purposed (moved into
> the required position to form the stp while respecting the RTL-SSA
> constraints), then we turn both the candidate stores into "tombstone"
> insns (logically delete them) and insert a new stp insn.
> 
> As it stands, we implement the insert case separately (after dealing
> with the candidate stores) in fuse_pair by inserting into the middle of
> the vector of changes.  This is OK when we only have to insert one
> change, but with this fix we would need to insert the change for the new
> stp plus multiple changes to fix up uses of mem (note the number of
> fix-ups is naturally bounded by the alias limit param to prevent
> quadratic behaviour).  If we kept the code structured as is and inserted
> into the middle of the vector, that would lead to repeated moving of
> elements in the vector which seems inefficient.  The structure of the
> code would also be a little unwieldy.
> 
> To improve on that situation, this patch introduces a helper class,
> stp_change_builder, which implements a state machine that helps to build
> the required changes directly in program order.  That state machine is
> reponsible for deciding what changes need to be made in what order, and
> the code in fuse_pair then simply follows those steps.
> 
> Together with the fix in the previous patch for installing new defs
> correctly in RTL-SSA, this fixes PR113070.
> 
> We take the opportunity to rename the function decide_stp_strategy to
> try_repurpose_store, as that seems more descriptive of what it actually
> does, since stp_change_builder is now responsible for the overall change
> strategy.
> 
> Bootstrapped/regtested as a series with/without the passes enabled on
> aarch64-linux-gnu, OK for trunk?
> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
> 	PR target/113070
> 	* config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
> 	(decide_stp_strategy): Reanme to ...
> 	(try_repurpose_store): ... this.
> 	(ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
> 	construct stp changes.  Fix up uses when inserting new stp insns.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++++++++++++++++++-----
>  1 file changed, 194 insertions(+), 54 deletions(-)
> 

> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 689a8c884bd..703cfb1228c 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
>    return range;
>  }
>  
> +// Class that implements a state machine for building the changes needed to form
> +// a store pair instruction.  This allows us to easily build the changes in
> +// program order, as required by rtl-ssa.
> +struct stp_change_builder
> +{
> +  enum class state
> +  {
> +    FIRST,
> +    INSERT,
> +    FIXUP_USE,
> +    LAST,
> +    DONE
> +  };
> +
> +  enum class action
> +  {
> +    TOMBSTONE,
> +    CHANGE,
> +    INSERT,
> +    FIXUP_USE
> +  };
> +
> +  struct change
> +  {
> +    action type;
> +    insn_info *insn;
> +  };
> +
> +  bool done () const { return m_state == state::DONE; }
> +
> +  stp_change_builder (insn_info *insns[2],
> +		      insn_info *repurpose,
> +		      insn_info *dest)
> +    : m_state (state::FIRST), m_insns { insns[0], insns[1] },
> +      m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}
> +
> +  change get_change () const
> +  {
> +    switch (m_state)
> +      {
> +      case state::FIRST:
> +	return {
> +	  m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> +	  m_insns[0]
> +	};
> +      case state::LAST:
> +	return {
> +	  m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> +	  m_insns[1]
> +	};
> +      case state::INSERT:
> +	return { action::INSERT, m_dest };
> +      case state::FIXUP_USE:
> +	return { action::FIXUP_USE, m_use->insn () };
> +      case state::DONE:
> +	break;
> +      }
> +
> +    gcc_unreachable ();
> +  }
> +
> +  // Transition to the next state.
> +  void advance ()
> +  {
> +    switch (m_state)
> +      {
> +      case state::FIRST:
> +	if (m_repurpose)
> +	  m_state = state::LAST;
> +	else
> +	  m_state = state::INSERT;
> +	break;
> +      case state::INSERT:
> +      {
> +	def_info *def = memory_access (m_insns[0]->defs ());
> +	while (*def->next_def ()->insn () <= *m_dest)
> +	  def = def->next_def ();
> +
> +	// Now we know DEF feeds the insertion point for the new stp.
> +	// Look for any uses of DEF that will consume the new stp.
> +	gcc_assert (*def->insn () <= *m_dest
> +		    && *def->next_def ()->insn () > *m_dest);
> +
> +	if (auto set = dyn_cast<set_info *> (def))
> +	  for (auto use : set->nondebug_insn_uses ())
> +	    if (*use->insn () > *m_dest)
> +	      {
> +		m_use = use;
> +		break;
> +	      }
> +
> +	if (m_use)
> +	  m_state = state::FIXUP_USE;
> +	else
> +	  m_state = state::LAST;
> +	break;
> +      }
> +      case state::FIXUP_USE:
> +	m_use = m_use->next_nondebug_insn_use ();
> +	if (!m_use)
> +	  m_state = state::LAST;
> +	break;
> +      case state::LAST:
> +	m_state = state::DONE;
> +	break;
> +      case state::DONE:
> +	gcc_unreachable ();
> +      }
> +  }
> +
> +private:
> +  state m_state;
> +
> +  // Original candidate stores.
> +  insn_info *m_insns[2];
> +
> +  // If non-null, this is a candidate insn to change into an stp.  Otherwise we
> +  // are deleting both original insns and inserting a new insn for the stp.
> +  insn_info *m_repurpose;
> +
> +  // Destionation of the stp, it will be placed immediately after m_dest.
> +  insn_info *m_dest;
> +
> +  // Current nondebug use that needs updating due to stp insertion.
> +  use_info *m_use;
> +};
> +
>  // Given candidate store insns FIRST and SECOND, see if we can re-purpose one
>  // of them (together with its def of memory) for the stp insn.  If so, return
>  // that insn.  Otherwise, return null.
>  static insn_info *
> -decide_stp_strategy (insn_info *first,
> +try_repurpose_store (insn_info *first,
>  		     insn_info *second,
>  		     const insn_range_info &move_range)
>  {
> @@ -1253,7 +1380,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>  
>    insn_info *insns[2] = { first, second };
>  
> -  auto_vec<insn_change *, 4> changes (4);
> +  auto_vec<insn_change *> changes;
>    auto_vec<int, 2> tombstone_uids (2);
>  
>    rtx pats[2] = {
> @@ -1455,9 +1582,9 @@ ldp_bb_info::fuse_pair (bool load_p,
>  
>    if (load_p)
>      {
> -      changes.quick_push (make_delete (first));
> +      changes.safe_push (make_delete (first));
>        pair_change = make_change (second);
> -      changes.quick_push (pair_change);
> +      changes.safe_push (pair_change);
>  
>        pair_change->move_range = move_range;
>        pair_change->new_defs = merge_access_arrays (attempt,
> @@ -1474,18 +1601,22 @@ ldp_bb_info::fuse_pair (bool load_p,
>      }
>    else
>      {
> -      insn_info *store_to_change = decide_stp_strategy (first, second,
> +      using Action = stp_change_builder::action;
> +      insn_info *store_to_change = try_repurpose_store (first, second,
>  							move_range);
> -
> -      if (store_to_change && dump_file)
> -	fprintf (dump_file, "  stp: re-purposing store %d\n",
> -		 store_to_change->uid ());
> -
> +      insn_info *stp_dest = move_range.singleton ();
> +      gcc_assert (stp_dest);
> +      stp_change_builder builder (insns, store_to_change, stp_dest);
>        insn_change *change;
> -      for (int i = 0; i < 2; i++)
> +      set_info *new_set = nullptr;
> +      for (; !builder.done (); builder.advance ())
>  	{
> -	  change = make_change (insns[i]);
> -	  if (insns[i] == store_to_change)
> +	  auto action = builder.get_change ();
> +	  change = (action.type == Action::INSERT)
> +	    ? nullptr : make_change (action.insn);
> +	  switch (action.type)
> +	    {
> +	    case Action::CHANGE:
>  	    {
>  	      set_pair_pat (change);
>  	      change->new_uses = merge_access_arrays (attempt,
> @@ -1495,67 +1626,76 @@ ldp_bb_info::fuse_pair (bool load_p,
>  	      auto d2 = drop_memory_access (input_defs[1]);
>  	      change->new_defs = merge_access_arrays (attempt, d1, d2);
>  	      gcc_assert (change->new_defs.is_valid ());
> -	      def_info *stp_def = memory_access (store_to_change->defs ());
> +	      def_info *stp_def = memory_access (change->insn ()->defs ());
>  	      change->new_defs = insert_access (attempt,
>  						stp_def,
>  						change->new_defs);
>  	      gcc_assert (change->new_defs.is_valid ());
>  	      change->move_range = move_range;
>  	      pair_change = change;
> +	      break;
>  	    }
> -	  else
> +	    case Action::TOMBSTONE:
>  	    {
> -	      // Note that we are turning this insn into a tombstone,
> -	      // we need to keep track of these if we go ahead with the
> -	      // change.
> -	      tombstone_uids.quick_push (insns[i]->uid ());
> -	      rtx_insn *rti = insns[i]->rtl ();
> +	      tombstone_uids.quick_push (change->insn ()->uid ());
> +	      rtx_insn *rti = change->insn ()->rtl ();
>  	      validate_change (rti, &PATTERN (rti), gen_tombstone (), true);
>  	      validate_change (rti, &REG_NOTES (rti), NULL_RTX, true);
>  	      change->new_uses = use_array (nullptr, 0);
> +	      break;
>  	    }
> -	  gcc_assert (change->new_uses.is_valid ());
> -	  changes.quick_push (change);
> -	}
> +	    case Action::INSERT:
> +	    {
> +	      if (dump_file)
> +		fprintf (dump_file,
> +			 "  stp: cannot re-purpose candidate stores\n");
>  
> -      if (!store_to_change)
> -	{
> -	  // Tricky case.  Cannot re-purpose existing insns for stp.
> -	  // Need to insert new insn.
> -	  if (dump_file)
> -	    fprintf (dump_file,
> -		     "  stp fusion: cannot re-purpose candidate stores\n");
> -
> -	  auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
> -	  change = make_change (new_insn);
> -	  change->move_range = move_range;
> -	  change->new_uses = merge_access_arrays (attempt,
> -						  input_uses[0],
> -						  input_uses[1]);
> -	  gcc_assert (change->new_uses.is_valid ());
> -
> -	  auto d1 = drop_memory_access (input_defs[0]);
> -	  auto d2 = drop_memory_access (input_defs[1]);
> -	  change->new_defs = merge_access_arrays (attempt, d1, d2);
> -	  gcc_assert (change->new_defs.is_valid ());
> -
> -	  auto new_set = crtl->ssa->create_set (attempt, new_insn, memory);
> -	  change->new_defs = insert_access (attempt, new_set,
> -					    change->new_defs);
> -	  gcc_assert (change->new_defs.is_valid ());
> -	  changes.safe_insert (1, change);
> -	  pair_change = change;
> +	      auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
> +	      change = make_change (new_insn);
> +	      change->move_range = move_range;
> +	      change->new_uses = merge_access_arrays (attempt,
> +						      input_uses[0],
> +						      input_uses[1]);
> +	      gcc_assert (change->new_uses.is_valid ());
> +
> +	      auto d1 = drop_memory_access (input_defs[0]);
> +	      auto d2 = drop_memory_access (input_defs[1]);
> +	      change->new_defs = merge_access_arrays (attempt, d1, d2);
> +	      gcc_assert (change->new_defs.is_valid ());
> +
> +	      new_set = crtl->ssa->create_set (attempt, new_insn, memory);
> +	      change->new_defs = insert_access (attempt, new_set,
> +						change->new_defs);
> +	      gcc_assert (change->new_defs.is_valid ());
> +	      pair_change = change;
> +	      break;
> +	    }
> +	    case Action::FIXUP_USE:
> +	    {
> +	      // This use now needs to consume memory from our stp.
> +	      if (dump_file)
> +		fprintf (dump_file,
> +			 "  stp: changing i%d to use mem from new stp "
> +			 "(after i%d)\n",
> +			 action.insn->uid (), stp_dest->uid ());
> +	      change->new_uses = drop_memory_access (change->new_uses);
> +	      gcc_assert (new_set);
> +	      auto new_use = crtl->ssa->create_use (attempt, action.insn,
> +						    new_set);
> +	      change->new_uses = insert_access (attempt, new_use,
> +						change->new_uses);
> +	      break;
> +	    }
> +	    }
> +	  changes.safe_push (change);
>  	}
>      }
>  
>    if (trailing_add)
>      changes.quick_push (make_delete (trailing_add));

Reviewing my own patch here, but just a note that this call to
quick_push should have been updated to safe_push with this change.

Will fix it locally and include in a v2 (if needed) when the patch gets
reviewed for real.

>  
> -  auto n_changes = changes.length ();
> -  gcc_checking_assert (n_changes >= 2 && n_changes <= 4);
> -
>    auto is_changing = insn_is_changing (changes);
> -  for (unsigned i = 0; i < n_changes; i++)
> +  for (unsigned i = 0; i < changes.length (); i++)
>      gcc_assert (rtl_ssa::restrict_movement_ignoring (*changes[i], is_changing));
>  
>    // Check the pair pattern is recog'd.
  
Richard Sandiford Jan. 22, 2024, 3:59 p.m. UTC | #2
Alex Coplan <alex.coplan@arm.com> writes:
> As the PR shows (specifically #c7) we are missing updating uses of mem
> when inserting an stp in the aarch64 load/store pair fusion pass.  This
> patch fixes that.
>
> RTL-SSA has a simple view of memory and by default doesn't allow stores
> to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
> own alias analysis and so can re-order stores over other accesses when
> we deem this is safe.  If neither store can be re-purposed (moved into
> the required position to form the stp while respecting the RTL-SSA
> constraints), then we turn both the candidate stores into "tombstone"
> insns (logically delete them) and insert a new stp insn.
>
> As it stands, we implement the insert case separately (after dealing
> with the candidate stores) in fuse_pair by inserting into the middle of
> the vector of changes.  This is OK when we only have to insert one
> change, but with this fix we would need to insert the change for the new
> stp plus multiple changes to fix up uses of mem (note the number of
> fix-ups is naturally bounded by the alias limit param to prevent
> quadratic behaviour).  If we kept the code structured as is and inserted
> into the middle of the vector, that would lead to repeated moving of
> elements in the vector which seems inefficient.  The structure of the
> code would also be a little unwieldy.
>
> To improve on that situation, this patch introduces a helper class,
> stp_change_builder, which implements a state machine that helps to build
> the required changes directly in program order.  That state machine is
> reponsible for deciding what changes need to be made in what order, and
> the code in fuse_pair then simply follows those steps.
>
> Together with the fix in the previous patch for installing new defs
> correctly in RTL-SSA, this fixes PR113070.
>
> We take the opportunity to rename the function decide_stp_strategy to
> try_repurpose_store, as that seems more descriptive of what it actually
> does, since stp_change_builder is now responsible for the overall change
> strategy.
>
> Bootstrapped/regtested as a series with/without the passes enabled on
> aarch64-linux-gnu, OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
> 	PR target/113070
> 	* config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
> 	(decide_stp_strategy): Reanme to ...
> 	(try_repurpose_store): ... this.
> 	(ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
> 	construct stp changes.  Fix up uses when inserting new stp insns.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++++++++++++++++++-----
>  1 file changed, 194 insertions(+), 54 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 689a8c884bd..703cfb1228c 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
>    return range;
>  }
>  
> +// Class that implements a state machine for building the changes needed to form
> +// a store pair instruction.  This allows us to easily build the changes in
> +// program order, as required by rtl-ssa.
> +struct stp_change_builder
> +{
> +  enum class state
> +  {
> +    FIRST,
> +    INSERT,
> +    FIXUP_USE,
> +    LAST,
> +    DONE
> +  };
> +
> +  enum class action
> +  {
> +    TOMBSTONE,
> +    CHANGE,
> +    INSERT,
> +    FIXUP_USE
> +  };
> +
> +  struct change
> +  {
> +    action type;
> +    insn_info *insn;
> +  };
> +
> +  bool done () const { return m_state == state::DONE; }
> +
> +  stp_change_builder (insn_info *insns[2],
> +		      insn_info *repurpose,
> +		      insn_info *dest)
> +    : m_state (state::FIRST), m_insns { insns[0], insns[1] },
> +      m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}

Just to make sure I understand: is it the case that

  *insns[0] <= *dest <= *insns[1]

?

> +
> +  change get_change () const
> +  {
> +    switch (m_state)
> +      {
> +      case state::FIRST:
> +	return {
> +	  m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> +	  m_insns[0]
> +	};
> +      case state::LAST:
> +	return {
> +	  m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> +	  m_insns[1]
> +	};
> +      case state::INSERT:
> +	return { action::INSERT, m_dest };
> +      case state::FIXUP_USE:
> +	return { action::FIXUP_USE, m_use->insn () };
> +      case state::DONE:
> +	break;
> +      }
> +
> +    gcc_unreachable ();
> +  }
> +
> +  // Transition to the next state.
> +  void advance ()
> +  {
> +    switch (m_state)
> +      {
> +      case state::FIRST:
> +	if (m_repurpose)
> +	  m_state = state::LAST;
> +	else
> +	  m_state = state::INSERT;
> +	break;
> +      case state::INSERT:
> +      {
> +	def_info *def = memory_access (m_insns[0]->defs ());
> +	while (*def->next_def ()->insn () <= *m_dest)
> +	  def = def->next_def ();
> +
> +	// Now we know DEF feeds the insertion point for the new stp.
> +	// Look for any uses of DEF that will consume the new stp.
> +	gcc_assert (*def->insn () <= *m_dest
> +		    && *def->next_def ()->insn () > *m_dest);
> +
> +	if (auto set = dyn_cast<set_info *> (def))

I think this should be an unconditional as_a.  Clobbers of memory
shouldn't be a thing, and it's not obvious that doing nothing here
would be the correct behaviour.

The patch looks good to me with that change and the one that you
pointed out in your reply.

However, as a follow-on, we should probably also handle debug
instructions.  The conservatively correct thing to do would be
to reset all debug uses that occur after *m_dest, since we (rightly)
haven't checked them for aliases before attempting the change.
A more expensive alternative would be to check each use for aliases
and only reset where necessary.

Unfortunately, that would apply to subsequent defs too, up to the
later of the two original instructions.

I'm not sure how good other passes are doing this kind of update though.

Thanks,
Richard

> +	  for (auto use : set->nondebug_insn_uses ())
> +	    if (*use->insn () > *m_dest)
> +	      {
> +		m_use = use;
> +		break;
> +	      }
> +
> +	if (m_use)
> +	  m_state = state::FIXUP_USE;
> +	else
> +	  m_state = state::LAST;
> +	break;
> +      }
> +      case state::FIXUP_USE:
> +	m_use = m_use->next_nondebug_insn_use ();
> +	if (!m_use)
> +	  m_state = state::LAST;
> +	break;
> +      case state::LAST:
> +	m_state = state::DONE;
> +	break;
> +      case state::DONE:
> +	gcc_unreachable ();
> +      }
> +  }
> +
> +private:
> +  state m_state;
> +
> +  // Original candidate stores.
> +  insn_info *m_insns[2];
> +
> +  // If non-null, this is a candidate insn to change into an stp.  Otherwise we
> +  // are deleting both original insns and inserting a new insn for the stp.
> +  insn_info *m_repurpose;
> +
> +  // Destionation of the stp, it will be placed immediately after m_dest.
> +  insn_info *m_dest;
> +
> +  // Current nondebug use that needs updating due to stp insertion.
> +  use_info *m_use;
> +};
> +
>  // Given candidate store insns FIRST and SECOND, see if we can re-purpose one
>  // of them (together with its def of memory) for the stp insn.  If so, return
>  // that insn.  Otherwise, return null.
>  static insn_info *
> -decide_stp_strategy (insn_info *first,
> +try_repurpose_store (insn_info *first,
>  		     insn_info *second,
>  		     const insn_range_info &move_range)
>  {
> @@ -1253,7 +1380,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>  
>    insn_info *insns[2] = { first, second };
>  
> -  auto_vec<insn_change *, 4> changes (4);
> +  auto_vec<insn_change *> changes;
>    auto_vec<int, 2> tombstone_uids (2);
>  
>    rtx pats[2] = {
> @@ -1455,9 +1582,9 @@ ldp_bb_info::fuse_pair (bool load_p,
>  
>    if (load_p)
>      {
> -      changes.quick_push (make_delete (first));
> +      changes.safe_push (make_delete (first));
>        pair_change = make_change (second);
> -      changes.quick_push (pair_change);
> +      changes.safe_push (pair_change);
>  
>        pair_change->move_range = move_range;
>        pair_change->new_defs = merge_access_arrays (attempt,
> @@ -1474,18 +1601,22 @@ ldp_bb_info::fuse_pair (bool load_p,
>      }
>    else
>      {
> -      insn_info *store_to_change = decide_stp_strategy (first, second,
> +      using Action = stp_change_builder::action;
> +      insn_info *store_to_change = try_repurpose_store (first, second,
>  							move_range);
> -
> -      if (store_to_change && dump_file)
> -	fprintf (dump_file, "  stp: re-purposing store %d\n",
> -		 store_to_change->uid ());
> -
> +      insn_info *stp_dest = move_range.singleton ();
> +      gcc_assert (stp_dest);
> +      stp_change_builder builder (insns, store_to_change, stp_dest);
>        insn_change *change;
> -      for (int i = 0; i < 2; i++)
> +      set_info *new_set = nullptr;
> +      for (; !builder.done (); builder.advance ())
>  	{
> -	  change = make_change (insns[i]);
> -	  if (insns[i] == store_to_change)
> +	  auto action = builder.get_change ();
> +	  change = (action.type == Action::INSERT)
> +	    ? nullptr : make_change (action.insn);
> +	  switch (action.type)
> +	    {
> +	    case Action::CHANGE:
>  	    {
>  	      set_pair_pat (change);
>  	      change->new_uses = merge_access_arrays (attempt,
> @@ -1495,67 +1626,76 @@ ldp_bb_info::fuse_pair (bool load_p,
>  	      auto d2 = drop_memory_access (input_defs[1]);
>  	      change->new_defs = merge_access_arrays (attempt, d1, d2);
>  	      gcc_assert (change->new_defs.is_valid ());
> -	      def_info *stp_def = memory_access (store_to_change->defs ());
> +	      def_info *stp_def = memory_access (change->insn ()->defs ());
>  	      change->new_defs = insert_access (attempt,
>  						stp_def,
>  						change->new_defs);
>  	      gcc_assert (change->new_defs.is_valid ());
>  	      change->move_range = move_range;
>  	      pair_change = change;
> +	      break;
>  	    }
> -	  else
> +	    case Action::TOMBSTONE:
>  	    {
> -	      // Note that we are turning this insn into a tombstone,
> -	      // we need to keep track of these if we go ahead with the
> -	      // change.
> -	      tombstone_uids.quick_push (insns[i]->uid ());
> -	      rtx_insn *rti = insns[i]->rtl ();
> +	      tombstone_uids.quick_push (change->insn ()->uid ());
> +	      rtx_insn *rti = change->insn ()->rtl ();
>  	      validate_change (rti, &PATTERN (rti), gen_tombstone (), true);
>  	      validate_change (rti, &REG_NOTES (rti), NULL_RTX, true);
>  	      change->new_uses = use_array (nullptr, 0);
> +	      break;
>  	    }
> -	  gcc_assert (change->new_uses.is_valid ());
> -	  changes.quick_push (change);
> -	}
> +	    case Action::INSERT:
> +	    {
> +	      if (dump_file)
> +		fprintf (dump_file,
> +			 "  stp: cannot re-purpose candidate stores\n");
>  
> -      if (!store_to_change)
> -	{
> -	  // Tricky case.  Cannot re-purpose existing insns for stp.
> -	  // Need to insert new insn.
> -	  if (dump_file)
> -	    fprintf (dump_file,
> -		     "  stp fusion: cannot re-purpose candidate stores\n");
> -
> -	  auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
> -	  change = make_change (new_insn);
> -	  change->move_range = move_range;
> -	  change->new_uses = merge_access_arrays (attempt,
> -						  input_uses[0],
> -						  input_uses[1]);
> -	  gcc_assert (change->new_uses.is_valid ());
> -
> -	  auto d1 = drop_memory_access (input_defs[0]);
> -	  auto d2 = drop_memory_access (input_defs[1]);
> -	  change->new_defs = merge_access_arrays (attempt, d1, d2);
> -	  gcc_assert (change->new_defs.is_valid ());
> -
> -	  auto new_set = crtl->ssa->create_set (attempt, new_insn, memory);
> -	  change->new_defs = insert_access (attempt, new_set,
> -					    change->new_defs);
> -	  gcc_assert (change->new_defs.is_valid ());
> -	  changes.safe_insert (1, change);
> -	  pair_change = change;
> +	      auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
> +	      change = make_change (new_insn);
> +	      change->move_range = move_range;
> +	      change->new_uses = merge_access_arrays (attempt,
> +						      input_uses[0],
> +						      input_uses[1]);
> +	      gcc_assert (change->new_uses.is_valid ());
> +
> +	      auto d1 = drop_memory_access (input_defs[0]);
> +	      auto d2 = drop_memory_access (input_defs[1]);
> +	      change->new_defs = merge_access_arrays (attempt, d1, d2);
> +	      gcc_assert (change->new_defs.is_valid ());
> +
> +	      new_set = crtl->ssa->create_set (attempt, new_insn, memory);
> +	      change->new_defs = insert_access (attempt, new_set,
> +						change->new_defs);
> +	      gcc_assert (change->new_defs.is_valid ());
> +	      pair_change = change;
> +	      break;
> +	    }
> +	    case Action::FIXUP_USE:
> +	    {
> +	      // This use now needs to consume memory from our stp.
> +	      if (dump_file)
> +		fprintf (dump_file,
> +			 "  stp: changing i%d to use mem from new stp "
> +			 "(after i%d)\n",
> +			 action.insn->uid (), stp_dest->uid ());
> +	      change->new_uses = drop_memory_access (change->new_uses);
> +	      gcc_assert (new_set);
> +	      auto new_use = crtl->ssa->create_use (attempt, action.insn,
> +						    new_set);
> +	      change->new_uses = insert_access (attempt, new_use,
> +						change->new_uses);
> +	      break;
> +	    }
> +	    }
> +	  changes.safe_push (change);
>  	}
>      }
>  
>    if (trailing_add)
>      changes.quick_push (make_delete (trailing_add));
>  
> -  auto n_changes = changes.length ();
> -  gcc_checking_assert (n_changes >= 2 && n_changes <= 4);
> -
>    auto is_changing = insn_is_changing (changes);
> -  for (unsigned i = 0; i < n_changes; i++)
> +  for (unsigned i = 0; i < changes.length (); i++)
>      gcc_assert (rtl_ssa::restrict_movement_ignoring (*changes[i], is_changing));
>  
>    // Check the pair pattern is recog'd.
  
Alex Coplan Jan. 22, 2024, 9:50 p.m. UTC | #3
On 22/01/2024 15:59, Richard Sandiford wrote:
> Alex Coplan <alex.coplan@arm.com> writes:
> > As the PR shows (specifically #c7) we are missing updating uses of mem
> > when inserting an stp in the aarch64 load/store pair fusion pass.  This
> > patch fixes that.
> >
> > RTL-SSA has a simple view of memory and by default doesn't allow stores
> > to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
> > own alias analysis and so can re-order stores over other accesses when
> > we deem this is safe.  If neither store can be re-purposed (moved into
> > the required position to form the stp while respecting the RTL-SSA
> > constraints), then we turn both the candidate stores into "tombstone"
> > insns (logically delete them) and insert a new stp insn.
> >
> > As it stands, we implement the insert case separately (after dealing
> > with the candidate stores) in fuse_pair by inserting into the middle of
> > the vector of changes.  This is OK when we only have to insert one
> > change, but with this fix we would need to insert the change for the new
> > stp plus multiple changes to fix up uses of mem (note the number of
> > fix-ups is naturally bounded by the alias limit param to prevent
> > quadratic behaviour).  If we kept the code structured as is and inserted
> > into the middle of the vector, that would lead to repeated moving of
> > elements in the vector which seems inefficient.  The structure of the
> > code would also be a little unwieldy.
> >
> > To improve on that situation, this patch introduces a helper class,
> > stp_change_builder, which implements a state machine that helps to build
> > the required changes directly in program order.  That state machine is
> > reponsible for deciding what changes need to be made in what order, and
> > the code in fuse_pair then simply follows those steps.
> >
> > Together with the fix in the previous patch for installing new defs
> > correctly in RTL-SSA, this fixes PR113070.
> >
> > We take the opportunity to rename the function decide_stp_strategy to
> > try_repurpose_store, as that seems more descriptive of what it actually
> > does, since stp_change_builder is now responsible for the overall change
> > strategy.
> >
> > Bootstrapped/regtested as a series with/without the passes enabled on
> > aarch64-linux-gnu, OK for trunk?
> >
> > Thanks,
> > Alex
> >
> > gcc/ChangeLog:
> >
> > 	PR target/113070
> > 	* config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
> > 	(decide_stp_strategy): Reanme to ...
> > 	(try_repurpose_store): ... this.
> > 	(ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
> > 	construct stp changes.  Fix up uses when inserting new stp insns.
> > ---
> >  gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++++++++++++++++++-----
> >  1 file changed, 194 insertions(+), 54 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > index 689a8c884bd..703cfb1228c 100644
> > --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
> >    return range;
> >  }
> >  
> > +// Class that implements a state machine for building the changes needed to form
> > +// a store pair instruction.  This allows us to easily build the changes in
> > +// program order, as required by rtl-ssa.
> > +struct stp_change_builder
> > +{
> > +  enum class state
> > +  {
> > +    FIRST,
> > +    INSERT,
> > +    FIXUP_USE,
> > +    LAST,
> > +    DONE
> > +  };
> > +
> > +  enum class action
> > +  {
> > +    TOMBSTONE,
> > +    CHANGE,
> > +    INSERT,
> > +    FIXUP_USE
> > +  };
> > +
> > +  struct change
> > +  {
> > +    action type;
> > +    insn_info *insn;
> > +  };
> > +
> > +  bool done () const { return m_state == state::DONE; }
> > +
> > +  stp_change_builder (insn_info *insns[2],
> > +		      insn_info *repurpose,
> > +		      insn_info *dest)
> > +    : m_state (state::FIRST), m_insns { insns[0], insns[1] },
> > +      m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}
> 
> Just to make sure I understand: is it the case that
> 
>   *insns[0] <= *dest <= *insns[1]
> 
> ?

Yes, that is my understanding.  I thought about asserting it somewhere in
stp_change_builder, but it seemed a bit gratuitous.

> 
> > +
> > +  change get_change () const
> > +  {
> > +    switch (m_state)
> > +      {
> > +      case state::FIRST:
> > +	return {
> > +	  m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> > +	  m_insns[0]
> > +	};
> > +      case state::LAST:
> > +	return {
> > +	  m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> > +	  m_insns[1]
> > +	};
> > +      case state::INSERT:
> > +	return { action::INSERT, m_dest };
> > +      case state::FIXUP_USE:
> > +	return { action::FIXUP_USE, m_use->insn () };
> > +      case state::DONE:
> > +	break;
> > +      }
> > +
> > +    gcc_unreachable ();
> > +  }
> > +
> > +  // Transition to the next state.
> > +  void advance ()
> > +  {
> > +    switch (m_state)
> > +      {
> > +      case state::FIRST:
> > +	if (m_repurpose)
> > +	  m_state = state::LAST;
> > +	else
> > +	  m_state = state::INSERT;
> > +	break;
> > +      case state::INSERT:
> > +      {
> > +	def_info *def = memory_access (m_insns[0]->defs ());
> > +	while (*def->next_def ()->insn () <= *m_dest)
> > +	  def = def->next_def ();
> > +
> > +	// Now we know DEF feeds the insertion point for the new stp.
> > +	// Look for any uses of DEF that will consume the new stp.
> > +	gcc_assert (*def->insn () <= *m_dest
> > +		    && *def->next_def ()->insn () > *m_dest);
> > +
> > +	if (auto set = dyn_cast<set_info *> (def))
> 
> I think this should be an unconditional as_a.  Clobbers of memory
> shouldn't be a thing, and it's not obvious that doing nothing here
> would be the correct behaviour.

Agreed, thanks.

> 
> The patch looks good to me with that change and the one that you
> pointed out in your reply.
> 
> However, as a follow-on, we should probably also handle debug
> instructions.  The conservatively correct thing to do would be
> to reset all debug uses that occur after *m_dest, since we (rightly)
> haven't checked them for aliases before attempting the change.
> A more expensive alternative would be to check each use for aliases
> and only reset where necessary.
> 
> Unfortunately, that would apply to subsequent defs too, up to the
> later of the two original instructions.
> 
> I'm not sure how good other passes are doing this kind of update though.

Yeah, should be handled by the PR113089 fixes (as you realised in later
reviews).

Thanks a lot for the reviews!  I'll re-test the series with those
changes.

Alex

> 
> Thanks,
> Richard
> 
> > +	  for (auto use : set->nondebug_insn_uses ())
> > +	    if (*use->insn () > *m_dest)
> > +	      {
> > +		m_use = use;
> > +		break;
> > +	      }
> > +
> > +	if (m_use)
> > +	  m_state = state::FIXUP_USE;
> > +	else
> > +	  m_state = state::LAST;
> > +	break;
> > +      }
> > +      case state::FIXUP_USE:
> > +	m_use = m_use->next_nondebug_insn_use ();
> > +	if (!m_use)
> > +	  m_state = state::LAST;
> > +	break;
> > +      case state::LAST:
> > +	m_state = state::DONE;
> > +	break;
> > +      case state::DONE:
> > +	gcc_unreachable ();
> > +      }
> > +  }
> > +
> > +private:
> > +  state m_state;
> > +
> > +  // Original candidate stores.
> > +  insn_info *m_insns[2];
> > +
> > +  // If non-null, this is a candidate insn to change into an stp.  Otherwise we
> > +  // are deleting both original insns and inserting a new insn for the stp.
> > +  insn_info *m_repurpose;
> > +
> > +  // Destionation of the stp, it will be placed immediately after m_dest.
> > +  insn_info *m_dest;
> > +
> > +  // Current nondebug use that needs updating due to stp insertion.
> > +  use_info *m_use;
> > +};
> > +
> >  // Given candidate store insns FIRST and SECOND, see if we can re-purpose one
> >  // of them (together with its def of memory) for the stp insn.  If so, return
> >  // that insn.  Otherwise, return null.
> >  static insn_info *
> > -decide_stp_strategy (insn_info *first,
> > +try_repurpose_store (insn_info *first,
> >  		     insn_info *second,
> >  		     const insn_range_info &move_range)
> >  {
> > @@ -1253,7 +1380,7 @@ ldp_bb_info::fuse_pair (bool load_p,
> >  
> >    insn_info *insns[2] = { first, second };
> >  
> > -  auto_vec<insn_change *, 4> changes (4);
> > +  auto_vec<insn_change *> changes;
> >    auto_vec<int, 2> tombstone_uids (2);
> >  
> >    rtx pats[2] = {
> > @@ -1455,9 +1582,9 @@ ldp_bb_info::fuse_pair (bool load_p,
> >  
> >    if (load_p)
> >      {
> > -      changes.quick_push (make_delete (first));
> > +      changes.safe_push (make_delete (first));
> >        pair_change = make_change (second);
> > -      changes.quick_push (pair_change);
> > +      changes.safe_push (pair_change);
> >  
> >        pair_change->move_range = move_range;
> >        pair_change->new_defs = merge_access_arrays (attempt,
> > @@ -1474,18 +1601,22 @@ ldp_bb_info::fuse_pair (bool load_p,
> >      }
> >    else
> >      {
> > -      insn_info *store_to_change = decide_stp_strategy (first, second,
> > +      using Action = stp_change_builder::action;
> > +      insn_info *store_to_change = try_repurpose_store (first, second,
> >  							move_range);
> > -
> > -      if (store_to_change && dump_file)
> > -	fprintf (dump_file, "  stp: re-purposing store %d\n",
> > -		 store_to_change->uid ());
> > -
> > +      insn_info *stp_dest = move_range.singleton ();
> > +      gcc_assert (stp_dest);
> > +      stp_change_builder builder (insns, store_to_change, stp_dest);
> >        insn_change *change;
> > -      for (int i = 0; i < 2; i++)
> > +      set_info *new_set = nullptr;
> > +      for (; !builder.done (); builder.advance ())
> >  	{
> > -	  change = make_change (insns[i]);
> > -	  if (insns[i] == store_to_change)
> > +	  auto action = builder.get_change ();
> > +	  change = (action.type == Action::INSERT)
> > +	    ? nullptr : make_change (action.insn);
> > +	  switch (action.type)
> > +	    {
> > +	    case Action::CHANGE:
> >  	    {
> >  	      set_pair_pat (change);
> >  	      change->new_uses = merge_access_arrays (attempt,
> > @@ -1495,67 +1626,76 @@ ldp_bb_info::fuse_pair (bool load_p,
> >  	      auto d2 = drop_memory_access (input_defs[1]);
> >  	      change->new_defs = merge_access_arrays (attempt, d1, d2);
> >  	      gcc_assert (change->new_defs.is_valid ());
> > -	      def_info *stp_def = memory_access (store_to_change->defs ());
> > +	      def_info *stp_def = memory_access (change->insn ()->defs ());
> >  	      change->new_defs = insert_access (attempt,
> >  						stp_def,
> >  						change->new_defs);
> >  	      gcc_assert (change->new_defs.is_valid ());
> >  	      change->move_range = move_range;
> >  	      pair_change = change;
> > +	      break;
> >  	    }
> > -	  else
> > +	    case Action::TOMBSTONE:
> >  	    {
> > -	      // Note that we are turning this insn into a tombstone,
> > -	      // we need to keep track of these if we go ahead with the
> > -	      // change.
> > -	      tombstone_uids.quick_push (insns[i]->uid ());
> > -	      rtx_insn *rti = insns[i]->rtl ();
> > +	      tombstone_uids.quick_push (change->insn ()->uid ());
> > +	      rtx_insn *rti = change->insn ()->rtl ();
> >  	      validate_change (rti, &PATTERN (rti), gen_tombstone (), true);
> >  	      validate_change (rti, &REG_NOTES (rti), NULL_RTX, true);
> >  	      change->new_uses = use_array (nullptr, 0);
> > +	      break;
> >  	    }
> > -	  gcc_assert (change->new_uses.is_valid ());
> > -	  changes.quick_push (change);
> > -	}
> > +	    case Action::INSERT:
> > +	    {
> > +	      if (dump_file)
> > +		fprintf (dump_file,
> > +			 "  stp: cannot re-purpose candidate stores\n");
> >  
> > -      if (!store_to_change)
> > -	{
> > -	  // Tricky case.  Cannot re-purpose existing insns for stp.
> > -	  // Need to insert new insn.
> > -	  if (dump_file)
> > -	    fprintf (dump_file,
> > -		     "  stp fusion: cannot re-purpose candidate stores\n");
> > -
> > -	  auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
> > -	  change = make_change (new_insn);
> > -	  change->move_range = move_range;
> > -	  change->new_uses = merge_access_arrays (attempt,
> > -						  input_uses[0],
> > -						  input_uses[1]);
> > -	  gcc_assert (change->new_uses.is_valid ());
> > -
> > -	  auto d1 = drop_memory_access (input_defs[0]);
> > -	  auto d2 = drop_memory_access (input_defs[1]);
> > -	  change->new_defs = merge_access_arrays (attempt, d1, d2);
> > -	  gcc_assert (change->new_defs.is_valid ());
> > -
> > -	  auto new_set = crtl->ssa->create_set (attempt, new_insn, memory);
> > -	  change->new_defs = insert_access (attempt, new_set,
> > -					    change->new_defs);
> > -	  gcc_assert (change->new_defs.is_valid ());
> > -	  changes.safe_insert (1, change);
> > -	  pair_change = change;
> > +	      auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
> > +	      change = make_change (new_insn);
> > +	      change->move_range = move_range;
> > +	      change->new_uses = merge_access_arrays (attempt,
> > +						      input_uses[0],
> > +						      input_uses[1]);
> > +	      gcc_assert (change->new_uses.is_valid ());
> > +
> > +	      auto d1 = drop_memory_access (input_defs[0]);
> > +	      auto d2 = drop_memory_access (input_defs[1]);
> > +	      change->new_defs = merge_access_arrays (attempt, d1, d2);
> > +	      gcc_assert (change->new_defs.is_valid ());
> > +
> > +	      new_set = crtl->ssa->create_set (attempt, new_insn, memory);
> > +	      change->new_defs = insert_access (attempt, new_set,
> > +						change->new_defs);
> > +	      gcc_assert (change->new_defs.is_valid ());
> > +	      pair_change = change;
> > +	      break;
> > +	    }
> > +	    case Action::FIXUP_USE:
> > +	    {
> > +	      // This use now needs to consume memory from our stp.
> > +	      if (dump_file)
> > +		fprintf (dump_file,
> > +			 "  stp: changing i%d to use mem from new stp "
> > +			 "(after i%d)\n",
> > +			 action.insn->uid (), stp_dest->uid ());
> > +	      change->new_uses = drop_memory_access (change->new_uses);
> > +	      gcc_assert (new_set);
> > +	      auto new_use = crtl->ssa->create_use (attempt, action.insn,
> > +						    new_set);
> > +	      change->new_uses = insert_access (attempt, new_use,
> > +						change->new_uses);
> > +	      break;
> > +	    }
> > +	    }
> > +	  changes.safe_push (change);
> >  	}
> >      }
> >  
> >    if (trailing_add)
> >      changes.quick_push (make_delete (trailing_add));
> >  
> > -  auto n_changes = changes.length ();
> > -  gcc_checking_assert (n_changes >= 2 && n_changes <= 4);
> > -
> >    auto is_changing = insn_is_changing (changes);
> > -  for (unsigned i = 0; i < n_changes; i++)
> > +  for (unsigned i = 0; i < changes.length (); i++)
> >      gcc_assert (rtl_ssa::restrict_movement_ignoring (*changes[i], is_changing));
> >  
> >    // Check the pair pattern is recog'd.
  
Alex Coplan Jan. 23, 2024, 1:30 p.m. UTC | #4
On 22/01/2024 21:50, Alex Coplan wrote:
> On 22/01/2024 15:59, Richard Sandiford wrote:
> > Alex Coplan <alex.coplan@arm.com> writes:
> > > As the PR shows (specifically #c7) we are missing updating uses of mem
> > > when inserting an stp in the aarch64 load/store pair fusion pass.  This
> > > patch fixes that.
> > >
> > > RTL-SSA has a simple view of memory and by default doesn't allow stores
> > > to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
> > > own alias analysis and so can re-order stores over other accesses when
> > > we deem this is safe.  If neither store can be re-purposed (moved into
> > > the required position to form the stp while respecting the RTL-SSA
> > > constraints), then we turn both the candidate stores into "tombstone"
> > > insns (logically delete them) and insert a new stp insn.
> > >
> > > As it stands, we implement the insert case separately (after dealing
> > > with the candidate stores) in fuse_pair by inserting into the middle of
> > > the vector of changes.  This is OK when we only have to insert one
> > > change, but with this fix we would need to insert the change for the new
> > > stp plus multiple changes to fix up uses of mem (note the number of
> > > fix-ups is naturally bounded by the alias limit param to prevent
> > > quadratic behaviour).  If we kept the code structured as is and inserted
> > > into the middle of the vector, that would lead to repeated moving of
> > > elements in the vector which seems inefficient.  The structure of the
> > > code would also be a little unwieldy.
> > >
> > > To improve on that situation, this patch introduces a helper class,
> > > stp_change_builder, which implements a state machine that helps to build
> > > the required changes directly in program order.  That state machine is
> > > reponsible for deciding what changes need to be made in what order, and
> > > the code in fuse_pair then simply follows those steps.
> > >
> > > Together with the fix in the previous patch for installing new defs
> > > correctly in RTL-SSA, this fixes PR113070.
> > >
> > > We take the opportunity to rename the function decide_stp_strategy to
> > > try_repurpose_store, as that seems more descriptive of what it actually
> > > does, since stp_change_builder is now responsible for the overall change
> > > strategy.
> > >
> > > Bootstrapped/regtested as a series with/without the passes enabled on
> > > aarch64-linux-gnu, OK for trunk?
> > >
> > > Thanks,
> > > Alex
> > >
> > > gcc/ChangeLog:
> > >
> > > 	PR target/113070
> > > 	* config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
> > > 	(decide_stp_strategy): Reanme to ...
> > > 	(try_repurpose_store): ... this.
> > > 	(ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
> > > 	construct stp changes.  Fix up uses when inserting new stp insns.
> > > ---
> > >  gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++++++++++++++++++-----
> > >  1 file changed, 194 insertions(+), 54 deletions(-)
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > > index 689a8c884bd..703cfb1228c 100644
> > > --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > > +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > > @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
> > >    return range;
> > >  }
> > >  
> > > +// Class that implements a state machine for building the changes needed to form
> > > +// a store pair instruction.  This allows us to easily build the changes in
> > > +// program order, as required by rtl-ssa.
> > > +struct stp_change_builder
> > > +{
> > > +  enum class state
> > > +  {
> > > +    FIRST,
> > > +    INSERT,
> > > +    FIXUP_USE,
> > > +    LAST,
> > > +    DONE
> > > +  };
> > > +
> > > +  enum class action
> > > +  {
> > > +    TOMBSTONE,
> > > +    CHANGE,
> > > +    INSERT,
> > > +    FIXUP_USE
> > > +  };
> > > +
> > > +  struct change
> > > +  {
> > > +    action type;
> > > +    insn_info *insn;
> > > +  };
> > > +
> > > +  bool done () const { return m_state == state::DONE; }
> > > +
> > > +  stp_change_builder (insn_info *insns[2],
> > > +		      insn_info *repurpose,
> > > +		      insn_info *dest)
> > > +    : m_state (state::FIRST), m_insns { insns[0], insns[1] },
> > > +      m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}
> > 
> > Just to make sure I understand: is it the case that
> > 
> >   *insns[0] <= *dest <= *insns[1]
> > 
> > ?
> 
> Yes, that is my understanding.  I thought about asserting it somewhere in
> stp_change_builder, but it seemed a bit gratuitous.
> 
> > 
> > > +
> > > +  change get_change () const
> > > +  {
> > > +    switch (m_state)
> > > +      {
> > > +      case state::FIRST:
> > > +	return {
> > > +	  m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> > > +	  m_insns[0]
> > > +	};
> > > +      case state::LAST:
> > > +	return {
> > > +	  m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> > > +	  m_insns[1]
> > > +	};
> > > +      case state::INSERT:
> > > +	return { action::INSERT, m_dest };
> > > +      case state::FIXUP_USE:
> > > +	return { action::FIXUP_USE, m_use->insn () };
> > > +      case state::DONE:
> > > +	break;
> > > +      }
> > > +
> > > +    gcc_unreachable ();
> > > +  }
> > > +
> > > +  // Transition to the next state.
> > > +  void advance ()
> > > +  {
> > > +    switch (m_state)
> > > +      {
> > > +      case state::FIRST:
> > > +	if (m_repurpose)
> > > +	  m_state = state::LAST;
> > > +	else
> > > +	  m_state = state::INSERT;
> > > +	break;
> > > +      case state::INSERT:
> > > +      {
> > > +	def_info *def = memory_access (m_insns[0]->defs ());
> > > +	while (*def->next_def ()->insn () <= *m_dest)
> > > +	  def = def->next_def ();
> > > +
> > > +	// Now we know DEF feeds the insertion point for the new stp.
> > > +	// Look for any uses of DEF that will consume the new stp.
> > > +	gcc_assert (*def->insn () <= *m_dest
> > > +		    && *def->next_def ()->insn () > *m_dest);
> > > +
> > > +	if (auto set = dyn_cast<set_info *> (def))
> > 
> > I think this should be an unconditional as_a.  Clobbers of memory
> > shouldn't be a thing, and it's not obvious that doing nothing here
> > would be the correct behaviour.
> 
> Agreed, thanks.
> 
> > 
> > The patch looks good to me with that change and the one that you
> > pointed out in your reply.
> > 
> > However, as a follow-on, we should probably also handle debug
> > instructions.  The conservatively correct thing to do would be
> > to reset all debug uses that occur after *m_dest, since we (rightly)
> > haven't checked them for aliases before attempting the change.
> > A more expensive alternative would be to check each use for aliases
> > and only reset where necessary.
> > 
> > Unfortunately, that would apply to subsequent defs too, up to the
> > later of the two original instructions.
> > 
> > I'm not sure how good other passes are doing this kind of update though.
> 
> Yeah, should be handled by the PR113089 fixes (as you realised in later
> reviews).
> 
> Thanks a lot for the reviews!  I'll re-test the series with those
> changes.

Testing went OK, so I've pushed the series (with the requested changes)
as:

ef86659da9d aarch64: Fix up uses of mem following stp insert [PR113070]
6dd613df590 rtl-ssa: Ensure new defs get inserted [PR113070]
fce3994d04f rtl-ssa: Support for creating new uses [PR113070]
e0374b028a6 rtl-ssa: Run finalize_new_accesses forwards [PR113070]

Thanks,
Alex

> 
> Alex
> 
> > 
> > Thanks,
> > Richard
> > 
> > > +	  for (auto use : set->nondebug_insn_uses ())
> > > +	    if (*use->insn () > *m_dest)
> > > +	      {
> > > +		m_use = use;
> > > +		break;
> > > +	      }
> > > +
> > > +	if (m_use)
> > > +	  m_state = state::FIXUP_USE;
> > > +	else
> > > +	  m_state = state::LAST;
> > > +	break;
> > > +      }
> > > +      case state::FIXUP_USE:
> > > +	m_use = m_use->next_nondebug_insn_use ();
> > > +	if (!m_use)
> > > +	  m_state = state::LAST;
> > > +	break;
> > > +      case state::LAST:
> > > +	m_state = state::DONE;
> > > +	break;
> > > +      case state::DONE:
> > > +	gcc_unreachable ();
> > > +      }
> > > +  }
> > > +
> > > +private:
> > > +  state m_state;
> > > +
> > > +  // Original candidate stores.
> > > +  insn_info *m_insns[2];
> > > +
> > > +  // If non-null, this is a candidate insn to change into an stp.  Otherwise we
> > > +  // are deleting both original insns and inserting a new insn for the stp.
> > > +  insn_info *m_repurpose;
> > > +
> > > +  // Destionation of the stp, it will be placed immediately after m_dest.
> > > +  insn_info *m_dest;
> > > +
> > > +  // Current nondebug use that needs updating due to stp insertion.
> > > +  use_info *m_use;
> > > +};
> > > +
> > >  // Given candidate store insns FIRST and SECOND, see if we can re-purpose one
> > >  // of them (together with its def of memory) for the stp insn.  If so, return
> > >  // that insn.  Otherwise, return null.
> > >  static insn_info *
> > > -decide_stp_strategy (insn_info *first,
> > > +try_repurpose_store (insn_info *first,
> > >  		     insn_info *second,
> > >  		     const insn_range_info &move_range)
> > >  {
> > > @@ -1253,7 +1380,7 @@ ldp_bb_info::fuse_pair (bool load_p,
> > >  
> > >    insn_info *insns[2] = { first, second };
> > >  
> > > -  auto_vec<insn_change *, 4> changes (4);
> > > +  auto_vec<insn_change *> changes;
> > >    auto_vec<int, 2> tombstone_uids (2);
> > >  
> > >    rtx pats[2] = {
> > > @@ -1455,9 +1582,9 @@ ldp_bb_info::fuse_pair (bool load_p,
> > >  
> > >    if (load_p)
> > >      {
> > > -      changes.quick_push (make_delete (first));
> > > +      changes.safe_push (make_delete (first));
> > >        pair_change = make_change (second);
> > > -      changes.quick_push (pair_change);
> > > +      changes.safe_push (pair_change);
> > >  
> > >        pair_change->move_range = move_range;
> > >        pair_change->new_defs = merge_access_arrays (attempt,
> > > @@ -1474,18 +1601,22 @@ ldp_bb_info::fuse_pair (bool load_p,
> > >      }
> > >    else
> > >      {
> > > -      insn_info *store_to_change = decide_stp_strategy (first, second,
> > > +      using Action = stp_change_builder::action;
> > > +      insn_info *store_to_change = try_repurpose_store (first, second,
> > >  							move_range);
> > > -
> > > -      if (store_to_change && dump_file)
> > > -	fprintf (dump_file, "  stp: re-purposing store %d\n",
> > > -		 store_to_change->uid ());
> > > -
> > > +      insn_info *stp_dest = move_range.singleton ();
> > > +      gcc_assert (stp_dest);
> > > +      stp_change_builder builder (insns, store_to_change, stp_dest);
> > >        insn_change *change;
> > > -      for (int i = 0; i < 2; i++)
> > > +      set_info *new_set = nullptr;
> > > +      for (; !builder.done (); builder.advance ())
> > >  	{
> > > -	  change = make_change (insns[i]);
> > > -	  if (insns[i] == store_to_change)
> > > +	  auto action = builder.get_change ();
> > > +	  change = (action.type == Action::INSERT)
> > > +	    ? nullptr : make_change (action.insn);
> > > +	  switch (action.type)
> > > +	    {
> > > +	    case Action::CHANGE:
> > >  	    {
> > >  	      set_pair_pat (change);
> > >  	      change->new_uses = merge_access_arrays (attempt,
> > > @@ -1495,67 +1626,76 @@ ldp_bb_info::fuse_pair (bool load_p,
> > >  	      auto d2 = drop_memory_access (input_defs[1]);
> > >  	      change->new_defs = merge_access_arrays (attempt, d1, d2);
> > >  	      gcc_assert (change->new_defs.is_valid ());
> > > -	      def_info *stp_def = memory_access (store_to_change->defs ());
> > > +	      def_info *stp_def = memory_access (change->insn ()->defs ());
> > >  	      change->new_defs = insert_access (attempt,
> > >  						stp_def,
> > >  						change->new_defs);
> > >  	      gcc_assert (change->new_defs.is_valid ());
> > >  	      change->move_range = move_range;
> > >  	      pair_change = change;
> > > +	      break;
> > >  	    }
> > > -	  else
> > > +	    case Action::TOMBSTONE:
> > >  	    {
> > > -	      // Note that we are turning this insn into a tombstone,
> > > -	      // we need to keep track of these if we go ahead with the
> > > -	      // change.
> > > -	      tombstone_uids.quick_push (insns[i]->uid ());
> > > -	      rtx_insn *rti = insns[i]->rtl ();
> > > +	      tombstone_uids.quick_push (change->insn ()->uid ());
> > > +	      rtx_insn *rti = change->insn ()->rtl ();
> > >  	      validate_change (rti, &PATTERN (rti), gen_tombstone (), true);
> > >  	      validate_change (rti, &REG_NOTES (rti), NULL_RTX, true);
> > >  	      change->new_uses = use_array (nullptr, 0);
> > > +	      break;
> > >  	    }
> > > -	  gcc_assert (change->new_uses.is_valid ());
> > > -	  changes.quick_push (change);
> > > -	}
> > > +	    case Action::INSERT:
> > > +	    {
> > > +	      if (dump_file)
> > > +		fprintf (dump_file,
> > > +			 "  stp: cannot re-purpose candidate stores\n");
> > >  
> > > -      if (!store_to_change)
> > > -	{
> > > -	  // Tricky case.  Cannot re-purpose existing insns for stp.
> > > -	  // Need to insert new insn.
> > > -	  if (dump_file)
> > > -	    fprintf (dump_file,
> > > -		     "  stp fusion: cannot re-purpose candidate stores\n");
> > > -
> > > -	  auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
> > > -	  change = make_change (new_insn);
> > > -	  change->move_range = move_range;
> > > -	  change->new_uses = merge_access_arrays (attempt,
> > > -						  input_uses[0],
> > > -						  input_uses[1]);
> > > -	  gcc_assert (change->new_uses.is_valid ());
> > > -
> > > -	  auto d1 = drop_memory_access (input_defs[0]);
> > > -	  auto d2 = drop_memory_access (input_defs[1]);
> > > -	  change->new_defs = merge_access_arrays (attempt, d1, d2);
> > > -	  gcc_assert (change->new_defs.is_valid ());
> > > -
> > > -	  auto new_set = crtl->ssa->create_set (attempt, new_insn, memory);
> > > -	  change->new_defs = insert_access (attempt, new_set,
> > > -					    change->new_defs);
> > > -	  gcc_assert (change->new_defs.is_valid ());
> > > -	  changes.safe_insert (1, change);
> > > -	  pair_change = change;
> > > +	      auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
> > > +	      change = make_change (new_insn);
> > > +	      change->move_range = move_range;
> > > +	      change->new_uses = merge_access_arrays (attempt,
> > > +						      input_uses[0],
> > > +						      input_uses[1]);
> > > +	      gcc_assert (change->new_uses.is_valid ());
> > > +
> > > +	      auto d1 = drop_memory_access (input_defs[0]);
> > > +	      auto d2 = drop_memory_access (input_defs[1]);
> > > +	      change->new_defs = merge_access_arrays (attempt, d1, d2);
> > > +	      gcc_assert (change->new_defs.is_valid ());
> > > +
> > > +	      new_set = crtl->ssa->create_set (attempt, new_insn, memory);
> > > +	      change->new_defs = insert_access (attempt, new_set,
> > > +						change->new_defs);
> > > +	      gcc_assert (change->new_defs.is_valid ());
> > > +	      pair_change = change;
> > > +	      break;
> > > +	    }
> > > +	    case Action::FIXUP_USE:
> > > +	    {
> > > +	      // This use now needs to consume memory from our stp.
> > > +	      if (dump_file)
> > > +		fprintf (dump_file,
> > > +			 "  stp: changing i%d to use mem from new stp "
> > > +			 "(after i%d)\n",
> > > +			 action.insn->uid (), stp_dest->uid ());
> > > +	      change->new_uses = drop_memory_access (change->new_uses);
> > > +	      gcc_assert (new_set);
> > > +	      auto new_use = crtl->ssa->create_use (attempt, action.insn,
> > > +						    new_set);
> > > +	      change->new_uses = insert_access (attempt, new_use,
> > > +						change->new_uses);
> > > +	      break;
> > > +	    }
> > > +	    }
> > > +	  changes.safe_push (change);
> > >  	}
> > >      }
> > >  
> > >    if (trailing_add)
> > >      changes.quick_push (make_delete (trailing_add));
> > >  
> > > -  auto n_changes = changes.length ();
> > > -  gcc_checking_assert (n_changes >= 2 && n_changes <= 4);
> > > -
> > >    auto is_changing = insn_is_changing (changes);
> > > -  for (unsigned i = 0; i < n_changes; i++)
> > > +  for (unsigned i = 0; i < changes.length (); i++)
> > >      gcc_assert (rtl_ssa::restrict_movement_ignoring (*changes[i], is_changing));
> > >  
> > >    // Check the pair pattern is recog'd.
  

Patch

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 689a8c884bd..703cfb1228c 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -844,11 +844,138 @@  def_upwards_move_range (def_info *def)
   return range;
 }
 
+// Class that implements a state machine for building the changes needed to form
+// a store pair instruction.  This allows us to easily build the changes in
+// program order, as required by rtl-ssa.
+struct stp_change_builder
+{
+  enum class state
+  {
+    FIRST,
+    INSERT,
+    FIXUP_USE,
+    LAST,
+    DONE
+  };
+
+  enum class action
+  {
+    TOMBSTONE,
+    CHANGE,
+    INSERT,
+    FIXUP_USE
+  };
+
+  struct change
+  {
+    action type;
+    insn_info *insn;
+  };
+
+  bool done () const { return m_state == state::DONE; }
+
+  stp_change_builder (insn_info *insns[2],
+		      insn_info *repurpose,
+		      insn_info *dest)
+    : m_state (state::FIRST), m_insns { insns[0], insns[1] },
+      m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}
+
+  change get_change () const
+  {
+    switch (m_state)
+      {
+      case state::FIRST:
+	return {
+	  m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
+	  m_insns[0]
+	};
+      case state::LAST:
+	return {
+	  m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
+	  m_insns[1]
+	};
+      case state::INSERT:
+	return { action::INSERT, m_dest };
+      case state::FIXUP_USE:
+	return { action::FIXUP_USE, m_use->insn () };
+      case state::DONE:
+	break;
+      }
+
+    gcc_unreachable ();
+  }
+
+  // Transition to the next state.
+  void advance ()
+  {
+    switch (m_state)
+      {
+      case state::FIRST:
+	if (m_repurpose)
+	  m_state = state::LAST;
+	else
+	  m_state = state::INSERT;
+	break;
+      case state::INSERT:
+      {
+	def_info *def = memory_access (m_insns[0]->defs ());
+	while (*def->next_def ()->insn () <= *m_dest)
+	  def = def->next_def ();
+
+	// Now we know DEF feeds the insertion point for the new stp.
+	// Look for any uses of DEF that will consume the new stp.
+	gcc_assert (*def->insn () <= *m_dest
+		    && *def->next_def ()->insn () > *m_dest);
+
+	if (auto set = dyn_cast<set_info *> (def))
+	  for (auto use : set->nondebug_insn_uses ())
+	    if (*use->insn () > *m_dest)
+	      {
+		m_use = use;
+		break;
+	      }
+
+	if (m_use)
+	  m_state = state::FIXUP_USE;
+	else
+	  m_state = state::LAST;
+	break;
+      }
+      case state::FIXUP_USE:
+	m_use = m_use->next_nondebug_insn_use ();
+	if (!m_use)
+	  m_state = state::LAST;
+	break;
+      case state::LAST:
+	m_state = state::DONE;
+	break;
+      case state::DONE:
+	gcc_unreachable ();
+      }
+  }
+
+private:
+  state m_state;
+
+  // Original candidate stores.
+  insn_info *m_insns[2];
+
+  // If non-null, this is a candidate insn to change into an stp.  Otherwise we
+  // are deleting both original insns and inserting a new insn for the stp.
+  insn_info *m_repurpose;
+
+  // Destionation of the stp, it will be placed immediately after m_dest.
+  insn_info *m_dest;
+
+  // Current nondebug use that needs updating due to stp insertion.
+  use_info *m_use;
+};
+
 // Given candidate store insns FIRST and SECOND, see if we can re-purpose one
 // of them (together with its def of memory) for the stp insn.  If so, return
 // that insn.  Otherwise, return null.
 static insn_info *
-decide_stp_strategy (insn_info *first,
+try_repurpose_store (insn_info *first,
 		     insn_info *second,
 		     const insn_range_info &move_range)
 {
@@ -1253,7 +1380,7 @@  ldp_bb_info::fuse_pair (bool load_p,
 
   insn_info *insns[2] = { first, second };
 
-  auto_vec<insn_change *, 4> changes (4);
+  auto_vec<insn_change *> changes;
   auto_vec<int, 2> tombstone_uids (2);
 
   rtx pats[2] = {
@@ -1455,9 +1582,9 @@  ldp_bb_info::fuse_pair (bool load_p,
 
   if (load_p)
     {
-      changes.quick_push (make_delete (first));
+      changes.safe_push (make_delete (first));
       pair_change = make_change (second);
-      changes.quick_push (pair_change);
+      changes.safe_push (pair_change);
 
       pair_change->move_range = move_range;
       pair_change->new_defs = merge_access_arrays (attempt,
@@ -1474,18 +1601,22 @@  ldp_bb_info::fuse_pair (bool load_p,
     }
   else
     {
-      insn_info *store_to_change = decide_stp_strategy (first, second,
+      using Action = stp_change_builder::action;
+      insn_info *store_to_change = try_repurpose_store (first, second,
 							move_range);
-
-      if (store_to_change && dump_file)
-	fprintf (dump_file, "  stp: re-purposing store %d\n",
-		 store_to_change->uid ());
-
+      insn_info *stp_dest = move_range.singleton ();
+      gcc_assert (stp_dest);
+      stp_change_builder builder (insns, store_to_change, stp_dest);
       insn_change *change;
-      for (int i = 0; i < 2; i++)
+      set_info *new_set = nullptr;
+      for (; !builder.done (); builder.advance ())
 	{
-	  change = make_change (insns[i]);
-	  if (insns[i] == store_to_change)
+	  auto action = builder.get_change ();
+	  change = (action.type == Action::INSERT)
+	    ? nullptr : make_change (action.insn);
+	  switch (action.type)
+	    {
+	    case Action::CHANGE:
 	    {
 	      set_pair_pat (change);
 	      change->new_uses = merge_access_arrays (attempt,
@@ -1495,67 +1626,76 @@  ldp_bb_info::fuse_pair (bool load_p,
 	      auto d2 = drop_memory_access (input_defs[1]);
 	      change->new_defs = merge_access_arrays (attempt, d1, d2);
 	      gcc_assert (change->new_defs.is_valid ());
-	      def_info *stp_def = memory_access (store_to_change->defs ());
+	      def_info *stp_def = memory_access (change->insn ()->defs ());
 	      change->new_defs = insert_access (attempt,
 						stp_def,
 						change->new_defs);
 	      gcc_assert (change->new_defs.is_valid ());
 	      change->move_range = move_range;
 	      pair_change = change;
+	      break;
 	    }
-	  else
+	    case Action::TOMBSTONE:
 	    {
-	      // Note that we are turning this insn into a tombstone,
-	      // we need to keep track of these if we go ahead with the
-	      // change.
-	      tombstone_uids.quick_push (insns[i]->uid ());
-	      rtx_insn *rti = insns[i]->rtl ();
+	      tombstone_uids.quick_push (change->insn ()->uid ());
+	      rtx_insn *rti = change->insn ()->rtl ();
 	      validate_change (rti, &PATTERN (rti), gen_tombstone (), true);
 	      validate_change (rti, &REG_NOTES (rti), NULL_RTX, true);
 	      change->new_uses = use_array (nullptr, 0);
+	      break;
 	    }
-	  gcc_assert (change->new_uses.is_valid ());
-	  changes.quick_push (change);
-	}
+	    case Action::INSERT:
+	    {
+	      if (dump_file)
+		fprintf (dump_file,
+			 "  stp: cannot re-purpose candidate stores\n");
 
-      if (!store_to_change)
-	{
-	  // Tricky case.  Cannot re-purpose existing insns for stp.
-	  // Need to insert new insn.
-	  if (dump_file)
-	    fprintf (dump_file,
-		     "  stp fusion: cannot re-purpose candidate stores\n");
-
-	  auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
-	  change = make_change (new_insn);
-	  change->move_range = move_range;
-	  change->new_uses = merge_access_arrays (attempt,
-						  input_uses[0],
-						  input_uses[1]);
-	  gcc_assert (change->new_uses.is_valid ());
-
-	  auto d1 = drop_memory_access (input_defs[0]);
-	  auto d2 = drop_memory_access (input_defs[1]);
-	  change->new_defs = merge_access_arrays (attempt, d1, d2);
-	  gcc_assert (change->new_defs.is_valid ());
-
-	  auto new_set = crtl->ssa->create_set (attempt, new_insn, memory);
-	  change->new_defs = insert_access (attempt, new_set,
-					    change->new_defs);
-	  gcc_assert (change->new_defs.is_valid ());
-	  changes.safe_insert (1, change);
-	  pair_change = change;
+	      auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
+	      change = make_change (new_insn);
+	      change->move_range = move_range;
+	      change->new_uses = merge_access_arrays (attempt,
+						      input_uses[0],
+						      input_uses[1]);
+	      gcc_assert (change->new_uses.is_valid ());
+
+	      auto d1 = drop_memory_access (input_defs[0]);
+	      auto d2 = drop_memory_access (input_defs[1]);
+	      change->new_defs = merge_access_arrays (attempt, d1, d2);
+	      gcc_assert (change->new_defs.is_valid ());
+
+	      new_set = crtl->ssa->create_set (attempt, new_insn, memory);
+	      change->new_defs = insert_access (attempt, new_set,
+						change->new_defs);
+	      gcc_assert (change->new_defs.is_valid ());
+	      pair_change = change;
+	      break;
+	    }
+	    case Action::FIXUP_USE:
+	    {
+	      // This use now needs to consume memory from our stp.
+	      if (dump_file)
+		fprintf (dump_file,
+			 "  stp: changing i%d to use mem from new stp "
+			 "(after i%d)\n",
+			 action.insn->uid (), stp_dest->uid ());
+	      change->new_uses = drop_memory_access (change->new_uses);
+	      gcc_assert (new_set);
+	      auto new_use = crtl->ssa->create_use (attempt, action.insn,
+						    new_set);
+	      change->new_uses = insert_access (attempt, new_use,
+						change->new_uses);
+	      break;
+	    }
+	    }
+	  changes.safe_push (change);
 	}
     }
 
   if (trailing_add)
     changes.quick_push (make_delete (trailing_add));
 
-  auto n_changes = changes.length ();
-  gcc_checking_assert (n_changes >= 2 && n_changes <= 4);
-
   auto is_changing = insn_is_changing (changes);
-  for (unsigned i = 0; i < n_changes; i++)
+  for (unsigned i = 0; i < changes.length (); i++)
     gcc_assert (rtl_ssa::restrict_movement_ignoring (*changes[i], is_changing));
 
   // Check the pair pattern is recog'd.