[RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

Message ID 3313269o-5444-9142-o8ro-1s59r67083pq@fhfr.qr
State New
Headers
Series [RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL |

Commit Message

Richard Biener Oct. 12, 2021, 10:24 a.m. UTC
  This prototype hack introduces a new tcc_reference TREE_AOREFWRAP
which we can use to wrap a reference tree, recording the ao_ref
associated with it.  That comes in handy when trying to optimize
the constant factor involved with alias stmt walking (or alias
queries in general) where there's parts that are liner in the
reference expression complexity, namely get_ref_base_and_extent,
which shows up usually high on profiles.

The idea was to make a wrapping TREE_AOREFWRAP mostly transparent
to users by making gimple_assign_{lhs,rhs1} strip it and provide
special accessors that get at the embedded ao_ref as well as
doing the wrapping when it's not already there.

The following patch is minimal as to make tree-ssa.exp=ssa-fre-*
not ICE and make the testcases from PR28071 and PR39326 compile
successfully at -O1 (both testcases show a moderately high
load on alias stmt walking around 25%, resp. 34%).  With the
patch which makes use of the cache only from stmt_may_clobber_ref_p
for now the compile-time improves by 7%, resp. 19% which means
overall the idea might be worth pursuing.

I did run into more "issues" with the extra TREE_AOREFWRAP appearing
than anticipated (well, no, not really...).  So given it is a
kind of hack already I'm now thinking of making it even more
transparent by instead of wrapping the refs with another tree
to reallocate the outermost handled_component_p (and only those),
placing the ao_ref after the tree and indicating its presence
by TREE_ASM_WRITTEN (or any other available bit).

As with TREE_AOREFWRAP the key is to invalidate the ao_ref
(strip TREE_AOREFWRAP or unset TREE_ASM_WRITTEN) when its
information becomes stale (for example by unsharing) or invalid
(by mangling the reference into sth semantically different) or
no longer precise (by propagating constants into the reference).

I did consider simply using an (optional) on-the-side hash-map
and allocpool to be initialized by passes and queried by the
oracle.  The downside is that we're optimizing a "constant" factor
of the oracle query but a hash-map lookup isn't exactly
cache-friendly.  Likewise caching across pass boundaries sounded
maybe important (OK, we have at most N ao_refs with N linear in
the program size - but as said, we're optimizing a constant factor).
Existing on-the-side caching includes the SCEV cache for example.

So take this as a proof-of-concept showing the possible gain,
I do think going either the TREE_ASM_WRITTEN or on-the-side table
solution is going to have less issues all around the compiler.

Comments?

Thanks,
Richard.

2021-10-12  Richard Biener  <rguenther@suse.de>

	* treestruct.def (TS_AOREFWRAP): New.
	* tree.def (TREE_AOREFWRAP): Likewise.
	* gimple.h (gimple_assign_lhs): Look through TREE_AOREFWRAP.
	(gimple_assign_rhs1): Likewise.
	(gimple_assign_lhs_with_ao_ref): New.
	(gimple_assign_rhs1_with_ao_ref): Likewise.
	* tree-ssa-alias.c (stmt_may_clobber_ref_p_1): Use the
	ao_ref embedded in the stmts LHS if possible.
	<... and more hacks ...>
---
 gcc/gimple.c               | 62 +++++++++++++++++++++++++++++++++++++-
 gcc/gimple.h               | 12 ++++++--
 gcc/tree-core.h            |  8 +++++
 gcc/tree-inline.c          |  4 +++
 gcc/tree-ssa-alias.c       | 11 +++++--
 gcc/tree-ssa-loop-im.c     |  5 ++-
 gcc/tree-ssa-loop-ivopts.c |  3 ++
 gcc/tree-ssa-loop.c        |  1 +
 gcc/tree-ssa-operands.c    |  1 +
 gcc/tree-streamer.c        |  1 +
 gcc/tree.c                 | 15 +++++++--
 gcc/tree.def               |  4 +++
 gcc/treestruct.def         |  1 +
 13 files changed, 119 insertions(+), 9 deletions(-)
  

Comments

Michael Matz Oct. 13, 2021, 2:57 p.m. UTC | #1
Hello,

[this is the fourth attempt to write a comment/review/opinion for this 
ao_ref-in-tcc_reference, please accept some possible incoherence]

On Tue, 12 Oct 2021, Richard Biener via Gcc-patches wrote:

> This prototype hack introduces a new tcc_reference TREE_AOREFWRAP
> which we can use to wrap a reference tree, recording the ao_ref
> associated with it.  That comes in handy when trying to optimize
> the constant factor involved with alias stmt walking (or alias
> queries in general) where there's parts that are liner in the
> reference expression complexity, namely get_ref_base_and_extent,
> which shows up usually high on profiles.

So, generally I like storing things into the IL that are impossible to 
(re-)discover.  Remembering things that are merely slowish to rediscover 
is less clear, the increase of IL memory use, and potentially the 
necessary pointer chasing might just trade one clearly measurable slow 
point (the rediscover computation) with many slow points all over the 
place (the pointer chasing/cache effects).  Here ...

> The following patch is minimal as to make tree-ssa.exp=ssa-fre-*
> not ICE and make the testcases from PR28071 and PR39326 compile
> successfully at -O1 (both testcases show a moderately high
> load on alias stmt walking around 25%, resp. 34%).  With the
> patch which makes use of the cache only from stmt_may_clobber_ref_p
> for now the compile-time improves by 7%, resp. 19% which means
> overall the idea might be worth pursuing.

... you seem to have a point, though.  Also, I am of the opinion that our 
gimple memrefs could be even fatter (and also encode things like 
multi-dimensional accesses, either right from the source code or 
discovered by index analysis), and your idea goes into that direction.  
So, yay for encoding memref properties into the IL, even though here it's 
only a cache.  You solved the necessary invalidation already.  Perhaps 
only partly, that will be seen once exposed to the wild.

So, the only question seems to be how to encode it: either by ...

> transparent by instead of wrapping the refs with another tree

... this (wrapping) ...

> to reallocate the outermost handled_component_p (and only those),

... or that (aggregation).  There is a third possibility if you want this 
only in the gimple world (which is the case): encode it not in the trees 
but in the gimple statements.  This sort of works easily for everything 
except calls.  I will not consider this variant, nor the side table 
implementation.

While writing this email I switched between more liking one or the other, 
multiple times.  So, I'll write down some basic facts/requirements:

1) You basically want to add stuff to an existing structure:
(a) by wrapping: to work seemlessly the outer tree should have similar 
    enough properties to the inner tree (e.g. also be tcc_reference) to be 
    used interchangably in most code, except that which needs to look at 
    the added stuff.
(b) by aggregating the stuff into the existing structure itself: if you 
    need both structs (with and without stuff) the pure thing to do is to 
    actually create two structs, once with, once without stuff.
2) the added stuff is optional
3) we have multiple things (all tcc_reference) to which to add stuff
4) all tcc_reference are tree_exp, which is variable number of operands,
   which constrain things we can do naturally (e.g. we can't add stuff 
   after tree_exp, except by constraining the number of operands)

Considering this it seems that aggregation is worse: you basically double 
the number of structure types (at least conceptually, if you go with your 
bit-idea).  So, some idea of wrapping seems more natural.

(I think your idea of aggregation but going with a bit flag to indicate if 
this tcc_reference is or isn't annotated, and therefore has things 
allocated after the variable number of operands, is a terrible hack)

There is another possibility doing something like your bit-flag 
aggregation but with fewer hackery: if ao_ref would be a tree it could be 
a normal operand of a tree_exp (and hence tcc_reference), just that the 
number of operands then would vary depending on if it's annotated or not.

Making ao_ref into a tree would also enable the use of ANNOTATE_EXPR for a 
generic wrapping tree.  (Currently its used only in very specific cases, 
so ANNOTATE_EXPR handling would need to be extended all over, and as it's 
not a tcc_reference it would probably rather mean to introduce a new 
ANNOTATE_REF).

Anyway, with this:

struct tree_ref_annotation {
  struct tree_base base;
  struct ao_ref ao_ref;
};

DEFTREECODE(TREE_MEM_ANNO, "mem_anno", tcc_exceptional, 0);

you could then add

DEFTREECODE(MEM_REF_A, "mem_ref_a", tcc_reference, 3);

where TREE_OPERAND(memref, 2) would then be a TREE_MEM_ANNO.  If we were 
to add one operand slot to each tcc_reference we could even do without new 
tree codes: the existence of an ao_ref would simply be indicated by 
TREE_OPERAND(ref, position) being non-NULL.  It would of course enlargen 
each tcc_reference by one pointer, but maybe we could move more things 
into the above ref_annotation from somewhere else (points-to sets? scev 
info?) and then trade saving memory there with this pointer which then 
will more often be used.  (I think the eight bytes one has to pay for 
wrapping the ao_ref into a tree by adding tree_base might be quite 
acceptabable, or, if ao_ref would be a tree itself (not just wrapped into 
one), then e.g. the ao_ref.volatile_p flag could be grabbed from tree_base 
and free up the 8 bytes again)

So, at _this_ write-through of the email I think I like the above idea 
best: make ao_ref be a tree (at least its storage, because it currently 
is a one-member-function class), make ao_ref.volatile_p be 
tree_base.volatile_flag (hence TREE_VOLATILE(ao_ref)) (this reduces 
sizeof(ao_ref) by 8), increase all nr-of-operand of each tcc_reference by 
1, and make TREE_AO_REF(reftree) be "TREE_OPERAND(reftree, 
TREE_CODE_LENGTH(reftree) - 1)", i.e. the last operand of such 
tcc_reference tree.

Hmm, food for thought :)


Ciao,
Michael.
  
Richard Biener Oct. 14, 2021, 1:13 p.m. UTC | #2
On Wed, 13 Oct 2021, Michael Matz wrote:

> Hello,
> 
> [this is the fourth attempt to write a comment/review/opinion for this 
> ao_ref-in-tcc_reference, please accept some possible incoherence]
> 
> On Tue, 12 Oct 2021, Richard Biener via Gcc-patches wrote:
> 
> > This prototype hack introduces a new tcc_reference TREE_AOREFWRAP
> > which we can use to wrap a reference tree, recording the ao_ref
> > associated with it.  That comes in handy when trying to optimize
> > the constant factor involved with alias stmt walking (or alias
> > queries in general) where there's parts that are liner in the
> > reference expression complexity, namely get_ref_base_and_extent,
> > which shows up usually high on profiles.
> 
> So, generally I like storing things into the IL that are impossible to 
> (re-)discover.  Remembering things that are merely slowish to rediscover 
> is less clear, the increase of IL memory use, and potentially the 
> necessary pointer chasing might just trade one clearly measurable slow 
> point (the rediscover computation) with many slow points all over the 
> place (the pointer chasing/cache effects).  Here ...

Note putting the cache in the IL rather than on-the-side was to
improve this - notably the approach embedding the ao_ref within
the outermost ref allocation (the TREE_ASM_WRITTEN thing) would
be just a pointer add away from memory we access anyway.  It also
cuts down the cost of the cache query itself to a bit test in
cache rather than querying a hashtable (and the hashtable pointer).

But yes, embedding any sort of cache into the IL is questionable.

> > The following patch is minimal as to make tree-ssa.exp=ssa-fre-*
> > not ICE and make the testcases from PR28071 and PR39326 compile
> > successfully at -O1 (both testcases show a moderately high
> > load on alias stmt walking around 25%, resp. 34%).  With the
> > patch which makes use of the cache only from stmt_may_clobber_ref_p
> > for now the compile-time improves by 7%, resp. 19% which means
> > overall the idea might be worth pursuing.
> 
> ... you seem to have a point, though.  Also, I am of the opinion that our 
> gimple memrefs could be even fatter (and also encode things like 
> multi-dimensional accesses, either right from the source code or 
> discovered by index analysis), and your idea goes into that direction.

Agreed with the issue of our reference tree representation, not sure
if the cache really goes into the fixing direction though.

> So, yay for encoding memref properties into the IL, even though here it's 
> only a cache.  You solved the necessary invalidation already.  Perhaps 
> only partly, that will be seen once exposed to the wild.
> 
> So, the only question seems to be how to encode it: either by ...
> 
> > transparent by instead of wrapping the refs with another tree
> 
> ... this (wrapping) ...
> 
> > to reallocate the outermost handled_component_p (and only those),
> 
> ... or that (aggregation).  There is a third possibility if you want this 
> only in the gimple world (which is the case): encode it not in the trees 
> but in the gimple statements.  This sort of works easily for everything 
> except calls.  I will not consider this variant, nor the side table 
> implementation.

Yeah, I was not considering to use the GIMPLE stmt itself because
we can have two (for aggregate copies) and N (for calls and asms).
I'm still considering the side table though.

> 
> While writing this email I switched between more liking one or the other, 
> multiple times.  So, I'll write down some basic facts/requirements:
> 
> 1) You basically want to add stuff to an existing structure:
> (a) by wrapping: to work seemlessly the outer tree should have similar 
>     enough properties to the inner tree (e.g. also be tcc_reference) to be 
>     used interchangably in most code, except that which needs to look at 
>     the added stuff.
> (b) by aggregating the stuff into the existing structure itself: if you 
>     need both structs (with and without stuff) the pure thing to do is to 
>     actually create two structs, once with, once without stuff.
> 2) the added stuff is optional
> 3) we have multiple things (all tcc_reference) to which to add stuff
> 4) all tcc_reference are tree_exp, which is variable number of operands,
>    which constrain things we can do naturally (e.g. we can't add stuff 
>    after tree_exp, except by constraining the number of operands)
> 
> Considering this it seems that aggregation is worse: you basically double 
> the number of structure types (at least conceptually, if you go with your 
> bit-idea).  So, some idea of wrapping seems more natural.
> 
> (I think your idea of aggregation but going with a bit flag to indicate if 
> this tcc_reference is or isn't annotated, and therefore has things 
> allocated after the variable number of operands, is a terrible hack)
> 
> There is another possibility doing something like your bit-flag 
> aggregation but with fewer hackery: if ao_ref would be a tree it could be 
> a normal operand of a tree_exp (and hence tcc_reference), just that the 
> number of operands then would vary depending on if it's annotated or not.
> 
> Making ao_ref into a tree would also enable the use of ANNOTATE_EXPR for a 
> generic wrapping tree.  (Currently its used only in very specific cases, 
> so ANNOTATE_EXPR handling would need to be extended all over, and as it's 
> not a tcc_reference it would probably rather mean to introduce a new 
> ANNOTATE_REF).
> 
> Anyway, with this:
> 
> struct tree_ref_annotation {
>   struct tree_base base;
>   struct ao_ref ao_ref;
> };
> 
> DEFTREECODE(TREE_MEM_ANNO, "mem_anno", tcc_exceptional, 0);
> 
> you could then add
> 
> DEFTREECODE(MEM_REF_A, "mem_ref_a", tcc_reference, 3);
> 
> where TREE_OPERAND(memref, 2) would then be a TREE_MEM_ANNO.  If we were 
> to add one operand slot to each tcc_reference we could even do without new 
> tree codes: the existence of an ao_ref would simply be indicated by 
> TREE_OPERAND(ref, position) being non-NULL.  It would of course enlargen 
> each tcc_reference by one pointer, but maybe we could move more things 
> into the above ref_annotation from somewhere else (points-to sets? scev 
> info?) and then trade saving memory there with this pointer which then 
> will more often be used.  (I think the eight bytes one has to pay for 
> wrapping the ao_ref into a tree by adding tree_base might be quite 
> acceptabable, or, if ao_ref would be a tree itself (not just wrapped into 
> one), then e.g. the ao_ref.volatile_p flag could be grabbed from tree_base 
> and free up the 8 bytes again)
> 
> So, at _this_ write-through of the email I think I like the above idea 
> best: make ao_ref be a tree (at least its storage, because it currently 
> is a one-member-function class), make ao_ref.volatile_p be 
> tree_base.volatile_flag (hence TREE_VOLATILE(ao_ref)) (this reduces 
> sizeof(ao_ref) by 8), increase all nr-of-operand of each tcc_reference by 
> 1, and make TREE_AO_REF(reftree) be "TREE_OPERAND(reftree, 
> TREE_CODE_LENGTH(reftree) - 1)", i.e. the last operand of such 
> tcc_reference tree.

Hmm.  I'm not sure that's really something I like - it's especially
quite some heavy lifting while at the same time lacking true boldness
as to changing the representation of memory refs ;)

> Hmm, food for thought :)

That said - I've prototyped the TREE_ASM_WRITTEN way now because it's
even simpler than the original TREE_AOREFWRAP approach, see below.

Note that I'm not embedding it into the tree structure, I'm merely
using the same allocation to store two objects, the outermost ref
and the ao_ref associated with it.  Quote:

+  size_t length = tree_code_size (TREE_CODE (lhs));
+  if (!TREE_ASM_WRITTEN (lhs))
+    {
+      tree alt_lhs
+       = ggc_alloc_cleared_tree_node_stat (length + sizeof (ao_ref));
+      memcpy (alt_lhs, lhs, length);
+      TREE_ASM_WRITTEN (alt_lhs) = 1;
+      *ref = new ((char *)alt_lhs + length) ao_ref;
+      ao_ref_init (*ref, alt_lhs);
+      stmt->op[0] = alt_lhs;


Richard.



From 97afd5e6bb32a0985a274ffe0cd26c8f935042ec Mon Sep 17 00:00:00 2001
From: Richard Biener <rguenther@suse.de>
Date: Thu, 14 Oct 2021 10:39:34 +0200
Subject: [PATCH] Introduce ao_ref cache in the IL
To: gcc-patches@gcc.gnu.org

This prototypes the alternate implementation suggested in
<ref> using TREE_ASM_WRITTEN on tcc_reference trees to denote
an adjacent ao_ref.  A complication I ran into when prototyping
is that we cannot re-allocate a component which has an SSA use
since there's no easy way to get to that to update the use pointer
in the immediate use structure.  So the prototype simply excuses
itself from caching anything on outermost components with a SSA
operand.

The patch misses strathegic unsetting of TREE_ASM_WRITTEN whenever
the cache becomes stale (both in the correctness and optimality sense)
but we didn't re-allocate the outermost component.  As optimization
we could use a 2nd bit to indicate the ao_ref storage is present but
unused.

Compared to the TREE_AOREFWRAP variant this survives building the
stage1 target libraries crashing "only" in IPA-CP when building
libgomp ...

The compile-time numbers are not as impressive (likely due to the
SSA operand issue), improving 7% (same as the other patch) and 10%
(less than the 19% seen in the other patch).  That means trying
to handle the SSA use adjustment might be worth.

Note that both patches only make light use of the approach and
specifically will not help the alias walks of DCE and DSE.

2021-10-14  Richard Biener  <rguenther@suse.de>

	* gimple.h (gimple_assign_lhs_with_ao_ref): New.
	* gimple.c (gimple_assign_lhs_with_ao_ref): Likewise.
	* tree-ssa-alias.c (stmt_may_clobber_ref_p_1): Use the
	ao_ref embedded in the stmts LHS if possible.
	* tree.c (copy_node): Unset TREE_ASM_WRITTEN on tcc_reference
	trees.
---
 gcc/gimple.c         | 52 ++++++++++++++++++++++++++++++++++++++++++++
 gcc/gimple.h         |  1 +
 gcc/tree-ssa-alias.c | 11 +++++++---
 gcc/tree.c           |  3 +++
 4 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple.c b/gcc/gimple.c
index cc7a88e822b..ceaa3ea3556 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -1758,6 +1758,58 @@ gimple_set_bb (gimple *stmt, basic_block bb)
 }
 
 
+/* Return lhs of a STMT with GIMPLE_SINGLE_RHS and initialize *REF
+   to the wrapping TREE_AOREFWRAP ao_ref or if rhs1 is not wrapped,
+   wrap it.  When wrapping is not profitable initialize *REF to NULL.  */
+
+tree
+gimple_assign_lhs_with_ao_ref (gassign *stmt, ao_ref **ref)
+{
+  tree lhs = stmt->op[0];
+  if (!REFERENCE_CLASS_P (lhs))
+    {
+      *ref = NULL;
+      return lhs;
+    }
+  /* ???  When we replace a component with SSA uses we have to
+     adjust the use pointer in the ssa_use_operand_t structure.
+     For now simply avoid this situation, doing a update_stmt here
+     would be super-ugly.
+     REALPART_EXPR, IMAGPART_EXPR, BIT_FIELD_REF and VIEW_CONVERT_EXPR
+     are always fine, MEM_REF and TARGET_MEM_REF are hardly profitable
+     to cache.  */
+  if (TREE_CODE (lhs) == MEM_REF
+      || TREE_CODE (lhs) == TARGET_MEM_REF
+      || (TREE_CODE (lhs) == COMPONENT_REF
+	  && TREE_OPERAND (lhs, 2) != NULL_TREE)
+      || ((TREE_CODE (lhs) == ARRAY_REF
+	   || TREE_CODE (lhs) == ARRAY_RANGE_REF)
+	  && (TREE_CODE (TREE_OPERAND (lhs, 1)) == SSA_NAME
+	      || (TREE_OPERAND (lhs, 2)
+		  && TREE_CODE (TREE_OPERAND (lhs, 2)) == SSA_NAME)
+	      || (TREE_OPERAND (lhs, 3)
+		  && TREE_CODE (TREE_OPERAND (lhs, 3)) == SSA_NAME))))
+    {
+      *ref = NULL;
+      return lhs;
+    }
+
+  size_t length = tree_code_size (TREE_CODE (lhs));
+  if (!TREE_ASM_WRITTEN (lhs))
+    {
+      tree alt_lhs
+	= ggc_alloc_cleared_tree_node_stat (length + sizeof (ao_ref));
+      memcpy (alt_lhs, lhs, length);
+      TREE_ASM_WRITTEN (alt_lhs) = 1;
+      *ref = new ((char *)alt_lhs + length) ao_ref;
+      ao_ref_init (*ref, alt_lhs);
+      stmt->op[0] = alt_lhs;
+      return alt_lhs;
+    }
+  *ref = reinterpret_cast <ao_ref *> ((char *)lhs + length);
+  return lhs;
+}
+
 /* Modify the RHS of the assignment pointed-to by GSI using the
    operands in the expression tree EXPR.
 
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 303623b3ced..5b18045e128 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -2654,6 +2654,7 @@ gimple_assign_rhs1 (const gimple *gs)
   return gimple_assign_rhs1 (ass);
 }
 
+extern tree gimple_assign_lhs_with_ao_ref (gassign *, ao_ref **);
 
 /* Return a pointer to the first operand on the RHS of assignment
    statement GS.  */
diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index ce667ff32b9..670d03748a8 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -3144,12 +3144,17 @@ stmt_may_clobber_ref_p_1 (gimple *stmt, ao_ref *ref, bool tbaa_p)
     }
   else if (gimple_assign_single_p (stmt))
     {
-      tree lhs = gimple_assign_lhs (stmt);
+      ao_ref *rp;
+      tree lhs = gimple_assign_lhs_with_ao_ref (as_a <gassign *> (stmt), &rp);
       if (TREE_CODE (lhs) != SSA_NAME)
 	{
 	  ao_ref r;
-	  ao_ref_init (&r, lhs);
-	  return refs_may_alias_p_1 (ref, &r, tbaa_p);
+	  if (!rp)
+	    {
+	      ao_ref_init (&r, lhs);
+	      rp = &r;
+	    }
+	  return refs_may_alias_p_1 (ref, rp, tbaa_p);
 	}
     }
   else if (gimple_code (stmt) == GIMPLE_ASM)
diff --git a/gcc/tree.c b/gcc/tree.c
index 7bfd64160f4..c0d43859235 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -1390,6 +1390,9 @@ copy_node (tree node MEM_STAT_DECL)
 	  TYPE_CACHED_VALUES (t) = NULL_TREE;
 	}
     }
+  else if (TREE_CODE_CLASS (code) == tcc_reference)
+    /* We did not and do not want to copy the adjacent ao_ref.  */
+    TREE_ASM_WRITTEN (t) = 0;
     else if (code == TARGET_OPTION_NODE)
       {
 	TREE_TARGET_OPTION (t) = ggc_alloc<struct cl_target_option>();
  
Michael Matz Oct. 14, 2021, 1:29 p.m. UTC | #3
Hello,

On Thu, 14 Oct 2021, Richard Biener wrote:

> > So, at _this_ write-through of the email I think I like the above idea 
> > best: make ao_ref be a tree (at least its storage, because it currently 
> > is a one-member-function class), make ao_ref.volatile_p be 
> > tree_base.volatile_flag (hence TREE_VOLATILE(ao_ref)) (this reduces 
> > sizeof(ao_ref) by 8), increase all nr-of-operand of each tcc_reference by 
> > 1, and make TREE_AO_REF(reftree) be "TREE_OPERAND(reftree, 
> > TREE_CODE_LENGTH(reftree) - 1)", i.e. the last operand of such 
> > tcc_reference tree.
> 
> Hmm.  I'm not sure that's really something I like - it's especially
> quite some heavy lifting while at the same time lacking true boldness
> as to changing the representation of memory refs ;)

Well, it would at least enable such changes later in an orderly fashion.

> That said - I've prototyped the TREE_ASM_WRITTEN way now because it's 
> even simpler than the original TREE_AOREFWRAP approach, see below.
> 
> Note that I'm not embedding it into the tree structure, I'm merely
> using the same allocation to store two objects, the outermost ref
> and the ao_ref associated with it.  Quote:
> 
> +  size_t length = tree_code_size (TREE_CODE (lhs));
> +  if (!TREE_ASM_WRITTEN (lhs))
> +    {
> +      tree alt_lhs
> +       = ggc_alloc_cleared_tree_node_stat (length + sizeof (ao_ref));
> +      memcpy (alt_lhs, lhs, length);
> +      TREE_ASM_WRITTEN (alt_lhs) = 1;
> +      *ref = new ((char *)alt_lhs + length) ao_ref;

You need to ensure that alt_lhs+length is properly aligned for ao_ref, but 
yeah, for a hack that works.  If you really want to go that way you need 
good comments about this hack.  It's really somewhat worrisome that the 
size of the allocation depends on a bit in tree_base.

(It's a really cute hack that works as a micro optimization, the question 
is, do we really need to go there already, are all other less hacky 
approaches not bringing similar improvements?  The cuter the hacks the 
less often they pay off in the long run of production software :) )


Ciao,
Michael.
  
Richard Sandiford Oct. 18, 2021, 9:58 a.m. UTC | #4
Michael Matz via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hello,
>
> On Thu, 14 Oct 2021, Richard Biener wrote:
>
>> > So, at _this_ write-through of the email I think I like the above idea 
>> > best: make ao_ref be a tree (at least its storage, because it currently 
>> > is a one-member-function class), make ao_ref.volatile_p be 
>> > tree_base.volatile_flag (hence TREE_VOLATILE(ao_ref)) (this reduces 
>> > sizeof(ao_ref) by 8), increase all nr-of-operand of each tcc_reference by 
>> > 1, and make TREE_AO_REF(reftree) be "TREE_OPERAND(reftree, 
>> > TREE_CODE_LENGTH(reftree) - 1)", i.e. the last operand of such 
>> > tcc_reference tree.
>> 
>> Hmm.  I'm not sure that's really something I like - it's especially
>> quite some heavy lifting while at the same time lacking true boldness
>> as to changing the representation of memory refs ;)
>
> Well, it would at least enable such changes later in an orderly fashion.
>
>> That said - I've prototyped the TREE_ASM_WRITTEN way now because it's 
>> even simpler than the original TREE_AOREFWRAP approach, see below.
>> 
>> Note that I'm not embedding it into the tree structure, I'm merely
>> using the same allocation to store two objects, the outermost ref
>> and the ao_ref associated with it.  Quote:
>> 
>> +  size_t length = tree_code_size (TREE_CODE (lhs));
>> +  if (!TREE_ASM_WRITTEN (lhs))
>> +    {
>> +      tree alt_lhs
>> +       = ggc_alloc_cleared_tree_node_stat (length + sizeof (ao_ref));
>> +      memcpy (alt_lhs, lhs, length);
>> +      TREE_ASM_WRITTEN (alt_lhs) = 1;
>> +      *ref = new ((char *)alt_lhs + length) ao_ref;
>
> You need to ensure that alt_lhs+length is properly aligned for ao_ref, but 
> yeah, for a hack that works.  If you really want to go that way you need 
> good comments about this hack.  It's really somewhat worrisome that the 
> size of the allocation depends on a bit in tree_base.
>
> (It's a really cute hack that works as a micro optimization, the question 
> is, do we really need to go there already, are all other less hacky 
> approaches not bringing similar improvements?  The cuter the hacks the 
> less often they pay off in the long run of production software :) )

FWIW, having been guilty of adding a similar hack(?) to SYMBOL_REFs
for block_symbol, I like the approach of concatenating/combining structures
based on flags.  The main tree and rtl types have too much baggage and
so I think there are some things that are better represented outside
of them.

I suppose cselib VALUE rtxes are also similar, although they're more
of a special case, since cselib data doesn't survive between passes.

Thanks,
Richard
  
Michael Matz Oct. 18, 2021, 1:14 p.m. UTC | #5
Hello,

On Mon, 18 Oct 2021, Richard Sandiford wrote:

> > (It's a really cute hack that works as a micro optimization, the question 
> > is, do we really need to go there already, are all other less hacky 
> > approaches not bringing similar improvements?  The cuter the hacks the 
> > less often they pay off in the long run of production software :) )
> 
> FWIW, having been guilty of adding a similar hack(?) to SYMBOL_REFs
> for block_symbol, I like the approach of concatenating/combining structures
> based on flags.

The problem is that if you unset the flag you can't free the (now useless) 
storage.  What's worse is that you can't even reuse it anymore, because 
you lost the knowledge that it exists (except if you want to use another 
flag to note that).  It's of course obvious, but it helps to spell that 
out if we want to argue about ...

> The main tree and rtl types have too much baggage and

... baggage.  What you actually gain by associating different info pieces 
by address (e.g. concatenate allocations) is that you don't need to refer 
to one from the other, that's the space you spare, not anything inherent 
in the structures (which remain to have the members they would have 
anyway).  So, you basically trade one pointer (or index), which would 
possibly be optional, with address association and inflexibility (with the 
impossibility to manage both pieces individually: you can't free the 
second piece, and you can't add the second piece post-allocation).  It 
might be a good trade off sometimes, but in the abstract it's not a good 
design.

Regarding trees and space: to make something a tree you need 8 bytes and 
get a number of flags, and an arbitrary 4-byte blob in return.  I don't 
see that as much baggage.  We could reduce it further by splitting the 
arbitrary union and the tree_code+flags parts.  Especially for things 
referred to from tree_exp it makes sense to try making them trees 
themself.

> so I think there are some things that are better represented outside
> of them.
> 
> I suppose cselib VALUE rtxes are also similar, although they're more
> of a special case, since cselib data doesn't survive between passes.


Ciao,
Michael.
  
Richard Biener Oct. 18, 2021, 1:30 p.m. UTC | #6
On Mon, 18 Oct 2021, Michael Matz wrote:

> Hello,
> 
> On Mon, 18 Oct 2021, Richard Sandiford wrote:
> 
> > > (It's a really cute hack that works as a micro optimization, the question 
> > > is, do we really need to go there already, are all other less hacky 
> > > approaches not bringing similar improvements?  The cuter the hacks the 
> > > less often they pay off in the long run of production software :) )
> > 
> > FWIW, having been guilty of adding a similar hack(?) to SYMBOL_REFs
> > for block_symbol, I like the approach of concatenating/combining structures
> > based on flags.
> 
> The problem is that if you unset the flag you can't free the (now useless) 
> storage.  What's worse is that you can't even reuse it anymore, because 
> you lost the knowledge that it exists (except if you want to use another 
> flag to note that).

Yes, I suspect in the end I'd use two bits to optimize this case.

> It's of course obvious, but it helps to spell that 
> out if we want to argue about ...
> 
> > The main tree and rtl types have too much baggage and
> 
> ... baggage.  What you actually gain by associating different info pieces 
> by address (e.g. concatenate allocations) is that you don't need to refer 
> to one from the other, that's the space you spare, not anything inherent 
> in the structures (which remain to have the members they would have 
> anyway).  So, you basically trade one pointer (or index), which would 
> possibly be optional, with address association and inflexibility (with the 
> impossibility to manage both pieces individually: you can't free the 
> second piece, and you can't add the second piece post-allocation).  It 
> might be a good trade off sometimes, but in the abstract it's not a good 
> design.
> 
> Regarding trees and space: to make something a tree you need 8 bytes and 
> get a number of flags, and an arbitrary 4-byte blob in return.  I don't 
> see that as much baggage.  We could reduce it further by splitting the 
> arbitrary union and the tree_code+flags parts.  Especially for things 
> referred to from tree_exp it makes sense to try making them trees 
> themself.

So the main issue is that I consider none of the discussed approaches
nice (or well-designed), so I went for the one that appears to be
least intrusive (the concatenating and bit-indication).

That said, I'm probably going to codify the on-the-side 
(optional) hashtable variant as well which is at least well-designed
but might have a disadvantage in the larger constant overhead and
principle difficulties in carrying info across passes and a necessarily
more explicit invalidation API.  Note all prototypes missed the
verification part (that info is not stale and reasonably up-to-date).

The real answer might of course be to invent the "proper" MEM_REF
tree that has fast access to ao_ref-style info as well as being
able to encode the important parts of the access path.

Richard.
  

Patch

diff --git a/gcc/gimple.c b/gcc/gimple.c
index cc7a88e822b..070304b64bf 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -1758,6 +1758,60 @@  gimple_set_bb (gimple *stmt, basic_block bb)
 }
 
 
+/* Return rhs1 of a STMT with GIMPLE_SINGLE_RHS and initialize *REF
+   to the wrapping TREE_AOREFWRAP ao_ref or if rhs1 is not wrapped,
+   wrap it.  When wrapping is not profitable initialize *REF to NULL.  */
+
+tree
+gimple_assign_rhs1_with_ao_ref (gassign *stmt, ao_ref **ref)
+{
+  tree rhs1 = stmt->op[1];
+  if (!REFERENCE_CLASS_P (rhs1))
+    {
+      *ref = NULL;
+      return rhs1;
+    }
+  if (TREE_CODE (rhs1) != TREE_AOREFWRAP)
+    {
+      rhs1 = build1 (TREE_AOREFWRAP, TREE_TYPE (rhs1), rhs1);
+      static_assert (sizeof (ao_ref) == sizeof (rhs1->aorefwrap.ref_storage),
+		     "wrong aorefwrap storage size");
+      *ref = new (&rhs1->aorefwrap.ref_storage) ao_ref;
+      ao_ref_init (*ref, TREE_OPERAND (rhs1, 0));
+      stmt->op[1] = rhs1;
+      return TREE_OPERAND (rhs1, 0);
+    }
+  *ref = reinterpret_cast <ao_ref *> (&rhs1->aorefwrap.ref_storage);
+  return TREE_OPERAND (rhs1, 0);
+}
+
+/* Return lhs of a STMT with GIMPLE_SINGLE_RHS and initialize *REF
+   to the wrapping TREE_AOREFWRAP ao_ref or if rhs1 is not wrapped,
+   wrap it.  When wrapping is not profitable initialize *REF to NULL.  */
+
+tree
+gimple_assign_lhs_with_ao_ref (gassign *stmt, ao_ref **ref)
+{
+  tree lhs = stmt->op[0];
+  if (!REFERENCE_CLASS_P (lhs))
+    {
+      *ref = NULL;
+      return lhs;
+    }
+  if (TREE_CODE (lhs) != TREE_AOREFWRAP)
+    {
+      lhs = build1 (TREE_AOREFWRAP, TREE_TYPE (lhs), lhs);
+      static_assert (sizeof (ao_ref) == sizeof (lhs->aorefwrap.ref_storage),
+		     "wrong aorefwrap storage size");
+      *ref = new (&lhs->aorefwrap.ref_storage) ao_ref;
+      ao_ref_init (*ref, TREE_OPERAND (lhs, 0));
+      stmt->op[0] = lhs;
+      return TREE_OPERAND (lhs, 0);
+    }
+  *ref = reinterpret_cast <ao_ref *> (&lhs->aorefwrap.ref_storage);
+  return TREE_OPERAND (lhs, 0);
+}
+
 /* Modify the RHS of the assignment pointed-to by GSI using the
    operands in the expression tree EXPR.
 
@@ -2094,7 +2148,13 @@  gimple_copy (gimple *stmt)
 
   /* Make copy of operands.  */
   for (i = 0; i < num_ops; i++)
-    gimple_set_op (copy, i, unshare_expr (gimple_op (stmt, i)));
+    {
+      tree op = gimple_op (stmt, i);
+      /* When we unshare OP the references in the ao_ref get stale.  */
+      if (op && TREE_CODE (op) == TREE_AOREFWRAP)
+	op = TREE_OPERAND (op, 0);
+      gimple_set_op (copy, i, unshare_expr (op));
+    }
 
   if (gimple_has_mem_ops (stmt))
     {
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 303623b3ced..e3fcce5ab60 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -2593,7 +2593,10 @@  get_gimple_rhs_class (enum tree_code code)
 static inline tree
 gimple_assign_lhs (const gassign *gs)
 {
-  return gs->op[0];
+  tree lhs = gs->op[0];
+  if (TREE_CODE (lhs) == TREE_AOREFWRAP)
+    lhs = TREE_OPERAND (lhs, 0);
+  return lhs;
 }
 
 static inline tree
@@ -2644,7 +2647,10 @@  gimple_assign_set_lhs (gimple *gs, tree lhs)
 static inline tree
 gimple_assign_rhs1 (const gassign *gs)
 {
-  return gs->op[1];
+  tree rhs1 = gs->op[1];
+  if (TREE_CODE (rhs1) == TREE_AOREFWRAP)
+    rhs1 = TREE_OPERAND (rhs1, 0);
+  return rhs1;
 }
 
 static inline tree
@@ -2654,6 +2660,8 @@  gimple_assign_rhs1 (const gimple *gs)
   return gimple_assign_rhs1 (ass);
 }
 
+extern tree gimple_assign_lhs_with_ao_ref (gassign *, ao_ref **);
+extern tree gimple_assign_rhs1_with_ao_ref (gassign *, ao_ref **);
 
 /* Return a pointer to the first operand on the RHS of assignment
    statement GS.  */
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index d3d2a8d812f..437888181a5 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1551,6 +1551,13 @@  struct GTY(()) tree_exp {
     operands[1];
 };
 
+struct GTY(()) tree_aorefwrap {
+  /* We rely on exp.operands having one declared element.  */
+  struct tree_exp exp;
+  /* We could move the definition of ao_ref here or to coretypes.h.  */
+  unsigned char ref_storage[/* sizeof (ao_ref) */ 2 * sizeof (tree) + 3 * sizeof (poly_int64) + 2 * sizeof (alias_set_type) + sizeof (tree)/*bool*/];
+};
+
 /* Immediate use linking structure.  This structure is used for maintaining
    a doubly linked list of uses of an SSA_NAME.  */
 struct GTY(()) ssa_use_operand_t {
@@ -2058,6 +2065,7 @@  union GTY ((ptr_alias (union lang_tree_node),
   struct tree_list GTY ((tag ("TS_LIST"))) list;
   struct tree_vec GTY ((tag ("TS_VEC"))) vec;
   struct tree_exp GTY ((tag ("TS_EXP"))) exp;
+  struct tree_aorefwrap GTY ((tag ("TS_AOREFWRAP"))) aorefwrap;
   struct tree_ssa_name GTY ((tag ("TS_SSA_NAME"))) ssa_name;
   struct tree_block GTY ((tag ("TS_BLOCK"))) block;
   struct tree_binfo GTY ((tag ("TS_BINFO"))) binfo;
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index e292a144967..7b4e701b2d5 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -1029,6 +1029,10 @@  remap_gimple_op_r (tree *tp, int *walk_subtrees, void *data)
   bool is_lhs = wi_p->is_lhs;
   wi_p->is_lhs = false;
 
+  /* Instead of possibly re-mapping ao_ref ref/base simply drop the cache.  */
+  if (TREE_CODE (*tp) == TREE_AOREFWRAP)
+    *tp = TREE_OPERAND (*tp, 0);
+
   if (TREE_CODE (*tp) == SSA_NAME)
     {
       *tp = remap_ssa_name (*tp, id);
diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index ce667ff32b9..670d03748a8 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -3144,12 +3144,17 @@  stmt_may_clobber_ref_p_1 (gimple *stmt, ao_ref *ref, bool tbaa_p)
     }
   else if (gimple_assign_single_p (stmt))
     {
-      tree lhs = gimple_assign_lhs (stmt);
+      ao_ref *rp;
+      tree lhs = gimple_assign_lhs_with_ao_ref (as_a <gassign *> (stmt), &rp);
       if (TREE_CODE (lhs) != SSA_NAME)
 	{
 	  ao_ref r;
-	  ao_ref_init (&r, lhs);
-	  return refs_may_alias_p_1 (ref, &r, tbaa_p);
+	  if (!rp)
+	    {
+	      ao_ref_init (&r, lhs);
+	      rp = &r;
+	    }
+	  return refs_may_alias_p_1 (ref, rp, tbaa_p);
 	}
     }
   else if (gimple_code (stmt) == GIMPLE_ASM)
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 4b187c2cdaf..ae940576e19 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -1500,6 +1500,9 @@  gather_mem_refs_stmt (class loop *loop, gimple *stmt)
 	 make sure we can canonicalize the ref in the hashtable if
 	 non-operand_equal_p refs are found.  For the lookup we mark
 	 the case we want strict equality with aor.max_size == -1.  */
+      tree *orig_mem = mem;
+      if (TREE_CODE (*mem) == TREE_AOREFWRAP)
+	mem = &TREE_OPERAND (*mem, 0);
       ao_ref aor;
       ao_ref_init (&aor, *mem);
       ao_ref_base (&aor);
@@ -1603,7 +1606,7 @@  gather_mem_refs_stmt (class loop *loop, gimple *stmt)
 	    }
 	}
 
-      record_mem_ref_loc (ref, stmt, mem);
+      record_mem_ref_loc (ref, stmt, orig_mem);
     }
   if (is_stored)
     {
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 4a498abe3b0..579cdce2d23 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -2302,6 +2302,9 @@  find_interesting_uses_address (struct ivopts_data *data, gimple *stmt,
   if (gimple_has_volatile_ops (stmt))
     goto fail;
 
+  if (TREE_CODE (base) == TREE_AOREFWRAP)
+    base = TREE_OPERAND (base, 0);
+
   /* Ignore bitfields for now.  Not really something terribly complicated
      to handle.  TODO.  */
   if (TREE_CODE (base) == BIT_FIELD_REF)
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 1bbf2f1fb2c..10ba24c5f07 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -600,6 +600,7 @@  for_each_index (tree *addr_p, bool (*cbck) (tree, tree *, void *), void *data)
 	case VIEW_CONVERT_EXPR:
 	case REALPART_EXPR:
 	case IMAGPART_EXPR:
+	case TREE_AOREFWRAP:
 	  nxt = &TREE_OPERAND (*addr_p, 0);
 	  break;
 
diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c
index ebf7eea3b04..4c7910518c4 100644
--- a/gcc/tree-ssa-operands.c
+++ b/gcc/tree-ssa-operands.c
@@ -826,6 +826,7 @@  operands_scanner::get_expr_operands (tree *expr_p, int flags)
     case COMPONENT_REF:
     case REALPART_EXPR:
     case IMAGPART_EXPR:
+    case TREE_AOREFWRAP:
       {
 	if (!(flags & opf_no_vops)
 	    && TREE_THIS_VOLATILE (expr))
diff --git a/gcc/tree-streamer.c b/gcc/tree-streamer.c
index 5a8406ffb3d..e23844bdfef 100644
--- a/gcc/tree-streamer.c
+++ b/gcc/tree-streamer.c
@@ -81,6 +81,7 @@  streamer_check_handled_ts_structures (void)
   handled_p[TS_LIST] = true;
   handled_p[TS_VEC] = true;
   handled_p[TS_EXP] = true;
+  handled_p[TS_AOREFWRAP] = true;
   handled_p[TS_SSA_NAME] = true;
   handled_p[TS_BLOCK] = true;
   handled_p[TS_BINFO] = true;
diff --git a/gcc/tree.c b/gcc/tree.c
index 7bfd64160f4..c6e82962c5c 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -480,10 +480,13 @@  tree_node_structure_for_code (enum tree_code code)
 
     case tcc_type:		return TS_TYPE_NON_COMMON;
 
+    case tcc_reference:
+      if (code == TREE_AOREFWRAP)
+	return TS_AOREFWRAP;
+      /* Fallthru.  */
     case tcc_binary:
     case tcc_comparison:
     case tcc_expression:
-    case tcc_reference:
     case tcc_statement:
     case tcc_unary:
     case tcc_vl_exp:		return TS_EXP;
@@ -624,6 +627,10 @@  initialize_tree_contains_struct (void)
 	  MARK_TS_DECL_COMMON (code);
 	  break;
 
+	case TS_AOREFWRAP:
+	  MARK_TS_EXP (code);
+	  break;
+
 	default:
 	  gcc_unreachable ();
 	}
@@ -998,6 +1005,9 @@  tree_code_size (enum tree_code code)
 	}
 
     case tcc_reference:   /* a reference */
+      if (code == TREE_AOREFWRAP)
+	return sizeof (struct tree_aorefwrap);
+      /* Fallthru. */
     case tcc_expression:  /* an expression */
     case tcc_statement:   /* an expression with side effects */
     case tcc_comparison:  /* a comparison expression */
@@ -4887,7 +4897,8 @@  build0 (enum tree_code code, tree tt MEM_STAT_DECL)
 tree
 build1 (enum tree_code code, tree type, tree node MEM_STAT_DECL)
 {
-  int length = sizeof (struct tree_exp);
+  /* ???  This was all tree_exp.  */
+  size_t length = tree_code_size (code);
   tree t;
 
   record_node_allocation_statistics (code, length);
diff --git a/gcc/tree.def b/gcc/tree.def
index e27bc3e2b1f..d09c77e7e4a 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -461,6 +461,10 @@  DEFTREECODE (IMAGPART_EXPR, "imagpart_expr", tcc_reference, 1)
    generating insns.  */
 DEFTREECODE (VIEW_CONVERT_EXPR, "view_convert_expr", tcc_reference, 1)
 
+/* Outermost wrapper around a reference caching an ao_ref for it.  The
+   only operand is the reference itself.  */
+DEFTREECODE (TREE_AOREFWRAP, "aoref_wrap", tcc_reference, 1)
+
 /* C unary `*'.  One operand, an expression for a pointer.  */
 DEFTREECODE (INDIRECT_REF, "indirect_ref", tcc_reference, 1)
 
diff --git a/gcc/treestruct.def b/gcc/treestruct.def
index 822fba17068..aab962a913b 100644
--- a/gcc/treestruct.def
+++ b/gcc/treestruct.def
@@ -61,6 +61,7 @@  DEFTREESTRUCT(TS_TYPE_NON_COMMON, "type non-common")
 DEFTREESTRUCT(TS_LIST, "list")
 DEFTREESTRUCT(TS_VEC, "vec")
 DEFTREESTRUCT(TS_EXP, "exp")
+DEFTREESTRUCT(TS_AOREFWRAP, "aorefwrap")
 DEFTREESTRUCT(TS_SSA_NAME, "ssa name")
 DEFTREESTRUCT(TS_BLOCK, "block")
 DEFTREESTRUCT(TS_BINFO, "binfo")