[RFC,7/9] hwasan: add support for generating MTE instructions for memory tagging

Message ID 20241107213937.362703-8-indu.bhagat@oracle.com
State New
Headers
Series Add -fsanitize=memtag |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 warning Skipped upon request
linaro-tcwg-bot/tcwg_gcc_build--master-arm warning Skipped upon request

Commit Message

Indu Bhagat Nov. 7, 2024, 9:39 p.m. UTC
  Memory tagging is used for detecting memory safety bugs.  On AArch64, the
memory tagging extension (MTE) helps in reducing the overheads of memory
tagging:
 - CPU: MTE instructions for efficiently tagging and untagging memory.
 - Memory: New memory type, Normal Tagged Memory, added to the Arm
   Architecture.

The MEMory TAGging (MEMTAG) sanitizer uses the same infrastructure as
HWASAN.  MEMTAG and HWASAN are both hardware-assisted solutions, and
rely on the same sanitizer machinery in parts.  So, define new
constructs that allow MEMTAG and HWASAN to share the infrastructure:

  - hwassist_sanitize_p () is true when either SANITIZE_MEMTAG or
    SANITIZE_HWASAN is true.
  - hwassist_sanitize_stack_p () is when hwassist_sanitize_p () and
    stack variables are to be sanitized.

MEMTAG and HWASAN do have differences, however, and hence, the need to
conditionalize using memtag_sanitize_p () in the relevant places. E.g.,

  - Instead of generating the libcall __hwasan_tag_memory, MEMTAG needs
    to invoke the target-specific hook TARGET_MEMTAG_TAG_MEMORY to tag
    memory.  Similar approach can be seen for handling
    handle_builtin_alloca, where instead of doing the gimple
    transformations, target hooks are used.

  - Add a new internal function HWASAN_ALLOCA_POISON to handle
    dynamically allocated stack when MEMTAG sanitizer is enabled. At
    expansion, this allows to, in turn, invoke target-hooks to increment
    tag, and use the generated tag to finally tag the dynamically allocated
    memory.

TBD:
 - Not sure if we really need param_memtag_instrument_mem_intrinsics
   explicitly.
 - Conditionalizing using hwassist_sanitize_p (), memtag_sanitize_p ()
   etc looks unappetizing in some cases.  Not sure if there is a better
   way.  Is this generally the right thing to do, or is there some
   desirable refactorings.
 - adding decl to hwasan_stack_var. double check if this is necessary.
   See how we update the RTL for decl at expand_one_stack_var_at. And
   then use the RTL for decl in hwasan_emit_prologue
 - In hwasan_frame_base (), see if checking for memtag_sanitize_p () for
   force_reg etc is really necessary.  Revisit to see what gives, fix or
   add documentation.
 - Error out if user specifies stack alloc alignment not a factor of 16 ?

gcc/ChangeLog:

	* asan.cc (struct hwasan_stack_var):
	(handle_builtin_stack_restore): Accommodate MEMTAG sanitizer.
	(handle_builtin_alloca): Expand differently if MEMTAG sanitizer.
	(get_mem_refs_of_builtin_call): Include MEMTAG along with
	HWASAN.
	(memtag_sanitize_stack_p): New definition.
	(memtag_sanitize_allocas_p): Likewise.
	(memtag_memintrin): Likewise.
	(hwassist_sanitize_p): Likewise.
	(hwassist_sanitize_stack_p): Likewise.
	(report_error_func): Include MEMTAG along with HWASAN.
	(build_check_stmt): Likewise.
	(instrument_derefs): MEMTAG too does not deal with globals yet.
	(instrument_builtin_call):
	(maybe_instrument_call): Include MEMTAG along with HWASAN.
	(asan_expand_mark_ifn): Likewise.
	(asan_expand_check_ifn): Likewise.
	(asan_expand_poison_ifn): Expand differently if MEMTAG sanitizer.
	(asan_instrument):
	(hwasan_frame_base):
	(hwasan_record_stack_var):
	(hwasan_emit_prologue): Expand differently if MEMTAG sanitizer.
	(hwasan_emit_untag_frame): Likewise.
	* asan.h (hwasan_record_stack_var):
	(memtag_sanitize_stack_p): New declaration.
	(memtag_sanitize_allocas_p): Likewise.
	(hwassist_sanitize_p): Likewise.
	(hwassist_sanitize_stack_p): Likewise.
	(asan_sanitize_use_after_scope): Include MEMTAG along with
	HWASAN.
	* cfgexpand.cc (align_local_variable): Likewise.
	(expand_one_stack_var_at): Likewise.
	(expand_stack_vars): Likewise.
	(expand_one_stack_var_1): Likewise.
	(init_vars_expansion): Likewise.
	(expand_used_vars): Likewise.
	(pass_expand::execute): Likewise.
	* gimplify.cc (asan_poison_variable): Likewise.
	* internal-fn.cc (expand_HWASAN_ALLOCA_POISON): New definition.
	(expand_HWASAN_ALLOCA_UNPOISON): Expand differently if MEMTAG
	sanitizer.
	(expand_HWASAN_MARK): Likewise.
	* internal-fn.def (HWASAN_ALLOCA_POISON): Define new.
	* params.opt: Document new param. FIXME.
	* sanopt.cc (pass_sanopt::execute): Include MEMTAG along with
	HWASAN.
---
 gcc/asan.cc         | 236 +++++++++++++++++++++++++++++++++-----------
 gcc/asan.h          |   9 +-
 gcc/cfgexpand.cc    |  37 +++----
 gcc/gimplify.cc     |   5 +-
 gcc/internal-fn.cc  |  69 +++++++++++--
 gcc/internal-fn.def |   1 +
 gcc/params.opt      |   4 +
 gcc/sanopt.cc       |   2 +-
 8 files changed, 275 insertions(+), 88 deletions(-)
  

Patch

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 95a9009011f7..92c16c67c7e5 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -299,6 +299,7 @@  static GTY(()) rtx_insn *hwasan_frame_base_init_seq = NULL;
    tagged_base).  */
 struct hwasan_stack_var
 {
+  tree decl;
   rtx untagged_base;
   rtx tagged_base;
   poly_int64 nearest_offset;
@@ -707,14 +708,15 @@  static void
 handle_builtin_stack_restore (gcall *call, gimple_stmt_iterator *iter)
 {
   if (!iter
-      || !(asan_sanitize_allocas_p () || hwasan_sanitize_allocas_p ()))
+      || !(asan_sanitize_allocas_p () || hwasan_sanitize_allocas_p ()
+	   || memtag_sanitize_allocas_p ()))
     return;
 
   tree restored_stack = gimple_call_arg (call, 0);
 
   gimple *g;
 
-  if (hwasan_sanitize_allocas_p ())
+  if (hwasan_sanitize_allocas_p () || memtag_sanitize_allocas_p ())
     {
       enum internal_fn fn = IFN_HWASAN_ALLOCA_UNPOISON;
       /* There is only one piece of information `expand_HWASAN_ALLOCA_UNPOISON`
@@ -763,7 +765,8 @@  static void
 handle_builtin_alloca (gcall *call, gimple_stmt_iterator *iter)
 {
   if (!iter
-      || !(asan_sanitize_allocas_p () || hwasan_sanitize_allocas_p ()))
+      || !(asan_sanitize_allocas_p () || hwasan_sanitize_allocas_p ()
+	   || memtag_sanitize_allocas_p ()))
     return;
 
   gassign *g;
@@ -787,22 +790,30 @@  handle_builtin_alloca (gcall *call, gimple_stmt_iterator *iter)
       e = find_fallthru_edge (gsi_bb (*iter)->succs);
     }
 
-  if (hwasan_sanitize_allocas_p ())
+  if (hwasan_sanitize_allocas_p () || memtag_sanitize_allocas_p ())
     {
       gimple_seq stmts = NULL;
       location_t loc = gimple_location (gsi_stmt (*iter));
       /*
-	 HWASAN needs a different expansion.
+	 HWASAN and MEMTAG need a different expansion.
 
 	 addr = __builtin_alloca (size, align);
 
-	 should be replaced by
+	 in case of HWASAN, should be replaced by
 
 	 new_size = size rounded up to HWASAN_TAG_GRANULE_SIZE byte alignment;
 	 untagged_addr = __builtin_alloca (new_size, align);
 	 tag = __hwasan_choose_alloca_tag ();
 	 addr = ifn_HWASAN_SET_TAG (untagged_addr, tag);
 	 __hwasan_tag_memory (untagged_addr, tag, new_size);
+
+	 in case of MEMTAG, should be replaced by
+
+	 new_size = size rounded up to HWASAN_TAG_GRANULE_SIZE byte alignment;
+	 untagged_addr = __builtin_alloca (new_size, align);
+	 addr = ifn_HWASAN_ALLOCA_POISON (untagged_addr, new_size);
+
+	 where a new tag is chosen when HWASAN_ALLOCA_POISON is expanded.
 	*/
       /* Ensure alignment at least HWASAN_TAG_GRANULE_SIZE bytes so we start on
 	 a tag granule.  */
@@ -819,23 +830,30 @@  handle_builtin_alloca (gcall *call, gimple_stmt_iterator *iter)
 			as_combined_fn (BUILT_IN_ALLOCA_WITH_ALIGN), ptr_type,
 			new_size, build_int_cst (size_type_node, align));
 
-      /* Choose the tag.
-	 Here we use an internal function so we can choose the tag at expand
-	 time.  We need the decision to be made after stack variables have been
-	 assigned their tag (i.e. once the hwasan_frame_tag_offset variable has
-	 been set to one after the last stack variables tag).  */
-      tree tag = gimple_build (&stmts, loc, CFN_HWASAN_CHOOSE_TAG,
-			       unsigned_char_type_node);
+      tree addr;
 
-      /* Add tag to pointer.  */
-      tree addr
-	= gimple_build (&stmts, loc, CFN_HWASAN_SET_TAG, ptr_type,
-			untagged_addr, tag);
-
-      /* Tag shadow memory.
-	 NOTE: require using `untagged_addr` here for libhwasan API.  */
-      gimple_build (&stmts, loc, as_combined_fn (BUILT_IN_HWASAN_TAG_MEM),
-		    void_type_node, untagged_addr, tag, new_size);
+      if (memtag_sanitize_p ())
+	addr = gimple_build (&stmts, loc, CFN_HWASAN_ALLOCA_POISON, ptr_type,
+			     untagged_addr, new_size);
+      else
+	{
+	  /* Choose the tag.
+	     Here we use an internal function so we can choose the tag at expand
+	     time.  We need the decision to be made after stack variables have been
+	     assigned their tag (i.e. once the hwasan_frame_tag_offset variable has
+	     been set to one after the last stack variables tag).  */
+	  tree tag = gimple_build (&stmts, loc, CFN_HWASAN_CHOOSE_TAG,
+				   unsigned_char_type_node);
+
+	  /* Add tag to pointer.  */
+	  addr = gimple_build (&stmts, loc, CFN_HWASAN_SET_TAG, ptr_type,
+			       untagged_addr, tag);
+
+	  /* Tag shadow memory.
+	     NOTE: require using `untagged_addr` here for libhwasan API.  */
+	  gimple_build (&stmts, loc, as_combined_fn (BUILT_IN_HWASAN_TAG_MEM),
+			void_type_node, untagged_addr, tag, new_size);
+	}
 
       /* Insert the built up code sequence into the original instruction stream
 	 the iterator points to.  */
@@ -1049,7 +1067,7 @@  get_mem_refs_of_builtin_call (gcall *call,
 	 for now we choose to just ignore `strlen` calls.
 	 This decision was simply made because that means the special case is
 	 limited to this one case of this one function.  */
-      if (hwasan_sanitize_p ())
+      if (hwassist_sanitize_p ())
 	return false;
       source0 = gimple_call_arg (call, 0);
       len = gimple_call_lhs (call);
@@ -1832,9 +1850,39 @@  hwasan_memintrin (void)
   return (hwasan_sanitize_p () && param_hwasan_instrument_mem_intrinsics);
 }
 
-/* MEMoryTAGging sanitizer (memtag) uses a hardware based capability known as
-   memory tagging to detect memory safety vulnerabilities.  Similar to hwasan,
-   it is also a probabilistic method.  */
+/* MEMoryTAGging sanitizer (MEMTAG) uses a hardware based capability known as
+   memory tagging to detect memory safety vulnerabilities.  Similar to HWASAN,
+   it is also a probabilistic method.
+
+   MEMTAG relies on the optional extension in armv8.5a, known as MTE (Memory
+   Tagging Extension).  The extension is available in AARCH64 only and
+   introduces two types of tags:
+     - Logical Address Tag - bits 56-59 (TARGET_MEMTAG_TAG_SIZE) of the virtual
+       address.
+     - Allocation Tag - 4 bits for each tag granule (TARGET_MEMTAG_GRANULE_SIZE
+       set to 16 bytes), stored separately.
+   Load / store instructions raise an exception if tags differ, thereby
+   providing a faster way (than HWASAN) to detect memory safety issues.
+   Further, new instructions are available in MTE to manipulate (generate,
+   update address with) tags.  Load / store instructions with SP base register
+   and immediate offset do not check tags.
+
+   PS: Currently, MEMTAG sanitizer is capable of stack (variable / memory)
+   tagging only.
+
+   In general, detecting stack-related memory bugs requires the compiler to:
+     - ensure that each tag granule is only used by one variable at a time.
+       This includes alloca.
+     - Tag/Color: put tags into each stack variable pointer.
+     - Untag: the function epilogue will retag the memory.
+
+   MEMTAG sanitizer is based off the HWASAN sanitizer implementation
+   internally.  Similar to HWASAN:
+     - Assigning an independently random tag to each variable is carried out by
+       keeping a tagged base pointer.  A tagged base pointer allows addressing
+       variables with (addr offset, tag offset).
+     - TBD
+   */
 
 /* Returns whether we are tagging pointers and checking those tags on memory
    access.  */
@@ -1844,6 +1892,42 @@  memtag_sanitize_p ()
   return false;
 }
 
+/* Are we tagging the stack?  */
+bool
+memtag_sanitize_stack_p ()
+{
+  return (memtag_sanitize_p () && param_memtag_instrument_stack);
+}
+
+/* Are we tagging alloca objects?  */
+bool
+memtag_sanitize_allocas_p (void)
+{
+  return (memtag_sanitize_stack_p () && param_memtag_instrument_allocas);
+}
+
+/* Are we taggin mem intrinsics?  */
+bool
+memtag_memintrin (void)
+{
+  return (memtag_sanitize_p () && param_memtag_instrument_mem_intrinsics);
+}
+
+/* Returns whether we are tagging pointers and checking those tags on memory
+   access.  */
+bool
+hwassist_sanitize_p ()
+{
+  return (hwasan_sanitize_p () || memtag_sanitize_p ());
+}
+
+/* Are we tagging stack objects for hwasan or memtag?  */
+bool
+hwassist_sanitize_stack_p ()
+{
+  return (hwasan_sanitize_stack_p () || memtag_sanitize_stack_p ());
+}
+
 /* Insert code to protect stack vars.  The prologue sequence should be emitted
    directly, epilogue sequence returned.  BASE is the register holding the
    stack base, against which OFFSETS array offsets are relative to, OFFSETS
@@ -2367,7 +2451,7 @@  static tree
 report_error_func (bool is_store, bool recover_p, HOST_WIDE_INT size_in_bytes,
 		   int *nargs)
 {
-  gcc_assert (!hwasan_sanitize_p ());
+  gcc_assert (!hwassist_sanitize_p ());
 
   static enum built_in_function report[2][2][6]
     = { { { BUILT_IN_ASAN_REPORT_LOAD1, BUILT_IN_ASAN_REPORT_LOAD2,
@@ -2703,7 +2787,7 @@  build_check_stmt (location_t loc, tree base, tree len,
   if (is_scalar_access)
     flags |= ASAN_CHECK_SCALAR_ACCESS;
 
-  enum internal_fn fn = hwasan_sanitize_p ()
+  enum internal_fn fn = hwassist_sanitize_p ()
     ? IFN_HWASAN_CHECK
     : IFN_ASAN_CHECK;
 
@@ -2803,7 +2887,7 @@  instrument_derefs (gimple_stmt_iterator *iter, tree t,
 	 access is inside a global variable, then there's no point adding
 	 instrumentation to check the access.  N.b. hwasan currently never
 	 sanitizes globals.  */
-      if ((hwasan_sanitize_p () || !param_asan_globals)
+      if ((hwassist_sanitize_p () || !param_asan_globals)
 	  && is_global_var (inner))
         return;
       if (!TREE_STATIC (inner))
@@ -2902,7 +2986,8 @@  instrument_mem_region_access (tree base, tree len,
 static bool
 instrument_builtin_call (gimple_stmt_iterator *iter)
 {
-  if (!(asan_memintrin () || hwasan_memintrin ()))
+  if (!(asan_memintrin () || hwasan_memintrin ()
+	|| memtag_memintrin ()))
     return false;
 
   bool iter_advanced_p = false;
@@ -3056,7 +3141,7 @@  maybe_instrument_call (gimple_stmt_iterator *iter)
 	 `longjmp`, thread exit, and exceptions in a different way.  These
 	 problems must be handled externally to the compiler, e.g. in the
 	 language runtime.  */
-      if (! hwasan_sanitize_p ())
+      if (! hwassist_sanitize_p ())
 	{
 	  tree decl = builtin_decl_implicit (BUILT_IN_ASAN_HANDLE_NO_RETURN);
 	  gimple *g = gimple_build_call (decl, 0);
@@ -3809,7 +3894,7 @@  asan_expand_mark_ifn (gimple_stmt_iterator *iter)
 
   gcc_checking_assert (TREE_CODE (decl) == VAR_DECL);
 
-  if (hwasan_sanitize_p ())
+  if (hwassist_sanitize_p ())
     {
       gcc_assert (param_hwasan_instrument_stack);
       gimple_seq stmts = NULL;
@@ -3907,7 +3992,7 @@  asan_expand_mark_ifn (gimple_stmt_iterator *iter)
 bool
 asan_expand_check_ifn (gimple_stmt_iterator *iter, bool use_calls)
 {
-  gcc_assert (!hwasan_sanitize_p ());
+  gcc_assert (!hwassist_sanitize_p ());
   gimple *g = gsi_stmt (*iter);
   location_t loc = gimple_location (g);
   bool recover_p;
@@ -4182,7 +4267,7 @@  asan_expand_poison_ifn (gimple_stmt_iterator *iter,
       int nargs;
       bool store_p = gimple_call_internal_p (use, IFN_ASAN_POISON_USE);
       gcall *call;
-      if (hwasan_sanitize_p ())
+      if (hwassist_sanitize_p ())
 	{
 	  tree fun = builtin_decl_implicit (BUILT_IN_HWASAN_TAG_MISMATCH4);
 	  /* NOTE: hwasan has no __hwasan_report_* functions like asan does.
@@ -4283,7 +4368,7 @@  asan_expand_poison_ifn (gimple_stmt_iterator *iter,
 static unsigned int
 asan_instrument (void)
 {
-  if (hwasan_sanitize_p ())
+  if (hwassist_sanitize_p ())
     {
       initialize_sanitizer_builtins ();
       transform_statements ();
@@ -4412,10 +4497,15 @@  hwasan_frame_base ()
   if (! hwasan_frame_base_ptr)
     {
       start_sequence ();
-      hwasan_frame_base_ptr
-	= force_reg (Pmode,
-		     targetm.memtag.insert_random_tag (virtual_stack_vars_rtx,
-						       NULL_RTX));
+      if (memtag_sanitize_p ())
+	hwasan_frame_base_ptr
+	  = targetm.memtag.insert_random_tag (virtual_stack_vars_rtx,
+					      NULL_RTX);
+      else
+	hwasan_frame_base_ptr
+	  = force_reg (Pmode,
+		       targetm.memtag.insert_random_tag (virtual_stack_vars_rtx,
+							 NULL_RTX));
       hwasan_frame_base_init_seq = get_insns ();
       end_sequence ();
     }
@@ -4470,10 +4560,11 @@  hwasan_maybe_emit_frame_base_init ()
    We record the `untagged_base` since the functions in the hwasan library we
    use to tag memory take pointers without a tag.  */
 void
-hwasan_record_stack_var (rtx untagged_base, rtx tagged_base,
+hwasan_record_stack_var (tree decl, rtx untagged_base, rtx tagged_base,
 			 poly_int64 nearest_offset, poly_int64 farthest_offset)
 {
   hwasan_stack_var cur_var;
+  cur_var.decl = decl;
   cur_var.untagged_base = untagged_base;
   cur_var.tagged_base = tagged_base;
   cur_var.nearest_offset = nearest_offset;
@@ -4625,19 +4716,34 @@  hwasan_emit_prologue ()
       gcc_assert (multiple_p (bot, HWASAN_TAG_GRANULE_SIZE));
       gcc_assert (multiple_p (size, HWASAN_TAG_GRANULE_SIZE));
 
-      rtx fn = init_one_libfunc ("__hwasan_tag_memory");
-      rtx base_tag = targetm.memtag.extract_tag (cur.tagged_base, NULL_RTX);
-      rtx tag = plus_constant (QImode, base_tag, cur.tag_offset);
-      tag = hwasan_truncate_to_tag_size (tag, NULL_RTX);
-
-      rtx bottom = convert_memory_address (ptr_mode,
-					   plus_constant (Pmode,
-							  cur.untagged_base,
-							  bot));
-      emit_library_call (fn, LCT_NORMAL, VOIDmode,
-			 bottom, ptr_mode,
-			 tag, QImode,
-			 gen_int_mode (size, ptr_mode), ptr_mode);
+      if (memtag_sanitize_p ())
+	{
+	  rtx x;
+	  if (!HAS_RTL_P (cur.decl))
+	    x = targetm.memtag.add_tag (cur.tagged_base, size, cur.tag_offset);
+	  else
+	    x = XEXP (DECL_RTL (cur.decl), 0);
+
+	  targetm.memtag.tag_memory (x, gen_int_mode (size, ptr_mode), x);
+	}
+      else
+	{
+	  rtx fn = init_one_libfunc ("__hwasan_tag_memory");
+
+	  rtx bottom = convert_memory_address (ptr_mode,
+					       plus_constant (Pmode,
+							      cur.untagged_base,
+							      bot));
+
+	  rtx base_tag = targetm.memtag.extract_tag (cur.tagged_base, NULL_RTX);
+	  rtx tag = plus_constant (QImode, base_tag, cur.tag_offset);
+	  tag = hwasan_truncate_to_tag_size (tag, NULL_RTX);
+
+	  emit_library_call (fn, LCT_NORMAL, VOIDmode,
+			     bottom, ptr_mode,
+			     tag, QImode,
+			     gen_int_mode (size, ptr_mode), ptr_mode);
+	}
     }
   /* Clear the stack vars, we've emitted the prologue for them all now.  */
   hwasan_tagged_stack_vars.truncate (0);
@@ -4678,11 +4784,25 @@  hwasan_emit_untag_frame (rtx dynamic, rtx vars)
 				      NULL_RTX, /* unsignedp = */0,
 				      OPTAB_DIRECT);
 
-  rtx fn = init_one_libfunc ("__hwasan_tag_memory");
-  emit_library_call (fn, LCT_NORMAL, VOIDmode,
-		     bot_rtx, ptr_mode,
-		     HWASAN_STACK_BACKGROUND, QImode,
-		     size_rtx, ptr_mode);
+  if (memtag_sanitize_p ())
+    {
+      /* FIXME - not sure if this is OK to do.  */
+      if (!cfun->calls_alloca)
+	{
+	  HOST_WIDE_INT size = frame_offset.to_constant ();
+	  size_rtx = gen_int_mode (size, ptr_mode);
+	}
+
+      targetm.memtag.tag_memory (top_rtx, size_rtx, virtual_stack_vars_rtx);
+    }
+  else
+    {
+      rtx fn = init_one_libfunc ("__hwasan_tag_memory");
+      emit_library_call (fn, LCT_NORMAL, VOIDmode,
+			 bot_rtx, ptr_mode,
+			 HWASAN_STACK_BACKGROUND, QImode,
+			 size_rtx, ptr_mode);
+    }
 
   do_pending_stack_adjust ();
   rtx_insn *insns = get_insns ();
diff --git a/gcc/asan.h b/gcc/asan.h
index c34b1d304288..d169a769f780 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -36,7 +36,7 @@  extern bool asan_expand_poison_ifn (gimple_stmt_iterator *, bool *,
 extern rtx asan_memfn_rtl (tree);
 
 extern void hwasan_record_frame_init ();
-extern void hwasan_record_stack_var (rtx, rtx, poly_int64, poly_int64);
+extern void hwasan_record_stack_var (tree, rtx, rtx, poly_int64, poly_int64);
 extern void hwasan_emit_prologue ();
 extern rtx_insn *hwasan_emit_untag_frame (rtx, rtx);
 extern rtx hwasan_get_frame_extent ();
@@ -55,6 +55,11 @@  extern bool hwasan_expand_mark_ifn (gimple_stmt_iterator *);
 extern bool gate_hwasan (void);
 
 extern bool memtag_sanitize_p (void);
+extern bool memtag_sanitize_stack_p (void);
+extern bool memtag_sanitize_allocas_p (void);
+
+bool hwassist_sanitize_p (void);
+bool hwassist_sanitize_stack_p (void);
 
 extern gimple_stmt_iterator create_cond_insert_point
      (gimple_stmt_iterator *, bool, bool, bool, basic_block *, basic_block *);
@@ -224,7 +229,7 @@  inline bool
 asan_sanitize_use_after_scope (void)
 {
   return (flag_sanitize_address_use_after_scope
-	  && (asan_sanitize_stack_p () || hwasan_sanitize_stack_p ()));
+	  && (asan_sanitize_stack_p () || hwassist_sanitize_stack_p ()));
 }
 
 /* Return true if DECL should be guarded on the stack.  */
diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index f3a33ff9a07d..20fb3b048f00 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -380,7 +380,7 @@  align_local_variable (tree decl, bool really_expand)
   else
     align = LOCAL_DECL_ALIGNMENT (decl);
 
-  if (hwasan_sanitize_stack_p ())
+  if (hwassist_sanitize_stack_p ())
     align = MAX (align, (unsigned) HWASAN_TAG_GRANULE_SIZE * BITS_PER_UNIT);
 
   if (TREE_CODE (decl) != SSA_NAME && really_expand)
@@ -1082,7 +1082,7 @@  expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
   /* If this fails, we've overflowed the stack frame.  Error nicely?  */
   gcc_assert (known_eq (offset, trunc_int_for_mode (offset, Pmode)));
 
-  if (hwasan_sanitize_stack_p ())
+  if (hwassist_sanitize_stack_p ())
     x = targetm.memtag.add_tag (base, offset,
 				hwasan_current_frame_tag ());
   else
@@ -1217,14 +1217,14 @@  expand_stack_vars (bool (*pred) (unsigned), class stack_vars_data *data)
       if (pred && !pred (i))
 	continue;
 
-      base = (hwasan_sanitize_stack_p ()
+      base = (hwassist_sanitize_stack_p ()
 	      ? hwasan_frame_base ()
 	      : virtual_stack_vars_rtx);
       alignb = stack_vars[i].alignb;
       if (alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT)
 	{
 	  poly_int64 hwasan_orig_offset;
-	  if (hwasan_sanitize_stack_p ())
+	  if (hwassist_sanitize_stack_p ())
 	    {
 	      /* There must be no tag granule "shared" between different
 		 objects.  This means that no HWASAN_TAG_GRANULE_SIZE byte
@@ -1323,7 +1323,7 @@  expand_stack_vars (bool (*pred) (unsigned), class stack_vars_data *data)
 	      offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
 	      base_align = crtl->max_used_stack_slot_alignment;
 
-	      if (hwasan_sanitize_stack_p ())
+	      if (hwassist_sanitize_stack_p ())
 		{
 		  /* Align again since the point of this alignment is to handle
 		     the "end" of the object (i.e. smallest address after the
@@ -1337,7 +1337,8 @@  expand_stack_vars (bool (*pred) (unsigned), class stack_vars_data *data)
 		     allocated for this particular variable while `offset`
 		     describes the address that this variable starts at.  */
 		  align_frame_offset (HWASAN_TAG_GRANULE_SIZE);
-		  hwasan_record_stack_var (virtual_stack_vars_rtx, base,
+		  hwasan_record_stack_var (stack_vars[i].decl,
+					   virtual_stack_vars_rtx, base,
 					   hwasan_orig_offset, frame_offset);
 		}
 	    }
@@ -1368,7 +1369,7 @@  expand_stack_vars (bool (*pred) (unsigned), class stack_vars_data *data)
 	  large_alloc = aligned_upper_bound (large_alloc, alignb);
 	  offset = large_alloc;
 	  large_alloc += stack_vars[i].size;
-	  if (hwasan_sanitize_stack_p ())
+	  if (hwassist_sanitize_stack_p ())
 	    {
 	      /* An object with a large alignment requirement means that the
 		 alignment requirement is greater than the required alignment
@@ -1384,7 +1385,8 @@  expand_stack_vars (bool (*pred) (unsigned), class stack_vars_data *data)
 		 then use positive offsets from that.  Hence the farthest
 		 offset is `align_again` and the nearest offset from the base
 		 is `offset`.  */
-	      hwasan_record_stack_var (large_untagged_base, large_base,
+	      hwasan_record_stack_var (stack_vars[i].decl,
+				       large_untagged_base, large_base,
 				       offset, align_again);
 	    }
 
@@ -1399,7 +1401,7 @@  expand_stack_vars (bool (*pred) (unsigned), class stack_vars_data *data)
 	  expand_one_stack_var_at (stack_vars[j].decl,
 				   base, base_align, offset);
 	}
-      if (hwasan_sanitize_stack_p ())
+      if (hwassist_sanitize_stack_p ())
 	hwasan_increment_frame_tag ();
     }
 
@@ -1492,7 +1494,7 @@  expand_one_stack_var_1 (tree var)
   gcc_assert (byte_align * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT);
 
   rtx base;
-  if (hwasan_sanitize_stack_p ())
+  if (hwassist_sanitize_stack_p ())
     {
       /* Allocate zero bytes to align the stack.  */
       poly_int64 hwasan_orig_offset
@@ -1508,7 +1510,7 @@  expand_one_stack_var_1 (tree var)
 	 the "furthest" offset from the base delimiting the current stack
 	 object.  `frame_offset` will always delimit the extent that the frame.
 	 */
-      hwasan_record_stack_var (virtual_stack_vars_rtx, base,
+      hwasan_record_stack_var (var, virtual_stack_vars_rtx, base,
 			       hwasan_orig_offset, frame_offset);
     }
   else
@@ -1520,7 +1522,7 @@  expand_one_stack_var_1 (tree var)
   expand_one_stack_var_at (var, base,
 			   crtl->max_used_stack_slot_alignment, offset);
 
-  if (hwasan_sanitize_stack_p ())
+  if (hwassist_sanitize_stack_p ())
     hwasan_increment_frame_tag ();
 }
 
@@ -2124,7 +2126,7 @@  init_vars_expansion (void)
   /* Initialize local stack smashing state.  */
   has_protected_decls = false;
   has_short_buffer = false;
-  if (hwasan_sanitize_stack_p ())
+  if (hwassist_sanitize_stack_p ())
     hwasan_record_frame_init ();
 }
 
@@ -2450,13 +2452,14 @@  expand_used_vars (bitmap forced_stack_vars)
       expand_stack_vars (NULL, &data);
     }
 
-  if (hwasan_sanitize_stack_p ())
+  if (hwassist_sanitize_stack_p ())
     hwasan_emit_prologue ();
   if (asan_sanitize_allocas_p () && cfun->calls_alloca)
     var_end_seq = asan_emit_allocas_unpoison (virtual_stack_dynamic_rtx,
 					      virtual_stack_vars_rtx,
 					      var_end_seq);
-  else if (hwasan_sanitize_allocas_p () && cfun->calls_alloca)
+  else if ((hwasan_sanitize_allocas_p () || memtag_sanitize_p ())
+	   && cfun->calls_alloca)
     /* When using out-of-line instrumentation we only want to emit one function
        call for clearing the tags in a region of shadow stack.  When there are
        alloca calls in this frame we want to emit a call using the
@@ -2464,7 +2467,7 @@  expand_used_vars (bitmap forced_stack_vars)
        rtx we created in expand_stack_vars.  */
     var_end_seq = hwasan_emit_untag_frame (virtual_stack_dynamic_rtx,
 					   virtual_stack_vars_rtx);
-  else if (hwasan_sanitize_stack_p ())
+  else if (hwassist_sanitize_stack_p ())
     /* If no variables were stored on the stack, `hwasan_get_frame_extent`
        will return NULL_RTX and hence `hwasan_emit_untag_frame` will return
        NULL (i.e. an empty sequence).  */
@@ -6945,7 +6948,7 @@  pass_expand::execute (function *fun)
       emit_insn_after (var_ret_seq, after);
     }
 
-  if (hwasan_sanitize_stack_p ())
+  if (hwassist_sanitize_stack_p ())
     hwasan_maybe_emit_frame_base_init ();
 
   /* Zap the tree EH table.  */
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 827941b24db2..e50056a5b869 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -1280,9 +1280,10 @@  asan_poison_variable (tree decl, bool poison, gimple_stmt_iterator *it,
 
   /* It's necessary to have all stack variables aligned to ASAN granularity
      bytes.  */
-  gcc_assert (!hwasan_sanitize_p () || hwasan_sanitize_stack_p ());
+  gcc_assert (!hwassist_sanitize_p () || hwassist_sanitize_stack_p ());
   unsigned shadow_granularity
-    = hwasan_sanitize_p () ? HWASAN_TAG_GRANULE_SIZE : ASAN_SHADOW_GRANULARITY;
+    = (hwassist_sanitize_p ()
+       ? HWASAN_TAG_GRANULE_SIZE : ASAN_SHADOW_GRANULARITY);
   if (DECL_ALIGN_UNIT (decl) <= shadow_granularity)
     SET_DECL_ALIGN (decl, BITS_PER_UNIT * shadow_granularity);
 
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 0ee5f5bc7c55..19d149e8c2bb 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -720,6 +720,39 @@  expand_HWASAN_CHECK (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* For hwasan stack tagging:
+   Tag memory which is dynamically allocated.  */
+static void
+expand_HWASAN_ALLOCA_POISON (internal_fn, gcall *gc)
+{
+  gcc_assert (ptr_mode == Pmode);
+  tree g_target = gimple_call_lhs (gc);
+  tree g_ptr = gimple_call_arg (gc, 0);
+  tree g_size = gimple_call_arg (gc, 1);
+
+  rtx ptr = expand_normal (g_ptr);
+  rtx tag = targetm.memtag.add_tag (hwasan_frame_base (), 0,
+				    hwasan_current_frame_tag ());
+  rtx size = expand_normal (g_size);
+  rtx target = expand_normal (g_target);
+  machine_mode mode = GET_MODE (target);
+
+  if (memtag_sanitize_p ())
+    {
+      /* Really need to put the tag into the `target` RTX.  */
+      if (tag != target)
+	{
+	  gcc_assert (GET_MODE (tag) == mode);
+	  emit_move_insn (target, tag);
+	}
+      /* Tag the memory.  */
+      emit_insn (targetm.memtag.tag_memory (ptr, size, tag));
+      hwasan_increment_frame_tag ();
+    }
+  else
+    gcc_unreachable ();
+}
+
 /* For hwasan stack tagging:
    Clear tags on the dynamically allocated space.
    For use after an object dynamically allocated on the stack goes out of
@@ -731,14 +764,27 @@  expand_HWASAN_ALLOCA_UNPOISON (internal_fn, gcall *gc)
   tree restored_position = gimple_call_arg (gc, 0);
   rtx restored_rtx = expand_expr (restored_position, NULL_RTX, VOIDmode,
 				  EXPAND_NORMAL);
-  rtx func = init_one_libfunc ("__hwasan_tag_memory");
   rtx off = expand_simple_binop (Pmode, MINUS, restored_rtx,
 				 stack_pointer_rtx, NULL_RTX, 0,
 				 OPTAB_WIDEN);
-  emit_library_call_value (func, NULL_RTX, LCT_NORMAL, VOIDmode,
-			   virtual_stack_dynamic_rtx, Pmode,
-			   HWASAN_STACK_BACKGROUND, QImode,
-			   off, Pmode);
+
+  if (memtag_sanitize_p ())
+    {
+      emit_insn (targetm.memtag.tag_memory (virtual_stack_dynamic_rtx,
+					    off,
+					    virtual_stack_dynamic_rtx));
+    }
+  else
+    {
+      rtx func = init_one_libfunc ("__hwasan_tag_memory");
+      rtx off = expand_simple_binop (Pmode, MINUS, restored_rtx,
+				     stack_pointer_rtx, NULL_RTX, 0,
+				     OPTAB_WIDEN);
+      emit_library_call_value (func, NULL_RTX, LCT_NORMAL, VOIDmode,
+			       virtual_stack_dynamic_rtx, Pmode,
+			       HWASAN_STACK_BACKGROUND, QImode,
+			       off, Pmode);
+    }
 }
 
 /* For hwasan stack tagging:
@@ -792,9 +838,16 @@  expand_HWASAN_MARK (internal_fn, gcall *gc)
   tree len = gimple_call_arg (gc, 2);
   rtx r_len = expand_normal (len);
 
-  rtx func = init_one_libfunc ("__hwasan_tag_memory");
-  emit_library_call (func, LCT_NORMAL, VOIDmode, address, Pmode,
-		     tag, QImode, r_len, Pmode);
+  if (memtag_sanitize_p ())
+    {
+      emit_insn (targetm.memtag.tag_memory (address, r_len, tag));
+    }
+  else
+    {
+      rtx func = init_one_libfunc ("__hwasan_tag_memory");
+      emit_library_call (func, LCT_NORMAL, VOIDmode, address, Pmode,
+			 tag, QImode, r_len, Pmode);
+    }
 }
 
 /* For hwasan stack tagging:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index c3d0efc0f2c3..628df957adbc 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -489,6 +489,7 @@  DEF_INTERNAL_FN (UBSAN_PTR, ECF_LEAF | ECF_NOTHROW, ". R . ")
 DEF_INTERNAL_FN (UBSAN_OBJECT_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ABNORMAL_DISPATCHER, ECF_NORETURN, NULL)
 DEF_INTERNAL_FN (BUILTIN_EXPECT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (HWASAN_ALLOCA_POISON, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (HWASAN_ALLOCA_UNPOISON, ECF_LEAF | ECF_NOTHROW, ". R ")
 DEF_INTERNAL_FN (HWASAN_CHOOSE_TAG, ECF_LEAF | ECF_NOTHROW, ". ")
 DEF_INTERNAL_FN (HWASAN_CHECK, ECF_TM_PURE | ECF_LEAF | ECF_NOTHROW,
diff --git a/gcc/params.opt b/gcc/params.opt
index a87df398a742..9e2c0d677441 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -98,6 +98,10 @@  When sanitizing using MTE instructions, add checks for all stack automatics.
 Target Joined UInteger Var(param_memtag_instrument_allocas) Init(1) IntegerRange(0, 1) Param
 When sanitizing using MTE instructions, add checks for all stack allocas.
 
+-param=memtag-instrument-mem-intrinsics=
+Common Joined UInteger Var(param_memtag_instrument_mem_intrinsics) Init(1) IntegerRange(0, 1) Param Optimization
+When sanitizing using MTE instructions, include builtin functions.
+
 -param=avg-loop-niter=
 Common Joined UInteger Var(param_avg_loop_niter) Init(10) IntegerRange(1, 65536) Param Optimization
 Average number of iterations of a loop.
diff --git a/gcc/sanopt.cc b/gcc/sanopt.cc
index 0d79a0271f73..da0580dd2f84 100644
--- a/gcc/sanopt.cc
+++ b/gcc/sanopt.cc
@@ -1327,7 +1327,7 @@  pass_sanopt::execute (function *fun)
       sanitize_asan_mark_poison ();
     }
 
-  if (asan_sanitize_stack_p () || hwasan_sanitize_stack_p ())
+  if (asan_sanitize_stack_p () || hwassist_sanitize_stack_p ())
     sanitize_rewrite_addressable_params (fun);
 
   bool use_calls = param_asan_instrumentation_with_call_threshold < INT_MAX