[2/2] c++/coroutines: handle (new-)extended alignment [PR104177]

Message ID 20240918210202.192478-3-arsen@aarsen.me
State New
Headers
Series Support for coroutine frames with new-extended alignment |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed

Commit Message

Arsen Arsenović Sept. 18, 2024, 8:36 p.m. UTC
  This patch implements support for frames and promises with new-extended
alignment.

There are two kinds of alignment to worry about here:
- Promise alignment, which causes "internal" padding inside the frame
  struct.  The reason this is a problem is because the (yet to be
  formalized, but agreed upon) coroutine ABI requires us to place the
  resume and destroy pointers right before the promise of a coroutine,
  so that they are at offset -sizeof(void*)B and -2*sizeof(void*)B.  To
  this end, we might need to insert padding to the start of the frame
  type in order to ensure that these pointers are aligned correctly.
- Frame alignment (either as a result of the above or some other field in
  the frame), which is the alignment of the entire finished frame type.
  This alignment currently (up to the standardization of P2014) requires
  us to over-allocate and then round the result.

In addition, this patch also alters __builtin_coro_promise, the builtin
that turns a promise pointer into a _M_fr_ptr of a coroutine handle.
This builtin took an additional alignment parameter it used to
compensate for the padding that'd happen between the resume/destroy
pointer and the promise.  This is no longer necessary, as now these
pointers directly precede the frame.

As a result of the resume pointer no longer being at the start of the
coroutine frame, our actors and destroys are changed.  They now take
&frame->_Coro_resume_fn as the argument ("resume pointer"), and need to
adjust it to be the start of the frame before use.

The way each of the kinds of padding above are handled is as follows:

- For promise alignment, we emit extra members before the resume
  pointer: an "allocation pointer", if there's room for it, and a char[]
  of appropriate size to ensure there's no room between the resume
  pointer and the padding, and
- For frame alignment, we allocate extra memory in order to perform
  alignment on our own and store the result in the allocation pointer,
  which is after the frame if it could not be placed as part of the
  process above.

The location and necessity of the allocation pointer is dictated by
coroutine_info::allocptr_expr and coroutine_info::alloc_store_pointer,
the former provides an lvalue for the allocation pointer based on the
frame pointer, iff rounding is to be performed (nullptr otherwise), and
the latter decides whether to allocate extra space for the allocation
pointer (if stored after the frame).  This allows us flexibility to
later add support for P2014.

PR c++/104177 - coroutine frame is not being allocated with the correct alignment

	PR c++/104177

gcc/ChangeLog:

	* coroutine-passes.cc (lower_coro_builtin): Remove the second
	argument to __builtin_coro_promise.

gcc/cp/ChangeLog:

	* coroutines.cc (coro_maybe_dump_transformed_functions): Add
	frame_type as an explicit argument rather than deriving it from
	the actor argument list.
	(build_resume_to_frame_expr): New function.  Computes an
	expression that converts a resume pointer (the pointer passed to
	__builtin_coro_{resume,destroy}) into a frame pointer, or
	vice-versa.
	(struct coroutine_info): Add members allocptr_expr and
	alloc_store_pointer.  The former specifies the approach by which
	locate storage for the allocation that we round from to get the
	aligned frame pointer, the latter specifies whether that
	approach requires allocating extra space.
	(coro_allocptr_id): New.  Populated with the identifier
	"_Coro_allocptr".
	(coro_padding_id): New.  Populated with the identifier
	"_Coro_padding".
	(coro_frame_ptr_id): New.  Populated with the identifier
	"frame_ptr".
	(coro_init_identifiers): Populate the above.
	(build_coroutine_frame_delete_expr): Require the ramp fndecl as
	an argument, to locate coroutine_info.
	(build_actor_fn): Rename frame_size argument to alloc_size.
	Pass the sole argument to the actor (resume_ptr) through
	build_resume_to_frame_expr in order to obtain the frame_ptr.
	Declare it as a new local variable.
	(build_destroy_fn): Pass the sole argument to destroy function
	through build_resume_to_frame_expr, use the op argument of
	cp_build_modify_expr rather than manually building a
	BIT_IOR_EXPR.  Update actor call to use the resume_ptr argument.
	(coro_build_actor_or_destroy_function): Change declarations to
	have a sole void* resume_ptr argument.  Do not take a
	coro_frame_ptr argument.
	(build_coroutine_frame_alloc_expr): Rename frame_size to
	alloc_size and take it by reference.  If alloc_store_pointer,
	extend that size to account for extra storage needed for the
	allocptr.  If allocptr_expr, extend it further to support
	rounding the allocation result.  After calling operator new, if
	need be, align the resulting pointer and store the original for
	later deletion.
	(allocptr_expr_member): New function.  Implements allocptr_expr
	for the case where we store the allocptr as a _Coro_allocptr
	member in the frame.
	(allocptr_expr_after_frame): New function.  Implements
	allocptr_expr for the case where we couldn't store the allocptr
	asa member of the frame, so we must place it after the frame.
	(cp_coroutine_transform::build_ramp_function): Update
	build_coro_frame_{alloc,delete}_expr calls as needed, pass
	argument to the actor through build_resume_to_frame_expr (with
	BACKWARDS_P set).
	(cp_coroutine_transform::apply_transforms): Update
	coro_build_actor_or_destroy_function.
	(cp_coroutine_transform::finish_transforms): Update calls to
	build_actor_fn so that it receives an unshared alloc_size.
	Update coro_maybe_dump_transformed_functions call.
	* coroutines.h (cp_coroutine_transform): Rename frame_size to
	alloc_size.

gcc/testsuite/ChangeLog:

	* g++.dg/coroutines/frame-alignment-4.C: New test.
	* g++.dg/coroutines/torture/frame-alignment-1.C: New test.
	* g++.dg/coroutines/torture/frame-alignment-2.C: New test.
	* g++.dg/coroutines/torture/frame-alignment-3.C: New test.
---
 gcc/coroutine-passes.cc                       |   8 +-
 gcc/cp/coroutines.cc                          | 430 +++++++++++++++---
 gcc/cp/coroutines.h                           |   2 +-
 .../g++.dg/coroutines/frame-alignment-4.C     |  35 ++
 .../coroutines/torture/frame-alignment-1.C    |  73 +++
 .../coroutines/torture/frame-alignment-2.C    | 101 ++++
 .../coroutines/torture/frame-alignment-3.C    |  87 ++++
 7 files changed, 671 insertions(+), 65 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/frame-alignment-4.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/frame-alignment-1.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/frame-alignment-2.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/frame-alignment-3.C
  

Patch

diff --git a/gcc/coroutine-passes.cc b/gcc/coroutine-passes.cc
index 0f8e24f8d551..c3e5d53b5a9b 100644
--- a/gcc/coroutine-passes.cc
+++ b/gcc/coroutine-passes.cc
@@ -105,18 +105,12 @@  lower_coro_builtin (gimple_stmt_iterator *gsi, bool *handled_ops_p,
 	   that is true when we are converting from a promise ptr to a
 	   frame pointer, and false for the inverse.  */
 	tree ptr = gimple_call_arg (stmt, 0);
-	tree align_t = gimple_call_arg (stmt, 1);
 	tree from = gimple_call_arg (stmt, 2);
-	gcc_checking_assert (TREE_CODE (align_t) == INTEGER_CST);
 	gcc_checking_assert (TREE_CODE (from) == INTEGER_CST);
 	bool dir = wi::to_wide (from) != 0;
-	HOST_WIDE_INT promise_align = TREE_INT_CST_LOW (align_t);
 	HOST_WIDE_INT psize =
 	  TREE_INT_CST_LOW (TYPE_SIZE_UNIT (ptr_type_node));
-	HOST_WIDE_INT align = TYPE_ALIGN_UNIT (ptr_type_node);
-	align = MAX (align, promise_align);
-	psize *= 2; /* Start with two pointers.  */
-	psize = ROUND_UP (psize, align);
+	psize *= 2;
 	HOST_WIDE_INT offs = dir ? -psize : psize;
 	tree repl = build2 (POINTER_PLUS_EXPR, ptr_type_node, ptr,
 			    size_int (offs));
diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 416202ed3e88..d3e88543b8b4 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -33,6 +33,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "gcc-rich-location.h"
 #include "hash-map.h"
 #include "coroutines.h"
+#include "gimplify.h"
 
 /* ================= Debug. ================= */
 
@@ -108,7 +109,8 @@  coro_maybe_dump_ramp (tree ramp)
    pretty-print their contents to the lang-coro dump.  */
 
 static void
-coro_maybe_dump_transformed_functions (tree actor, tree destroy)
+coro_maybe_dump_transformed_functions (tree actor, tree destroy,
+				       tree frame_type)
 {
   if (!dmp_str)
     return;
@@ -124,10 +126,9 @@  coro_maybe_dump_transformed_functions (tree actor, tree destroy)
       return;
     }
 
-  tree frame = TREE_TYPE (TREE_TYPE (DECL_ARGUMENTS (actor)));
   pp_string (&pp, "Frame type:");
   pp_newline (&pp);
-  dump_record_fields (&pp, frame);
+  dump_record_fields (&pp, frame_type);
   pp_newline_and_flush (&pp);
 
   pp_string (&pp, "Actor/resumer:");
@@ -144,6 +145,9 @@  coro_maybe_dump_transformed_functions (tree actor, tree destroy)
 /* ================= END Debug. ================= */
 
 static bool coro_promise_type_found_p (tree, location_t);
+static tree build_resume_to_frame_expr (location_t loc, tree fp_type,
+					tree orig_ptr, bool backwards_p);
+
 
 /* GCC C++ coroutines implementation.
 
@@ -209,6 +213,21 @@  struct GTY((for_user)) coroutine_info
   /* Temporary variable number assigned by get_awaitable_var.  */
   int awaitable_number = 0;
 
+  /* Function that produces an lvalue expression for the storage for the
+     allocated pointer, if the frame type is overaligned.
+
+     FRAME_PTR is the actual pointer to the frame, as produced by
+     resume_to_frame_expr.  LOC is the location of the expr.
+
+     The result is an lvalue of type void*.
+
+     This value is nullptr iff there's no allocation adjustment to account for
+     alignment.  */
+  tree (*allocptr_expr) (location_t loc, tree frame_ptr) = nullptr;
+  /* True iff the allocator needs to pad the allocated storage in order to store
+     a pointer.  */
+  bool alloc_store_pointer = false;
+
   /* Flags to avoid repeated errors for per-function issues.  */
   bool coro_ret_type_error_emitted;
   bool coro_promise_error_emitted;
@@ -332,6 +351,8 @@  static GTY(()) tree coro_await_resume_identifier;
 
 /* Accessors for the coroutine frame state used by the implementation.  */
 
+static GTY(()) tree coro_allocptr_id;
+static GTY(()) tree coro_padding_id;
 static GTY(()) tree coro_resume_fn_id;
 static GTY(()) tree coro_destroy_fn_id;
 static GTY(()) tree coro_promise_id;
@@ -340,6 +361,7 @@  static GTY(()) tree coro_resume_index_id;
 static GTY(()) tree coro_self_handle_id;
 static GTY(()) tree coro_actor_continue_id;
 static GTY(()) tree coro_frame_i_a_r_c_id;
+static GTY(()) tree coro_frame_ptr_id;
 
 /* Create the identifiers used by the coroutines library interfaces and
    the implementation frame state.  */
@@ -369,11 +391,14 @@  coro_init_identifiers ()
   coro_await_resume_identifier = get_identifier ("await_resume");
 
   /* Coroutine state frame field accessors.  */
+  coro_allocptr_id = get_identifier ("_Coro_allocptr");
+  coro_padding_id = get_identifier ("_Coro_padding");
   coro_resume_fn_id = get_identifier ("_Coro_resume_fn");
   coro_destroy_fn_id = get_identifier ("_Coro_destroy_fn");
   coro_promise_id = get_identifier ("_Coro_promise");
   coro_frame_needs_free_id = get_identifier ("_Coro_frame_needs_free");
   coro_frame_i_a_r_c_id = get_identifier ("_Coro_initial_await_resume_called");
+  coro_frame_ptr_id = get_identifier ("frame_ptr");
   coro_resume_index_id = get_identifier ("_Coro_resume_index");
   coro_self_handle_id = get_identifier ("_Coro_self_handle");
   coro_actor_continue_id = get_identifier ("_Coro_actor_continue");
@@ -2328,7 +2353,7 @@  transform_local_var_uses (tree *stmt, int *do_subtree, void *d)
    argument.  */
 
 static tree
-build_coroutine_frame_delete_expr (tree, tree, tree, location_t);
+build_coroutine_frame_delete_expr (tree, tree, tree, location_t, tree);
 
 /* The actor transform.  */
 
@@ -2337,7 +2362,7 @@  build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody,
 		tree orig, hash_map<tree, local_var_info> *local_var_uses,
 		hash_map<tree, suspend_point_info> *suspend_points,
 		vec<tree> *param_dtor_list,
-		tree resume_idx_var, unsigned body_count, tree frame_size,
+		tree resume_idx_var, unsigned body_count, tree alloc_size,
 		bool inline_p)
 {
   verify_stmt_tree (fnbody);
@@ -2345,8 +2370,15 @@  build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody,
   tree promise_type = get_coroutine_promise_type (orig);
   tree promise_proxy = get_coroutine_promise_proxy (orig);
 
-  /* One param, the coro frame pointer.  */
-  tree actor_fp = DECL_ARGUMENTS (actor);
+  /* One param, the resume pointer.  */
+  tree resume_ptr = DECL_ARGUMENTS (actor);
+  gcc_checking_assert (!DECL_CHAIN (actor));
+
+  tree frame_ptr_type = build_pointer_type (coro_frame_type);
+  tree rtf_expr = build_resume_to_frame_expr (loc, frame_ptr_type, resume_ptr,
+					      false);
+  tree actor_fp = coro_build_artificial_var (loc, coro_frame_ptr_id,
+					     frame_ptr_type, actor, rtf_expr);
 
   bool spf = start_preparsed_function (actor, NULL_TREE, SF_PRE_PARSED);
   gcc_checking_assert (spf);
@@ -2361,6 +2393,7 @@  build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody,
 						 void_coro_handle_type, actor,
 						 NULL_TREE);
 
+  DECL_CHAIN (continuation) = actor_fp;
   BIND_EXPR_VARS (actor_bind) = continuation;
   BLOCK_VARS (top_block) = BIND_EXPR_VARS (actor_bind) ;
 
@@ -2382,6 +2415,8 @@  build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody,
     = create_named_label_with_ctx (loc, "actor.begin", actor);
   tree actor_frame = build1_loc (loc, INDIRECT_REF, coro_frame_type, actor_fp);
 
+  /* Declare the frame pointer.  */
+  add_decl_expr (actor_fp);
   /* Declare the continuation handle.  */
   add_decl_expr (continuation);
 
@@ -2543,8 +2578,8 @@  build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody,
 
   /* Build the frame DTOR.  */
   tree del_coro_fr
-    = build_coroutine_frame_delete_expr (actor_fp, frame_size,
-					 promise_type, loc);
+    = build_coroutine_frame_delete_expr (actor_fp, alloc_size,
+					 promise_type, loc, orig);
   finish_expr_stmt (del_coro_fr);
   finish_then_clause (need_free_if);
   tree scope = IF_SCOPE (need_free_if);
@@ -2620,15 +2655,20 @@  build_destroy_fn (location_t loc, tree coro_frame_type, tree destroy,
 		  tree actor, bool inline_p)
 {
   /* One param, the coro frame pointer.  */
-  tree destr_fp = DECL_ARGUMENTS (destroy);
+  tree resume_ptr = DECL_ARGUMENTS (destroy);
+  gcc_checking_assert (!DECL_CHAIN (resume_ptr));
+  tree frame_ptr_type = build_pointer_type (coro_frame_type);
+
+  bool spf = start_preparsed_function (destroy, NULL_TREE, SF_PRE_PARSED);
+  gcc_checking_assert (spf);
+  tree dstr_stmt = begin_function_body ();
+
+  tree destr_fp = build_resume_to_frame_expr (loc, frame_ptr_type, resume_ptr,
+					      false);
   gcc_checking_assert (POINTER_TYPE_P (TREE_TYPE (destr_fp))
 		       && same_type_p (coro_frame_type,
 				       TREE_TYPE (TREE_TYPE (destr_fp))));
 
-  bool spf = start_preparsed_function (destroy, NULL_TREE, SF_PRE_PARSED);
-  gcc_checking_assert (spf);
-  tree dstr_stmt = begin_function_body ();
-
   tree destr_frame
     = cp_build_indirect_ref (loc, destr_fp, RO_UNARY_STAR,
 			     tf_warning_or_error);
@@ -2637,15 +2677,13 @@  build_destroy_fn (location_t loc, tree coro_frame_type, tree destroy,
 					   false, tf_warning_or_error);
 
   /* _resume_at |= 1 */
-  tree dstr_idx
-    = build2_loc (loc, BIT_IOR_EXPR, short_unsigned_type_node, rat,
-		  build_int_cst (short_unsigned_type_node, 1));
-  tree r = cp_build_modify_expr (loc, rat, NOP_EXPR, dstr_idx,
+  tree r = cp_build_modify_expr (loc, rat, BIT_IOR_EXPR,
+				 build_int_cst (short_unsigned_type_node, 1),
 				 tf_warning_or_error);
   finish_expr_stmt (r);
 
   /* So .. call the actor ..  */
-  finish_expr_stmt (build_call_expr_loc (loc, actor, 1, destr_fp));
+  finish_expr_stmt (build_call_expr_loc (loc, actor, 1, resume_ptr));
 
   /* done. */
   finish_return_stmt (NULL_TREE);
@@ -4142,8 +4180,7 @@  register_local_var_uses (tree *stmt, int *do_subtree, void *d)
    ACTOR_P is true, otherwise the destroy. */
 
 static tree
-coro_build_actor_or_destroy_function (tree orig, tree fn_type,
-				      tree coro_frame_ptr, bool actor_p)
+coro_build_actor_or_destroy_function (tree orig, tree fn_type, bool actor_p)
 {
   location_t loc = DECL_SOURCE_LOCATION (orig);
   tree fn
@@ -4159,11 +4196,11 @@  coro_build_actor_or_destroy_function (tree orig, tree fn_type,
   DECL_ARTIFICIAL (fn) = true;
   DECL_INITIAL (fn) = error_mark_node;
 
-  tree id = get_identifier ("frame_ptr");
-  tree fp = build_lang_decl (PARM_DECL, id, coro_frame_ptr);
+  tree id = get_identifier ("resume_ptr");
+  tree fp = build_lang_decl (PARM_DECL, id, ptr_type_node);
   DECL_ARTIFICIAL (fp) = true;
   DECL_CONTEXT (fp) = fn;
-  DECL_ARG_TYPE (fp) = type_passed_as (coro_frame_ptr);
+  DECL_ARG_TYPE (fp) = type_passed_as (ptr_type_node);
   DECL_ARGUMENTS (fn) = fp;
 
   /* Copy selected attributes from the original function.  */
@@ -4536,7 +4573,8 @@  static tree
 build_coroutine_frame_alloc_expr (tree promise_type, tree orig_fn_decl,
 				  location_t fn_start, tree grooaf,
 				  hash_map<tree, param_info> *param_uses,
-				  tree frame_size)
+				  tree& alloc_size,
+				  tree frame_type)
 {
   /* Allocate the frame, this has several possibilities:
      [dcl.fct.def.coroutine] / 9 (part 1)
@@ -4547,6 +4585,27 @@  build_coroutine_frame_alloc_expr (tree promise_type, tree orig_fn_decl,
   tree new_fn_call = NULL_TREE;
   tree dummy_promise
     = build_dummy_object (get_coroutine_promise_type (orig_fn_decl));
+  auto coro_info = get_coroutine_info (orig_fn_decl);
+
+  if (coro_info->alloc_store_pointer)
+    {
+      /* If we're padding storage but won't use the padded storage, there's no
+	 point in padding it.  */
+      gcc_checking_assert (coro_info->allocptr_expr);
+      /* We need to pad the allocation to store the allocated pointer.  */
+      alloc_size = fold_build2_loc (fn_start, PLUS_EXPR, TREE_TYPE (alloc_size),
+				    alloc_size, TYPE_SIZE_UNIT (ptr_type_node));
+    }
+
+  if (coro_info->allocptr_expr)
+    {
+      /* We'll be doing rounding.  */
+      tree extra_size = build_int_cst (TREE_TYPE (alloc_size),
+				       TYPE_ALIGN_UNIT (frame_type) - 1);
+      alloc_size = fold_build2_loc (fn_start, PLUS_EXPR, TREE_TYPE (alloc_size),
+				    alloc_size, extra_size);
+    }
+
 
   if (TYPE_HAS_NEW_OPERATOR (promise_type))
     {
@@ -4559,7 +4618,7 @@  build_coroutine_frame_alloc_expr (tree promise_type, tree orig_fn_decl,
 	requested, and has type std::size_t.  The lvalues p1...pn are the
 	succeeding arguments..  */
       vec<tree, va_gc> *args = make_tree_vector ();
-      vec_safe_push (args, frame_size); /* Space needed.  */
+      vec_safe_push (args, alloc_size); /* Space needed.  */
 
       for (tree arg = DECL_ARGUMENTS (orig_fn_decl); arg != NULL;
 	   arg = DECL_CHAIN (arg))
@@ -4591,7 +4650,7 @@  build_coroutine_frame_alloc_expr (tree promise_type, tree orig_fn_decl,
 	    If no viable function is found, overload resolution is performed
 	    again on a function call created by passing just the amount of
 	    space required as an argument of type std::size_t.  */
-	  args = make_tree_vector_single (frame_size); /* Space needed.  */
+	  args = make_tree_vector_single (alloc_size); /* Space needed.  */
 	  new_fn_call = build_new_method_call (dummy_promise, fns, &args,
 					  NULL_TREE, LOOKUP_NORMAL, &func,
 					  tf_none);
@@ -4654,34 +4713,109 @@  build_coroutine_frame_alloc_expr (tree promise_type, tree orig_fn_decl,
 
       /* If we get to this point, we must succeed in looking up the global
 	 operator new for the params provided.  Since we are not setting
-	 size_check or cookie, we expect frame_size to be unaltered.  */
+	 size_check or cookie, we expect alloc_size to be unaltered.  */
       tree cookie = NULL;
-      new_fn_call = build_operator_new_call (nwname, &args, &frame_size,
+      new_fn_call = build_operator_new_call (nwname, &args, &alloc_size,
 					     &cookie, /*align_arg=*/NULL,
 					     /*size_check=*/NULL, /*fn=*/NULL,
 					     tf_warning_or_error);
       release_tree_vector (args);
     }
-  return new_fn_call;
+
+  if (auto allocptr_expr = coro_info->allocptr_expr)
+    {
+      /* Need to do extra handling to ensure the op new result is aligned
+	 properly.  */
+      auto align = TYPE_ALIGN_UNIT (frame_type);
+      tree aligning_alloc_expr = begin_stmt_expr ();
+      tree aaexpr_comp = begin_compound_stmt (BCS_STMT_EXPR);
+
+      tree charstar = build_pointer_type (char_type_node);
+      tree aligned_addr = (coro_build_artificial_var
+			   (fn_start, NULL_TREE, charstar, orig_fn_decl,
+			    NULL_TREE));
+
+      tree unaligned_addr = (coro_build_artificial_var
+			     (fn_start, NULL_TREE, ptr_type_node,
+			      orig_fn_decl, new_fn_call));
+
+      /* Declare the allocated address, and call the allocator.  */
+      pushdecl (unaligned_addr);
+      add_decl_expr (unaligned_addr);
+
+      /* Declare the resultant address.  */
+      pushdecl (aligned_addr);
+      add_decl_expr (aligned_addr);
+
+      auto binop = [&] (enum tree_code cc, tree op1, tree op2)
+      {
+	return cp_build_binary_op (fn_start, cc, op1, op2,
+				   tf_warning_or_error);
+      };
+
+      {
+	/* Compute the aligned value and store it into ALIGNED_ADDR.  */
+	tree alignment_cmask = build_int_cst (uintptr_type_node, align - 1);
+	tree alignment_mask = cp_build_unary_op (BIT_NOT_EXPR, alignment_cmask,
+						 true, tf_warning_or_error);
+	tree align_val = binop (PLUS_EXPR,
+				convert (uintptr_type_node, unaligned_addr),
+				alignment_cmask);
+	align_val = binop (BIT_AND_EXPR, align_val, alignment_mask);
+	align_val = convert (charstar, align_val);
+	align_val = cp_build_modify_expr (fn_start, aligned_addr, NOP_EXPR,
+					  align_val, tf_warning_or_error);
+	finish_expr_stmt (align_val);
+      }
+
+      {
+	/* Okay, now that we've aligned the address we received into
+	   ALIGNED_ADDR, we need to store UNALIGNED_ADDDR right after the frame,
+	   for later reading.  */
+	tree frame_ptr = convert (build_pointer_type (frame_type),
+				  aligned_addr);
+	tree addr_stor = allocptr_expr (fn_start, frame_ptr);
+	addr_stor = cp_build_modify_expr (fn_start, addr_stor, NOP_EXPR,
+					  unaligned_addr, tf_warning_or_error);
+	finish_expr_stmt (addr_stor);
+      }
+
+      /* Our result is the aligned address.  */
+      finish_stmt_expr_expr (aligned_addr, aligning_alloc_expr);
+
+      finish_compound_stmt (aaexpr_comp);
+      aligning_alloc_expr = finish_stmt_expr (aligning_alloc_expr, false);
+      return aligning_alloc_expr;
+    }
+  else
+    /* No extra processing required.  */
+    return new_fn_call;
 }
 
 /* Build an expression to delete the coroutine state frame.  */
 
 static tree
-build_coroutine_frame_delete_expr (tree coro_fp, tree frame_size,
-				   tree promise_type, location_t loc)
+build_coroutine_frame_delete_expr (tree coro_fp, tree alloc_size,
+				   tree promise_type, location_t loc,
+				   tree fndecl)
 {
+  tree alloc_ptr = coro_fp;
+
+  if (auto allocptr_expr = get_coroutine_info (fndecl)->allocptr_expr)
+    /* Get the real allocation address.  */
+    alloc_ptr = allocptr_expr (loc, alloc_ptr);
+
   /* Cast the frame pointer to a pointer to promise so that the build op
      delete call will search the promise.  */
   tree pptr_type = build_pointer_type (promise_type);
-  tree frame_arg = build1_loc (loc, CONVERT_EXPR, pptr_type, coro_fp);
+  tree frame_arg = build1_loc (loc, CONVERT_EXPR, pptr_type, alloc_ptr);
   /* [dcl.fct.def.coroutine] / 12 sentence 3:
      If both a usual deallocation function with only a pointer parameter and
      a usual deallocation function with both a pointer parameter and a size
      parameter are found, then the selected deallocation function shall be the
      one with two parameters.  */
   tree del_coro_fr
-    = build_coroutine_op_delete_call (DELETE_EXPR, frame_arg, frame_size,
+    = build_coroutine_op_delete_call (DELETE_EXPR, frame_arg, alloc_size,
 				      /*global_p=*/false,  /*placement=*/NULL,
 				      /*alloc_fn=*/NULL, tf_warning_or_error);
   if (!del_coro_fr || del_coro_fr == error_mark_node)
@@ -4689,6 +4823,98 @@  build_coroutine_frame_delete_expr (tree coro_fp, tree frame_size,
   return del_coro_fr;
 }
 
+/* Function that converts the "resume" pointer passed to
+   __buitlin_coro_destroy/resume into its frame pointer, or vice-versa.
+
+   FP_TYPE is the frame pointer type.  It will be the result type of the
+   expression, unless BACKWARDS_P is true.
+
+   BACKWARDS_P specified whether we should take a resume pointer into a frame
+   pointer or vice-versa.
+
+   ORIG_PTR is either the resume or frame pointer, the latter iff BACKWARDS_P.
+
+   LOC is the location to use for the expression.
+
+   The result will be a void* iff BACKWARDS_P, otherwise FP_TYPE.  */
+static tree
+build_resume_to_frame_expr (location_t loc, tree fp_type, tree orig_ptr,
+			    bool backwards_p)
+{
+  gcc_assert (POINTER_TYPE_P (fp_type)
+	      && TREE_CODE (TREE_TYPE (fp_type)) == RECORD_TYPE
+	      && (same_type_p (TREE_TYPE (orig_ptr),
+			       (backwards_p ? fp_type
+				: ptr_type_node))));
+  tree resumeptr = lookup_member (TREE_TYPE (fp_type), coro_resume_fn_id,
+				  /*protect=*/1, /*want_type=*/0, tf_none);
+  gcc_assert (resumeptr && TREE_CODE (resumeptr) == FIELD_DECL);
+  if (compare_tree_int (bit_position (resumeptr), 0) == 0)
+    /* Simple case: just cast.  */
+    return convert (backwards_p ? ptr_type_node : fp_type, orig_ptr);
+
+  /* Otherwise, we need to do pointer math.  The expression we're building is,
+     depending on BACKWARDS_P is:
+
+     if BACKWARDS_P: (void*) ((char*) (orig_ptr)
+                              + offsetof (Frame, _Coro_resume_fn))
+     else:          (Frame*) ((char*) (orig_ptr)
+                              - offsetof (Frame, _Coro_resume_fn))  */
+  tree resume_off = byte_position (resumeptr);
+  tree charstar = build_pointer_type (char_type_node);
+  tree res = convert (charstar, orig_ptr);
+  res = cp_build_binary_op (loc, backwards_p ? PLUS_EXPR : MINUS_EXPR, res,
+                           resume_off, tf_none);
+  res = convert (backwards_p ? ptr_type_node : fp_type, res);
+  return res;
+}
+
+/* coroutine_info->allocptr_expr implementation for storing the allocation
+   pointer as a member.  */
+
+static tree
+allocptr_expr_member (location_t loc, tree frame_ptr)
+{
+  tree frame_ptr_type = TREE_TYPE (frame_ptr);
+  gcc_checking_assert (POINTER_TYPE_P (frame_ptr_type)
+		       && TREE_CODE (TREE_TYPE (frame_ptr_type)) == RECORD_TYPE);
+  tree frame_type = TREE_TYPE (frame_ptr_type);
+  tree allocptr = lookup_member (frame_type, coro_allocptr_id,
+				 /*protect=*/1, /*want_type=*/0,
+				 tf_warning_or_error);
+  gcc_checking_assert (allocptr && TREE_CODE (allocptr) == FIELD_DECL);
+  tree r = build_fold_indirect_ref_loc (loc, frame_ptr);
+  r = build3_loc (loc, COMPONENT_REF, TREE_TYPE (allocptr),
+		  r, allocptr, NULL_TREE);
+  gcc_checking_assert (lvalue_p (r));
+  return r;
+}
+
+/* coroutine_info->allocptr_expr implementation for storing the allocation
+   pointer following the frame type.  Used when there isn't enough room in the
+   frame type.  */
+
+static tree
+allocptr_expr_after_frame (location_t loc, tree frame_ptr)
+{
+  tree frame_ptr_type = TREE_TYPE (frame_ptr);
+  gcc_checking_assert (POINTER_TYPE_P (frame_ptr_type)
+		       && TREE_CODE (TREE_TYPE (frame_ptr_type)) == RECORD_TYPE);
+  tree frame_type = TREE_TYPE (frame_ptr_type);
+  tree charstar = build_pointer_type (char_type_node);
+  tree ptrptr_type = build_pointer_type (ptr_type_node);
+  /* We need unit addition here.  */
+  frame_ptr = convert (charstar, frame_ptr);
+
+  tree size = TYPE_SIZE_UNIT (frame_type);
+  tree addr_stor = cp_build_binary_op (loc, PLUS_EXPR, frame_ptr, size,
+                                      tf_error);
+  addr_stor = convert (ptrptr_type, addr_stor);
+  addr_stor = build_fold_indirect_ref_loc (loc, addr_stor);
+  gcc_checking_assert (lvalue_p (addr_stor));
+  return addr_stor;
+}
+
 /* Build the ramp function.
    Here we take the original function definition which has now had its body
    removed, and use it as the declaration of the ramp which both replaces the
@@ -4702,8 +4928,10 @@  build_coroutine_frame_delete_expr (tree coro_fp, tree frame_size,
 bool
 cp_coroutine_transform::build_ramp_function ()
 {
+  auto coro_info = get_coroutine_info (orig_fn_decl);
   gcc_checking_assert (current_binding_level
-		       && current_binding_level->kind == sk_function_parms);
+		       && current_binding_level->kind == sk_function_parms
+		       && coro_info);
 
   /* This is completely synthetic code, if we find an issue then we have not
      much chance to point at the most useful place in the user's code.  In
@@ -4751,8 +4979,6 @@  cp_coroutine_transform::build_ramp_function ()
   /* Check early for usable allocator/deallocator, without which we cannot
      build a useful ramp; early exit if they are not available or usable.  */
 
-  frame_size = TYPE_SIZE_UNIT (frame_type);
-
   /* Make a var to represent the frame pointer early.  Initialize to zero so
      that we can pass it to the IFN_CO_FRAME (to give that access to the frame
      type).  */
@@ -4760,9 +4986,20 @@  cp_coroutine_transform::build_ramp_function ()
 					    frame_ptr_type, orig_fn_decl,
 					    NULL_TREE);
 
+
+  /* The CO_FRAME internal function is a mechanism to allow the middle end
+     to adjust the allocation in response to optimizations.  We provide the
+     current conservative estimate of the frame size (as per the current)
+     computed layout.  */
+
+  alloc_size = build_call_expr_internal_loc (loc, IFN_CO_FRAME, size_type_node,
+					     2,
+					     TYPE_SIZE_UNIT (frame_type),
+					     build_zero_cst (frame_ptr_type));
   tree new_fn_call
     = build_coroutine_frame_alloc_expr (promise_type, orig_fn_decl, fn_start,
-					grooaf, &param_uses, frame_size);
+					grooaf, &param_uses, alloc_size,
+					frame_type);
 
   /* We must have a useable allocator to proceed.  */
   if (!new_fn_call || new_fn_call == error_mark_node)
@@ -4770,8 +5007,8 @@  cp_coroutine_transform::build_ramp_function ()
 
   /* Likewise, we need the DTOR to delete the frame.  */
   tree delete_frame_call
-    = build_coroutine_frame_delete_expr (coro_fp, frame_size, promise_type,
-					 fn_start);
+    = build_coroutine_frame_delete_expr (coro_fp, alloc_size, promise_type,
+					 fn_start, orig_fn_decl);
   if (!delete_frame_call || delete_frame_call == error_mark_node)
     return false;
 
@@ -4847,17 +5084,6 @@  cp_coroutine_transform::build_ramp_function ()
   add_decl_expr (coro_gro_live);
 
   /* Build the frame.  */
-
-  /* The CO_FRAME internal function is a mechanism to allow the middle end
-     to adjust the allocation in response to optimizations.  We provide the
-     current conservative estimate of the frame size (as per the current)
-     computed layout.  */
-
-  tree resizeable
-    = build_call_expr_internal_loc (loc, IFN_CO_FRAME, size_type_node, 2,
-				    frame_size,
-				    build_zero_cst (frame_ptr_type));
-  CALL_EXPR_ARG (new_fn_call, 0) = resizeable;
   tree allocated = build1 (CONVERT_EXPR, frame_ptr_type, new_fn_call);
   tree r = cp_build_init_expr (coro_fp, allocated);
   finish_expr_stmt (r);
@@ -5145,7 +5371,9 @@  cp_coroutine_transform::build_ramp_function ()
     }
 
   /* Start the coroutine body.  */
-  r = build_call_expr_loc (fn_start, resumer, 1, coro_fp);
+  r = build_call_expr_loc (fn_start, resumer, 1,
+			   build_resume_to_frame_expr (fn_start, frame_ptr_type,
+						       coro_fp, true));
   finish_expr_stmt (r);
 
   /* The ramp is done, we just need the return statement, which we build from
@@ -5308,6 +5536,8 @@  cp_coroutine_transform::~cp_coroutine_transform ()
  declare a dummy coro frame.
  struct _R_frame {
   using handle_type = coro::coroutine_handle<coro1::promise_type>;
+  void* _Coro_allocptr; optional, if there's room
+  char _Coro_padding[]; optional, if needed
   void (*_Coro_resume_fn)(_R_frame *);
   void (*_Coro_destroy_fn)(_R_frame *);
   coro1::promise_type _Coro_promise;
@@ -5347,10 +5577,10 @@  cp_coroutine_transform::apply_transforms ()
      see these.  */
   resumer
     = coro_build_actor_or_destroy_function (orig_fn_decl, act_des_fn_type,
-					    frame_ptr_type, true);
+					    true);
   destroyer
     = coro_build_actor_or_destroy_function (orig_fn_decl, act_des_fn_type,
-					    frame_ptr_type, false);
+					    false);
 
   /* Transform the function body as per [dcl.fct.def.coroutine] / 5.  */
   wrap_original_function_body ();
@@ -5360,19 +5590,104 @@  cp_coroutine_transform::apply_transforms ()
   cp_walk_tree (&coroutine_body, await_statement_walker, &body_aw_points, NULL);
   await_count = body_aw_points.await_number;
 
-  /* Determine the fields for the coroutine state.  */
   tree field_list = NULL_TREE;
+
+  /* Determine if we need some padding before our promise.  This padding is
+     added in order to support the __builtin_coro_{resume,destroy} assumption
+     of the resume/destroy function pointers being right before the promise in
+     the coroutine frame.  */
+  auto coro_info = get_coroutine_info (orig_fn_decl);
+  auto ptr_size = tree_to_uhwi (TYPE_SIZE_UNIT (ptr_type_node));
+  tree promise_type = get_coroutine_promise_type (orig_fn_decl);
+  auto padding_size = TYPE_ALIGN_UNIT (promise_type);
+
+  /* We have two or three pointers to place before the promise type.  Two are
+     mandatory, the _Coro_resume_fn and _Coro_destroy_fn, the third one is
+     called _Coro_allocptr and is only wanted if it fits.  In addition, the
+     resume and destroy function pointers must precede the _Coro_promise
+     directly, so we may need to insert extra padding to make that happen.  */
+  if (padding_size >= 3 * ptr_size)
+    {
+      /* Enough room for all three pointers, and then some.  Inject first the
+	 alloc pointer.  */
+      tree allocptr = build_lang_decl (FIELD_DECL, coro_allocptr_id,
+				       ptr_type_node);
+      DECL_ARTIFICIAL (allocptr) = true;
+      DECL_CHAIN (allocptr) = field_list;
+      field_list = allocptr;
+
+      /* Tell the (de)allocation code where to store the allocated pointer.  */
+      coro_info->allocptr_expr = allocptr_expr_member;
+
+      /* We've taken up some of the room needed for padding.  */
+      padding_size -= ptr_size;
+    }
+  if (padding_size > 2 * ptr_size)
+    {
+      padding_size -= 2 * ptr_size;
+      /* We need to insert some padding to keep the pointers in the right place
+	 before the promise.  */
+      tree padding_type = build_array_of_n_type (char_type_node, padding_size);
+      tree padding = build_lang_decl (FIELD_DECL, coro_padding_id,
+				      padding_type);
+      DECL_ARTIFICIAL (padding) = true;
+      DECL_CHAIN (padding) = field_list;
+      field_list = padding;
+    }
+
+  /* Determine the fields for the coroutine state.  */
   local_vars_frame_data local_vars_data (&field_list, &local_var_uses);
   cp_walk_tree (&coroutine_body, register_local_var_uses, &local_vars_data, NULL);
 
   /* Conservative computation of the coroutine frame content.  */
   frame_type = begin_class_definition (frame_type);
+
   TYPE_FIELDS (frame_type) = field_list;
   TYPE_BINFO (frame_type) = make_tree_binfo (0);
   BINFO_OFFSET (TYPE_BINFO (frame_type)) = size_zero_node;
   BINFO_TYPE (TYPE_BINFO (frame_type)) = frame_type;
   frame_type = finish_struct (frame_type, NULL_TREE);
 
+  if (CHECKING_P)
+    {
+      /* Verify the ABI-relevant bits of the layout of the produced frame
+	 struct.  */
+      auto frame_offset = [&] (tree memb)
+      {
+	tree member = lookup_member (frame_type, memb, 0, false,
+				     tf_none, nullptr);
+	gcc_assert (member && TREE_CODE (member) == FIELD_DECL);
+	auto pos = int_bit_position (member);
+	gcc_assert (pos >= 0);
+	return static_cast <unsigned HOST_WIDE_INT> (pos);
+      };
+      auto coro_promise_off = frame_offset (coro_promise_id);
+      auto coro_resumer_off = frame_offset (coro_resume_fn_id);
+      auto coro_destroyer_off = frame_offset (coro_destroy_fn_id);
+      auto ptr_size_bit = tree_to_uhwi (TYPE_SIZE (ptr_type_node));
+      gcc_assert (coro_promise_off == coro_resumer_off + 2 * ptr_size_bit);
+      gcc_assert (coro_promise_off == coro_destroyer_off + ptr_size_bit);
+    }
+
+  /* Handle frame overalignment.  'malloc' and 'operator new' only guarantee us
+     MALLOC_ABI_ALIGNMENT alignment.  */
+  auto frame_alignment = TYPE_ALIGN (frame_type);
+  if (frame_alignment <= MALLOC_ABI_ALIGNMENT)
+    {
+      /* Reset allocptr_expr if it was set above.  We don't need to store the
+	 allocation pointer.  */
+      coro_info->allocptr_expr = nullptr;
+      coro_info->alloc_store_pointer = false;
+    }
+  else if (!coro_info->allocptr_expr)
+    {
+      /* This wasn't set above, meaning we must store past the frame as we
+	 couldn't figure out a better strategy.  */
+      coro_info->allocptr_expr = allocptr_expr_after_frame;
+      coro_info->alloc_store_pointer = true;
+      gcc_checking_assert (TYPE_ALIGN (ptr_type_node) <= frame_alignment);
+    }
+
   valid_coroutine = build_ramp_function ();
   coro_maybe_dump_ramp (orig_fn_decl);
 }
@@ -5389,12 +5704,13 @@  cp_coroutine_transform::finish_transforms ()
   current_function_decl = resumer;
   build_actor_fn (fn_start, frame_type, resumer, coroutine_body, orig_fn_decl,
 		  &local_var_uses, &suspend_points, &param_dtor_list,
-		  resume_idx_var, await_count, frame_size, inline_p);
+		  resume_idx_var, await_count, unshare_expr (alloc_size),
+		  inline_p);
 
   current_function_decl = destroyer;
   build_destroy_fn (fn_start, frame_type, destroyer, resumer, inline_p);
 
-  coro_maybe_dump_transformed_functions (resumer, destroyer);
+  coro_maybe_dump_transformed_functions (resumer, destroyer, frame_type);
 }
 
 #include "gt-cp-coroutines.h"
diff --git a/gcc/cp/coroutines.h b/gcc/cp/coroutines.h
index d13bea0f302b..f2d0088a8f9e 100644
--- a/gcc/cp/coroutines.h
+++ b/gcc/cp/coroutines.h
@@ -121,7 +121,7 @@  private:
   hash_map<tree, suspend_point_info> suspend_points;
   hash_map<tree, local_var_info> local_var_uses;
   vec<tree> param_dtor_list = vNULL;
-  tree frame_size = NULL_TREE;
+  tree alloc_size = NULL_TREE;
   unsigned int await_count = 0;
 
   bool inline_p = false;
diff --git a/gcc/testsuite/g++.dg/coroutines/frame-alignment-4.C b/gcc/testsuite/g++.dg/coroutines/frame-alignment-4.C
new file mode 100644
index 000000000000..fab3adfdbf99
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/frame-alignment-4.C
@@ -0,0 +1,35 @@ 
+// Verify that frames are given allocptrs if there's sufficient room.
+// { dg-additional-options "-Wp,-fdump-lang-coro" }
+#include <coroutine>
+#include <algorithm>
+#include <cstddef>
+
+namespace {
+  constexpr std::size_t max (std::size_t a, std::size_t b)
+  { return std::max (a, b); }
+}
+
+struct task
+{
+  /* If the promise alignment is 4*sizeof(void*), then there is room to fit an
+     allocptr.  If that's somehow less than the alignment 'new' provides, we
+     need to align to more than that in order to force using the allocptr.  */
+  struct alignas(max (2 * __STDCPP_DEFAULT_NEW_ALIGNMENT__, 4 * sizeof (void*))) promise_type
+  {
+    task get_return_object () { return {}; }
+    void unhandled_exception () noexcept {}
+    std::suspend_never initial_suspend () { return {}; }
+    std::suspend_never final_suspend () noexcept { return {}; }
+    void return_void () {}
+  };
+};
+
+task
+foo ()
+{ co_return; }
+
+// { dg-final { scan-lang-dump "\n_Z.*\\.Frame\\s*\\{\[^\}\]*_Coro_allocptr\[^\}\]*\\}" coro } }
+// Check that we're actually using it
+// { dg-final { scan-lang-dump "_Coro_allocptr\\s*=" coro } }
+// ... and that we're cleaning it (twice, once in ramp and once in actor)
+// { dg-final { scan-lang-dump-times "operator\\s+delete\\s*\\(\[^\n\]+_Coro_allocptr" 2 coro } }
diff --git a/gcc/testsuite/g++.dg/coroutines/torture/frame-alignment-1.C b/gcc/testsuite/g++.dg/coroutines/torture/frame-alignment-1.C
new file mode 100644
index 000000000000..11ca9ee77c13
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/torture/frame-alignment-1.C
@@ -0,0 +1,73 @@ 
+// { dg-do run }
+#include <cstdint>
+#include <cassert>
+#include <exception>
+#include <coroutine>
+#include <unordered_map>
+
+/* This test checks that our synthesized coroutine allocation correctly aligns
+   the promise type inside the coroutine frame.  It does so by altering
+   operator new() to provide minimum possible alignment and then checking
+   whether, despite that, promise_type has a well-allocaed 'this'.  Since it
+   was convenient, it also verifies that we deallocate all addresses we
+   allocate.  */
+
+static std::unordered_map<void*, std::ptrdiff_t> off_map;
+static std::unordered_map<void*, std::size_t> sz_map;
+
+#pragma GCC diagnostic ignored "-Wpointer-arith"
+
+static constexpr auto overalignment = 2 * __STDCPP_DEFAULT_NEW_ALIGNMENT__;
+
+struct task
+{
+  struct alignas(overalignment) promise_type
+  {
+    promise_type ()
+    {
+      if (((std::uintptr_t)this) % overalignment)
+	std::terminate ();
+    }
+
+    task get_return_object () noexcept { return {}; }
+    void unhandled_exception () noexcept {}
+    std::suspend_never initial_suspend () { return {}; }
+    std::suspend_never final_suspend () noexcept { return {}; }
+    void return_void () {}
+
+    void
+    operator delete (void* ptr, std::size_t sz)
+    {
+      auto off = off_map.at (ptr);
+      off_map.erase (ptr);
+      ::operator delete (ptr - off, sz + __STDCPP_DEFAULT_NEW_ALIGNMENT__);
+    }
+
+    void*
+    operator new (std::size_t sz)
+    {
+      auto x = ::operator new (sz + __STDCPP_DEFAULT_NEW_ALIGNMENT__);
+      std::ptrdiff_t off = 0;
+
+      if (((std::uintptr_t)x) % overalignment == 0)
+	x += (off = __STDCPP_DEFAULT_NEW_ALIGNMENT__);
+
+      off_map.emplace (x, off);
+      return x;
+    }
+  };
+};
+
+task
+foo ()
+{
+  co_return;
+}
+
+int
+main ()
+{
+  foo ();
+
+  assert (off_map.empty ());
+}
diff --git a/gcc/testsuite/g++.dg/coroutines/torture/frame-alignment-2.C b/gcc/testsuite/g++.dg/coroutines/torture/frame-alignment-2.C
new file mode 100644
index 000000000000..8d99c5137737
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/torture/frame-alignment-2.C
@@ -0,0 +1,101 @@ 
+// { dg-do run }
+// { dg-additional-options "-Wp,-fdump-lang-coro" }
+#include <cstdint>
+#include <cassert>
+#include <exception>
+#include <coroutine>
+#include <unordered_map>
+
+/* This test checks that our synthesized coroutine allocation correctly aligns
+   the promise type inside the coroutine frame.  It does so by altering
+   operator new() to provide minimum possible alignment and then checking
+   whether, despite that, an automatic storage duration variable moved into the
+   frame is correctly aligned.  */
+
+static std::unordered_map<void*, std::ptrdiff_t> off_map;
+static std::unordered_map<void*, std::size_t> osz_map;
+
+#pragma GCC diagnostic ignored "-Wpointer-arith"
+
+static constexpr auto overalignment = 2 * __STDCPP_DEFAULT_NEW_ALIGNMENT__;
+
+struct task
+{
+  struct promise_type
+  {
+    promise_type()
+    {}
+
+    task get_return_object () noexcept {
+      return {std::coroutine_handle<promise_type>::from_promise (*this)};
+    }
+    void unhandled_exception () noexcept {}
+    std::suspend_never initial_suspend () { return {}; }
+    std::suspend_never final_suspend () noexcept { return {}; }
+    void return_void () {}
+
+    void
+    operator delete (void* ptr, std::size_t sz)
+    {
+      auto off = off_map.at (ptr);
+      auto osz = osz_map.at (ptr);
+      assert (osz == sz);
+      off_map.erase (ptr);
+      ::operator delete (ptr - off, sz + __STDCPP_DEFAULT_NEW_ALIGNMENT__);
+    }
+
+    void*
+    operator new (std::size_t sz)
+    {
+      auto x = ::operator new (sz + __STDCPP_DEFAULT_NEW_ALIGNMENT__);
+      std::ptrdiff_t off = 0;
+
+      if (((std::uintptr_t)x) % overalignment == 0)
+	x += (off = __STDCPP_DEFAULT_NEW_ALIGNMENT__);
+
+      off_map.emplace (x, off);
+      osz_map.emplace (x, sz);
+      return x;
+    }
+  };
+  std::coroutine_handle<promise_type> ch;
+};
+
+static bool ov_allocd = false;
+
+struct alignas (overalignment) overaligned
+{
+  overaligned()
+  {
+    auto ithis = reinterpret_cast<std::uintptr_t> (this);
+    if (ithis % overalignment)
+      std::terminate ();
+    ov_allocd = true;
+  }
+};
+
+task
+foo ()
+{
+  overaligned ov_var;
+  co_await std::suspend_always{};
+  co_return;
+}
+
+int
+main ()
+{
+  auto x = foo ();
+  x.ch ();
+
+  assert (off_map.empty ());
+
+  /* See note above.  */
+  assert (ov_allocd);
+}
+
+// Ensure that we're actually testing something.
+// { dg-final { scan-lang-dump "\n_Z.*\\.Frame\\s*\\{\[^\}\]*ov_var\[^\}\]*\\}" coro } }
+// ... and that we didn't somehow gain enough room in the padding to use the
+// _Coro_allocptr allocation pointer storage
+// { dg-final { scan-lang-dump-not "\n_Z.*\\.Frame\\s*\\{\[^\}\]*_Coro_allocptr\[^\}\]*\\}" coro } }
diff --git a/gcc/testsuite/g++.dg/coroutines/torture/frame-alignment-3.C b/gcc/testsuite/g++.dg/coroutines/torture/frame-alignment-3.C
new file mode 100644
index 000000000000..4b662753e029
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/torture/frame-alignment-3.C
@@ -0,0 +1,87 @@ 
+// { dg-do run }
+// { dg-additional-options "-Wp,-fdump-lang-coro" }
+#include <cstdint>
+#include <cassert>
+#include <exception>
+#include <coroutine>
+#include <unordered_map>
+
+/* This test checks that our synthesized coroutine allocation correctly handles
+   the padding at the start of frames.  */
+
+static std::unordered_map<void*, std::ptrdiff_t> off_map;
+static std::unordered_map<void*, std::size_t> sz_map;
+
+#pragma GCC diagnostic ignored "-Wpointer-arith"
+
+static constexpr auto overalignment = 2 * __STDCPP_DEFAULT_NEW_ALIGNMENT__;
+
+struct task
+{
+  struct alignas(overalignment) promise_type
+  {
+    promise_type ()
+    {
+      if (((std::uintptr_t)this) % overalignment)
+	std::terminate ();
+    }
+
+    task get_return_object () noexcept {
+      return { std::coroutine_handle<promise_type>::from_promise (*this) };
+    }
+    void unhandled_exception () noexcept {}
+    std::suspend_never initial_suspend () { return {}; }
+    std::suspend_never final_suspend () noexcept { return {}; }
+    void return_void () {}
+
+    void
+    operator delete (void* ptr, std::size_t sz)
+    {
+      auto off = off_map.at (ptr);
+      off_map.erase (ptr);
+      ::operator delete (ptr - off, sz + __STDCPP_DEFAULT_NEW_ALIGNMENT__);
+    }
+
+    void*
+    operator new (std::size_t sz)
+    {
+      auto x = ::operator new (sz + __STDCPP_DEFAULT_NEW_ALIGNMENT__);
+      std::ptrdiff_t off = 0;
+
+      if (((std::uintptr_t)x) % overalignment == 0)
+	x += (off = __STDCPP_DEFAULT_NEW_ALIGNMENT__);
+
+      off_map.emplace (x, off);
+      return x;
+    }
+  };
+
+  std::coroutine_handle<promise_type> handle;
+};
+
+static int* aa_loc;
+
+task
+foo ()
+{
+  int aa;
+  aa_loc = &aa;
+  co_await std::suspend_always{};
+  assert (&aa == aa_loc);
+  co_return;
+}
+
+int
+main ()
+{
+  auto ro = foo ();
+  ro.handle ();
+
+  auto ro2 = foo ();
+  ro2.handle.destroy ();
+
+  assert (off_map.empty ());
+}
+
+// Ensure that we're actually testing something.
+// { dg-final { scan-lang-dump "\n_Z.*\\.Frame\\s*\\{\[^\}\]*_Coro_padding\[^\}\]*\\}" coro } }