[PING^3,v2,1/1,AARCH64,PR102768] aarch64: Add compiler support for Shadow Call Stack

Message ID 81d54b71-7c9c-47ef-ac8d-72aae46cd4ee@linux.alibaba.com
State New
Headers
Series [PING^3,v2,1/1,AARCH64,PR102768] aarch64: Add compiler support for Shadow Call Stack |

Commit Message

Dan Li Jan. 19, 2022, 2:43 a.m. UTC
  Gentile ping for this again.
Could someone please give me a quick reply first? :), thanks.

Link: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586204.html

> Shadow Call Stack can be used to protect the return address of a
> function at runtime, and clang already supports this feature[1].
> 
> To enable SCS in user mode, in addition to compiler, other support
> is also required (as discussed in [2]). This patch only adds basic
> support for SCS from the compiler side, and provides convenience
> for users to enable SCS.
> 
> For linux kernel, only the support of the compiler is required.
> 
> [1] https://clang.llvm.org/docs/ShadowCallStack.html
> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768
> 
Signed-off-by: Dan Li <ashimida@linux.alibaba.com>

gcc/c-family/ChangeLog:

	* c-attribs.c (handle_no_sanitize_shadow_call_stack_attribute):
	New.

gcc/ChangeLog:

	* config/aarch64/aarch64-protos.h (aarch64_shadow_call_stack_enabled):
	New decl.
	* config/aarch64/aarch64.c (aarch64_shadow_call_stack_enabled):
	New.
	(aarch64_expand_prologue):	Push x30 onto SCS before it's
	pushed onto stack.
	(aarch64_expand_epilogue):	Pop x30 frome SCS.
	* config/aarch64/aarch64.h (TARGET_SUPPORT_SHADOW_CALL_STACK):
	New macro.
	(TARGET_CHECK_SCS_RESERVED_REGISTER):	Likewise.
	* config/aarch64/aarch64.md (scs_push):	New template.
	(scs_pop):	Likewise.
	* defaults.h (TARGET_SUPPORT_SHADOW_CALL_STACK):	New macro.
	* doc/extend.texi:	Document -fsanitize=shadow-call-stack.
	* doc/invoke.texi:	Document attribute.
	* flag-types.h (enum sanitize_code):	Add
	SANITIZE_SHADOW_CALL_STACK.
	* opts-global.c (handle_common_deferred_options):	Add SCS
	compile option check.
	* opts.c (finish_options):	Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/shadow_call_stack_1.c: New test.
	* gcc.target/aarch64/shadow_call_stack_2.c: New test.
	* gcc.target/aarch64/shadow_call_stack_3.c: New test.
	* gcc.target/aarch64/shadow_call_stack_4.c: New test.
---
  gcc/c-family/c-attribs.c                      | 21 +++++++++
  gcc/config/aarch64/aarch64-protos.h           |  1 +
  gcc/config/aarch64/aarch64.c                  | 27 +++++++++++
  gcc/config/aarch64/aarch64.h                  | 11 +++++
  gcc/config/aarch64/aarch64.md                 | 18 ++++++++
  gcc/defaults.h                                |  4 ++
  gcc/doc/extend.texi                           |  7 +++
  gcc/doc/invoke.texi                           | 29 ++++++++++++
  gcc/flag-types.h                              |  2 +
  gcc/opts-global.c                             |  6 +++
  gcc/opts.c                                    | 12 +++++
  .../gcc.target/aarch64/shadow_call_stack_1.c  |  6 +++
  .../gcc.target/aarch64/shadow_call_stack_2.c  |  6 +++
  .../gcc.target/aarch64/shadow_call_stack_3.c  | 45 +++++++++++++++++++
  .../gcc.target/aarch64/shadow_call_stack_4.c  | 18 ++++++++
  15 files changed, 213 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c
  

Comments

Richard Sandiford Jan. 20, 2022, 12:02 p.m. UTC | #1
Thanks for the patch and sorry for the (very) slow review.

Dan Li <ashimida@linux.alibaba.com> writes:
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index 007b928c54b..9b3a35c06bf 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -56,6 +56,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, bool *);
>   static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
>   static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
>   						  int, bool *);
> +static tree handle_no_sanitize_shadow_call_stack_attribute (tree *, tree,
> +						  tree, int, bool *);
>   static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
>   						 int, bool *);
>   static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
> @@ -454,6 +456,10 @@ const struct attribute_spec c_common_attribute_table[] =
>   			      handle_no_sanitize_attribute, NULL },
>     { "no_sanitize_address",    0, 0, true, false, false, false,
>   			      handle_no_sanitize_address_attribute, NULL },
> +  { "no_sanitize_shadow_call_stack",
> +			      0, 0, true, false, false, false,
> +			      handle_no_sanitize_shadow_call_stack_attribute,
> +			      NULL },
>     { "no_sanitize_thread",     0, 0, true, false, false, false,
>   			      handle_no_sanitize_thread_attribute, NULL },
>     { "no_sanitize_undefined",  0, 0, true, false, false, false,
> @@ -1175,6 +1181,21 @@ handle_no_sanitize_address_attribute (tree *node, tree name, tree, int,
>     return NULL_TREE;
>   }
>   
> +/* Handle a "no_sanitize_shadow_call_stack" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +static tree
> +handle_no_sanitize_shadow_call_stack_attribute (tree *node, tree name,
> +				      tree, int, bool *no_add_attrs)
> +{
> +  *no_add_attrs = true;
> +  if (TREE_CODE (*node) != FUNCTION_DECL)
> +    warning (OPT_Wattributes, "%qE attribute ignored", name);
> +  else
> +    add_no_sanitize_value (*node, SANITIZE_SHADOW_CALL_STACK);
> +
> +  return NULL_TREE;
> +}
> +

Do we need this?  I think these days the preference is to use the
general "no_sanitize" attribute with an argument, which is also the
syntax documented on the clang page.

We have to support no_sanitize_foo attributes for some of the early
sanitiser features, to avoid breaking backwards compatibility, but it
doesn't look like clang ever supported no_sanitize_shadow_call_stack.

>   /* Handle a "no_sanitize_thread" attribute; arguments as in
>      struct attribute_spec.handler.  */
>   
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index 768e8fae136..150c015df21 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -893,6 +893,7 @@ void aarch64_register_pragmas (void);
>   void aarch64_relayout_simd_types (void);
>   void aarch64_reset_previous_fndecl (void);
>   bool aarch64_return_address_signing_enabled (void);
> +bool aarch64_shadow_call_stack_enabled (void);
>   bool aarch64_bti_enabled (void);
>   void aarch64_save_restore_target_globals (tree);
>   void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 699c105a42a..5a36a459f4e 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -79,6 +79,7 @@
>   #include "tree-ssa-loop-niter.h"
>   #include "fractional-cost.h"
>   #include "rtlanal.h"
> +#include "asan.h"
>   
>   /* This file should be included last.  */
>   #include "target-def.h"
> @@ -7799,6 +7800,24 @@ aarch64_return_address_signing_enabled (void)
>   	      && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0)));
>   }
>   
> +/* Return TRUE if shadow call stack should be enabled for the current
> +   function, otherwise return FALSE.  */
> +
> +bool
> +aarch64_shadow_call_stack_enabled (void)
> +{
> +  /* This function should only be called after frame laid out.  */
> +  gcc_assert (cfun->machine->frame.laid_out);
> +
> +  if (crtl->calls_eh_return)
> +    return false;
> +
> +  /* We only deal with a function if its LR is pushed onto stack
> +     and attribute no_sanitize_shadow_call_stack is not specified.  */

(This would need to be updated if we do drop the specific attribute.)

> +  return (sanitize_flags_p (SANITIZE_SHADOW_CALL_STACK)
> +	  && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0));
> +}
> +
>   /* Return TRUE if Branch Target Identification Mechanism is enabled.  */
>   bool
>   aarch64_bti_enabled (void)
> @@ -8810,6 +8829,10 @@ aarch64_expand_prologue (void)
>         RTX_FRAME_RELATED_P (insn) = 1;
>       }
>   
> +  /* Push return address to shadow call stack.  */
> +  if (aarch64_shadow_call_stack_enabled ())
> +	emit_insn (gen_scs_push ());

Formatting nit: should only be indented by two spaces more than the “if”.
Same below.

I guess doing it like this means that we also continue to store x30 to the
frame in the traditional way.  And that's probably necessary, given that
the saved x30 forms part of link chain.

> +
>     if (flag_stack_usage_info)
>       current_function_static_stack_size = constant_lower_bound (frame_size);
>   
> @@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)
>         RTX_FRAME_RELATED_P (insn) = 1;
>       }
>   
> +  /* Pop return address from shadow call stack.  */
> +  if (aarch64_shadow_call_stack_enabled ())
> +	emit_insn (gen_scs_pop ());
> +

This looks correct, but following on from the above, I guess we continue
to restore x30 from the frame in the traditional way first, and then
overwrite it with the SCS value.  I think the output code would be
slightly neater if we suppressed the first restore of x30.

>     /* We prefer to emit the combined return/authenticate instruction RETAA,
>        however there are three cases in which we must instead emit an explicit
>        authentication instruction.
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 2792bb29adb..bdd50a8d0a1 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -100,6 +100,17 @@
>      generating stack clash probes.  */
>   #define STACK_CLASH_MAX_UNROLL_PAGES 4
>   
> +/* This value represents whether the shadow call stack is implemented on
> +   the target platform.  */
> +#define TARGET_SUPPORT_SHADOW_CALL_STACK true
> +
> +#define TARGET_CHECK_SCS_RESERVED_REGISTER()	\
> +  do {						\
> +    if (!fixed_regs[R18_REGNUM])		\
> +      error ("%<-fsanitize=shadow-call-stack%> only "	\
> +	     "allowed with %<-ffixed-x18%>");	\
> +  } while (0)
> +

Please make these target hooks instead.  The first one can use DEFHOOKPOD.

>   /* The architecture reserves all bits of the address for hardware use,
>      so the vbit must go into the delta field of pointers to member
>      functions.  This is the same config as that in the AArch32
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 1a39470a1fe..8e68a6f793d 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -6994,6 +6994,24 @@ (define_insn "xpaclri"
>     "hint\t7 // xpaclri"
>   )
>   
> +;; Save X30 in the X18-based POST_INC stack (consistent with clang).
> +(define_insn "scs_push"
> +  [(set (mem:DI (reg:DI R18_REGNUM)) (reg:DI R30_REGNUM))
> +   (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int 8)))]
> +  ""
> +  "str\\tx30, [x18], #8"
> +  [(set_attr "type" "store_8")]
> +)
> +
> +;; Load X30 form the X18-based PRE_DEC stack (consistent with clang).
> +(define_insn "scs_pop"
> +  [(set (reg:DI R18_REGNUM) (minus:DI (reg:DI R18_REGNUM) (const_int 8)))
> +   (set (reg:DI R30_REGNUM) (mem:DI (reg:DI R18_REGNUM)))]
> +  ""
> +  "ldr\\tx30, [x18, #-8]!"
> +  [(set_attr "type" "load_8")]
> +)
> +

The two SETs here work in parallel, with the define_insn as a whole
following a "read input operands, act, write output operands" sequence.
The (mem: …) address therefore sees the old value of r18 rather than the
new one.  Also, RTL rules say that subtractions need to be written as
additions of the negative.

I think the pattern would therefore be something like:

  [(set (reg:DI R30_REGNUM) (mem:DI (plus:DI (reg:DI R18_REGNUM)
					     (const_int -8))))]
   (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int -8)))]

However, since these are normal moves, I think we should be able to use
the standard move patterns.  E.g. the push could be:

(define_expand "scs_push"
  [(set (mem:DI (pre_dec:DI (reg:DI R18_REGNUM)))
        (reg:DI R30_REGNUM))])

and similarly for the pop.

>   ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>   ;; all of memory.  This blocks insns from being moved across this point.
>   
> diff --git a/gcc/defaults.h b/gcc/defaults.h
> index bb68d0d1a79..0f1719a3bb5 100644
> --- a/gcc/defaults.h
> +++ b/gcc/defaults.h
> @@ -1172,6 +1172,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>   #define PCC_BITFIELD_TYPE_MATTERS false
>   #endif
>   
> +#ifndef TARGET_SUPPORT_SHADOW_CALL_STACK
> +#define TARGET_SUPPORT_SHADOW_CALL_STACK false
> +#endif
> +
>   #ifndef INSN_SETS_ARE_DELAYED
>   #define INSN_SETS_ARE_DELAYED(INSN) false
>   #endif
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index eee4c6737bb..16c3fc367fa 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3453,6 +3453,13 @@ The @code{no_address_safety_analysis} is a deprecated alias of the
>   @code{no_sanitize_address} attribute, new code should use
>   @code{no_sanitize_address}.
>   
> +@item no_sanitize_shadow_call_stack
> +@cindex @code{no_sanitize_shadow_call_stack} function attribute
> +The @code{no_sanitize_shadow_call_stack} attribute on functions is used
> +to inform the compiler that it should not instrument shadow call stack
> +push/pop instructions in the function when compiling with the
> +@option{-fsanitize=shadow-call-stack} option.
> +
>   @item no_sanitize_thread
>   @cindex @code{no_sanitize_thread} function attribute
>   The @code{no_sanitize_thread} attribute on functions is used
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 71992b8c597..2c134c68e83 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -15224,6 +15224,35 @@ add @code{detect_invalid_pointer_pairs=2} to the environment variable
>   @env{ASAN_OPTIONS}. Using @code{detect_invalid_pointer_pairs=1} detects
>   invalid operation only when both pointers are non-null.
>   
> +@item -fsanitize=shadow-call-stack
> +@opindex fsanitize=shadow-call-stack
> +Enable ShadowCallStack, a security enhancement mechanism used to protect
> +programs against return address overwrites (e.g. stack buffer overflows.)
> +It works by saving a function's return address to a separately allocated
> +shadow call stack in the function prologue and restore the return address

typo: s/restore/restoring/

> +from the shadow call stack in the function epilogue.  Instrumentation only
> +occurs in functions that need to save the return address to the stack.
> +
> +Currently it only supports the aarch64 platform and only works fine in
> +the linux kernel that implements CONFIG_SHADOW_CALL_STACK.

Maybe:

  Currently it only supports the aarch64 platform.  It is specifically
  designed for linux kernels that enable the CONFIG_SHADOW_CALL_STACK option.

> +For the user space programs, runtime support is not currently provided
> +in libc and libgcc.  Users who want to use this feature in user space need
> +to provide their own support for the runtime.  It should be noted that
> +this may cause the ABI rules to be broken.
> +
> +On aarch64, the instrumentation makes use of the platform register 'x18'.

texinfo formatting would be @code{x18}.  (Hopefully we'll move rst soon!)

> +This generally means that any code that may run on the same thread as code
> +compiled with ShadowCallStack must be compiled with the flag
> +@option{-ffixed-x18}, otherwise functions compiled without
> +@option{-ffixed-x18} may clobber x18 and break scs.

Maybe:

  might clobber @code{x18} and so corrupt the shadow stack pointer.

> +
> +And also because there is no runtime support, the code compiled with
> +ShadowCallStack cannot use exception handling, @option{-fno-exceptions}
> +also needs to be specified.

Maybe:

  Also, because there is no userspace runtime support, code compiled with
  ShadowCallStack cannot use exception handling.  Use @option{-fno-exceptions}
  to turn off exceptions.

> +
> +See @uref{https://clang.llvm.org/docs/ShadowCallStack.html} for more
> +details.
> +

This is a really nice write-up though, thanks.

>   @item -fsanitize=thread
>   @opindex fsanitize=thread
>   Enable ThreadSanitizer, a fast data race detector.
> diff --git a/gcc/flag-types.h b/gcc/flag-types.h
> index a5a637160d7..c22ef35a289 100644
> --- a/gcc/flag-types.h
> +++ b/gcc/flag-types.h
> @@ -321,6 +321,8 @@ enum sanitize_code {
>     SANITIZE_HWADDRESS = 1UL << 28,
>     SANITIZE_USER_HWADDRESS = 1UL << 29,
>     SANITIZE_KERNEL_HWADDRESS = 1UL << 30,
> +  /* Shadow Call Stack.  */
> +  SANITIZE_SHADOW_CALL_STACK = 1UL << 31,

Ooh, down to the last bit :-)

>     SANITIZE_SHIFT = SANITIZE_SHIFT_BASE | SANITIZE_SHIFT_EXPONENT,
>     SANITIZE_UNDEFINED = SANITIZE_SHIFT | SANITIZE_DIVIDE | SANITIZE_UNREACHABLE
>   		       | SANITIZE_VLA | SANITIZE_NULL | SANITIZE_RETURN
> diff --git a/gcc/opts-global.c b/gcc/opts-global.c
> index 55273822ec5..98e1d636f7f 100644
> --- a/gcc/opts-global.c
> +++ b/gcc/opts-global.c
> @@ -477,4 +477,10 @@ handle_common_deferred_options (void)
>   	  gcc_unreachable ();
>   	}
>       }
> +
> +#ifdef TARGET_CHECK_SCS_RESERVED_REGISTER
> +  if (flag_sanitize & SANITIZE_SHADOW_CALL_STACK)
> +    TARGET_CHECK_SCS_RESERVED_REGISTER ();
> +#endif
> +

Actually (after saying this should move from being a macro to a hook):
maybe we can keep it in target-specific code instead.  We already have
similar errors in aarch64_override_options_internal.

(That comment only applies to this macro/hook.  We still want a hook for
TARGET_SUPPORT_SHADOW_CALL_STACK.)

>   }
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 4472cec1b98..fc84a07a4ab 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -1308,6 +1308,17 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set,
>       sorry ("transactional memory is not supported with "
>   	   "%<-fsanitize=kernel-address%>");
>   
> +  if (opts->x_flag_sanitize & SANITIZE_SHADOW_CALL_STACK)
> +    {
> +      if (!TARGET_SUPPORT_SHADOW_CALL_STACK)
> +	error_at (loc, "%<-fsanitize=shadow-call-stack%> not supported "
> +		  "in current platform");

I think this should be a sorry() instead.  It changes the diagnostic
prefix to “sorry, not implemented”, so that the compiler admits that
this is a missing feature rather than the user doing something wrong.

> +
> +      if (opts->x_flag_exceptions)
> +	error_at (loc, "%<-fsanitize=shadow-call-stack%> only allowed "
> +		  "with %<-fno-exceptions%>");

Maybe better as an “else”, since reporting the second error seems
redundant after reporting the first error.

maybe: s/only allowed with/requires/

Arguably this could be a sorry() too, but it's not as clear.  Let's stick
with error_at for now.

Thanks,
Richard

> +    }
> +
>     /* Currently live patching is not support for LTO.  */
>     if (opts->x_flag_live_patching && opts->x_flag_lto)
>       sorry ("live patching is not supported with LTO");
> @@ -1994,6 +2005,7 @@ const struct sanitizer_opts_s sanitizer_opts[] =
>     SANITIZER_OPT (vptr, SANITIZE_VPTR, true),
>     SANITIZER_OPT (pointer-overflow, SANITIZE_POINTER_OVERFLOW, true),
>     SANITIZER_OPT (builtin, SANITIZE_BUILTIN, true),
> +  SANITIZER_OPT (shadow-call-stack, SANITIZE_SHADOW_CALL_STACK, false),
>     SANITIZER_OPT (all, ~0U, true),
>   #undef SANITIZER_OPT
>     { NULL, 0U, 0UL, false }
> diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
> new file mode 100644
> index 00000000000..b8234ca0677
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=shadow-call-stack" } */
> +
> +int i;
> +
> +/* { dg-error "'-fsanitize=shadow-call-stack' only allowed with '-ffixed-x18'" "" {target "aarch64*-*-*" } 0 } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
> new file mode 100644
> index 00000000000..b0c663f26d3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=shadow-call-stack -ffixed-x18 -fexceptions" } */
> +
> +int i;
> +
> +/* { dg-error "'-fsanitize=shadow-call-stack' only allowed with '-fno-exceptions'" "" {target "aarch64*-*-*" } 0 } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
> new file mode 100644
> index 00000000000..511b5f0d693
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
> @@ -0,0 +1,45 @@
> +/* Testing shadow call stack.  */
> +/* scs_push: str x30, [x18], #8 */
> +/* scs_pop: ldr x30, [x18, #-8]! */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fsanitize=shadow-call-stack -ffixed-x18 -fno-exceptions" } */
> +
> +int foo (int);
> +
> +/* function not use x30.  */
> +int func1 (void)
> +{
> +  return 0;
> +}
> +
> +/* function use x30.  */
> +int func2 (void)
> +{
> +  /* scs push */
> +  asm volatile ("":::"x30");
> +
> +  return 0;
> +  /* scs pop */
> +}
> +
> +/* sibcall.  */
> +int func3 (int a, int b)
> +{
> +  /* scs push */
> +  asm volatile ("":::"x30");
> +
> +  return foo (a+b);
> +  /* scs pop */
> +}
> +
> +/* eh_return.  */
> +int func4 (long offset, void *handler)
> +{
> +  /* Do not emit scs push/pop */
> +  asm volatile ("":::"x30");
> +
> +  __builtin_eh_return (offset, handler);
> +}
> +
> +/* { dg-final { scan-assembler-times "str\tx30, \\\[x18\\\], #8" 2 } } */
> +/* { dg-final { scan-assembler-times "ldr\tx30, \\\[x18, #-8\\\]!" 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c
> new file mode 100644
> index 00000000000..a1b1cf16655
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c
> @@ -0,0 +1,18 @@
> +/* Testing the disable of shadow call stack.  */
> +/* scs_push: str x30, [x18], #8 */
> +/* scs_pop: ldr x30, [x18, #-8]! */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fsanitize=shadow-call-stack -ffixed-x18 -fno-exceptions" } */
> +
> +int foo (int);
> +
> +/* function disable shadow call stack.  */
> +int __attribute__((no_sanitize_shadow_call_stack)) func1 (void)
> +{
> +  asm volatile ("":::"x30");
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-assembler-not "str\tx30, \\\[x18\\\], #8" } } */
> +/* { dg-final { scan-assembler-not "ldr\tx30, \\\[x18, #-8\\\]!" } } */
  
Dan Li Jan. 24, 2022, 2:16 a.m. UTC | #2
On 1/20/22 04:02, Richard Sandiford wrote:
> Thanks for the patch and sorry for the (very) slow review.
> 
Thanks for the review, Richard :).

>> +/* Handle a "no_sanitize_shadow_call_stack" attribute; arguments as in
>> +   struct attribute_spec.handler.  */
>> +static tree
>> +handle_no_sanitize_shadow_call_stack_attribute (tree *node, tree name,
>> +				      tree, int, bool *no_add_attrs)
>> +{
>> +  *no_add_attrs = true;
>> +  if (TREE_CODE (*node) != FUNCTION_DECL)
>> +    warning (OPT_Wattributes, "%qE attribute ignored", name);
>> +  else
>> +    add_no_sanitize_value (*node, SANITIZE_SHADOW_CALL_STACK);
>> +
>> +  return NULL_TREE;
>> +}
>> +
> 
> Do we need this?  I think these days the preference is to use the
> general "no_sanitize" attribute with an argument, which is also the
> syntax documented on the clang page.
> 
> We have to support no_sanitize_foo attributes for some of the early
> sanitiser features, to avoid breaking backwards compatibility, but it
> doesn't look like clang ever supported no_sanitize_shadow_call_stack.
> 
Oh, "no_sanitize" is enough, I will delete this in the next version.

>> +/* Return TRUE if shadow call stack should be enabled for the current
>> +   function, otherwise return FALSE.  */
>> +
>> +bool
>> +aarch64_shadow_call_stack_enabled (void)
>> +{
>> +  /* This function should only be called after frame laid out.  */
>> +  gcc_assert (cfun->machine->frame.laid_out);
>> +
>> +  if (crtl->calls_eh_return)
>> +    return false;
>> +
>> +  /* We only deal with a function if its LR is pushed onto stack
>> +     and attribute no_sanitize_shadow_call_stack is not specified.  */
> 
> (This would need to be updated if we do drop the specific attribute.)
> 
Ok.

>> +  /* Push return address to shadow call stack.  */
>> +  if (aarch64_shadow_call_stack_enabled ())
>> +	emit_insn (gen_scs_push ());
> 
> Formatting nit: should only be indented by two spaces more than the “if”.
> Same below.
> 
Got it.

> I guess doing it like this means that we also continue to store x30 to the
> frame in the traditional way.  And that's probably necessary, given that
> the saved x30 forms part of link chain.
> 
Yes, we just added an extra insn to the prologue like:
+   str     x30, [x18], #8
     stp     x29, x30, [sp, #-16]!
     .......

>> +
>>      if (flag_stack_usage_info)
>>        current_function_static_stack_size = constant_lower_bound (frame_size);
>>    
>> @@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)
>>          RTX_FRAME_RELATED_P (insn) = 1;
>>        }
>>    
>> +  /* Pop return address from shadow call stack.  */
>> +  if (aarch64_shadow_call_stack_enabled ())
>> +	emit_insn (gen_scs_pop ());
>> +
> 
> This looks correct, but following on from the above, I guess we continue
> to restore x30 from the frame in the traditional way first, and then
> overwrite it with the SCS value.  I think the output code would be
> slightly neater if we suppressed the first restore of x30.
> 
Yes, the current epilogue is like:
     .......
     ldp     x29, x30, [sp], #16
+   ldr     x30, [x18, #-8]!

I think may be we can keep the two x30 pops here, for two reasons:
1) The implementation of scs in clang is to pop x30 twice, it might be
better to be consistent with clang here[1].
2) If we keep the first restore of x30, it may provide some flexibility
for other scenarios. For example, we can directly patch the scs_push/pop
insns in the binary to turn it into a binary without scs;

[1] https://github.com/llvm/llvm-project/commit/f11eb3ebe77729426e562d7d4d7ebb1d5ff2e7c8

>> +/* This value represents whether the shadow call stack is implemented on
>> +   the target platform.  */
>> +#define TARGET_SUPPORT_SHADOW_CALL_STACK true
>> +
>> +#define TARGET_CHECK_SCS_RESERVED_REGISTER()	\
>> +  do {						\
>> +    if (!fixed_regs[R18_REGNUM])		\
>> +      error ("%<-fsanitize=shadow-call-stack%> only "	\
>> +	     "allowed with %<-ffixed-x18%>");	\
>> +  } while (0)
>> +
> 
> Please make these target hooks instead.  The first one can use DEFHOOKPOD.
> 
Ok, I will add a field to gcc_target via DEFHOOKPOD.

>> +;; Save X30 in the X18-based POST_INC stack (consistent with clang).
>> +(define_insn "scs_push"
>> +  [(set (mem:DI (reg:DI R18_REGNUM)) (reg:DI R30_REGNUM))
>> +   (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int 8)))]
>> +  ""
>> +  "str\\tx30, [x18], #8"
>> +  [(set_attr "type" "store_8")]
>> +)
>> +
>> +;; Load X30 form the X18-based PRE_DEC stack (consistent with clang).
>> +(define_insn "scs_pop"
>> +  [(set (reg:DI R18_REGNUM) (minus:DI (reg:DI R18_REGNUM) (const_int 8)))
>> +   (set (reg:DI R30_REGNUM) (mem:DI (reg:DI R18_REGNUM)))]
>> +  ""
>> +  "ldr\\tx30, [x18, #-8]!"
>> +  [(set_attr "type" "load_8")]
>> +)
>> +
> 
> The two SETs here work in parallel, with the define_insn as a whole
> following a "read input operands, act, write output operands" sequence.
> The (mem: …) address therefore sees the old value of r18 rather than the
> new one.  Also, RTL rules say that subtractions need to be written as
> additions of the negative.
> 
Oh, sorry, I got it wrong here.

> I think the pattern would therefore be something like:
> 
>    [(set (reg:DI R30_REGNUM) (mem:DI (plus:DI (reg:DI R18_REGNUM)
> 					     (const_int -8))))]
>     (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int -8)))]
> 
> However, since these are normal moves, I think we should be able to use
> the standard move patterns.  E.g. the push could be:
> 
> (define_expand "scs_push"
>    [(set (mem:DI (pre_dec:DI (reg:DI R18_REGNUM)))
>          (reg:DI R30_REGNUM))])
> 
> and similarly for the pop.
> 
Thanks, this template looks better.

Since the push/pop of SCS and normal stack in clang are different
(for example, scs push is post_inc, while normal stack is pre_dec),
in order to maintain consistency, I think it can be changed to the
following:

(define_expand "scs_push"
   [(set (mem:DI (post_inc:DI (reg:DI R18_REGNUM)))
         (reg:DI R30_REGNUM))])

(define_expand "scs_pop"
   [(set (reg:DI R30_REGNUM)
        (mem:DI (pre_dec:DI (reg:DI R18_REGNUM))))])

>> +Enable ShadowCallStack, a security enhancement mechanism used to protect
>> +programs against return address overwrites (e.g. stack buffer overflows.)
>> +It works by saving a function's return address to a separately allocated
>> +shadow call stack in the function prologue and restore the return address
> 
> typo: s/restore/restoring/
> 
Got it.
>> +from the shadow call stack in the function epilogue.  Instrumentation only
>> +occurs in functions that need to save the return address to the stack.
>> +
>> +Currently it only supports the aarch64 platform and only works fine in
>> +the linux kernel that implements CONFIG_SHADOW_CALL_STACK.
> 
> Maybe:
> 
>    Currently it only supports the aarch64 platform.  It is specifically
>    designed for linux kernels that enable the CONFIG_SHADOW_CALL_STACK option.
> Ok.

>> +For the user space programs, runtime support is not currently provided
>> +in libc and libgcc.  Users who want to use this feature in user space need
>> +to provide their own support for the runtime.  It should be noted that
>> +this may cause the ABI rules to be broken.
>> +
>> +On aarch64, the instrumentation makes use of the platform register 'x18'.
> 
> texinfo formatting would be @code{x18}.  (Hopefully we'll move rst soon!)
> 
Got it.

>> +This generally means that any code that may run on the same thread as code
>> +compiled with ShadowCallStack must be compiled with the flag
>> +@option{-ffixed-x18}, otherwise functions compiled without
>> +@option{-ffixed-x18} may clobber x18 and break scs.
> 
> Maybe:
> 
>    might clobber @code{x18} and so corrupt the shadow stack pointer.
> 
Ok.
>> +
>> +And also because there is no runtime support, the code compiled with
>> +ShadowCallStack cannot use exception handling, @option{-fno-exceptions}
>> +also needs to be specified.
> 
> Maybe:
> 
>    Also, because there is no userspace runtime support, code compiled with
>    ShadowCallStack cannot use exception handling.  Use @option{-fno-exceptions}
>    to turn off exceptions.
> 
Ok.

>> index 55273822ec5..98e1d636f7f 100644
>> --- a/gcc/opts-global.c
>> +++ b/gcc/opts-global.c
>> @@ -477,4 +477,10 @@ handle_common_deferred_options (void)
>>    	  gcc_unreachable ();
>>    	}
>>        }
>> +
>> +#ifdef TARGET_CHECK_SCS_RESERVED_REGISTER
>> +  if (flag_sanitize & SANITIZE_SHADOW_CALL_STACK)
>> +    TARGET_CHECK_SCS_RESERVED_REGISTER ();
>> +#endif
>> +
> 
> Actually (after saying this should move from being a macro to a hook):
> maybe we can keep it in target-specific code instead.  We already have
> similar errors in aarch64_override_options_internal.
>> (That comment only applies to this macro/hook.  We still want a hook for
> TARGET_SUPPORT_SHADOW_CALL_STACK.)
> 
Ok, I'll put this check into aarch64_override_options_internal
and add a hook in gcc_target for TARGET_SUPPORT_SHADOW_CALL_STACK.

>> +  if (opts->x_flag_sanitize & SANITIZE_SHADOW_CALL_STACK)
>> +    {
>> +      if (!TARGET_SUPPORT_SHADOW_CALL_STACK)
>> +	error_at (loc, "%<-fsanitize=shadow-call-stack%> not supported "
>> +		  "in current platform");
> 
> I think this should be a sorry() instead.  It changes the diagnostic
> prefix to “sorry, not implemented”, so that the compiler admits that
> this is a missing feature rather than the user doing something wrong.
> 
Thanks for the explanation, it sounds reasonable.

>> +
>> +      if (opts->x_flag_exceptions)
>> +	error_at (loc, "%<-fsanitize=shadow-call-stack%> only allowed "
>> +		  "with %<-fno-exceptions%>");
> 
> Maybe better as an “else”, since reporting the second error seems
> redundant after reporting the first error.
> 
> maybe: s/only allowed with/requires/
> 
> Arguably this could be a sorry() too, but it's not as clear.  Let's stick
> with error_at for now.
> 
Got it!
> Thanks,
> Richard
> 
>> +/* { dg-final { scan-assembler-times "str\tx30, \\\[x18\\\], #8" 2 } } */
>> +/* { dg-final { scan-assembler-times "ldr\tx30, \\\[x18, #-8\\\]!" 2 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c
>> new file mode 100644
>> index 00000000000..a1b1cf16655
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c
>> @@ -0,0 +1,18 @@
>> +/* Testing the disable of shadow call stack.  */
>> +/* scs_push: str x30, [x18], #8 */
>> +/* scs_pop: ldr x30, [x18, #-8]! */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fsanitize=shadow-call-stack -ffixed-x18 -fno-exceptions" } */
>> +
>> +int foo (int);
>> +
>> +/* function disable shadow call stack.  */
>> +int __attribute__((no_sanitize_shadow_call_stack)) func1 (void)
>> +{
>> +  asm volatile ("":::"x30");
>> +
>> +  return 0;
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "str\tx30, \\\[x18\\\], #8" } } */
>> +/* { dg-final { scan-assembler-not "ldr\tx30, \\\[x18, #-8\\\]!" } } */
  
Richard Sandiford Jan. 25, 2022, 10:19 a.m. UTC | #3
Dan Li <ashimida@linux.alibaba.com> writes:
>>> +
>>>      if (flag_stack_usage_info)
>>>        current_function_static_stack_size = constant_lower_bound (frame_size);
>>>    
>>> @@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)
>>>          RTX_FRAME_RELATED_P (insn) = 1;
>>>        }
>>>    
>>> +  /* Pop return address from shadow call stack.  */
>>> +  if (aarch64_shadow_call_stack_enabled ())
>>> +	emit_insn (gen_scs_pop ());
>>> +
>> 
>> This looks correct, but following on from the above, I guess we continue
>> to restore x30 from the frame in the traditional way first, and then
>> overwrite it with the SCS value.  I think the output code would be
>> slightly neater if we suppressed the first restore of x30.
>> 
> Yes, the current epilogue is like:
>      .......
>      ldp     x29, x30, [sp], #16
> +   ldr     x30, [x18, #-8]!
>
> I think may be we can keep the two x30 pops here, for two reasons:
> 1) The implementation of scs in clang is to pop x30 twice, it might be
> better to be consistent with clang here[1].

This doesn't seem a very compelling reason on its own, unless it's
explicitly meant to be observable behaviour.  (But then, observed how?)

> 2) If we keep the first restore of x30, it may provide some flexibility
> for other scenarios. For example, we can directly patch the scs_push/pop
> insns in the binary to turn it into a binary without scs;

Is that a supported (and ideally documented) use case?  If it is,
then it becomes important for correctness that the compiler emits
exactly the opcodes in the original patch.  If it isn't, then the
compiler can emit other instructions that have the same effect.

I'd like a definitive ruling on this from the kernel folks before
the patch goes in.

If binary patching is supposed to be possible then scs_push and
scs_pop *do* need to be separate define_insns.  But they also need
to have some magic unspec that differentiates them from normal
pushes and pops, e.g.:

  (set ...mem...
       (unspec:DI [...reg...] UNSPEC_SCS_PUSH))

so that there is no chance that the pattern would be treated as
a normal move and optimised accordingly.

>>> +;; Save X30 in the X18-based POST_INC stack (consistent with clang).
>>> +(define_insn "scs_push"
>>> +  [(set (mem:DI (reg:DI R18_REGNUM)) (reg:DI R30_REGNUM))
>>> +   (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int 8)))]
>>> +  ""
>>> +  "str\\tx30, [x18], #8"
>>> +  [(set_attr "type" "store_8")]
>>> +)
>>> +
>>> +;; Load X30 form the X18-based PRE_DEC stack (consistent with clang).
>>> +(define_insn "scs_pop"
>>> +  [(set (reg:DI R18_REGNUM) (minus:DI (reg:DI R18_REGNUM) (const_int 8)))
>>> +   (set (reg:DI R30_REGNUM) (mem:DI (reg:DI R18_REGNUM)))]
>>> +  ""
>>> +  "ldr\\tx30, [x18, #-8]!"
>>> +  [(set_attr "type" "load_8")]
>>> +)
>>> +
>> 
>> The two SETs here work in parallel, with the define_insn as a whole
>> following a "read input operands, act, write output operands" sequence.
>> The (mem: …) address therefore sees the old value of r18 rather than the
>> new one.  Also, RTL rules say that subtractions need to be written as
>> additions of the negative.
>> 
> Oh, sorry, I got it wrong here.
>
>> I think the pattern would therefore be something like:
>> 
>>    [(set (reg:DI R30_REGNUM) (mem:DI (plus:DI (reg:DI R18_REGNUM)
>> 					     (const_int -8))))]
>>     (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int -8)))]
>> 
>> However, since these are normal moves, I think we should be able to use
>> the standard move patterns.  E.g. the push could be:
>> 
>> (define_expand "scs_push"
>>    [(set (mem:DI (pre_dec:DI (reg:DI R18_REGNUM)))
>>          (reg:DI R30_REGNUM))])
>> 
>> and similarly for the pop.
>> 
> Thanks, this template looks better.
>
> Since the push/pop of SCS and normal stack in clang are different
> (for example, scs push is post_inc, while normal stack is pre_dec),
> in order to maintain consistency, I think it can be changed to the
> following:
>
> (define_expand "scs_push"
>    [(set (mem:DI (post_inc:DI (reg:DI R18_REGNUM)))
>          (reg:DI R30_REGNUM))])
>
> (define_expand "scs_pop"
>    [(set (reg:DI R30_REGNUM)
>         (mem:DI (pre_dec:DI (reg:DI R18_REGNUM))))])

Yeah, I copied the wrong name above, sorry.

Thanks,
Richard
  
Dan Li Jan. 26, 2022, 6:51 a.m. UTC | #4
On 1/25/22 02:19, Richard Sandiford wrote:
> Dan Li <ashimida@linux.alibaba.com> writes:
>>>> +
>>>>       if (flag_stack_usage_info)
>>>>         current_function_static_stack_size = constant_lower_bound (frame_size);
>>>>     
>>>> @@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)
>>>>           RTX_FRAME_RELATED_P (insn) = 1;
>>>>         }
>>>>     
>>>> +  /* Pop return address from shadow call stack.  */
>>>> +  if (aarch64_shadow_call_stack_enabled ())
>>>> +	emit_insn (gen_scs_pop ());
>>>> +
>>>
>>> This looks correct, but following on from the above, I guess we continue
>>> to restore x30 from the frame in the traditional way first, and then
>>> overwrite it with the SCS value.  I think the output code would be
>>> slightly neater if we suppressed the first restore of x30.
>>>
>> Yes, the current epilogue is like:
>>       .......
>>       ldp     x29, x30, [sp], #16
>> +   ldr     x30, [x18, #-8]!
>>
>> I think may be we can keep the two x30 pops here, for two reasons:
>> 1) The implementation of scs in clang is to pop x30 twice, it might be
>> better to be consistent with clang here[1].
> 
> This doesn't seem a very compelling reason on its own, unless it's
> explicitly meant to be observable behaviour.  (But then, observed how?)
> 

Well, probably sticking to pop x30 twice is not a good idea.
AFAICT, there doesn't seem to be an explicit requirement that
this behavior must be followed.

BTW:
Do we still need to emit the .cfi_restore 30 directive here if we
want to save a pop x30? (Sorry I'm not familiar enough with DWARF.)

Since the aarch64 linux kernel always enables -fno-omit-frame-pointer,
the generated assembly code may be as follows:

str     x30, [x18], 8
ldp     x29, x30, [sp], 16
......
ldr     x29, [sp], 16
                         ## Do we still need a .cfi_restore 30 here
.cfi_restore 29
.cfi_def_cfa_offset 0
ldr     x30, [x18, -8]!
                         ## Or may be a non-CFA based directive here
ret

>> 2) If we keep the first restore of x30, it may provide some flexibility
>> for other scenarios. For example, we can directly patch the scs_push/pop
>> insns in the binary to turn it into a binary without scs;
> 
> Is that a supported (and ideally documented) use case?  If it is,
> then it becomes important for correctness that the compiler emits
> exactly the opcodes in the original patch.  If it isn't, then the
> compiler can emit other instructions that have the same effect.
> 

Oh, no, this is just a little trick that can be used for compatibility.
(Maybe some scenarios such as our company sometimes need to be
compatible with some non-open source binaries from different
manufacturers, two pops could make life easier :). )

If this is not a consideration for our community, please ignore
this request.

> I'd like a definitive ruling on this from the kernel folks before
> the patch goes in.
> 

Ok, I'll cc some kernel folks to make sure I didn't miss something.

> If binary patching is supposed to be possible then scs_push and
> scs_pop *do* need to be separate define_insns.  But they also need
> to have some magic unspec that differentiates them from normal
> pushes and pops, e.g.:
> 
>    (set ...mem...
>         (unspec:DI [...reg...] UNSPEC_SCS_PUSH))
> 
> so that there is no chance that the pattern would be treated as
> a normal move and optimised accordingly.
> 

Yeah, this template looks more appropriate if it is to be treated
as a special directive.

Thanks for your suggestions,
Dan
  
Dan Li Jan. 26, 2022, 7:53 a.m. UTC | #5
Hi, all,

Sorry for bothering.

I'm trying to commit aarch64 scs code to the gcc and there is an issue
that I'm not sure about, could someone give me some suggestions?
(To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) )

When clang enables scs, the following instructions are usually generated:

str     x30, [x18], 8
ldp     x29, x30, [sp], 16
......
ldp     x29, x30, [sp], 16		## x30 pop
ldr     x30, [x18, -8]!   		## x30 pop again
ret

The x30 register is popped twice here, Richard suggested that we can
omit the first x30 pop here.

AFAICT, it seems fine and also safe for SCS. But I'm not sure if I'm
missing something with the kernel, could someone give some suggestions?

The previous discussion can be found here [1].

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589257.html

Thanks a lot!
Dan

On 1/25/22 22:51, Dan Li wrote:
> 
> 
> On 1/25/22 02:19, Richard Sandiford wrote:
>> Dan Li <ashimida@linux.alibaba.com> writes:
>>>>> +
>>>>>       if (flag_stack_usage_info)
>>>>>         current_function_static_stack_size = constant_lower_bound (frame_size);
>>>>> @@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)
>>>>>           RTX_FRAME_RELATED_P (insn) = 1;
>>>>>         }
>>>>> +  /* Pop return address from shadow call stack.  */
>>>>> +  if (aarch64_shadow_call_stack_enabled ())
>>>>> +    emit_insn (gen_scs_pop ());
>>>>> +
>>>>
>>>> This looks correct, but following on from the above, I guess we continue
>>>> to restore x30 from the frame in the traditional way first, and then
>>>> overwrite it with the SCS value.  I think the output code would be
>>>> slightly neater if we suppressed the first restore of x30.
>>>>
>>> Yes, the current epilogue is like:
>>>       .......
>>>       ldp     x29, x30, [sp], #16
>>> +   ldr     x30, [x18, #-8]!
>>>
>>> I think may be we can keep the two x30 pops here, for two reasons:
>>> 1) The implementation of scs in clang is to pop x30 twice, it might be
>>> better to be consistent with clang here[1].
>>
>> This doesn't seem a very compelling reason on its own, unless it's
>> explicitly meant to be observable behaviour.  (But then, observed how?)
>>
> 
> Well, probably sticking to pop x30 twice is not a good idea.
> AFAICT, there doesn't seem to be an explicit requirement that
> this behavior must be followed.
> 
> BTW:
> Do we still need to emit the .cfi_restore 30 directive here if we
> want to save a pop x30? (Sorry I'm not familiar enough with DWARF.)
> 
> Since the aarch64 linux kernel always enables -fno-omit-frame-pointer,
> the generated assembly code may be as follows:
> 
> str     x30, [x18], 8
> ldp     x29, x30, [sp], 16
> ......
> ldr     x29, [sp], 16
>                          ## Do we still need a .cfi_restore 30 here
> .cfi_restore 29
> .cfi_def_cfa_offset 0
> ldr     x30, [x18, -8]!
>                          ## Or may be a non-CFA based directive here
> ret
> 
>>> 2) If we keep the first restore of x30, it may provide some flexibility
>>> for other scenarios. For example, we can directly patch the scs_push/pop
>>> insns in the binary to turn it into a binary without scs;
>>
>> Is that a supported (and ideally documented) use case?  If it is,
>> then it becomes important for correctness that the compiler emits
>> exactly the opcodes in the original patch.  If it isn't, then the
>> compiler can emit other instructions that have the same effect.
>>
> 
> Oh, no, this is just a little trick that can be used for compatibility.
> (Maybe some scenarios such as our company sometimes need to be
> compatible with some non-open source binaries from different
> manufacturers, two pops could make life easier :). )
> 
> If this is not a consideration for our community, please ignore
> this request.
> 
>> I'd like a definitive ruling on this from the kernel folks before
>> the patch goes in.
>>
> 
> Ok, I'll cc some kernel folks to make sure I didn't miss something.
> 
>> If binary patching is supposed to be possible then scs_push and
>> scs_pop *do* need to be separate define_insns.  But they also need
>> to have some magic unspec that differentiates them from normal
>> pushes and pops, e.g.:
>>
>>    (set ...mem...
>>         (unspec:DI [...reg...] UNSPEC_SCS_PUSH))
>>
>> so that there is no chance that the pattern would be treated as
>> a normal move and optimised accordingly.
>>
> 
> Yeah, this template looks more appropriate if it is to be treated
> as a special directive.
> 
> Thanks for your suggestions,
> Dan
  
Ard Biesheuvel Jan. 26, 2022, 8:10 a.m. UTC | #6
On Wed, 26 Jan 2022 at 08:53, Dan Li <ashimida@linux.alibaba.com> wrote:
>
> Hi, all,
>
> Sorry for bothering.
>
> I'm trying to commit aarch64 scs code to the gcc and there is an issue
> that I'm not sure about, could someone give me some suggestions?
> (To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) )
>
> When clang enables scs, the following instructions are usually generated:
>
> str     x30, [x18], 8
> ldp     x29, x30, [sp], 16
> ......
> ldp     x29, x30, [sp], 16              ## x30 pop
> ldr     x30, [x18, -8]!                 ## x30 pop again
> ret
>
> The x30 register is popped twice here, Richard suggested that we can
> omit the first x30 pop here.
>
> AFAICT, it seems fine and also safe for SCS. But I'm not sure if I'm
> missing something with the kernel, could someone give some suggestions?
>
> The previous discussion can be found here [1].
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589257.html
>

As was pointed out in the discussion, binary patching is in fact a
concern for the Linux kernel. E.g., Android uses generic binary
builds, but we would like to be able to switch between pointer
authentication and shadow call stack at boot time, rather than always
support both, and take the SCS performance hit on systems that
implement PAC as well.

However, it seems more straight-forward to patch PACIASP and AUTIASP
instructions into SCS push/pop instructions rather than the other way
around, as we can force the use of these exact opcodes [in the NOP
space]), as well as rely on existing unwind annotations to locate any
such instruction in the binary.

So omitting the load of X30 from the ordinary stack seems fine to me.

> On 1/25/22 22:51, Dan Li wrote:
> >
> >
> > On 1/25/22 02:19, Richard Sandiford wrote:
> >> Dan Li <ashimida@linux.alibaba.com> writes:
> >>>>> +
> >>>>>       if (flag_stack_usage_info)
> >>>>>         current_function_static_stack_size = constant_lower_bound (frame_size);
> >>>>> @@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)
> >>>>>           RTX_FRAME_RELATED_P (insn) = 1;
> >>>>>         }
> >>>>> +  /* Pop return address from shadow call stack.  */
> >>>>> +  if (aarch64_shadow_call_stack_enabled ())
> >>>>> +    emit_insn (gen_scs_pop ());
> >>>>> +
> >>>>
> >>>> This looks correct, but following on from the above, I guess we continue
> >>>> to restore x30 from the frame in the traditional way first, and then
> >>>> overwrite it with the SCS value.  I think the output code would be
> >>>> slightly neater if we suppressed the first restore of x30.
> >>>>
> >>> Yes, the current epilogue is like:
> >>>       .......
> >>>       ldp     x29, x30, [sp], #16
> >>> +   ldr     x30, [x18, #-8]!
> >>>
> >>> I think may be we can keep the two x30 pops here, for two reasons:
> >>> 1) The implementation of scs in clang is to pop x30 twice, it might be
> >>> better to be consistent with clang here[1].
> >>
> >> This doesn't seem a very compelling reason on its own, unless it's
> >> explicitly meant to be observable behaviour.  (But then, observed how?)
> >>
> >
> > Well, probably sticking to pop x30 twice is not a good idea.
> > AFAICT, there doesn't seem to be an explicit requirement that
> > this behavior must be followed.
> >
> > BTW:
> > Do we still need to emit the .cfi_restore 30 directive here if we
> > want to save a pop x30? (Sorry I'm not familiar enough with DWARF.)
> >
> > Since the aarch64 linux kernel always enables -fno-omit-frame-pointer,
> > the generated assembly code may be as follows:
> >
> > str     x30, [x18], 8
> > ldp     x29, x30, [sp], 16
> > ......
> > ldr     x29, [sp], 16
> >                          ## Do we still need a .cfi_restore 30 here
> > .cfi_restore 29
> > .cfi_def_cfa_offset 0
> > ldr     x30, [x18, -8]!
> >                          ## Or may be a non-CFA based directive here
> > ret
> >
> >>> 2) If we keep the first restore of x30, it may provide some flexibility
> >>> for other scenarios. For example, we can directly patch the scs_push/pop
> >>> insns in the binary to turn it into a binary without scs;
> >>
> >> Is that a supported (and ideally documented) use case?  If it is,
> >> then it becomes important for correctness that the compiler emits
> >> exactly the opcodes in the original patch.  If it isn't, then the
> >> compiler can emit other instructions that have the same effect.
> >>
> >
> > Oh, no, this is just a little trick that can be used for compatibility.
> > (Maybe some scenarios such as our company sometimes need to be
> > compatible with some non-open source binaries from different
> > manufacturers, two pops could make life easier :). )
> >
> > If this is not a consideration for our community, please ignore
> > this request.
> >
> >> I'd like a definitive ruling on this from the kernel folks before
> >> the patch goes in.
> >>
> >
> > Ok, I'll cc some kernel folks to make sure I didn't miss something.
> >
> >> If binary patching is supposed to be possible then scs_push and
> >> scs_pop *do* need to be separate define_insns.  But they also need
> >> to have some magic unspec that differentiates them from normal
> >> pushes and pops, e.g.:
> >>
> >>    (set ...mem...
> >>         (unspec:DI [...reg...] UNSPEC_SCS_PUSH))
> >>
> >> so that there is no chance that the pattern would be treated as
> >> a normal move and optimised accordingly.
> >>
> >
> > Yeah, this template looks more appropriate if it is to be treated
> > as a special directive.
> >
> > Thanks for your suggestions,
> > Dan
  
Dan Li Jan. 26, 2022, 10:35 a.m. UTC | #7
Thanks, Ard,

On 1/26/22 00:10, Ard Biesheuvel wrote:
> On Wed, 26 Jan 2022 at 08:53, Dan Li <ashimida@linux.alibaba.com> wrote:
>>
>> Hi, all,
>>
>> Sorry for bothering.
>>
>> I'm trying to commit aarch64 scs code to the gcc and there is an issue
>> that I'm not sure about, could someone give me some suggestions?
>> (To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) )
>>
>> When clang enables scs, the following instructions are usually generated:
>>
>> str     x30, [x18], 8
>> ldp     x29, x30, [sp], 16
>> ......
>> ldp     x29, x30, [sp], 16              ## x30 pop
>> ldr     x30, [x18, -8]!                 ## x30 pop again
>> ret
>>
>> The x30 register is popped twice here, Richard suggested that we can
>> omit the first x30 pop here.
>>
>> AFAICT, it seems fine and also safe for SCS. But I'm not sure if I'm
>> missing something with the kernel, could someone give some suggestions?
>>
>> The previous discussion can be found here [1].
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589257.html
>>
> 
> As was pointed out in the discussion, binary patching is in fact a
> concern for the Linux kernel. E.g., Android uses generic binary
> builds, but we would like to be able to switch between pointer
> authentication and shadow call stack at boot time, rather than always
> support both, and take the SCS performance hit on systems that
> implement PAC as well.
> 
> However, it seems more straight-forward to patch PACIASP and AUTIASP
> instructions into SCS push/pop instructions rather than the other way
> around, as we can force the use of these exact opcodes [in the NOP
> space]), as well as rely on existing unwind annotations to locate any
> such instruction in the binary.
> 

Well, then I think I don't need to submit a kernel patch to
enable SCS for gcc :)

BTW:
Do we have a plan to submit patches of dynamic patch PAC into
the kernel recently?

> So omitting the load of X30 from the ordinary stack seems fine to me.
> 
>> On 1/25/22 22:51, Dan Li wrote:
>>>
>>>
>>> On 1/25/22 02:19, Richard Sandiford wrote:
>>>
>>> Well, probably sticking to pop x30 twice is not a good idea.
>>> AFAICT, there doesn't seem to be an explicit requirement that
>
>>>>
>>>
>>> Ok, I'll cc some kernel folks to make sure I didn't miss something.
>>>
To Richard:

Sorry for my mistake.

Due to binary compatibility issues, SCS related code may not
be directly merged into libgcc/glibc, do we still need to
add this patch into GCC? (I'd like to finish it if that
makes sense).


Thanks all for your time!
Dan

>>>> If binary patching is supposed to be possible then scs_push and
>>>> scs_pop *do* need to be separate define_insns.  But they also need
>>>> to have some magic unspec that differentiates them from normal
>>>> pushes and pops, e.g.:
>>>>
>>>>     (set ...mem...
>>>>          (unspec:DI [...reg...] UNSPEC_SCS_PUSH))
>>>>
>>>> so that there is no chance that the pattern would be treated as
>>>> a normal move and optimised accordingly.
>>>>
>>>
>>> Yeah, this template looks more appropriate if it is to be treated
>>> as a special directive.
>>>
>>> Thanks for your suggestions,
>>> Dan
  
Ard Biesheuvel Jan. 26, 2022, 11:09 a.m. UTC | #8
On Wed, 26 Jan 2022 at 11:40, Dan Li <ashimida@linux.alibaba.com> wrote:
>
> Thanks, Ard,
>
> On 1/26/22 00:10, Ard Biesheuvel wrote:
> > On Wed, 26 Jan 2022 at 08:53, Dan Li <ashimida@linux.alibaba.com> wrote:
> >>
> >> Hi, all,
> >>
> >> Sorry for bothering.
> >>
> >> I'm trying to commit aarch64 scs code to the gcc and there is an issue
> >> that I'm not sure about, could someone give me some suggestions?
> >> (To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) )
> >>
> >> When clang enables scs, the following instructions are usually generated:
> >>
> >> str     x30, [x18], 8
> >> ldp     x29, x30, [sp], 16
> >> ......
> >> ldp     x29, x30, [sp], 16              ## x30 pop
> >> ldr     x30, [x18, -8]!                 ## x30 pop again
> >> ret
> >>
> >> The x30 register is popped twice here, Richard suggested that we can
> >> omit the first x30 pop here.
> >>
> >> AFAICT, it seems fine and also safe for SCS. But I'm not sure if I'm
> >> missing something with the kernel, could someone give some suggestions?
> >>
> >> The previous discussion can be found here [1].
> >>
> >> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589257.html
> >>
> >
> > As was pointed out in the discussion, binary patching is in fact a
> > concern for the Linux kernel. E.g., Android uses generic binary
> > builds, but we would like to be able to switch between pointer
> > authentication and shadow call stack at boot time, rather than always
> > support both, and take the SCS performance hit on systems that
> > implement PAC as well.
> >
> > However, it seems more straight-forward to patch PACIASP and AUTIASP
> > instructions into SCS push/pop instructions rather than the other way
> > around, as we can force the use of these exact opcodes [in the NOP
> > space]), as well as rely on existing unwind annotations to locate any
> > such instruction in the binary.
> >
>
> Well, then I think I don't need to submit a kernel patch to
> enable SCS for gcc :)
>

Not entirely.

> BTW:
> Do we have a plan to submit patches of dynamic patch PAC into
> the kernel recently?
>

At the moment, there are just some ideas floating around. I did
implement a proof of concept that uses unwind data, but it hit some
issues with cfi_negate_ra_state being emitted imprecisely (GCC) or not
at all (Clang) in some cases. Using objtool would be another
possibility.

So in summary, getting SCS support proper into GCC is definitely worth
the effort.
  
Dan Li Jan. 26, 2022, 2:08 p.m. UTC | #9
On 1/26/22 03:09, Ard Biesheuvel wrote:
> On Wed, 26 Jan 2022 at 11:40, Dan Li <ashimida@linux.alibaba.com> wrote:
>>
>> Thanks, Ard,
>>
>> On 1/26/22 00:10, Ard Biesheuvel wrote:
>>> On Wed, 26 Jan 2022 at 08:53, Dan Li <ashimida@linux.alibaba.com> wrote:
>>>>
>>>> Hi, all,
>>>>
>>>> Sorry for bothering.
>>>>
>>>> I'm trying to commit aarch64 scs code to the gcc and there is an issue
>>>> that I'm not sure about, could someone give me some suggestions?
>>>> (To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) )
>>>>
>>>> When clang enables scs, the following instructions are usually generated:
>>>>
>>>> str     x30, [x18], 8
>>>> ldp     x29, x30, [sp], 16
>>>> ......
>>>> ldp     x29, x30, [sp], 16              ## x30 pop
>>>> ldr     x30, [x18, -8]!                 ## x30 pop again
>>>> ret
>>>>
>>>> The x30 register is popped twice here, Richard suggested that we can
>>>> omit the first x30 pop here.
>>>>
>>>> AFAICT, it seems fine and also safe for SCS. But I'm not sure if I'm
>>>> missing something with the kernel, could someone give some suggestions?
>>>>
>>>> The previous discussion can be found here [1].
>>>>
>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589257.html
>>>>
>>>
>>> As was pointed out in the discussion, binary patching is in fact a
>>> concern for the Linux kernel. E.g., Android uses generic binary
>>> builds, but we would like to be able to switch between pointer
>>> authentication and shadow call stack at boot time, rather than always
>>> support both, and take the SCS performance hit on systems that
>>> implement PAC as well.
>>>
>>> However, it seems more straight-forward to patch PACIASP and AUTIASP
>>> instructions into SCS push/pop instructions rather than the other way
>>> around, as we can force the use of these exact opcodes [in the NOP
>>> space]), as well as rely on existing unwind annotations to locate any
>>> such instruction in the binary.
>>>
>>
>> Well, then I think I don't need to submit a kernel patch to
>> enable SCS for gcc :)
>>
> 
> Not entirely.
> 
>> BTW:
>> Do we have a plan to submit patches of dynamic patch PAC into
>> the kernel recently?
>>
> 
> At the moment, there are just some ideas floating around. I did
> implement a proof of concept that uses unwind data, but it hit some
> issues with cfi_negate_ra_state being emitted imprecisely (GCC) or not
> at all (Clang) in some cases. Using objtool would be another
> possibility.
> 
> So in summary, getting SCS support proper into GCC is definitely worth
> the effort.
> 
Got it!

And thanks for the suggestion, Ard :)
  
Dan Li Jan. 29, 2022, 3:11 p.m. UTC | #10
Hi, Richard,
I have sent out my v3[1], and (probably) fixed the previous issues,
please let me know if i got something wrong :)

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589471.html

Thanks,
Dan.

On 1/25/22 02:19, Richard Sandiford wrote:
> Dan Li <ashimida@linux.alibaba.com> writes:
>>>> +
>>>>       if (flag_stack_usage_info)
>>>>         current_function_static_stack_size = constant_lower_bound (frame_size);
>>>>     
>>>> @@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)
>>>>           RTX_FRAME_RELATED_P (insn) = 1;
>>>>         }
>>>>     
>>>> +  /* Pop return address from shadow call stack.  */
>>>> +  if (aarch64_shadow_call_stack_enabled ())
>>>> +	emit_insn (gen_scs_pop ());
>>>> +
>>>
>>> This looks correct, but following on from the above, I guess we continue
>>> to restore x30 from the frame in the traditional way first, and then
>>> overwrite it with the SCS value.  I think the output code would be
>>> slightly neater if we suppressed the first restore of x30.
>>>
>> Yes, the current epilogue is like:
>>       .......
>>       ldp     x29, x30, [sp], #16
>> +   ldr     x30, [x18, #-8]!
>>
>> I think may be we can keep the two x30 pops here, for two reasons:
>> 1) The implementation of scs in clang is to pop x30 twice, it might be
>> better to be consistent with clang here[1].
> 
> This doesn't seem a very compelling reason on its own, unless it's
> explicitly meant to be observable behaviour.  (But then, observed how?)
> 
>> 2) If we keep the first restore of x30, it may provide some flexibility
>> for other scenarios. For example, we can directly patch the scs_push/pop
>> insns in the binary to turn it into a binary without scs;
> 
> Is that a supported (and ideally documented) use case?  If it is,
> then it becomes important for correctness that the compiler emits
> exactly the opcodes in the original patch.  If it isn't, then the
> compiler can emit other instructions that have the same effect.
> 
> I'd like a definitive ruling on this from the kernel folks before
> the patch goes in.
> 
> If binary patching is supposed to be possible then scs_push and
> scs_pop *do* need to be separate define_insns.  But they also need
> to have some magic unspec that differentiates them from normal
> pushes and pops, e.g.:
> 
>    (set ...mem...
>         (unspec:DI [...reg...] UNSPEC_SCS_PUSH))
> 
> so that there is no chance that the pattern would be treated as
> a normal move and optimised accordingly.
> 
>>>> +;; Save X30 in the X18-based POST_INC stack (consistent with clang).
>>>> +(define_insn "scs_push"
>>>> +  [(set (mem:DI (reg:DI R18_REGNUM)) (reg:DI R30_REGNUM))
>>>> +   (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int 8)))]
>>>> +  ""
>>>> +  "str\\tx30, [x18], #8"
>>>> +  [(set_attr "type" "store_8")]
>>>> +)
>>>> +
>>>> +;; Load X30 form the X18-based PRE_DEC stack (consistent with clang).
>>>> +(define_insn "scs_pop"
>>>> +  [(set (reg:DI R18_REGNUM) (minus:DI (reg:DI R18_REGNUM) (const_int 8)))
>>>> +   (set (reg:DI R30_REGNUM) (mem:DI (reg:DI R18_REGNUM)))]
>>>> +  ""
>>>> +  "ldr\\tx30, [x18, #-8]!"
>>>> +  [(set_attr "type" "load_8")]
>>>> +)
>>>> +
>>>
>>> The two SETs here work in parallel, with the define_insn as a whole
>>> following a "read input operands, act, write output operands" sequence.
>>> The (mem: …) address therefore sees the old value of r18 rather than the
>>> new one.  Also, RTL rules say that subtractions need to be written as
>>> additions of the negative.
>>>
>> Oh, sorry, I got it wrong here.
>>
>>> I think the pattern would therefore be something like:
>>>
>>>     [(set (reg:DI R30_REGNUM) (mem:DI (plus:DI (reg:DI R18_REGNUM)
>>> 					     (const_int -8))))]
>>>      (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int -8)))]
>>>
>>> However, since these are normal moves, I think we should be able to use
>>> the standard move patterns.  E.g. the push could be:
>>>
>>> (define_expand "scs_push"
>>>     [(set (mem:DI (pre_dec:DI (reg:DI R18_REGNUM)))
>>>           (reg:DI R30_REGNUM))])
>>>
>>> and similarly for the pop.
>>>
>> Thanks, this template looks better.
>>
>> Since the push/pop of SCS and normal stack in clang are different
>> (for example, scs push is post_inc, while normal stack is pre_dec),
>> in order to maintain consistency, I think it can be changed to the
>> following:
>>
>> (define_expand "scs_push"
>>     [(set (mem:DI (post_inc:DI (reg:DI R18_REGNUM)))
>>           (reg:DI R30_REGNUM))])
>>
>> (define_expand "scs_pop"
>>     [(set (reg:DI R30_REGNUM)
>>          (mem:DI (pre_dec:DI (reg:DI R18_REGNUM))))])
> 
> Yeah, I copied the wrong name above, sorry.
> 
> Thanks,
> Richard
  
Richard Sandiford Jan. 31, 2022, 4:26 p.m. UTC | #11
Thanks for the discussion and sorry for the slow reply, was out most of
last week.

Dan Li <ashimida@linux.alibaba.com> writes:
> Thanks, Ard,
>
> On 1/26/22 00:10, Ard Biesheuvel wrote:
>> On Wed, 26 Jan 2022 at 08:53, Dan Li <ashimida@linux.alibaba.com> wrote:
>>>
>>> Hi, all,
>>>
>>> Sorry for bothering.
>>>
>>> I'm trying to commit aarch64 scs code to the gcc and there is an issue
>>> that I'm not sure about, could someone give me some suggestions?
>>> (To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) )
>>>
>>> When clang enables scs, the following instructions are usually generated:
>>>
>>> str     x30, [x18], 8
>>> ldp     x29, x30, [sp], 16
>>> ......
>>> ldp     x29, x30, [sp], 16              ## x30 pop
>>> ldr     x30, [x18, -8]!                 ## x30 pop again
>>> ret
>>>
>>> The x30 register is popped twice here, Richard suggested that we can
>>> omit the first x30 pop here.
>>>
>>> AFAICT, it seems fine and also safe for SCS. But I'm not sure if I'm
>>> missing something with the kernel, could someone give some suggestions?
>>>
>>> The previous discussion can be found here [1].
>>>
>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589257.html
>>>
>> 
>> As was pointed out in the discussion, binary patching is in fact a
>> concern for the Linux kernel. E.g., Android uses generic binary
>> builds, but we would like to be able to switch between pointer
>> authentication and shadow call stack at boot time, rather than always
>> support both, and take the SCS performance hit on systems that
>> implement PAC as well.
>> 
>> However, it seems more straight-forward to patch PACIASP and AUTIASP
>> instructions into SCS push/pop instructions rather than the other way
>> around, as we can force the use of these exact opcodes [in the NOP
>> space]), as well as rely on existing unwind annotations to locate any
>> such instruction in the binary.
>> 
>
> Well, then I think I don't need to submit a kernel patch to
> enable SCS for gcc :)
>
> BTW:
> Do we have a plan to submit patches of dynamic patch PAC into
> the kernel recently?
>
>> So omitting the load of X30 from the ordinary stack seems fine to me.

OK, thanks.  Let's go with that for now then.  There would still be
time to change our minds before GCC 12 is released, if anyone feels
that patching SCS code would be useful.

Reading it back, I think my previous message came across as sounding
like a complaint against binary patching, which wasn't the case at all.
I think it would be fine to support patching, even if it was just for a
single vendor rather than expected to be a common case.  It's just that,
if we did want to support it, we'd need to document it as a requirement
(at least within GCC) and change the implementation accordingly.

Thanks,
Richard
  
Dan Li Feb. 2, 2022, 9:25 a.m. UTC | #12
On 1/31/22 08:26, Richard Sandiford wrote:
> Thanks for the discussion and sorry for the slow reply, was out most of
> last week.
> 
> Dan Li <ashimida@linux.alibaba.com> writes:
>> Thanks, Ard,
>>
>> On 1/26/22 00:10, Ard Biesheuvel wrote:
>>> On Wed, 26 Jan 2022 at 08:53, Dan Li <ashimida@linux.alibaba.com> wrote:
>>>>
>>>> Hi, all,
>>>>
>>>> Sorry for bothering.
>>>>
>>>> I'm trying to commit aarch64 scs code to the gcc and there is an issue
>>>> that I'm not sure about, could someone give me some suggestions?
>>>> (To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) )
>>>>
>>> So omitting the load of X30 from the ordinary stack seems fine to me.
> 
> OK, thanks.  Let's go with that for now then.  There would still be
> time to change our minds before GCC 12 is released, if anyone feels
> that patching SCS code would be useful.
>> Reading it back, I think my previous message came across as sounding
> like a complaint against binary patching, which wasn't the case at all.
> I think it would be fine to support patching, even if it was just for a
> single vendor rather than expected to be a common case.  It's just that,
> if we did want to support it, we'd need to document it as a requirement
> (at least within GCC) and change the implementation accordingly.
> 
Got it, then I'll implement this feature as discussed above and see
if we could add additional options for SCS later.

Thanks,
Dan
  

Patch

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..9b3a35c06bf 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -56,6 +56,8 @@  static tree handle_cold_attribute (tree *, tree, tree, int, bool *);
  static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
  static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  						  int, bool *);
+static tree handle_no_sanitize_shadow_call_stack_attribute (tree *, tree,
+						  tree, int, bool *);
  static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
  						 int, bool *);
  static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
@@ -454,6 +456,10 @@  const struct attribute_spec c_common_attribute_table[] =
  			      handle_no_sanitize_attribute, NULL },
    { "no_sanitize_address",    0, 0, true, false, false, false,
  			      handle_no_sanitize_address_attribute, NULL },
+  { "no_sanitize_shadow_call_stack",
+			      0, 0, true, false, false, false,
+			      handle_no_sanitize_shadow_call_stack_attribute,
+			      NULL },
    { "no_sanitize_thread",     0, 0, true, false, false, false,
  			      handle_no_sanitize_thread_attribute, NULL },
    { "no_sanitize_undefined",  0, 0, true, false, false, false,
@@ -1175,6 +1181,21 @@  handle_no_sanitize_address_attribute (tree *node, tree name, tree, int,
    return NULL_TREE;
  }
  
+/* Handle a "no_sanitize_shadow_call_stack" attribute; arguments as in
+   struct attribute_spec.handler.  */
+static tree
+handle_no_sanitize_shadow_call_stack_attribute (tree *node, tree name,
+				      tree, int, bool *no_add_attrs)
+{
+  *no_add_attrs = true;
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+    warning (OPT_Wattributes, "%qE attribute ignored", name);
+  else
+    add_no_sanitize_value (*node, SANITIZE_SHADOW_CALL_STACK);
+
+  return NULL_TREE;
+}
+
  /* Handle a "no_sanitize_thread" attribute; arguments as in
     struct attribute_spec.handler.  */
  
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 768e8fae136..150c015df21 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -893,6 +893,7 @@  void aarch64_register_pragmas (void);
  void aarch64_relayout_simd_types (void);
  void aarch64_reset_previous_fndecl (void);
  bool aarch64_return_address_signing_enabled (void);
+bool aarch64_shadow_call_stack_enabled (void);
  bool aarch64_bti_enabled (void);
  void aarch64_save_restore_target_globals (tree);
  void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 699c105a42a..5a36a459f4e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -79,6 +79,7 @@ 
  #include "tree-ssa-loop-niter.h"
  #include "fractional-cost.h"
  #include "rtlanal.h"
+#include "asan.h"
  
  /* This file should be included last.  */
  #include "target-def.h"
@@ -7799,6 +7800,24 @@  aarch64_return_address_signing_enabled (void)
  	      && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0)));
  }
  
+/* Return TRUE if shadow call stack should be enabled for the current
+   function, otherwise return FALSE.  */
+
+bool
+aarch64_shadow_call_stack_enabled (void)
+{
+  /* This function should only be called after frame laid out.  */
+  gcc_assert (cfun->machine->frame.laid_out);
+
+  if (crtl->calls_eh_return)
+    return false;
+
+  /* We only deal with a function if its LR is pushed onto stack
+     and attribute no_sanitize_shadow_call_stack is not specified.  */
+  return (sanitize_flags_p (SANITIZE_SHADOW_CALL_STACK)
+	  && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0));
+}
+
  /* Return TRUE if Branch Target Identification Mechanism is enabled.  */
  bool
  aarch64_bti_enabled (void)
@@ -8810,6 +8829,10 @@  aarch64_expand_prologue (void)
        RTX_FRAME_RELATED_P (insn) = 1;
      }
  
+  /* Push return address to shadow call stack.  */
+  if (aarch64_shadow_call_stack_enabled ())
+	emit_insn (gen_scs_push ());
+
    if (flag_stack_usage_info)
      current_function_static_stack_size = constant_lower_bound (frame_size);
  
@@ -9066,6 +9089,10 @@  aarch64_expand_epilogue (bool for_sibcall)
        RTX_FRAME_RELATED_P (insn) = 1;
      }
  
+  /* Pop return address from shadow call stack.  */
+  if (aarch64_shadow_call_stack_enabled ())
+	emit_insn (gen_scs_pop ());
+
    /* We prefer to emit the combined return/authenticate instruction RETAA,
       however there are three cases in which we must instead emit an explicit
       authentication instruction.
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 2792bb29adb..bdd50a8d0a1 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -100,6 +100,17 @@ 
     generating stack clash probes.  */
  #define STACK_CLASH_MAX_UNROLL_PAGES 4
  
+/* This value represents whether the shadow call stack is implemented on
+   the target platform.  */
+#define TARGET_SUPPORT_SHADOW_CALL_STACK true
+
+#define TARGET_CHECK_SCS_RESERVED_REGISTER()	\
+  do {						\
+    if (!fixed_regs[R18_REGNUM])		\
+      error ("%<-fsanitize=shadow-call-stack%> only "	\
+	     "allowed with %<-ffixed-x18%>");	\
+  } while (0)
+
  /* The architecture reserves all bits of the address for hardware use,
     so the vbit must go into the delta field of pointers to member
     functions.  This is the same config as that in the AArch32
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1a39470a1fe..8e68a6f793d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6994,6 +6994,24 @@  (define_insn "xpaclri"
    "hint\t7 // xpaclri"
  )
  
+;; Save X30 in the X18-based POST_INC stack (consistent with clang).
+(define_insn "scs_push"
+  [(set (mem:DI (reg:DI R18_REGNUM)) (reg:DI R30_REGNUM))
+   (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int 8)))]
+  ""
+  "str\\tx30, [x18], #8"
+  [(set_attr "type" "store_8")]
+)
+
+;; Load X30 form the X18-based PRE_DEC stack (consistent with clang).
+(define_insn "scs_pop"
+  [(set (reg:DI R18_REGNUM) (minus:DI (reg:DI R18_REGNUM) (const_int 8)))
+   (set (reg:DI R30_REGNUM) (mem:DI (reg:DI R18_REGNUM)))]
+  ""
+  "ldr\\tx30, [x18, #-8]!"
+  [(set_attr "type" "load_8")]
+)
+
  ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
  ;; all of memory.  This blocks insns from being moved across this point.
  
diff --git a/gcc/defaults.h b/gcc/defaults.h
index bb68d0d1a79..0f1719a3bb5 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1172,6 +1172,10 @@  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
  #define PCC_BITFIELD_TYPE_MATTERS false
  #endif
  
+#ifndef TARGET_SUPPORT_SHADOW_CALL_STACK
+#define TARGET_SUPPORT_SHADOW_CALL_STACK false
+#endif
+
  #ifndef INSN_SETS_ARE_DELAYED
  #define INSN_SETS_ARE_DELAYED(INSN) false
  #endif
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index eee4c6737bb..16c3fc367fa 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3453,6 +3453,13 @@  The @code{no_address_safety_analysis} is a deprecated alias of the
  @code{no_sanitize_address} attribute, new code should use
  @code{no_sanitize_address}.
  
+@item no_sanitize_shadow_call_stack
+@cindex @code{no_sanitize_shadow_call_stack} function attribute
+The @code{no_sanitize_shadow_call_stack} attribute on functions is used
+to inform the compiler that it should not instrument shadow call stack
+push/pop instructions in the function when compiling with the
+@option{-fsanitize=shadow-call-stack} option.
+
  @item no_sanitize_thread
  @cindex @code{no_sanitize_thread} function attribute
  The @code{no_sanitize_thread} attribute on functions is used
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 71992b8c597..2c134c68e83 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15224,6 +15224,35 @@  add @code{detect_invalid_pointer_pairs=2} to the environment variable
  @env{ASAN_OPTIONS}. Using @code{detect_invalid_pointer_pairs=1} detects
  invalid operation only when both pointers are non-null.
  
+@item -fsanitize=shadow-call-stack
+@opindex fsanitize=shadow-call-stack
+Enable ShadowCallStack, a security enhancement mechanism used to protect
+programs against return address overwrites (e.g. stack buffer overflows.)
+It works by saving a function's return address to a separately allocated
+shadow call stack in the function prologue and restore the return address
+from the shadow call stack in the function epilogue.  Instrumentation only
+occurs in functions that need to save the return address to the stack.
+
+Currently it only supports the aarch64 platform and only works fine in
+the linux kernel that implements CONFIG_SHADOW_CALL_STACK.
+For the user space programs, runtime support is not currently provided
+in libc and libgcc.  Users who want to use this feature in user space need
+to provide their own support for the runtime.  It should be noted that
+this may cause the ABI rules to be broken.
+
+On aarch64, the instrumentation makes use of the platform register 'x18'.
+This generally means that any code that may run on the same thread as code
+compiled with ShadowCallStack must be compiled with the flag
+@option{-ffixed-x18}, otherwise functions compiled without
+@option{-ffixed-x18} may clobber x18 and break scs.
+
+And also because there is no runtime support, the code compiled with
+ShadowCallStack cannot use exception handling, @option{-fno-exceptions}
+also needs to be specified.
+
+See @uref{https://clang.llvm.org/docs/ShadowCallStack.html} for more
+details.
+
  @item -fsanitize=thread
  @opindex fsanitize=thread
  Enable ThreadSanitizer, a fast data race detector.
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index a5a637160d7..c22ef35a289 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -321,6 +321,8 @@  enum sanitize_code {
    SANITIZE_HWADDRESS = 1UL << 28,
    SANITIZE_USER_HWADDRESS = 1UL << 29,
    SANITIZE_KERNEL_HWADDRESS = 1UL << 30,
+  /* Shadow Call Stack.  */
+  SANITIZE_SHADOW_CALL_STACK = 1UL << 31,
    SANITIZE_SHIFT = SANITIZE_SHIFT_BASE | SANITIZE_SHIFT_EXPONENT,
    SANITIZE_UNDEFINED = SANITIZE_SHIFT | SANITIZE_DIVIDE | SANITIZE_UNREACHABLE
  		       | SANITIZE_VLA | SANITIZE_NULL | SANITIZE_RETURN
diff --git a/gcc/opts-global.c b/gcc/opts-global.c
index 55273822ec5..98e1d636f7f 100644
--- a/gcc/opts-global.c
+++ b/gcc/opts-global.c
@@ -477,4 +477,10 @@  handle_common_deferred_options (void)
  	  gcc_unreachable ();
  	}
      }
+
+#ifdef TARGET_CHECK_SCS_RESERVED_REGISTER
+  if (flag_sanitize & SANITIZE_SHADOW_CALL_STACK)
+    TARGET_CHECK_SCS_RESERVED_REGISTER ();
+#endif
+
  }
diff --git a/gcc/opts.c b/gcc/opts.c
index 4472cec1b98..fc84a07a4ab 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1308,6 +1308,17 @@  finish_options (struct gcc_options *opts, struct gcc_options *opts_set,
      sorry ("transactional memory is not supported with "
  	   "%<-fsanitize=kernel-address%>");
  
+  if (opts->x_flag_sanitize & SANITIZE_SHADOW_CALL_STACK)
+    {
+      if (!TARGET_SUPPORT_SHADOW_CALL_STACK)
+	error_at (loc, "%<-fsanitize=shadow-call-stack%> not supported "
+		  "in current platform");
+
+      if (opts->x_flag_exceptions)
+	error_at (loc, "%<-fsanitize=shadow-call-stack%> only allowed "
+		  "with %<-fno-exceptions%>");
+    }
+
    /* Currently live patching is not support for LTO.  */
    if (opts->x_flag_live_patching && opts->x_flag_lto)
      sorry ("live patching is not supported with LTO");
@@ -1994,6 +2005,7 @@  const struct sanitizer_opts_s sanitizer_opts[] =
    SANITIZER_OPT (vptr, SANITIZE_VPTR, true),
    SANITIZER_OPT (pointer-overflow, SANITIZE_POINTER_OVERFLOW, true),
    SANITIZER_OPT (builtin, SANITIZE_BUILTIN, true),
+  SANITIZER_OPT (shadow-call-stack, SANITIZE_SHADOW_CALL_STACK, false),
    SANITIZER_OPT (all, ~0U, true),
  #undef SANITIZER_OPT
    { NULL, 0U, 0UL, false }
diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
new file mode 100644
index 00000000000..b8234ca0677
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
@@ -0,0 +1,6 @@ 
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=shadow-call-stack" } */
+
+int i;
+
+/* { dg-error "'-fsanitize=shadow-call-stack' only allowed with '-ffixed-x18'" "" {target "aarch64*-*-*" } 0 } */
diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
new file mode 100644
index 00000000000..b0c663f26d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
@@ -0,0 +1,6 @@ 
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=shadow-call-stack -ffixed-x18 -fexceptions" } */
+
+int i;
+
+/* { dg-error "'-fsanitize=shadow-call-stack' only allowed with '-fno-exceptions'" "" {target "aarch64*-*-*" } 0 } */
diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
new file mode 100644
index 00000000000..511b5f0d693
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
@@ -0,0 +1,45 @@ 
+/* Testing shadow call stack.  */
+/* scs_push: str x30, [x18], #8 */
+/* scs_pop: ldr x30, [x18, #-8]! */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsanitize=shadow-call-stack -ffixed-x18 -fno-exceptions" } */
+
+int foo (int);
+
+/* function not use x30.  */
+int func1 (void)
+{
+  return 0;
+}
+
+/* function use x30.  */
+int func2 (void)
+{
+  /* scs push */
+  asm volatile ("":::"x30");
+
+  return 0;
+  /* scs pop */
+}
+
+/* sibcall.  */
+int func3 (int a, int b)
+{
+  /* scs push */
+  asm volatile ("":::"x30");
+
+  return foo (a+b);
+  /* scs pop */
+}
+
+/* eh_return.  */
+int func4 (long offset, void *handler)
+{
+  /* Do not emit scs push/pop */
+  asm volatile ("":::"x30");
+
+  __builtin_eh_return (offset, handler);
+}
+
+/* { dg-final { scan-assembler-times "str\tx30, \\\[x18\\\], #8" 2 } } */
+/* { dg-final { scan-assembler-times "ldr\tx30, \\\[x18, #-8\\\]!" 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c
new file mode 100644
index 00000000000..a1b1cf16655
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c
@@ -0,0 +1,18 @@ 
+/* Testing the disable of shadow call stack.  */
+/* scs_push: str x30, [x18], #8 */
+/* scs_pop: ldr x30, [x18, #-8]! */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsanitize=shadow-call-stack -ffixed-x18 -fno-exceptions" } */
+
+int foo (int);
+
+/* function disable shadow call stack.  */
+int __attribute__((no_sanitize_shadow_call_stack)) func1 (void)
+{
+  asm volatile ("":::"x30");
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "str\tx30, \\\[x18\\\], #8" } } */
+/* { dg-final { scan-assembler-not "ldr\tx30, \\\[x18, #-8\\\]!" } } */