[V2] AArch64 pauth: Indicate unmasked addresses in backtrace

Message ID 20190730144123.11135-1-alan.hayward@arm.com
State New, archived
Headers

Commit Message

Alan Hayward July 30, 2019, 2:41 p.m. UTC
  [Updated after reviews:
 *Marker changed to [PAC].
 *Did not change address column width, as didn't want to have to pre-parse all
 the frames that can fit on the screen in order to determine if extra spaces
 are needed, as that seemed too heavyweight.  Considered setting a bool so
 that once a single address requires PAC, then all backtraces would have the
 extra spaces, but felt this would cause too many spaces when not required.
 *Added to python backtrace, but did expose it in python.
 *Added documentation
]

Armv8.3-a Pointer Authentication causes the function return address to be
obfuscated on entry to some functions. GDB must unmask the link register in
order to produce a backtrace.

The following patch adds markers of [PAC] to the bracktrace, to indicate
which addresses needed unmasking.  This includes the backtrace when using MI.

For example, consider the following backtrace:

(gdb) bt
0  0x0000000000400490 in puts@plt ()
1  0x00000000004005dc in foo ("hello") at cbreak-lib.c:6
2  0x0000000000400604 [PAC] in bar () at cbreak-lib.c:12
3  0x0000000000400620 [PAC] in main2 () at cbreak.c:17
4  0x00000000004005b4 in main () at cbreak-3.c:10

The functions in cbreak-lib use pointer auth, which masks the return address
to the previous function, causing the addresses of bar (in the library) and main2
(in the main binary) to require unmasking in order to unwind the backtrace.

An extra bool is added alongside the prev_pc in the frame structure.  At the
point at which the link register is unmasked, the AArch64 port calls into frame
to sets the bool.  This is the most efficient way of doing it.

The marker is also added to the python frame printer, which is always printed if
set.  The marker is not explicitly exposed to the python code.

I expect this will potentially cause issues with some tests in the testsuite
when Armv8.3 pointer authentication is used.  This should be fixed up in the
the future once real hardware is available for full testsuite testing.

gdb/ChangeLog:

2019-07-30  Alan Hayward  <alan.hayward@arm.com>

        * NEWS: Expand the Pointer Authentication entry.
        * aarch64-tdep.c (aarch64_frame_unmask_address): Rename from this.
        (aarch64_frame_unmask_lr): ... to this.
        (aarch64_prologue_prev_register, aarch64_dwarf2_prev_register):
        Call aarch64_frame_unmask_lr.
        * frame.c (struct frame_info): Add "masked" variable.
        (frame_set_previous_pc_masked) (frame_get_pc_masked): New functions.
        (fprint_frame): Check for masked pc.
        * frame.h (frame_set_previous_pc_masked) (frame_get_pc_masked): New
        declarations.
	* python/py-framefilter.c (py_print_frame): Check for masked pc.
        * stack.c (print_frame): Check for masked pc.

gdb/doc/ChangeLog:

2019-07-30  Alan Hayward  <alan.hayward@arm.com>

        * gdb.texinfo (AArch64 Pointer Authentication): New subsection.
---
 gdb/NEWS                    |  4 +++-
 gdb/aarch64-tdep.c          | 17 ++++++++++-------
 gdb/doc/gdb.texinfo         |  8 ++++++++
 gdb/frame.c                 | 28 ++++++++++++++++++++++++++--
 gdb/frame.h                 |  9 +++++++++
 gdb/python/py-framefilter.c |  2 ++
 gdb/stack.c                 |  6 +++++-
 7 files changed, 63 insertions(+), 11 deletions(-)
  

Comments

Tom Tromey Aug. 6, 2019, 4:15 p.m. UTC | #1
>>>>> "Alan" == Alan Hayward <Alan.Hayward@arm.com> writes:

Alan> Armv8.3-a Pointer Authentication causes the function return address to be
Alan> obfuscated on entry to some functions. GDB must unmask the link register in
Alan> order to produce a backtrace.

Alan> The following patch adds markers of [PAC] to the bracktrace, to indicate
Alan> which addresses needed unmasking.  This includes the backtrace when using MI.

Thanks for the patch.  I don't think this was re-reviewed?

Alan> --- a/gdb/frame.c
Alan> +++ b/gdb/frame.c
Alan> @@ -124,6 +124,8 @@ struct frame_info
Alan>    struct {
Alan>      enum cached_copy_status status;
Alan>      CORE_ADDR value;
Alan> +    /* Did VALUE require unmasking when being read.  */
Alan> +    bool masked;
Alan>    } prev_pc;

I think putting the new field between "status" and "value" will pack the
structure a bit better.

Alan> +bool
Alan> +get_frame_pc_masked (struct frame_info *frame)

Maybe make the parameter "const"?  I see there isn't much use of const
in frame.h, but on the other hand, that's probably just historical and
not required.

Otherwise the patch looks fine to me.  It is ok.  I'd prefer those
changes but if you'd prefer not, a note to that effect is fine.

thanks,
Tom
  
Pedro Alves Aug. 7, 2019, 7:24 p.m. UTC | #2
On 7/30/19 3:41 PM, Alan Hayward wrote:

> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
> index 0fcd131f71..b7dba2f918 100644
> --- a/gdb/doc/gdb.texinfo
> +++ b/gdb/doc/gdb.texinfo
> @@ -24380,6 +24380,14 @@ but the lengths of the @code{z} and @code{p} registers will not change.  This
>  is a known limitation of @value{GDBN} and does not affect the execution of the
>  target process.
>  
> +@subsubsection AArch64 Pointer Authentication.
> +@cindex AArch64 Pointer Authentication.
> +
> +When @value{GDBN} is debugging the AArch64 architecture, and the program is
> +using the v8.3-A feature Pointer Authentication (PAC), then whenever the link
> +register @code{$lr} is pointing to an PAC function it's value will be masked.

s/it's value/its value/

> +When GDB prints a backtrace, any addresses that required unmasking will be
> +postfixed with the marker [PAC].
>  

> diff --git a/gdb/python/py-framefilter.c b/gdb/python/py-framefilter.c
> index a2a96ac0d3..d805ec68f2 100644
> --- a/gdb/python/py-framefilter.c
> +++ b/gdb/python/py-framefilter.c
> @@ -901,6 +901,8 @@ py_print_frame (PyObject *filter, frame_filter_flags flags,
>  	    {
>  	      annotate_frame_address ();
>  	      out->field_core_addr ("addr", gdbarch, address);
> +	      if (get_frame_pc_masked (frame))
> +		out->field_string ("pac", " [PAC]");
>  	      annotate_frame_address_end ();
>  	      out->text (" in ");
>  	    }
> diff --git a/gdb/stack.c b/gdb/stack.c
> index 7833ca4aeb..9d49809895 100644
> --- a/gdb/stack.c
> +++ b/gdb/stack.c
> @@ -1298,7 +1298,11 @@ print_frame (const frame_print_options &fp_opts,
>  	{
>  	  annotate_frame_address ();
>  	  if (pc_p)
> -	    uiout->field_core_addr ("addr", gdbarch, pc);
> +	    {
> +	      uiout->field_core_addr ("addr", gdbarch, pc);
> +	      if (get_frame_pc_masked (frame))
> +		uiout->field_string ("pac", " [PAC]");

Hmm, I had suggested considering MI in the previous iteration, but
I was just thinking of including the "[PAC]" text in the
"addr" field.  If we're adding a new field, then a few extra
things need to be considered:

 #1 - documentation, both manual and NEWS should mention this new MI field.

 #2 - calling the attribute "pac" makes it architecture specific. 
      I.e., to make use of it, a frontend will have to have Aarch64 awareness?
      Not sure that is a good thing.

 #3 - The MI attribute is called "pac", and its content is
      literally " [PAC]".  I'd find that odd if I were a frontend author:
      the content is right aligned with a space, making doing anything with
      it other than appending it to the address text probably look odd,
      unless you bake in awareness of the attribute's text...  If I saw
      an attribute named "pac", I'd expect it to be a boolean?  At the
      least, the left space should not be part of the field, I think?
      Maybe we should rename the field to something else, like "addr_attr"
     for "address attributes" or something.

Thanks,
Pedro Alves
  
Tom Tromey Aug. 8, 2019, 4:58 p.m. UTC | #3
>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:

Pedro> Hmm, I had suggested considering MI in the previous iteration, but
Pedro> I was just thinking of including the "[PAC]" text in the
Pedro> "addr" field.  If we're adding a new field, then a few extra
Pedro> things need to be considered:

Pedro>  #1 - documentation, both manual and NEWS should mention this new MI field.

Oops, I forgot about this.  Sorry about that.

I don't think putting this information into the "addr" field is a good
idea.  It's better, IMO, to let MI field names provide the structure,
rather than requiring clients to also parse the values of fields.

I realize MI isn't 100% clean on this topic, but we can still not make
it worse.

Pedro>  #2 - calling the attribute "pac" makes it architecture specific. 

I don't think this is such a big deal but at the same time any
reasonable name is fine by me.

Pedro>  #3 - The MI attribute is called "pac", and its content is
Pedro>       literally " [PAC]".  I'd find that odd if I were a frontend author:
Pedro>       the content is right aligned with a space, making doing anything with
Pedro>       it other than appending it to the address text probably look odd,
Pedro>       unless you bake in awareness of the attribute's text...  If I saw
Pedro>       an attribute named "pac", I'd expect it to be a boolean?  At the
Pedro>       least, the left space should not be part of the field, I think?

I think part of the pain here is an internal constraint, namely that the
CLI ui-out wouldn't know to rewrite the boolean value to something else
here.  But perhaps that's something that could just be addressed
directly.

Tom
  
Pedro Alves Aug. 9, 2019, 2:11 p.m. UTC | #4
On 8/8/19 5:58 PM, Tom Tromey wrote:
>>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:
> 
> Pedro> Hmm, I had suggested considering MI in the previous iteration, but
> Pedro> I was just thinking of including the "[PAC]" text in the
> Pedro> "addr" field.  If we're adding a new field, then a few extra
> Pedro> things need to be considered:
> 
> Pedro>  #1 - documentation, both manual and NEWS should mention this new MI field.
> 
> Oops, I forgot about this.  Sorry about that.
> 
> I don't think putting this information into the "addr" field is a good
> idea.  It's better, IMO, to let MI field names provide the structure,
> rather than requiring clients to also parse the values of fields.

True, though I observe that the addr field isn't strictly numeric,
since it can contain "<unavailable>".

> 
> I realize MI isn't 100% clean on this topic, but we can still not make
> it worse.

Can't disagree with that.

> 
> Pedro>  #2 - calling the attribute "pac" makes it architecture specific. 
> 
> I don't think this is such a big deal but at the same time any
> reasonable name is fine by me.

It depends on what is the "pac" attribute intended for.  If it is strictly
about Aarch64 PAC, then when we need to expose some other attribute, either
for Aarch64 or some other architecture, it's better to not reuse "pac",
since it may conflict with whatever use IDEs may give to "pac", assuming
it really is about PAC, only.  And if that's the intended use, it seems
better to make it a boolean, since "[PAC]" is CLI's presentation, not the
datum.

If OTOH we assume that later on, other address attributes would be displayed
here, like "[PAC,FOO,BAR]", then it seems better to me to name it and
document it like that from the get go, to avoid causing confusion to
MI clients down the road if/when such attributes start showing, and
also avoid having to have a future discussion of what MI attribute
should be used to output my "[FOO]" attribute, given that "pac" is
seemingly about PAC only.

I.e. I think it's better to nail down the semantics upfront, and if
the consensus is to make it a "generic address attributes" field, then
I'd rather name it and document it as such from the get go.

> 
> Pedro>  #3 - The MI attribute is called "pac", and its content is
> Pedro>       literally " [PAC]".  I'd find that odd if I were a frontend author:
> Pedro>       the content is right aligned with a space, making doing anything with
> Pedro>       it other than appending it to the address text probably look odd,
> Pedro>       unless you bake in awareness of the attribute's text...  If I saw
> Pedro>       an attribute named "pac", I'd expect it to be a boolean?  At the
> Pedro>       least, the left space should not be part of the field, I think?
> 
> I think part of the pain here is an internal constraint, namely that the
> CLI ui-out wouldn't know to rewrite the boolean value to something else
> here.  But perhaps that's something that could just be addressed
> directly.

There's always 'uiout->is_mi_like_p()', but I'm not advocating
making this a boolean (and not advocating not making it a boolean
either).

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/NEWS b/gdb/NEWS
index 4e821eab4c..fc644817e7 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -16,7 +16,9 @@ 
   architectures require kernel changes.  TLS is not yet supported for
   amd64 and i386 process core dumps.
 
-* Support for Pointer Authentication on AArch64 Linux.
+* Support for Pointer Authentication (PAC) on AArch64 Linux.  Return
+  addresses that required unmasking are shown in the backtrace with the
+  postfix [PAC].
 
 * Two new convenience functions $_cimag and $_creal that extract the
   imaginary and real parts respectively from complex numbers.
diff --git a/gdb/aarch64-tdep.c b/gdb/aarch64-tdep.c
index e23cc3f956..9b6324f0fc 100644
--- a/gdb/aarch64-tdep.c
+++ b/gdb/aarch64-tdep.c
@@ -250,13 +250,13 @@  class instruction_reader : public abstract_instruction_reader
 
 } // namespace
 
-/* If address signing is enabled, mask off the signature bits from ADDR, using
-   the register values in THIS_FRAME.  */
+/* If address signing is enabled, mask off the signature bits from the link
+   register, which is passed by value in ADDR, using the register values in
+   THIS_FRAME.  */
 
 static CORE_ADDR
-aarch64_frame_unmask_address (struct gdbarch_tdep *tdep,
-			      struct frame_info *this_frame,
-			      CORE_ADDR addr)
+aarch64_frame_unmask_lr (struct gdbarch_tdep *tdep,
+			 struct frame_info *this_frame, CORE_ADDR addr)
 {
   if (tdep->has_pauth ()
       && frame_unwind_register_unsigned (this_frame,
@@ -265,6 +265,9 @@  aarch64_frame_unmask_address (struct gdbarch_tdep *tdep,
       int cmask_num = AARCH64_PAUTH_CMASK_REGNUM (tdep->pauth_reg_base);
       CORE_ADDR cmask = frame_unwind_register_unsigned (this_frame, cmask_num);
       addr = addr & ~cmask;
+
+      /* Record in the frame that the link register required unmasking.  */
+      set_frame_previous_pc_masked (this_frame);
     }
 
   return addr;
@@ -952,7 +955,7 @@  aarch64_prologue_prev_register (struct frame_info *this_frame,
       if (tdep->has_pauth ()
 	  && trad_frame_value_p (cache->saved_regs,
 				 tdep->pauth_ra_state_regnum))
-	lr = aarch64_frame_unmask_address (tdep, this_frame, lr);
+	lr = aarch64_frame_unmask_lr (tdep, this_frame, lr);
 
       return frame_unwind_got_constant (this_frame, prev_regnum, lr);
     }
@@ -1119,7 +1122,7 @@  aarch64_dwarf2_prev_register (struct frame_info *this_frame,
     {
     case AARCH64_PC_REGNUM:
       lr = frame_unwind_register_unsigned (this_frame, AARCH64_LR_REGNUM);
-      lr = aarch64_frame_unmask_address (tdep, this_frame, lr);
+      lr = aarch64_frame_unmask_lr (tdep, this_frame, lr);
       return frame_unwind_got_constant (this_frame, regnum, lr);
 
     default:
diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index 0fcd131f71..b7dba2f918 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -24380,6 +24380,14 @@  but the lengths of the @code{z} and @code{p} registers will not change.  This
 is a known limitation of @value{GDBN} and does not affect the execution of the
 target process.
 
+@subsubsection AArch64 Pointer Authentication.
+@cindex AArch64 Pointer Authentication.
+
+When @value{GDBN} is debugging the AArch64 architecture, and the program is
+using the v8.3-A feature Pointer Authentication (PAC), then whenever the link
+register @code{$lr} is pointing to an PAC function it's value will be masked.
+When GDB prints a backtrace, any addresses that required unmasking will be
+postfixed with the marker [PAC].
 
 @node i386
 @subsection x86 Architecture-specific Issues
diff --git a/gdb/frame.c b/gdb/frame.c
index 84e0397db9..6046b3ba8d 100644
--- a/gdb/frame.c
+++ b/gdb/frame.c
@@ -124,6 +124,8 @@  struct frame_info
   struct {
     enum cached_copy_status status;
     CORE_ADDR value;
+    /* Did VALUE require unmasking when being read.  */
+    bool masked;
   } prev_pc;
   
   /* Cached copy of the previous frame's function address.  */
@@ -161,6 +163,25 @@  struct frame_info
   const char *stop_string;
 };
 
+/* See frame.h.  */
+
+void
+set_frame_previous_pc_masked (struct frame_info *frame)
+{
+  frame->prev_pc.masked = true;
+}
+
+/* See frame.h.  */
+
+bool
+get_frame_pc_masked (struct frame_info *frame)
+{
+  gdb_assert (frame->next != nullptr);
+  gdb_assert (frame->next->prev_pc.status == CC_VALUE);
+
+  return frame->next->prev_pc.masked;
+}
+
 /* A frame stash used to speed up frame lookups.  Create a hash table
    to stash frames previously accessed from the frame cache for
    quicker subsequent retrieval.  The hash table is emptied whenever
@@ -429,8 +450,11 @@  fprint_frame (struct ui_file *file, struct frame_info *fi)
   if (fi->next == NULL || fi->next->prev_pc.status == CC_UNKNOWN)
     fprintf_unfiltered (file, "<unknown>");
   else if (fi->next->prev_pc.status == CC_VALUE)
-    fprintf_unfiltered (file, "%s",
-			hex_string (fi->next->prev_pc.value));
+    {
+      fprintf_unfiltered (file, "%s", hex_string (fi->next->prev_pc.value));
+      if (fi->next->prev_pc.masked)
+	fprintf_unfiltered (file, "[PAC]");
+    }
   else if (fi->next->prev_pc.status == CC_NOT_SAVED)
     val_print_not_saved (file);
   else if (fi->next->prev_pc.status == CC_UNAVAILABLE)
diff --git a/gdb/frame.h b/gdb/frame.h
index ccc285005a..501708ef4c 100644
--- a/gdb/frame.h
+++ b/gdb/frame.h
@@ -949,4 +949,13 @@  extern const gdb::option::option_def set_backtrace_option_defs[2];
 /* The values behind the global "set backtrace ..." settings.  */
 extern set_backtrace_options user_set_backtrace_options;
 
+/* Mark that the PC value is masked for the previous frame.  */
+
+extern void set_frame_previous_pc_masked (struct frame_info *frame);
+
+/* Get whether the PC value is masked for the given frame.  */
+
+extern bool get_frame_pc_masked (struct frame_info *frame);
+
+
 #endif /* !defined (FRAME_H)  */
diff --git a/gdb/python/py-framefilter.c b/gdb/python/py-framefilter.c
index a2a96ac0d3..d805ec68f2 100644
--- a/gdb/python/py-framefilter.c
+++ b/gdb/python/py-framefilter.c
@@ -901,6 +901,8 @@  py_print_frame (PyObject *filter, frame_filter_flags flags,
 	    {
 	      annotate_frame_address ();
 	      out->field_core_addr ("addr", gdbarch, address);
+	      if (get_frame_pc_masked (frame))
+		out->field_string ("pac", " [PAC]");
 	      annotate_frame_address_end ();
 	      out->text (" in ");
 	    }
diff --git a/gdb/stack.c b/gdb/stack.c
index 7833ca4aeb..9d49809895 100644
--- a/gdb/stack.c
+++ b/gdb/stack.c
@@ -1298,7 +1298,11 @@  print_frame (const frame_print_options &fp_opts,
 	{
 	  annotate_frame_address ();
 	  if (pc_p)
-	    uiout->field_core_addr ("addr", gdbarch, pc);
+	    {
+	      uiout->field_core_addr ("addr", gdbarch, pc);
+	      if (get_frame_pc_masked (frame))
+		uiout->field_string ("pac", " [PAC]");
+	    }
 	  else
 	    uiout->field_string ("addr", "<unavailable>",
 				 ui_out_style_kind::ADDRESS);