[0/2] Add SCFI support for aarch64

Message ID 20240411074407.1429624-1-indu.bhagat@oracle.com
Headers
Series Add SCFI support for aarch64 |

Message

Indu Bhagat April 11, 2024, 7:44 a.m. UTC
  Hello,

This patch series extends GAS support for SCFI to aarch64.

Since Binutils 2.42, GAS has experimental support for synthesizing CFI (SCFI)
for hand-written asm for the x86 backend.  This is invoked via
--scfi=experimental on the hand-written asm.  SCFI aims to relieve users from
the overhead of writing and maintaining CFI directives in hand-written asm.

One of the ways of hardening the SCFI feature in GAS is to extend support to
another major architecture.  This would also allow exercising SCFI on more
workloads.

Background
-----------
Some background notes on SCFI are present on the wiki
https://sourceware.org/binutils/wiki/gas/SCFI.  I will refrain from repeating
some of that content here for sake of brevity.

Additionally, the commit log for the first commit which added the support on
x86 may also be helpful in reviewing this series.
 - gas: x86: synthesize CFI for hand-written asm
   c7defc5386cc53a4abbb7c53a924cdac3f16aa33

For synthesizing (DWARF) CFI, the SCFI machinery requires the programmer
to adhere to some pre-requisites for their asm:
   - Hand-written asm block must begin with a .type   foo, %function
It is highly recommended to, additionally, also ensure that:
   - Hand-written asm block ends with a .size foo, .-foo

ginsns, SCFI constraints, etc.
------------------------------
ginsn is an acronym for generic GAS instruction.  This is intended to be
architecture-neutral abstraction that can be used to convey and keep semantic
information about machine instructions in an arch-neutral way in GAS.  ginsn
specification and associated interfaces can be seen in gas/ginsn.c and
gas/ginsn.h.

The SCFI algorithm itself is implemented as a couple of passes.  The following
is a gross over-simplification of the overall process; simplified to hopefully
aid the review process:

 - Create the GCFG (control flow graph) of the ginsns.
 - Process each basic block and make a note of how each instruction changes the
   SCFI state (CFA, callee-saved registers, RA).  This is done via two passes:
   forward_flow_scfi_state () and backward_flow_scfi_state ().
 - Translate SCFI ops to equivalent DWARF CFI ops or directives.

The above is implemented in gas/scfi.h and gas/scfi.c.  Also see the
gas/scfidw2gen.h and gas/scfidw2gen.c where SCFI ops are processed to finally
create the DWARF CFI directives.

Lastly, I think stating some specifics of SCFI core algorithm itself may be
helpful for the review process: Basically the SCFI machinery encodes some rules
specified in the standard ABI calling convention (e.g., set of callee-saved
registers,  how the return address is managed etc).  Apart from the rules, the
SCFI machinery employs some heuristics.  Few examples of heuristics:

   - The base register for CFA tracking may be either REG_SP or REG_FP.
   - If the base register for CFA tracking is REG_SP, the precise amount of
     stack usage (and hence, the value of REG_SP) must be known at all times.
   - If using dynamic stack allocation, the function must switch to
     FP-based CFA.  This means using instructions like the following (in
     AMD64) in prologue:
        pushq   %rbp
        movq    %rsp, %rbp
     and analogous instructions in epilogue.  In case of aarch64, this simply
     means creation of the frame record.
   - Save and Restore of callee-saved registers must be symmetrical.
     However, the SCFI machinery at this time only warns if any such
     asymmetry is seen.

These heuristics/rules are architecture-independent and are meant to
employed for all architectures/ABIs using SCFI in the future.

The SCFI paper published sometime ago
(https://sourceware.org/pipermail/binutils/2023-September/129558.html) may be a
useful resource to get additional understanding of the above.  

Known limitations 
-----------------
These are planned to be worked on in the near future:

 - The current SCFI machinery does not currently synthesize the PAC-related
   aarch64-specific CFI directives: .cfi_b_key_frame.  Other opcodes used when
   pointer authentication is enabled also need to be handled (braa, brab,
   retaa, etc.).

 - Supporting the following pattern:
   mov x16,4266
   add sp, x16, sp
   ...

 - Not a limitation per se, but a note that ATM, that predicated insns are
   skipped from ginsn translation.  IIUC, these instructions are not such that
   can be used alongside stack management ops. To be double-checked.

Thanks,

Indu Bhagat (2):
  gas: aarch64: add experimental support for SCFI
  gas: aarch64: testsuite: add new tests for SCFI

 gas/config/tc-aarch64.c                       | 744 ++++++++++++++++++
 gas/config/tc-aarch64.h                       |  20 +
 gas/testsuite/gas/scfi/README                 |   2 +-
 gas/testsuite/gas/scfi/aarch64/ginsn-cofi-1.l |  30 +
 gas/testsuite/gas/scfi/aarch64/ginsn-cofi-1.s |  16 +
 gas/testsuite/gas/scfi/aarch64/ginsn-ldst-1.l |  40 +
 gas/testsuite/gas/scfi/aarch64/ginsn-ldst-1.s |  21 +
 gas/testsuite/gas/scfi/aarch64/ginsn-misc-1.l |  32 +
 gas/testsuite/gas/scfi/aarch64/ginsn-misc-1.s |  15 +
 .../gas/scfi/aarch64/scfi-aarch64.exp         |  60 ++
 gas/testsuite/gas/scfi/aarch64/scfi-cb-1.d    |  20 +
 gas/testsuite/gas/scfi/aarch64/scfi-cb-1.l    |   2 +
 gas/testsuite/gas/scfi/aarch64/scfi-cb-1.s    |  14 +
 gas/testsuite/gas/scfi/aarch64/scfi-cfg-1.d   |  31 +
 gas/testsuite/gas/scfi/aarch64/scfi-cfg-1.l   |   2 +
 gas/testsuite/gas/scfi/aarch64/scfi-cfg-1.s   |  46 ++
 gas/testsuite/gas/scfi/aarch64/scfi-cfg-2.d   |  40 +
 gas/testsuite/gas/scfi/aarch64/scfi-cfg-2.l   |   2 +
 gas/testsuite/gas/scfi/aarch64/scfi-cfg-2.s   |  42 +
 gas/testsuite/gas/scfi/aarch64/scfi-cfg-3.d   |  32 +
 gas/testsuite/gas/scfi/aarch64/scfi-cfg-3.l   |   2 +
 gas/testsuite/gas/scfi/aarch64/scfi-cfg-3.s   |  34 +
 .../gas/scfi/aarch64/scfi-cond-br-1.d         |  20 +
 .../gas/scfi/aarch64/scfi-cond-br-1.l         |   2 +
 .../gas/scfi/aarch64/scfi-cond-br-1.s         |  13 +
 gas/testsuite/gas/scfi/aarch64/scfi-diag-1.l  |   2 +
 gas/testsuite/gas/scfi/aarch64/scfi-diag-1.s  |   6 +
 gas/testsuite/gas/scfi/aarch64/scfi-diag-2.l  |   3 +
 gas/testsuite/gas/scfi/aarch64/scfi-diag-2.s  |  25 +
 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-1.d  |  59 ++
 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-1.l  |   2 +
 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-1.s  |  52 ++
 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-2.d  |  33 +
 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-2.l  |   2 +
 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-2.s  |  26 +
 gas/testsuite/gas/scfi/aarch64/scfi-strp-1.d  |  39 +
 gas/testsuite/gas/scfi/aarch64/scfi-strp-1.l  |   2 +
 gas/testsuite/gas/scfi/aarch64/scfi-strp-1.s  |  37 +
 gas/testsuite/gas/scfi/aarch64/scfi-strp-2.d  |  35 +
 gas/testsuite/gas/scfi/aarch64/scfi-strp-2.l  |   2 +
 gas/testsuite/gas/scfi/aarch64/scfi-strp-2.s  |  30 +
 .../gas/scfi/aarch64/scfi-unsupported-1.l     |   4 +
 .../gas/scfi/aarch64/scfi-unsupported-1.s     |  31 +
 43 files changed, 1671 insertions(+), 1 deletion(-)
 create mode 100644 gas/testsuite/gas/scfi/aarch64/ginsn-cofi-1.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/ginsn-cofi-1.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/ginsn-ldst-1.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/ginsn-ldst-1.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/ginsn-misc-1.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/ginsn-misc-1.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-aarch64.exp
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cb-1.d
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cb-1.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cb-1.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-1.d
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-1.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-1.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-2.d
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-2.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-2.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-3.d
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-3.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-3.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cond-br-1.d
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cond-br-1.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cond-br-1.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-diag-1.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-diag-1.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-diag-2.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-diag-2.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-1.d
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-1.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-1.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-2.d
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-2.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-2.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-strp-1.d
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-strp-1.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-strp-1.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-strp-2.d
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-strp-2.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-strp-2.s
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-unsupported-1.l
 create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-unsupported-1.s
  

Comments

Indu Bhagat May 1, 2024, 6:20 p.m. UTC | #1
Ping.

On 4/11/24 12:44 AM, Indu Bhagat wrote:
> Hello,
> 
> This patch series extends GAS support for SCFI to aarch64.
> 
> Since Binutils 2.42, GAS has experimental support for synthesizing CFI (SCFI)
> for hand-written asm for the x86 backend.  This is invoked via
> --scfi=experimental on the hand-written asm.  SCFI aims to relieve users from
> the overhead of writing and maintaining CFI directives in hand-written asm.
> 
> One of the ways of hardening the SCFI feature in GAS is to extend support to
> another major architecture.  This would also allow exercising SCFI on more
> workloads.
> 
> Background
> -----------
> Some background notes on SCFI are present on the wiki
> https://sourceware.org/binutils/wiki/gas/SCFI.  I will refrain from repeating
> some of that content here for sake of brevity.
> 
> Additionally, the commit log for the first commit which added the support on
> x86 may also be helpful in reviewing this series.
>   - gas: x86: synthesize CFI for hand-written asm
>     c7defc5386cc53a4abbb7c53a924cdac3f16aa33
> 
> For synthesizing (DWARF) CFI, the SCFI machinery requires the programmer
> to adhere to some pre-requisites for their asm:
>     - Hand-written asm block must begin with a .type   foo, %function
> It is highly recommended to, additionally, also ensure that:
>     - Hand-written asm block ends with a .size foo, .-foo
> 
> ginsns, SCFI constraints, etc.
> ------------------------------
> ginsn is an acronym for generic GAS instruction.  This is intended to be
> architecture-neutral abstraction that can be used to convey and keep semantic
> information about machine instructions in an arch-neutral way in GAS.  ginsn
> specification and associated interfaces can be seen in gas/ginsn.c and
> gas/ginsn.h.
> 
> The SCFI algorithm itself is implemented as a couple of passes.  The following
> is a gross over-simplification of the overall process; simplified to hopefully
> aid the review process:
> 
>   - Create the GCFG (control flow graph) of the ginsns.
>   - Process each basic block and make a note of how each instruction changes the
>     SCFI state (CFA, callee-saved registers, RA).  This is done via two passes:
>     forward_flow_scfi_state () and backward_flow_scfi_state ().
>   - Translate SCFI ops to equivalent DWARF CFI ops or directives.
> 
> The above is implemented in gas/scfi.h and gas/scfi.c.  Also see the
> gas/scfidw2gen.h and gas/scfidw2gen.c where SCFI ops are processed to finally
> create the DWARF CFI directives.
> 
> Lastly, I think stating some specifics of SCFI core algorithm itself may be
> helpful for the review process: Basically the SCFI machinery encodes some rules
> specified in the standard ABI calling convention (e.g., set of callee-saved
> registers,  how the return address is managed etc).  Apart from the rules, the
> SCFI machinery employs some heuristics.  Few examples of heuristics:
> 
>     - The base register for CFA tracking may be either REG_SP or REG_FP.
>     - If the base register for CFA tracking is REG_SP, the precise amount of
>       stack usage (and hence, the value of REG_SP) must be known at all times.
>     - If using dynamic stack allocation, the function must switch to
>       FP-based CFA.  This means using instructions like the following (in
>       AMD64) in prologue:
>          pushq   %rbp
>          movq    %rsp, %rbp
>       and analogous instructions in epilogue.  In case of aarch64, this simply
>       means creation of the frame record.
>     - Save and Restore of callee-saved registers must be symmetrical.
>       However, the SCFI machinery at this time only warns if any such
>       asymmetry is seen.
> 
> These heuristics/rules are architecture-independent and are meant to
> employed for all architectures/ABIs using SCFI in the future.
> 
> The SCFI paper published sometime ago
> (https://sourceware.org/pipermail/binutils/2023-September/129558.html) may be a
> useful resource to get additional understanding of the above.
> 
> Known limitations
> -----------------
> These are planned to be worked on in the near future:
> 
>   - The current SCFI machinery does not currently synthesize the PAC-related
>     aarch64-specific CFI directives: .cfi_b_key_frame.  Other opcodes used when
>     pointer authentication is enabled also need to be handled (braa, brab,
>     retaa, etc.).
> 
>   - Supporting the following pattern:
>     mov x16,4266
>     add sp, x16, sp
>     ...
> 
>   - Not a limitation per se, but a note that ATM, that predicated insns are
>     skipped from ginsn translation.  IIUC, these instructions are not such that
>     can be used alongside stack management ops. To be double-checked.
> 
> Thanks,
> 
> Indu Bhagat (2):
>    gas: aarch64: add experimental support for SCFI
>    gas: aarch64: testsuite: add new tests for SCFI
> 
>   gas/config/tc-aarch64.c                       | 744 ++++++++++++++++++
>   gas/config/tc-aarch64.h                       |  20 +
>   gas/testsuite/gas/scfi/README                 |   2 +-
>   gas/testsuite/gas/scfi/aarch64/ginsn-cofi-1.l |  30 +
>   gas/testsuite/gas/scfi/aarch64/ginsn-cofi-1.s |  16 +
>   gas/testsuite/gas/scfi/aarch64/ginsn-ldst-1.l |  40 +
>   gas/testsuite/gas/scfi/aarch64/ginsn-ldst-1.s |  21 +
>   gas/testsuite/gas/scfi/aarch64/ginsn-misc-1.l |  32 +
>   gas/testsuite/gas/scfi/aarch64/ginsn-misc-1.s |  15 +
>   .../gas/scfi/aarch64/scfi-aarch64.exp         |  60 ++
>   gas/testsuite/gas/scfi/aarch64/scfi-cb-1.d    |  20 +
>   gas/testsuite/gas/scfi/aarch64/scfi-cb-1.l    |   2 +
>   gas/testsuite/gas/scfi/aarch64/scfi-cb-1.s    |  14 +
>   gas/testsuite/gas/scfi/aarch64/scfi-cfg-1.d   |  31 +
>   gas/testsuite/gas/scfi/aarch64/scfi-cfg-1.l   |   2 +
>   gas/testsuite/gas/scfi/aarch64/scfi-cfg-1.s   |  46 ++
>   gas/testsuite/gas/scfi/aarch64/scfi-cfg-2.d   |  40 +
>   gas/testsuite/gas/scfi/aarch64/scfi-cfg-2.l   |   2 +
>   gas/testsuite/gas/scfi/aarch64/scfi-cfg-2.s   |  42 +
>   gas/testsuite/gas/scfi/aarch64/scfi-cfg-3.d   |  32 +
>   gas/testsuite/gas/scfi/aarch64/scfi-cfg-3.l   |   2 +
>   gas/testsuite/gas/scfi/aarch64/scfi-cfg-3.s   |  34 +
>   .../gas/scfi/aarch64/scfi-cond-br-1.d         |  20 +
>   .../gas/scfi/aarch64/scfi-cond-br-1.l         |   2 +
>   .../gas/scfi/aarch64/scfi-cond-br-1.s         |  13 +
>   gas/testsuite/gas/scfi/aarch64/scfi-diag-1.l  |   2 +
>   gas/testsuite/gas/scfi/aarch64/scfi-diag-1.s  |   6 +
>   gas/testsuite/gas/scfi/aarch64/scfi-diag-2.l  |   3 +
>   gas/testsuite/gas/scfi/aarch64/scfi-diag-2.s  |  25 +
>   gas/testsuite/gas/scfi/aarch64/scfi-ldrp-1.d  |  59 ++
>   gas/testsuite/gas/scfi/aarch64/scfi-ldrp-1.l  |   2 +
>   gas/testsuite/gas/scfi/aarch64/scfi-ldrp-1.s  |  52 ++
>   gas/testsuite/gas/scfi/aarch64/scfi-ldrp-2.d  |  33 +
>   gas/testsuite/gas/scfi/aarch64/scfi-ldrp-2.l  |   2 +
>   gas/testsuite/gas/scfi/aarch64/scfi-ldrp-2.s  |  26 +
>   gas/testsuite/gas/scfi/aarch64/scfi-strp-1.d  |  39 +
>   gas/testsuite/gas/scfi/aarch64/scfi-strp-1.l  |   2 +
>   gas/testsuite/gas/scfi/aarch64/scfi-strp-1.s  |  37 +
>   gas/testsuite/gas/scfi/aarch64/scfi-strp-2.d  |  35 +
>   gas/testsuite/gas/scfi/aarch64/scfi-strp-2.l  |   2 +
>   gas/testsuite/gas/scfi/aarch64/scfi-strp-2.s  |  30 +
>   .../gas/scfi/aarch64/scfi-unsupported-1.l     |   4 +
>   .../gas/scfi/aarch64/scfi-unsupported-1.s     |  31 +
>   43 files changed, 1671 insertions(+), 1 deletion(-)
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/ginsn-cofi-1.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/ginsn-cofi-1.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/ginsn-ldst-1.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/ginsn-ldst-1.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/ginsn-misc-1.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/ginsn-misc-1.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-aarch64.exp
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cb-1.d
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cb-1.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cb-1.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-1.d
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-1.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-1.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-2.d
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-2.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-2.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-3.d
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-3.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cfg-3.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cond-br-1.d
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cond-br-1.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-cond-br-1.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-diag-1.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-diag-1.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-diag-2.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-diag-2.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-1.d
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-1.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-1.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-2.d
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-2.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-ldrp-2.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-strp-1.d
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-strp-1.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-strp-1.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-strp-2.d
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-strp-2.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-strp-2.s
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-unsupported-1.l
>   create mode 100644 gas/testsuite/gas/scfi/aarch64/scfi-unsupported-1.s
>
  
Richard Sandiford June 26, 2024, 11:01 a.m. UTC | #2
Hi,

I was having a look at the v3 series, but had a question about the
design & known limitations that I thought was better to ask here:

Indu Bhagat <indu.bhagat@oracle.com> writes:
> Known limitations 
> -----------------
> These are planned to be worked on in the near future:
>
>  - The current SCFI machinery does not currently synthesize the PAC-related
>    aarch64-specific CFI directives: .cfi_b_key_frame.  Other opcodes used when
>    pointer authentication is enabled also need to be handled (braa, brab,
>    retaa, etc.).
>
>  - Supporting the following pattern:
>    mov x16,4266
>    add sp, x16, sp
>    ...
>
>  - Not a limitation per se, but a note that ATM, that predicated insns are
>    skipped from ginsn translation.  IIUC, these instructions are not such that
>    can be used alongside stack management ops. To be double-checked.

AFAICT, the current code only handles GPRs.  It doesn't handle D8-D15,
which are also call-preserved under the base AAPCS64.  Is that right?
I think we should try to handle those as well.

D8-D15 are "interesting" because they are the low 64 bits of Q8-Q15,
and of Z8-Z15 if SVE is used.  However, a CFI save slot always represents
the low 64 bits, regardless of whether a save occurs on D, Q or Z registers.
This matters for big-endian code, because there are two additional
PCS variants:

* the "vector PCS", which preserves Q8-Q23
* the "SVE PCS", which preserves Z8-Z23 and P3-P15

So vector PCS functions might need to save and restore Q8 when returning
normally, but the CFI only describes the save of the D8 portion (since
that's the only portion that is preserved by exceptions).  This means
that, on big-endian:

	str	q8, [sp, #16]

should record D8 as being saved at sp+24 rather than sp+16.

A further complication is that STR Qn and STR Zn do not store in
the same byte order for big-endian: STR Qn stores as a 128-bit
integer (MSB first), whereas STR Zn stores as a stream of bytes
(LSB first).  This means that GCC-generated big-endian SVE PCS
functions use things like:

	st1d	z8.d, p2, [sp, #1, mul vl]

with the D8 save slot then being at sp + 2*VL - 64.

I think it's OK to punt on the big-endian SVE PCS case for now (provided
that there's a warning that the code isn't understood, which it looks
like there is).  But I think it's worth handling the Q register saves.

Other comments:

- I like the new approach of using a combination of the iclass and a
  "subclass" field of the flags.  How about making aarch64-gen.c enforce
  that:

  - if aarch64-ginsn.c looks at the subclass of a particular iclass,
    every instruction of that iclass has a nonzero subclass field

  - every other instruction has a zero subclass field

  This would help to ensure that the data stays up to date.
  The subclass enum could include a nonzero "other" value where
  necessary.

- I think we should only add things like F_LDST_LOAD and F_LDST_STORE
  to instructions that are semantically simple loads and stores
  (unless the iclass gives us the information needed to handle
  more complicated cases).  E.g. it looks like patch 2/7 adds
  F_LDST_LOAD to things like ld4, which are AoS->SoA loads.
  It would not be correct to interpret an LD4 on byte elements
  (say) as a register restore for CFI purposes.

  I realise the information could be useful for other things
  besides ginsns.  But while ginsns are the only things using the
  information, I think we should be careful to make sure that the
  information can't be misunderstood.

Thanks,
Richard
  
Indu Bhagat June 27, 2024, 8 a.m. UTC | #3
On 6/26/24 04:01, Richard Sandiford wrote:
> Hi,
> 
> I was having a look at the v3 series, but had a question about the
> design & known limitations that I thought was better to ask here:
> 
> Indu Bhagat <indu.bhagat@oracle.com> writes:
>> Known limitations
>> -----------------
>> These are planned to be worked on in the near future:
>>
>>   - The current SCFI machinery does not currently synthesize the PAC-related
>>     aarch64-specific CFI directives: .cfi_b_key_frame.  Other opcodes used when
>>     pointer authentication is enabled also need to be handled (braa, brab,
>>     retaa, etc.).
>>
>>   - Supporting the following pattern:
>>     mov x16,4266
>>     add sp, x16, sp
>>     ...
>>
>>   - Not a limitation per se, but a note that ATM, that predicated insns are
>>     skipped from ginsn translation.  IIUC, these instructions are not such that
>>     can be used alongside stack management ops. To be double-checked.
> 
> AFAICT, the current code only handles GPRs.  It doesn't handle D8-D15,
> which are also call-preserved under the base AAPCS64.  Is that right?
> I think we should try to handle those as well.

Ah yes, the current code only handles GPRS. I will need to add the 
D8-D15 registers. The code was also explicitly skipping ldstp with FP 
registers. Thanks.

> 
> D8-D15 are "interesting" because they are the low 64 bits of Q8-Q15,
> and of Z8-Z15 if SVE is used.  However, a CFI save slot always represents
> the low 64 bits, regardless of whether a save occurs on D, Q or Z registers.
> This matters for big-endian code, because there are two additional
> PCS variants:
> 
> * the "vector PCS", which preserves Q8-Q23
> * the "SVE PCS", which preserves Z8-Z23 and P3-P15
> 

Is there a way to annotate that a (hand-written asm) function adheres to 
vectors PCS or SVE PCS ?  I see that there is a .variant_pcs but that 
does not help differentiate between the above two?

I _think_ gas will need to know which of SVE vs vector PCS is in effect 
for a specific function so that the P3-P15 can be added to the set of 
callee-saved registers being tracked for SCFI for SVE PCS but not for 
vector PCS.

> So vector PCS functions might need to save and restore Q8 when returning
> normally, but the CFI only describes the save of the D8 portion (since
> that's the only portion that is preserved by exceptions).  This means
> that, on big-endian:
> 
> 	str	q8, [sp, #16]
> 
> should record D8 as being saved at sp+24 rather than sp+16.
> 
> A further complication is that STR Qn and STR Zn do not store in
> the same byte order for big-endian: STR Qn stores as a 128-bit
> integer (MSB first), whereas STR Zn stores as a stream of bytes
> (LSB first).  This means that GCC-generated big-endian SVE PCS
> functions use things like:
> 
> 	st1d	z8.d, p2, [sp, #1, mul vl]
> 
> with the D8 save slot then being at sp + 2*VL - 64.
> 
> I think it's OK to punt on the big-endian SVE PCS case for now (provided
> that there's a warning that the code isn't understood, which it looks
> like there is).  But I think it's worth handling the Q register saves.

It looks to me that using reg name / size is an unambiguous proxy to 
deciding  whether SVE PCS is in effect. Is this correct ?

> 
> Other comments:
> 
> - I like the new approach of using a combination of the iclass and a
>    "subclass" field of the flags.  How about making aarch64-gen.c enforce
>    that:
> 
>    - if aarch64-ginsn.c looks at the subclass of a particular iclass,
>      every instruction of that iclass has a nonzero subclass field
> 

(Let me refer to the above as #1). I can see that there can be ways to 
achieve this...

>    - every other instruction has a zero subclass field
> 

..but I am not sure I follow this statement. (Let me refer to the above 
as #2).

>    This would help to ensure that the data stays up to date.
>    The subclass enum could include a nonzero "other" value where
>    necessary.
> 

Currently, we are using the opcode->flags bits to encode:

In include/opcode/aarch64.h:

/* 4-bit flag field to indicate subclass of operations.
    Note that there is an (intended) overlap between the three flag sets
    (F_LDST*, F_ARITH* and F_BRANCH*).  This allows space savings.  */
#define F_LDST_LOAD (1ULL << 36)
#define F_LDST_STORE (2ULL << 36)
/* A load followed by a store (using the same address). */
#define F_LDST_SWAP (F_LDST_LOAD | F_LDST_STORE)
/* Subclasses to denote add, sub and mov insns.  */
#define F_ARITH_ADD (1ULL << 36)
#define F_ARITH_SUB (2ULL << 36)
#define F_ARITH_MOV (4ULL << 36)
/* Subclasses to denote call and ret insns.  */
#define F_BRANCH_CALL (1ULL << 36)
#define F_BRANCH_RET (2ULL << 36)

We can dedicate F_SUBCLASS_NONE (8ULL << 36) and enforce this subclass 
on all insns which use none of the above subclasses in a specific 
iclass.  This can help address (#1), but not sure about (#2).

> - I think we should only add things like F_LDST_LOAD and F_LDST_STORE
>    to instructions that are semantically simple loads and stores
>    (unless the iclass gives us the information needed to handle
>    more complicated cases).  E.g. it looks like patch 2/7 adds
>    F_LDST_LOAD to things like ld4, which are AoS->SoA loads.
>    It would not be correct to interpret an LD4 on byte elements
>    (say) as a register restore for CFI purposes.
> 
>    I realise the information could be useful for other things
>    besides ginsns.  But while ginsns are the only things using the
>    information, I think we should be careful to make sure that the
>    information can't be misunderstood.
> 

If some safeguards like #1 are placed for the specific iclasses, and 
further we only allow subclass information retrieval for selected 
iclasses in aarch64-gen.c, I think we can afford to go this route as you 
suggest: Only add subclasses for those iclasses relevant for SCFI 
purposes ATM.

Thanks for review and feedback
Indu
  
Richard Sandiford June 27, 2024, 9:40 a.m. UTC | #4
Indu Bhagat <indu.bhagat@oracle.com> writes:
> On 6/26/24 04:01, Richard Sandiford wrote:
>> D8-D15 are "interesting" because they are the low 64 bits of Q8-Q15,
>> and of Z8-Z15 if SVE is used.  However, a CFI save slot always represents
>> the low 64 bits, regardless of whether a save occurs on D, Q or Z registers.
>> This matters for big-endian code, because there are two additional
>> PCS variants:
>> 
>> * the "vector PCS", which preserves Q8-Q23
>> * the "SVE PCS", which preserves Z8-Z23 and P3-P15
>> 
>
> Is there a way to annotate that a (hand-written asm) function adheres to 
> vectors PCS or SVE PCS ?  I see that there is a .variant_pcs but that 
> does not help differentiate between the above two?
>
> I _think_ gas will need to know which of SVE vs vector PCS is in effect 
> for a specific function so that the P3-P15 can be added to the set of 
> callee-saved registers being tracked for SCFI for SVE PCS but not for 
> vector PCS.

Only the normal base AAPCS64 register set is preserved across abnormal
control flow (setjmp/longjmp, exceptions, etc.)  The extra call-preserved
guarantees for vector and SVE PCS functions only apply to normal returns.

[This means, for example, that:

  void foo();
  svbool_t f() {
    try {
      foo();
    } catch (...) {};
    return svptrue_b8();
  }

must manually restore the additional register state when catching
and returning normally.]

The CFI requirements therefore don't change: only D8-D15 matter,
like for normal functions.  But that's also where the big-endian
complications that I mentioned come from.

So I don't think the code needs to know which kind of function is
being assembled.  The code just needs to be able to recognise Q-based
and Z-based loads and stores of D8-D15 and work out the correct offset
of the low 64 bits.  (Although, like I say, I think we can punt on
big-endian SVE PCS functions.)

>> So vector PCS functions might need to save and restore Q8 when returning
>> normally, but the CFI only describes the save of the D8 portion (since
>> that's the only portion that is preserved by exceptions).  This means
>> that, on big-endian:
>> 
>> 	str	q8, [sp, #16]
>> 
>> should record D8 as being saved at sp+24 rather than sp+16.
>> 
>> A further complication is that STR Qn and STR Zn do not store in
>> the same byte order for big-endian: STR Qn stores as a 128-bit
>> integer (MSB first), whereas STR Zn stores as a stream of bytes
>> (LSB first).  This means that GCC-generated big-endian SVE PCS
>> functions use things like:
>> 
>> 	st1d	z8.d, p2, [sp, #1, mul vl]
>> 
>> with the D8 save slot then being at sp + 2*VL - 64.
>> 
>> I think it's OK to punt on the big-endian SVE PCS case for now (provided
>> that there's a warning that the code isn't understood, which it looks
>> like there is).  But I think it's worth handling the Q register saves.
>
> It looks to me that using reg name / size is an unambiguous proxy to 
> deciding  whether SVE PCS is in effect. Is this correct ?

Not necessarily.  There's nothing stopping code from using Q-based
loads and stores for normal functions (although it would be an
odd choice).

There's also the possiblity of ad-hoc PCSes, but the assumption there
too would be that only the base AAPCS64 set needs to be preserved
through unwinding.

>> Other comments:
>> 
>> - I like the new approach of using a combination of the iclass and a
>>    "subclass" field of the flags.  How about making aarch64-gen.c enforce
>>    that:
>> 
>>    - if aarch64-ginsn.c looks at the subclass of a particular iclass,
>>      every instruction of that iclass has a nonzero subclass field
>> 
>
> (Let me refer to the above as #1). I can see that there can be ways to 
> achieve this...
>
>>    - every other instruction has a zero subclass field
>> 
>
> ..but I am not sure I follow this statement. (Let me refer to the above 
> as #2).
>
>>    This would help to ensure that the data stays up to date.
>>    The subclass enum could include a nonzero "other" value where
>>    necessary.
>> 
>
> Currently, we are using the opcode->flags bits to encode:
>
> In include/opcode/aarch64.h:
>
> /* 4-bit flag field to indicate subclass of operations.
>     Note that there is an (intended) overlap between the three flag sets
>     (F_LDST*, F_ARITH* and F_BRANCH*).  This allows space savings.  */
> #define F_LDST_LOAD (1ULL << 36)
> #define F_LDST_STORE (2ULL << 36)
> /* A load followed by a store (using the same address). */
> #define F_LDST_SWAP (F_LDST_LOAD | F_LDST_STORE)
> /* Subclasses to denote add, sub and mov insns.  */
> #define F_ARITH_ADD (1ULL << 36)
> #define F_ARITH_SUB (2ULL << 36)
> #define F_ARITH_MOV (4ULL << 36)
> /* Subclasses to denote call and ret insns.  */
> #define F_BRANCH_CALL (1ULL << 36)
> #define F_BRANCH_RET (2ULL << 36)
>
> We can dedicate F_SUBCLASS_NONE (8ULL << 36) and enforce this subclass 
> on all insns which use none of the above subclasses in a specific 
> iclass.  This can help address (#1), but not sure about (#2).

I think the 4 bits are really an enum rather than true independent flags.
So it might be better to use 15ULL, so that the other 14 nonzero values are
consecutive.

But yeah, I think it addresses both #1 and #2.  #2 makes sure that a
subclass is only present when we expect one.  If we define:

#define F_SUBCLASS (15ULL << 36)

then #2 makes sure that (flags & F_SUBCLASS) == 0 for classes that
are not interpreted by ginsns.

Thanks,
Richard
  
Indu Bhagat July 1, 2024, 1:03 a.m. UTC | #5
On 6/27/24 02:40, Richard Sandiford wrote:
> Indu Bhagat <indu.bhagat@oracle.com> writes:
>> On 6/26/24 04:01, Richard Sandiford wrote:
>>> D8-D15 are "interesting" because they are the low 64 bits of Q8-Q15,
>>> and of Z8-Z15 if SVE is used.  However, a CFI save slot always represents
>>> the low 64 bits, regardless of whether a save occurs on D, Q or Z registers.
>>> This matters for big-endian code, because there are two additional
>>> PCS variants:
>>>
>>> * the "vector PCS", which preserves Q8-Q23
>>> * the "SVE PCS", which preserves Z8-Z23 and P3-P15
>>>
>>
>> Is there a way to annotate that a (hand-written asm) function adheres to
>> vectors PCS or SVE PCS ?  I see that there is a .variant_pcs but that
>> does not help differentiate between the above two?
>>
>> I _think_ gas will need to know which of SVE vs vector PCS is in effect
>> for a specific function so that the P3-P15 can be added to the set of
>> callee-saved registers being tracked for SCFI for SVE PCS but not for
>> vector PCS.
> 
> Only the normal base AAPCS64 register set is preserved across abnormal
> control flow (setjmp/longjmp, exceptions, etc.)  The extra call-preserved
> guarantees for vector and SVE PCS functions only apply to normal returns.
> 
> [This means, for example, that:
> 
>    void foo();
>    svbool_t f() {
>      try {
>        foo();
>      } catch (...) {};
>      return svptrue_b8();
>    }
> 
> must manually restore the additional register state when catching
> and returning normally.]
> 
> The CFI requirements therefore don't change: only D8-D15 matter,
> like for normal functions.  But that's also where the big-endian
> complications that I mentioned come from.
> 
> So I don't think the code needs to know which kind of function is
> being assembled.  The code just needs to be able to recognise Q-based
> and Z-based loads and stores of D8-D15 and work out the correct offset
> of the low 64 bits.  (Although, like I say, I think we can punt on
> big-endian SVE PCS functions.)
> 
>>> So vector PCS functions might need to save and restore Q8 when returning
>>> normally, but the CFI only describes the save of the D8 portion (since
>>> that's the only portion that is preserved by exceptions).  This means
>>> that, on big-endian:
>>>
>>> 	str	q8, [sp, #16]
>>>
>>> should record D8 as being saved at sp+24 rather than sp+16.
>>>
>>> A further complication is that STR Qn and STR Zn do not store in
>>> the same byte order for big-endian: STR Qn stores as a 128-bit
>>> integer (MSB first), whereas STR Zn stores as a stream of bytes
>>> (LSB first).  This means that GCC-generated big-endian SVE PCS
>>> functions use things like:
>>>
>>> 	st1d	z8.d, p2, [sp, #1, mul vl]
>>>
>>> with the D8 save slot then being at sp + 2*VL - 64.
>>>
>>> I think it's OK to punt on the big-endian SVE PCS case for now (provided
>>> that there's a warning that the code isn't understood, which it looks
>>> like there is).  But I think it's worth handling the Q register saves.
>>
>> It looks to me that using reg name / size is an unambiguous proxy to
>> deciding  whether SVE PCS is in effect. Is this correct ?
> 
> Not necessarily.  There's nothing stopping code from using Q-based
> loads and stores for normal functions (although it would be an
> odd choice).

Of course, I dont know what I was thinking when I wrote that.

As for Z registers, I realized that I need more time to take a look at 
the SVE insns and see what patterns need to be handled etc for memory 
offset calculation.

For V4 (to be posted soon), I have added handling for D and Q registers 
(little-endian and big-endian), but skipped Z altogether for now (SCFI 
errors out when correctness is affected).  Also added this to the set of 
known limitations to be addressed in a future patch.

>  
> There's also the possiblity of ad-hoc PCSes, but the assumption there
> too would be that only the base AAPCS64 set needs to be preserved
> through unwinding.
> 
>>> Other comments:
>>>
>>> - I like the new approach of using a combination of the iclass and a
>>>     "subclass" field of the flags.  How about making aarch64-gen.c enforce
>>>     that:
>>>
>>>     - if aarch64-ginsn.c looks at the subclass of a particular iclass,
>>>       every instruction of that iclass has a nonzero subclass field
>>>
>>
>> (Let me refer to the above as #1). I can see that there can be ways to
>> achieve this...
>>
>>>     - every other instruction has a zero subclass field
>>>
>>
>> ..but I am not sure I follow this statement. (Let me refer to the above
>> as #2).
>>
>>>     This would help to ensure that the data stays up to date.
>>>     The subclass enum could include a nonzero "other" value where
>>>     necessary.
>>>
>>
>> Currently, we are using the opcode->flags bits to encode:
>>
>> In include/opcode/aarch64.h:
>>
>> /* 4-bit flag field to indicate subclass of operations.
>>      Note that there is an (intended) overlap between the three flag sets
>>      (F_LDST*, F_ARITH* and F_BRANCH*).  This allows space savings.  */
>> #define F_LDST_LOAD (1ULL << 36)
>> #define F_LDST_STORE (2ULL << 36)
>> /* A load followed by a store (using the same address). */
>> #define F_LDST_SWAP (F_LDST_LOAD | F_LDST_STORE)
>> /* Subclasses to denote add, sub and mov insns.  */
>> #define F_ARITH_ADD (1ULL << 36)
>> #define F_ARITH_SUB (2ULL << 36)
>> #define F_ARITH_MOV (4ULL << 36)
>> /* Subclasses to denote call and ret insns.  */
>> #define F_BRANCH_CALL (1ULL << 36)
>> #define F_BRANCH_RET (2ULL << 36)
>>
>> We can dedicate F_SUBCLASS_NONE (8ULL << 36) and enforce this subclass
>> on all insns which use none of the above subclasses in a specific
>> iclass.  This can help address (#1), but not sure about (#2).
> 
> I think the 4 bits are really an enum rather than true independent flags.
> So it might be better to use 15ULL, so that the other 14 nonzero values are
> consecutive.
> 
> But yeah, I think it addresses both #1 and #2.  #2 makes sure that a
> subclass is only present when we expect one.  If we define:
> 
> #define F_SUBCLASS (15ULL << 36)
> 
> then #2 makes sure that (flags & F_SUBCLASS) == 0 for classes that
> are not interpreted by ginsns.
> 

Thanks.  This should be addressed in the V4 series which I will be 
posting soon.

Thanks
Indu