[0/7] RFC Memory tagging support

Message ID 20200615144029.19771-1-rearnsha@arm.com
Headers
Series RFC Memory tagging support |

Message

Richard Earnshaw June 15, 2020, 2:40 p.m. UTC
  Last year I posted a preliminary set of patches for discussion
purposes on adding support for tagged memory to glibc.  This version
polishes that to the point where I believe it is now deployable.

The first four patches are generic changes, the final three add the
aarch64 specific code.

The first patch simply adds a new configuration option to the build
system which can be turned on with the option --enable-memory-tagging.
The default at present is 'no'.

The second patch adds a glibc tunable that can be used at run-time to
enable the feature (the default again, is disabled).  This tunable
would be always present, but have no effect on systems lacking support
for memory tagging.  I've structured the tunable as a bit-mask of
features that can be used with memory tagging, though at present only
two bits have defined uses.

The third patch is the meat of the changes; it adds the changes to the
malloc APIs.  I've tried as far as possible to ensure that when memory
tagging is disabled, there is no change in behaviour, even when the
memory tagging is configured into the library, but there are
inevitably a small number of changes needed in the optimizations that
calloc performs since tagging would require that all the tags were
correctly set, even if the memory does not strictly have to be zeroed.
I've made use of function pointers in the code, much the same way as
the morecore hook is used, so that when tagging is disabled, the
functions called are the same as the traditional operations; this also
ensures that glibc does not require any internal ifunc resolution in
order to work.

The fourth patch adds support for the new prctl operations that are
being proposed to the linux kernel.  The kernel changes are to a
generic header and this patch mirrors that design decision in glibc.

The fifth patch is a place-holder, so that this series of changes is
stand-alone.  Work is already underway to change the string operations
to be MTE safe without losing too much in the way of performance.  I
expect this patch to be removed entirely before the series is
committed.

The final two patches add the remaining aarch64 support.  The first
adds the support code to examine the tunable and HW caps; and enable
memory tagging in the kernel as needed.  The second adds the final
pieces needed to support memory tagging in glibc.

Testing
=======

I've run all the malloc tests.  While not all of them pass at present,
I have examined all the failing tests and I'm confident that none of the
failures represent real bugs in the code; but I have not yet decided how
best to address these failures.  The failures fall into three categories

1) Tests that expect a particular failure mode on double-free operations.
I've added a quick tag-checking access in the free entry path that
essentially asserts that the tag colour of a free'd block of memory
matches the colour of the pointer - this leads to the kernel delivering
a different signal that the test code does not expect.

2) Tests that assume that malloc_usable_size will return a specific
amount of free space.  The assumptions are not correct, because the
tag colouring boundaries needed for MTE means that the 8 bytes in the
block containing the back pointer can no-longer be used by users when
we have MTE (they have a different colour that belongs to the malloc
data structures).

3) Tests that construct a fake internal malloc data structure and then
try to perform operations on them.  I haven't looked at these in too
much detail, but the first issue is that the fake header is only
8-byte aligned and for MTE to work it requires a 16-byte aligned
structure (the tag read/write operations require the address be
granule aligned, and the real glibc data structure has this property).

In addition to the above I've run the code on a traditional aarch64
machine that lacks MTE (to confirm that the code behaves as expected
on existing machines) and on a model that has support for MTE.

Richard.

Richard Earnshaw (7):
  config: Allow memory tagging to be enabled when configuring glibc
  elf: Add a tunable to control use of tagged memory
  malloc: Basic support for memory tagging in the malloc() family
  linux: Add compatibility definitions to sys/prctl.h for MTE
  aarch64: Mitigations for string functions when MTE is enabled.
  aarch64: Add sysv specific enabling code for memory tagging
  aarch64: Add aarch64-specific files for memory tagging support

 config.h.in                                   |   3 +
 config.make.in                                |   2 +
 configure                                     |  17 ++
 configure.ac                                  |  10 +
 elf/dl-tunables.list                          |   9 +
 malloc/arena.c                                |  42 ++++-
 malloc/malloc.c                               | 171 ++++++++++++++----
 malloc/malloc.h                               |   7 +
 manual/install.texi                           |  13 ++
 manual/tunables.texi                          |  31 ++++
 sysdeps/aarch64/Makefile                      |   5 +
 sysdeps/aarch64/__mtag_address_get_tag.S      |  31 ++++
 sysdeps/aarch64/__mtag_memset_tag.S           |  46 +++++
 sysdeps/aarch64/__mtag_new_tag.S              |  38 ++++
 sysdeps/aarch64/__mtag_tag_region.S           |  44 +++++
 sysdeps/aarch64/libc-mtag.h                   |  57 ++++++
 sysdeps/aarch64/memchr.S                      |  21 ++-
 sysdeps/aarch64/multiarch/strlen_asimd.S      |   2 +-
 sysdeps/aarch64/strchr.S                      |  15 ++
 sysdeps/aarch64/strchrnul.S                   |  14 +-
 sysdeps/aarch64/strcmp.S                      |  12 +-
 sysdeps/aarch64/strcpy.S                      |   2 +-
 sysdeps/aarch64/strlen.S                      |   2 +-
 sysdeps/aarch64/strncmp.S                     |  10 +-
 sysdeps/aarch64/strrchr.S                     |  15 +-
 sysdeps/generic/libc-mtag.h                   |  52 ++++++
 sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h  |   2 +
 sysdeps/unix/sysv/linux/aarch64/bits/mman.h   |  32 ++++
 .../unix/sysv/linux/aarch64/cpu-features.c    |  22 +++
 .../unix/sysv/linux/aarch64/cpu-features.h    |   1 +
 sysdeps/unix/sysv/linux/sys/prctl.h           |  18 ++
 31 files changed, 696 insertions(+), 50 deletions(-)
 create mode 100644 sysdeps/aarch64/__mtag_address_get_tag.S
 create mode 100644 sysdeps/aarch64/__mtag_memset_tag.S
 create mode 100644 sysdeps/aarch64/__mtag_new_tag.S
 create mode 100644 sysdeps/aarch64/__mtag_tag_region.S
 create mode 100644 sysdeps/aarch64/libc-mtag.h
 create mode 100644 sysdeps/generic/libc-mtag.h
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/mman.h
  

Comments

H.J. Lu June 15, 2020, 3:03 p.m. UTC | #1
On Mon, Jun 15, 2020 at 7:43 AM Richard Earnshaw <rearnsha@arm.com> wrote:
>
> Last year I posted a preliminary set of patches for discussion
> purposes on adding support for tagged memory to glibc.  This version
> polishes that to the point where I believe it is now deployable.
>
> The first four patches are generic changes, the final three add the
> aarch64 specific code.
>
> The first patch simply adds a new configuration option to the build
> system which can be turned on with the option --enable-memory-tagging.
> The default at present is 'no'.
>
> The second patch adds a glibc tunable that can be used at run-time to
> enable the feature (the default again, is disabled).  This tunable
> would be always present, but have no effect on systems lacking support
> for memory tagging.  I've structured the tunable as a bit-mask of
> features that can be used with memory tagging, though at present only
> two bits have defined uses.
>
> The third patch is the meat of the changes; it adds the changes to the
> malloc APIs.  I've tried as far as possible to ensure that when memory
> tagging is disabled, there is no change in behaviour, even when the
> memory tagging is configured into the library, but there are
> inevitably a small number of changes needed in the optimizations that
> calloc performs since tagging would require that all the tags were
> correctly set, even if the memory does not strictly have to be zeroed.
> I've made use of function pointers in the code, much the same way as
> the morecore hook is used, so that when tagging is disabled, the
> functions called are the same as the traditional operations; this also
> ensures that glibc does not require any internal ifunc resolution in
> order to work.
>
> The fourth patch adds support for the new prctl operations that are
> being proposed to the linux kernel.  The kernel changes are to a
> generic header and this patch mirrors that design decision in glibc.
>
> The fifth patch is a place-holder, so that this series of changes is
> stand-alone.  Work is already underway to change the string operations
> to be MTE safe without losing too much in the way of performance.  I
> expect this patch to be removed entirely before the series is
> committed.
>
> The final two patches add the remaining aarch64 support.  The first
> adds the support code to examine the tunable and HW caps; and enable
> memory tagging in the kernel as needed.  The second adds the final
> pieces needed to support memory tagging in glibc.
>

Obviously, pointer comparison and algorithm will be impacted by MTE.
From what you are proposing, only parts of glibc will be MTE compatible.
Is this correct?
  
Paul Eggert June 15, 2020, 3:08 p.m. UTC | #2
On 6/15/20 7:40 AM, Richard Earnshaw wrote:
> 3) Tests that construct a fake internal malloc data structure and then
> try to perform operations on them.  I haven't looked at these in too
> much detail, but the first issue is that the fake header is only
> 8-byte aligned and for MTE to work it requires a 16-byte aligned
> structure

Is the problem here that the tests fail quickly due to easy-to-check alignment
issues, and so no longer test the more-interesting defenses?

Would it make sense for these tests to align the fake internal data structure to
16 bytes using _Alignas? I.e., should the tests be trying to fail due to easy
alignment issues, or should they be trying to skip the easy alignment issues and
go on to the harder parts of the tests?
  
Richard Earnshaw (lists) June 15, 2020, 3:11 p.m. UTC | #3
On 15/06/2020 16:03, H.J. Lu wrote:
> On Mon, Jun 15, 2020 at 7:43 AM Richard Earnshaw <rearnsha@arm.com> wrote:
>>
>> Last year I posted a preliminary set of patches for discussion
>> purposes on adding support for tagged memory to glibc.  This version
>> polishes that to the point where I believe it is now deployable.
>>
>> The first four patches are generic changes, the final three add the
>> aarch64 specific code.
>>
>> The first patch simply adds a new configuration option to the build
>> system which can be turned on with the option --enable-memory-tagging.
>> The default at present is 'no'.
>>
>> The second patch adds a glibc tunable that can be used at run-time to
>> enable the feature (the default again, is disabled).  This tunable
>> would be always present, but have no effect on systems lacking support
>> for memory tagging.  I've structured the tunable as a bit-mask of
>> features that can be used with memory tagging, though at present only
>> two bits have defined uses.
>>
>> The third patch is the meat of the changes; it adds the changes to the
>> malloc APIs.  I've tried as far as possible to ensure that when memory
>> tagging is disabled, there is no change in behaviour, even when the
>> memory tagging is configured into the library, but there are
>> inevitably a small number of changes needed in the optimizations that
>> calloc performs since tagging would require that all the tags were
>> correctly set, even if the memory does not strictly have to be zeroed.
>> I've made use of function pointers in the code, much the same way as
>> the morecore hook is used, so that when tagging is disabled, the
>> functions called are the same as the traditional operations; this also
>> ensures that glibc does not require any internal ifunc resolution in
>> order to work.
>>
>> The fourth patch adds support for the new prctl operations that are
>> being proposed to the linux kernel.  The kernel changes are to a
>> generic header and this patch mirrors that design decision in glibc.
>>
>> The fifth patch is a place-holder, so that this series of changes is
>> stand-alone.  Work is already underway to change the string operations
>> to be MTE safe without losing too much in the way of performance.  I
>> expect this patch to be removed entirely before the series is
>> committed.
>>
>> The final two patches add the remaining aarch64 support.  The first
>> adds the support code to examine the tunable and HW caps; and enable
>> memory tagging in the kernel as needed.  The second adds the final
>> pieces needed to support memory tagging in glibc.
>>
> 
> Obviously, pointer comparison and algorithm will be impacted by MTE.
> From what you are proposing, only parts of glibc will be MTE compatible.
> Is this correct?
> 

Only *undefined* pointer comparisons will be impacted, such as comparing
objects from different allocations.

Within an allocation comparisons are fine.  And also, pointer
equivalence is fine as well.

R.

> -- 
> H.J.
  
H.J. Lu June 15, 2020, 3:37 p.m. UTC | #4
On Mon, Jun 15, 2020 at 8:11 AM Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
>
> On 15/06/2020 16:03, H.J. Lu wrote:
> > On Mon, Jun 15, 2020 at 7:43 AM Richard Earnshaw <rearnsha@arm.com> wrote:
> >>
> >> Last year I posted a preliminary set of patches for discussion
> >> purposes on adding support for tagged memory to glibc.  This version
> >> polishes that to the point where I believe it is now deployable.
> >>
> >> The first four patches are generic changes, the final three add the
> >> aarch64 specific code.
> >>
> >> The first patch simply adds a new configuration option to the build
> >> system which can be turned on with the option --enable-memory-tagging.
> >> The default at present is 'no'.
> >>
> >> The second patch adds a glibc tunable that can be used at run-time to
> >> enable the feature (the default again, is disabled).  This tunable
> >> would be always present, but have no effect on systems lacking support
> >> for memory tagging.  I've structured the tunable as a bit-mask of
> >> features that can be used with memory tagging, though at present only
> >> two bits have defined uses.
> >>
> >> The third patch is the meat of the changes; it adds the changes to the
> >> malloc APIs.  I've tried as far as possible to ensure that when memory
> >> tagging is disabled, there is no change in behaviour, even when the
> >> memory tagging is configured into the library, but there are
> >> inevitably a small number of changes needed in the optimizations that
> >> calloc performs since tagging would require that all the tags were
> >> correctly set, even if the memory does not strictly have to be zeroed.
> >> I've made use of function pointers in the code, much the same way as
> >> the morecore hook is used, so that when tagging is disabled, the
> >> functions called are the same as the traditional operations; this also
> >> ensures that glibc does not require any internal ifunc resolution in
> >> order to work.
> >>
> >> The fourth patch adds support for the new prctl operations that are
> >> being proposed to the linux kernel.  The kernel changes are to a
> >> generic header and this patch mirrors that design decision in glibc.
> >>
> >> The fifth patch is a place-holder, so that this series of changes is
> >> stand-alone.  Work is already underway to change the string operations
> >> to be MTE safe without losing too much in the way of performance.  I
> >> expect this patch to be removed entirely before the series is
> >> committed.
> >>
> >> The final two patches add the remaining aarch64 support.  The first
> >> adds the support code to examine the tunable and HW caps; and enable
> >> memory tagging in the kernel as needed.  The second adds the final
> >> pieces needed to support memory tagging in glibc.
> >>
> >
> > Obviously, pointer comparison and algorithm will be impacted by MTE.
> > From what you are proposing, only parts of glibc will be MTE compatible.
> > Is this correct?
> >
>
> Only *undefined* pointer comparisons will be impacted, such as comparing
> objects from different allocations.
>
> Within an allocation comparisons are fine.  And also, pointer
> equivalence is fine as well.
>

Is "ptr1 - ptr2" valid to compute the distance between 2 pointers?
  
Szabolcs Nagy June 15, 2020, 4:30 p.m. UTC | #5
The 06/15/2020 08:37, H.J. Lu via Libc-alpha wrote:
> On Mon, Jun 15, 2020 at 8:11 AM Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com> wrote:
> >
> > On 15/06/2020 16:03, H.J. Lu wrote:
> > > On Mon, Jun 15, 2020 at 7:43 AM Richard Earnshaw <rearnsha@arm.com> wrote:
> > >>
> > >> Last year I posted a preliminary set of patches for discussion
> > >> purposes on adding support for tagged memory to glibc.  This version
> > >> polishes that to the point where I believe it is now deployable.
> > >>
> > >> The first four patches are generic changes, the final three add the
> > >> aarch64 specific code.
> > >>
> > >> The first patch simply adds a new configuration option to the build
> > >> system which can be turned on with the option --enable-memory-tagging.
> > >> The default at present is 'no'.
> > >>
> > >> The second patch adds a glibc tunable that can be used at run-time to
> > >> enable the feature (the default again, is disabled).  This tunable
> > >> would be always present, but have no effect on systems lacking support
> > >> for memory tagging.  I've structured the tunable as a bit-mask of
> > >> features that can be used with memory tagging, though at present only
> > >> two bits have defined uses.
> > >>
> > >> The third patch is the meat of the changes; it adds the changes to the
> > >> malloc APIs.  I've tried as far as possible to ensure that when memory
> > >> tagging is disabled, there is no change in behaviour, even when the
> > >> memory tagging is configured into the library, but there are
> > >> inevitably a small number of changes needed in the optimizations that
> > >> calloc performs since tagging would require that all the tags were
> > >> correctly set, even if the memory does not strictly have to be zeroed.
> > >> I've made use of function pointers in the code, much the same way as
> > >> the morecore hook is used, so that when tagging is disabled, the
> > >> functions called are the same as the traditional operations; this also
> > >> ensures that glibc does not require any internal ifunc resolution in
> > >> order to work.
> > >>
> > >> The fourth patch adds support for the new prctl operations that are
> > >> being proposed to the linux kernel.  The kernel changes are to a
> > >> generic header and this patch mirrors that design decision in glibc.
> > >>
> > >> The fifth patch is a place-holder, so that this series of changes is
> > >> stand-alone.  Work is already underway to change the string operations
> > >> to be MTE safe without losing too much in the way of performance.  I
> > >> expect this patch to be removed entirely before the series is
> > >> committed.
> > >>
> > >> The final two patches add the remaining aarch64 support.  The first
> > >> adds the support code to examine the tunable and HW caps; and enable
> > >> memory tagging in the kernel as needed.  The second adds the final
> > >> pieces needed to support memory tagging in glibc.
> > >>
> > >
> > > Obviously, pointer comparison and algorithm will be impacted by MTE.
> > > From what you are proposing, only parts of glibc will be MTE compatible.
> > > Is this correct?
> > >
> >
> > Only *undefined* pointer comparisons will be impacted, such as comparing
> > objects from different allocations.
> >
> > Within an allocation comparisons are fine.  And also, pointer
> > equivalence is fine as well.
> >
> 
> Is "ptr1 - ptr2" valid to compute the distance between 2 pointers?

in iso c that's only valid within the same object.

(but e.g. gcc tries to detect which way the stack grows
by comparing stack pointers across stack frames: that's
not legal in c, and does not work if stack objects are
tagged with mte, this patch set is for heap tagging though)
  
Richard Earnshaw June 15, 2020, 4:37 p.m. UTC | #6
On 15/06/2020 16:08, Paul Eggert wrote:
> On 6/15/20 7:40 AM, Richard Earnshaw wrote:
>> 3) Tests that construct a fake internal malloc data structure and then
>> try to perform operations on them.  I haven't looked at these in too
>> much detail, but the first issue is that the fake header is only
>> 8-byte aligned and for MTE to work it requires a 16-byte aligned
>> structure
> 
> Is the problem here that the tests fail quickly due to easy-to-check alignment
> issues, and so no longer test the more-interesting defenses?
> 
> Would it make sense for these tests to align the fake internal data structure to
> 16 bytes using _Alignas? I.e., should the tests be trying to fail due to easy
> alignment issues, or should they be trying to skip the easy alignment issues and
> go on to the harder parts of the tests?
> 

It might.  Of course, there's no (easy) way for the test to fake up the
tag colouring, but that's not a major issue (at least on aarch64)
because tags are ignored on non-tagged memory and static data is just that.

This is something I need to look into some more, but I wanted to get the
review started.

R.
  
Joseph Myers June 15, 2020, 4:37 p.m. UTC | #7
On Mon, 15 Jun 2020, Richard Earnshaw wrote:

> The fourth patch adds support for the new prctl operations that are
> being proposed to the linux kernel.  The kernel changes are to a
> generic header and this patch mirrors that design decision in glibc.

Are these values in Linus's git tree?  (I think we should generally avoid 
adding kernel interfaces until they are at least in upstream git.)

>  manual/install.texi                           |  13 ++

INSTALL should be regenerated when updating install.texi.  Also, new 
features such as this should get an entry in NEWS.
  
H.J. Lu June 15, 2020, 4:40 p.m. UTC | #8
On Mon, Jun 15, 2020 at 9:30 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>
> The 06/15/2020 08:37, H.J. Lu via Libc-alpha wrote:
> > On Mon, Jun 15, 2020 at 8:11 AM Richard Earnshaw (lists)
> > <Richard.Earnshaw@arm.com> wrote:
> > >
> > > On 15/06/2020 16:03, H.J. Lu wrote:
> > > > On Mon, Jun 15, 2020 at 7:43 AM Richard Earnshaw <rearnsha@arm.com> wrote:
> > > >>
> > > >> Last year I posted a preliminary set of patches for discussion
> > > >> purposes on adding support for tagged memory to glibc.  This version
> > > >> polishes that to the point where I believe it is now deployable.
> > > >>
> > > >> The first four patches are generic changes, the final three add the
> > > >> aarch64 specific code.
> > > >>
> > > >> The first patch simply adds a new configuration option to the build
> > > >> system which can be turned on with the option --enable-memory-tagging.
> > > >> The default at present is 'no'.
> > > >>
> > > >> The second patch adds a glibc tunable that can be used at run-time to
> > > >> enable the feature (the default again, is disabled).  This tunable
> > > >> would be always present, but have no effect on systems lacking support
> > > >> for memory tagging.  I've structured the tunable as a bit-mask of
> > > >> features that can be used with memory tagging, though at present only
> > > >> two bits have defined uses.
> > > >>
> > > >> The third patch is the meat of the changes; it adds the changes to the
> > > >> malloc APIs.  I've tried as far as possible to ensure that when memory
> > > >> tagging is disabled, there is no change in behaviour, even when the
> > > >> memory tagging is configured into the library, but there are
> > > >> inevitably a small number of changes needed in the optimizations that
> > > >> calloc performs since tagging would require that all the tags were
> > > >> correctly set, even if the memory does not strictly have to be zeroed.
> > > >> I've made use of function pointers in the code, much the same way as
> > > >> the morecore hook is used, so that when tagging is disabled, the
> > > >> functions called are the same as the traditional operations; this also
> > > >> ensures that glibc does not require any internal ifunc resolution in
> > > >> order to work.
> > > >>
> > > >> The fourth patch adds support for the new prctl operations that are
> > > >> being proposed to the linux kernel.  The kernel changes are to a
> > > >> generic header and this patch mirrors that design decision in glibc.
> > > >>
> > > >> The fifth patch is a place-holder, so that this series of changes is
> > > >> stand-alone.  Work is already underway to change the string operations
> > > >> to be MTE safe without losing too much in the way of performance.  I
> > > >> expect this patch to be removed entirely before the series is
> > > >> committed.
> > > >>
> > > >> The final two patches add the remaining aarch64 support.  The first
> > > >> adds the support code to examine the tunable and HW caps; and enable
> > > >> memory tagging in the kernel as needed.  The second adds the final
> > > >> pieces needed to support memory tagging in glibc.
> > > >>
> > > >
> > > > Obviously, pointer comparison and algorithm will be impacted by MTE.
> > > > From what you are proposing, only parts of glibc will be MTE compatible.
> > > > Is this correct?
> > > >
> > >
> > > Only *undefined* pointer comparisons will be impacted, such as comparing
> > > objects from different allocations.
> > >
> > > Within an allocation comparisons are fine.  And also, pointer
> > > equivalence is fine as well.
> > >
> >
> > Is "ptr1 - ptr2" valid to compute the distance between 2 pointers?
>
> in iso c that's only valid within the same object.
>
> (but e.g. gcc tries to detect which way the stack grows
> by comparing stack pointers across stack frames: that's
> not legal in c, and does not work if stack objects are
> tagged with mte, this patch set is for heap tagging though)

memmove in C has

rettype
inhibit_loop_to_libcall
MEMMOVE (a1const void *a1, a2const void *a2, size_t len)
{
  unsigned long int dstp = (long int) dest;
  unsigned long int srcp = (long int) src;

  /* This test makes the forward copying code be used whenever possible.
     Reduces the working set.  */
  if (dstp - srcp >= len) /* *Unsigned* compare!  */

How does it work?
  
Richard Earnshaw (lists) June 15, 2020, 4:51 p.m. UTC | #9
On 15/06/2020 17:40, H.J. Lu via Libc-alpha wrote:
> On Mon, Jun 15, 2020 at 9:30 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>
>> The 06/15/2020 08:37, H.J. Lu via Libc-alpha wrote:
>>> On Mon, Jun 15, 2020 at 8:11 AM Richard Earnshaw (lists)
>>> <Richard.Earnshaw@arm.com> wrote:
>>>>
>>>> On 15/06/2020 16:03, H.J. Lu wrote:
>>>>> On Mon, Jun 15, 2020 at 7:43 AM Richard Earnshaw <rearnsha@arm.com> wrote:
>>>>>>
>>>>>> Last year I posted a preliminary set of patches for discussion
>>>>>> purposes on adding support for tagged memory to glibc.  This version
>>>>>> polishes that to the point where I believe it is now deployable.
>>>>>>
>>>>>> The first four patches are generic changes, the final three add the
>>>>>> aarch64 specific code.
>>>>>>
>>>>>> The first patch simply adds a new configuration option to the build
>>>>>> system which can be turned on with the option --enable-memory-tagging.
>>>>>> The default at present is 'no'.
>>>>>>
>>>>>> The second patch adds a glibc tunable that can be used at run-time to
>>>>>> enable the feature (the default again, is disabled).  This tunable
>>>>>> would be always present, but have no effect on systems lacking support
>>>>>> for memory tagging.  I've structured the tunable as a bit-mask of
>>>>>> features that can be used with memory tagging, though at present only
>>>>>> two bits have defined uses.
>>>>>>
>>>>>> The third patch is the meat of the changes; it adds the changes to the
>>>>>> malloc APIs.  I've tried as far as possible to ensure that when memory
>>>>>> tagging is disabled, there is no change in behaviour, even when the
>>>>>> memory tagging is configured into the library, but there are
>>>>>> inevitably a small number of changes needed in the optimizations that
>>>>>> calloc performs since tagging would require that all the tags were
>>>>>> correctly set, even if the memory does not strictly have to be zeroed.
>>>>>> I've made use of function pointers in the code, much the same way as
>>>>>> the morecore hook is used, so that when tagging is disabled, the
>>>>>> functions called are the same as the traditional operations; this also
>>>>>> ensures that glibc does not require any internal ifunc resolution in
>>>>>> order to work.
>>>>>>
>>>>>> The fourth patch adds support for the new prctl operations that are
>>>>>> being proposed to the linux kernel.  The kernel changes are to a
>>>>>> generic header and this patch mirrors that design decision in glibc.
>>>>>>
>>>>>> The fifth patch is a place-holder, so that this series of changes is
>>>>>> stand-alone.  Work is already underway to change the string operations
>>>>>> to be MTE safe without losing too much in the way of performance.  I
>>>>>> expect this patch to be removed entirely before the series is
>>>>>> committed.
>>>>>>
>>>>>> The final two patches add the remaining aarch64 support.  The first
>>>>>> adds the support code to examine the tunable and HW caps; and enable
>>>>>> memory tagging in the kernel as needed.  The second adds the final
>>>>>> pieces needed to support memory tagging in glibc.
>>>>>>
>>>>>
>>>>> Obviously, pointer comparison and algorithm will be impacted by MTE.
>>>>> From what you are proposing, only parts of glibc will be MTE compatible.
>>>>> Is this correct?
>>>>>
>>>>
>>>> Only *undefined* pointer comparisons will be impacted, such as comparing
>>>> objects from different allocations.
>>>>
>>>> Within an allocation comparisons are fine.  And also, pointer
>>>> equivalence is fine as well.
>>>>
>>>
>>> Is "ptr1 - ptr2" valid to compute the distance between 2 pointers?
>>
>> in iso c that's only valid within the same object.
>>
>> (but e.g. gcc tries to detect which way the stack grows
>> by comparing stack pointers across stack frames: that's
>> not legal in c, and does not work if stack objects are
>> tagged with mte, this patch set is for heap tagging though)
> 
> memmove in C has
> 
> rettype
> inhibit_loop_to_libcall
> MEMMOVE (a1const void *a1, a2const void *a2, size_t len)
> {
>   unsigned long int dstp = (long int) dest;
>   unsigned long int srcp = (long int) src;
> 
>   /* This test makes the forward copying code be used whenever possible.
>      Reduces the working set.  */
>   if (dstp - srcp >= len) /* *Unsigned* compare!  */
> 
> How does it work?
> 

Well the code is technically undefined!

In practice it will work because objects passed to memmove will have to
have a single colour, so the test will work correctly, though not for
the reason the programmer thought.

:-)

R.
  
Richard Earnshaw June 15, 2020, 4:53 p.m. UTC | #10
On 15/06/2020 17:37, Joseph Myers wrote:
> On Mon, 15 Jun 2020, Richard Earnshaw wrote:
> 
>> The fourth patch adds support for the new prctl operations that are
>> being proposed to the linux kernel.  The kernel changes are to a
>> generic header and this patch mirrors that design decision in glibc.
> 
> Are these values in Linus's git tree?  (I think we should generally avoid 
> adding kernel interfaces until they are at least in upstream git.)
> 
>>  manual/install.texi                           |  13 ++
> 
> INSTALL should be regenerated when updating install.texi.  Also, new 
> features such as this should get an entry in NEWS.
> 

This is only an RFC at this point.  Kernel patches are in review.  But
we need to agree on the API before they can be committed, and that can
only happen if we agree on the libc side of it as well.

We can't require the kernel patches be pushed before we even consider
this or we would have deadlock.

R.
  
H.J. Lu June 15, 2020, 5:46 p.m. UTC | #11
On Mon, Jun 15, 2020 at 9:51 AM Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
>
> On 15/06/2020 17:40, H.J. Lu via Libc-alpha wrote:
> > On Mon, Jun 15, 2020 at 9:30 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> >>
> >> The 06/15/2020 08:37, H.J. Lu via Libc-alpha wrote:
> >>> On Mon, Jun 15, 2020 at 8:11 AM Richard Earnshaw (lists)
> >>> <Richard.Earnshaw@arm.com> wrote:
> >>>>
> >>>> On 15/06/2020 16:03, H.J. Lu wrote:
> >>>>> On Mon, Jun 15, 2020 at 7:43 AM Richard Earnshaw <rearnsha@arm.com> wrote:
> >>>>>>
> >>>>>> Last year I posted a preliminary set of patches for discussion
> >>>>>> purposes on adding support for tagged memory to glibc.  This version
> >>>>>> polishes that to the point where I believe it is now deployable.
> >>>>>>
> >>>>>> The first four patches are generic changes, the final three add the
> >>>>>> aarch64 specific code.
> >>>>>>
> >>>>>> The first patch simply adds a new configuration option to the build
> >>>>>> system which can be turned on with the option --enable-memory-tagging.
> >>>>>> The default at present is 'no'.
> >>>>>>
> >>>>>> The second patch adds a glibc tunable that can be used at run-time to
> >>>>>> enable the feature (the default again, is disabled).  This tunable
> >>>>>> would be always present, but have no effect on systems lacking support
> >>>>>> for memory tagging.  I've structured the tunable as a bit-mask of
> >>>>>> features that can be used with memory tagging, though at present only
> >>>>>> two bits have defined uses.
> >>>>>>
> >>>>>> The third patch is the meat of the changes; it adds the changes to the
> >>>>>> malloc APIs.  I've tried as far as possible to ensure that when memory
> >>>>>> tagging is disabled, there is no change in behaviour, even when the
> >>>>>> memory tagging is configured into the library, but there are
> >>>>>> inevitably a small number of changes needed in the optimizations that
> >>>>>> calloc performs since tagging would require that all the tags were
> >>>>>> correctly set, even if the memory does not strictly have to be zeroed.
> >>>>>> I've made use of function pointers in the code, much the same way as
> >>>>>> the morecore hook is used, so that when tagging is disabled, the
> >>>>>> functions called are the same as the traditional operations; this also
> >>>>>> ensures that glibc does not require any internal ifunc resolution in
> >>>>>> order to work.
> >>>>>>
> >>>>>> The fourth patch adds support for the new prctl operations that are
> >>>>>> being proposed to the linux kernel.  The kernel changes are to a
> >>>>>> generic header and this patch mirrors that design decision in glibc.
> >>>>>>
> >>>>>> The fifth patch is a place-holder, so that this series of changes is
> >>>>>> stand-alone.  Work is already underway to change the string operations
> >>>>>> to be MTE safe without losing too much in the way of performance.  I
> >>>>>> expect this patch to be removed entirely before the series is
> >>>>>> committed.
> >>>>>>
> >>>>>> The final two patches add the remaining aarch64 support.  The first
> >>>>>> adds the support code to examine the tunable and HW caps; and enable
> >>>>>> memory tagging in the kernel as needed.  The second adds the final
> >>>>>> pieces needed to support memory tagging in glibc.
> >>>>>>
> >>>>>
> >>>>> Obviously, pointer comparison and algorithm will be impacted by MTE.
> >>>>> From what you are proposing, only parts of glibc will be MTE compatible.
> >>>>> Is this correct?
> >>>>>
> >>>>
> >>>> Only *undefined* pointer comparisons will be impacted, such as comparing
> >>>> objects from different allocations.
> >>>>
> >>>> Within an allocation comparisons are fine.  And also, pointer
> >>>> equivalence is fine as well.
> >>>>
> >>>
> >>> Is "ptr1 - ptr2" valid to compute the distance between 2 pointers?
> >>
> >> in iso c that's only valid within the same object.
> >>
> >> (but e.g. gcc tries to detect which way the stack grows
> >> by comparing stack pointers across stack frames: that's
> >> not legal in c, and does not work if stack objects are
> >> tagged with mte, this patch set is for heap tagging though)
> >
> > memmove in C has
> >
> > rettype
> > inhibit_loop_to_libcall
> > MEMMOVE (a1const void *a1, a2const void *a2, size_t len)
> > {
> >   unsigned long int dstp = (long int) dest;
> >   unsigned long int srcp = (long int) src;
> >
> >   /* This test makes the forward copying code be used whenever possible.
> >      Reduces the working set.  */
> >   if (dstp - srcp >= len) /* *Unsigned* compare!  */
> >
> > How does it work?
> >
>
> Well the code is technically undefined!

What does it mean to glibc?  Have you done an audit on glibc for this
issue?

> In practice it will work because objects passed to memmove will have to
> have a single colour, so the test will work correctly, though not for
> the reason the programmer thought.

Do you have a list of functions in glibc which allow more than one color?
  
Paul Eggert June 15, 2020, 6:05 p.m. UTC | #12
On 6/15/20 9:51 AM, Richard Earnshaw (lists) wrote:
> In practice it will work because objects passed to memmove will have to
> have a single colour,

Does this mean all stack and heap objects visible to the C programmer must have
the same tag? This surprises me, as I thought part of the idea was to assign
tags randomly.
  
Andreas Schwab June 15, 2020, 6:10 p.m. UTC | #13
On Jun 15 2020, Richard Earnshaw (lists) wrote:

>> rettype
>> inhibit_loop_to_libcall
>> MEMMOVE (a1const void *a1, a2const void *a2, size_t len)
>> {
>>   unsigned long int dstp = (long int) dest;
>>   unsigned long int srcp = (long int) src;
>> 
>>   /* This test makes the forward copying code be used whenever possible.
>>      Reduces the working set.  */
>>   if (dstp - srcp >= len) /* *Unsigned* compare!  */
>> 
>> How does it work?
>> 
>
> Well the code is technically undefined!

Not undefined, but implementation-defined.

Andreas.
  
Richard Earnshaw (lists) June 15, 2020, 6:14 p.m. UTC | #14
On 15/06/2020 19:05, Paul Eggert wrote:
> On 6/15/20 9:51 AM, Richard Earnshaw (lists) wrote:
>> In practice it will work because objects passed to memmove will have to
>> have a single colour,
> 
> Does this mean all stack and heap objects visible to the C programmer must have
> the same tag? This surprises me, as I thought part of the idea was to assign
> tags randomly.
> 

No, I mean that a single block of memory must have a single tag, so if
the regions overlap, they must have the same tag.  If they don't
overlap, then the tags can be different, but then the length check would
indicate that as well.

memcpy/memmove aren't expected to copy the tag, since you might be
copying some data into the middle of another block.  All hell would
likely break out if they tried to do that.

R.
  
Szabolcs Nagy June 15, 2020, 6:41 p.m. UTC | #15
The 06/15/2020 11:05, Paul Eggert wrote:
> On 6/15/20 9:51 AM, Richard Earnshaw (lists) wrote:
> > In practice it will work because objects passed to memmove will have to
> > have a single colour,
> 
> Does this mean all stack and heap objects visible to the C programmer must have
> the same tag? This surprises me, as I thought part of the idea was to assign
> tags randomly.

the check works for non-overlapping objects with
or without tagging the same way, so different
heap allocations can have different color.

for overlapping objects the pointers must have
the same tag for the check to work, so single
heap allocations must have a single color.
this is guaranteed by the proposed design.

(in case of heap allocation it would be
difficult to do otherwise, but e.g. stack
tagging could try to color different fields
in a struct differently and then memmove
would fail to detect an overlapping copy).
  
H.J. Lu June 15, 2020, 7:18 p.m. UTC | #16
On Mon, Jun 15, 2020 at 11:41 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>
> The 06/15/2020 11:05, Paul Eggert wrote:
> > On 6/15/20 9:51 AM, Richard Earnshaw (lists) wrote:
> > > In practice it will work because objects passed to memmove will have to
> > > have a single colour,
> >
> > Does this mean all stack and heap objects visible to the C programmer must have
> > the same tag? This surprises me, as I thought part of the idea was to assign
> > tags randomly.
>
> the check works for non-overlapping objects with
> or without tagging the same way, so different
> heap allocations can have different color.
>
> for overlapping objects the pointers must have
> the same tag for the check to work, so single
> heap allocations must have a single color.
> this is guaranteed by the proposed design.
>
> (in case of heap allocation it would be
> difficult to do otherwise, but e.g. stack
> tagging could try to color different fields
> in a struct differently and then memmove
> would fail to detect an overlapping copy).

I think we need a marker to indicate an object is MTE compatible.
  
Szabolcs Nagy June 16, 2020, 8:14 a.m. UTC | #17
The 06/15/2020 12:18, H.J. Lu wrote:
> I think we need a marker to indicate an object is MTE compatible.

i expect users to only able to discover tag-unsafety
issues in applications at runtime and even if there
is static information in binaries that's usually too
late to do anything useful about it (other than fail).

so i think an initial implementation that is off by
default but can be turned on with an env var makes
sense, but i agree that to turn tagging on by default
some markings will be needed such that tag-unsafe
applications continue to work (if possible).

i think a marking design for heap tagging has to at
least consider:

1 malloc can be interposed and then tagging is not a
  libc internal decision. so i think we would have to
  expose the markings to user code (e.g. like hwcaps)
  or have an opt-in libc api for malloc implementors
  to request tagging which can fail in the presence
  of unmarked modules.

2 software components other than malloc may choose
  to use tagging on memory that they manage: tagging
  can be controlled per page on aarch64, the only
  global settings are: enable syscall abi to take
  tagged pointers, select the tag checking behaviour
  (e.g. SIGSEGV on mismatch) and select the behaviour
  of the random tag generator instruction. The global
  settings may need coordination, but the ability to
  use tagging (locally) should not be disabled based
  on markings (of other modules).

3 users will want to force the tagging on at runtime
  even if their code is not tagging safe: this can
  be used for debugging unmodified binaries, and
  hitting a tagging issue is a dynamic property not
  statically determined, i.e. using tagging with
  code that's not generally tagging safe may work.
  the nice thing about heap tagging is that it's a
  runtime decision, no recompilation is needed, we
  should not ruin this.

4 there is tag-unsafe code that makes assumptions
  about pointer representation (e.g. stores things in
  the top bits of pointers) passing tagged pointers
  to such a module does not work with whatever tag
  checking setting, but there is tag-unsafe code that
  makes assumptions about the granularity of memory
  protection (e.g. assumes out of bound read is ok if
  it does not cross a page boundary) which is fine
  with tagged pointers, it just does not want to fault
  (e.g. tag checking can have a monitoring mode where
  tag mismatch is counted by the kernel per process
  and not cause runtime behaviour changes), so if we
  have markings then we need different marking for
  "compatible with tagged pointers" and "compatible
  with faulting tag checking mode".

do you think such static marking design can be
introduced later? what should the compiler do?
(in principle a compiler can generte tag-unsafe code
for conforming c code currently, but that's unlikely,
it's much more likely that the source code is doing
something non-portable, but that's hard to find out
in the compiler)
  
Richard Earnshaw (lists) June 16, 2020, 9:36 a.m. UTC | #18
On 15/06/2020 20:18, H.J. Lu via Libc-alpha wrote:
> On Mon, Jun 15, 2020 at 11:41 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>
>> The 06/15/2020 11:05, Paul Eggert wrote:
>>> On 6/15/20 9:51 AM, Richard Earnshaw (lists) wrote:
>>>> In practice it will work because objects passed to memmove will have to
>>>> have a single colour,
>>>
>>> Does this mean all stack and heap objects visible to the C programmer must have
>>> the same tag? This surprises me, as I thought part of the idea was to assign
>>> tags randomly.
>>
>> the check works for non-overlapping objects with
>> or without tagging the same way, so different
>> heap allocations can have different color.
>>
>> for overlapping objects the pointers must have
>> the same tag for the check to work, so single
>> heap allocations must have a single color.
>> this is guaranteed by the proposed design.
>>
>> (in case of heap allocation it would be
>> difficult to do otherwise, but e.g. stack
>> tagging could try to color different fields
>> in a struct differently and then memmove
>> would fail to detect an overlapping copy).
> 
> I think we need a marker to indicate an object is MTE compatible.
> 

I think that's inverted.  Firstly, we would then need to recompile
everything in order to use MTE; that's highly undesirable, especially
when it's largely unnecessary.  Secondly, how would a compiler know?  It
generally can't see when a programmer isn't conforming fully to the
standard - so you'd have to trust the user and just put in the tag.
Quite frankly, that's just silly.

There are a small number of programs that do violate the principles that
underlying tagging - sadly Python's memory management code is one of
them - it has objects that are a mix of malloc'd objects and objects
it's doled out from its own heap, but it assumes it can just grub around
in a memory page to find out which - yuck.  Ideally, these would be
marked in some way so that we never enabled MTE in those cases.  But at
present, I don't see that as urgent.

Remember, MTE is primarily a *debugging* aid, intended to help identify
issues such as buffer overruns and other allocation issues such as
use-after free or double free.  It doesn't have to be on all the time,
so if some programs won't work with it that's not a fatal issue - just
don't turn it on in that case.

R.
  
Szabolcs Nagy June 16, 2020, 10:16 a.m. UTC | #19
The 06/15/2020 15:40, Richard Earnshaw wrote:
> 2) Tests that assume that malloc_usable_size will return a specific
> amount of free space.  The assumptions are not correct, because the
> tag colouring boundaries needed for MTE means that the 8 bytes in the
> block containing the back pointer can no-longer be used by users when
> we have MTE (they have a different colour that belongs to the malloc
> data structures).

with --enable-memory-tagging i see

FAIL: malloc/tst-malloc-usable
FAIL: malloc/tst-malloc-usable-static
FAIL: malloc/tst-malloc-usable-static-tunables
FAIL: malloc/tst-malloc-usable-tunables

malloc_usable_size(malloc(7)) is 16 with
MALLOC_CHECK_=0 and it's 0 with MALLOC_CHECK_=3.

i think this breaks existing usage, so either
malloc check should be disabled if memory tagging
is enabled or fixed to be compatible.
(or at least the issue should be documented)
  
H.J. Lu June 16, 2020, 1:31 p.m. UTC | #20
On Tue, Jun 16, 2020 at 1:14 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>
> The 06/15/2020 12:18, H.J. Lu wrote:
> > I think we need a marker to indicate an object is MTE compatible.
>
> i expect users to only able to discover tag-unsafety
> issues in applications at runtime and even if there
> is static information in binaries that's usually too
> late to do anything useful about it (other than fail).
>
> so i think an initial implementation that is off by
> default but can be turned on with an env var makes
> sense, but i agree that to turn tagging on by default
> some markings will be needed such that tag-unsafe
> applications continue to work (if possible).

MTE isn't just another debugging tool.  Otherwise, we wouldn't
want it in glibc.   Since we are enabling MTE in some object files,
at least we should mark them as MTE enabled.

> i think a marking design for heap tagging has to at
> least consider:
>
> 1 malloc can be interposed and then tagging is not a
>   libc internal decision. so i think we would have to
>   expose the markings to user code (e.g. like hwcaps)
>   or have an opt-in libc api for malloc implementors
>   to request tagging which can fail in the presence
>   of unmarked modules.

MTE marker can be in the GNU_PROPERTY segment.

> 2 software components other than malloc may choose
>   to use tagging on memory that they manage: tagging
>   can be controlled per page on aarch64, the only
>   global settings are: enable syscall abi to take
>   tagged pointers, select the tag checking behaviour
>   (e.g. SIGSEGV on mismatch) and select the behaviour
>   of the random tag generator instruction. The global
>   settings may need coordination, but the ability to
>   use tagging (locally) should not be disabled based
>   on markings (of other modules).

How to use the MTE marker info is up to the runtime.

> 3 users will want to force the tagging on at runtime
>   even if their code is not tagging safe: this can
>   be used for debugging unmodified binaries, and
>   hitting a tagging issue is a dynamic property not
>   statically determined, i.e. using tagging with
>   code that's not generally tagging safe may work.
>   the nice thing about heap tagging is that it's a
>   runtime decision, no recompilation is needed, we
>   should not ruin this.

This sounds like

GLIBC_TUNABLES=glibc.cpu.x86_shstk=on

> 4 there is tag-unsafe code that makes assumptions
>   about pointer representation (e.g. stores things in
>   the top bits of pointers) passing tagged pointers
>   to such a module does not work with whatever tag
>   checking setting, but there is tag-unsafe code that
>   makes assumptions about the granularity of memory
>   protection (e.g. assumes out of bound read is ok if
>   it does not cross a page boundary) which is fine
>   with tagged pointers, it just does not want to fault
>   (e.g. tag checking can have a monitoring mode where
>   tag mismatch is counted by the kernel per process
>   and not cause runtime behaviour changes), so if we
>   have markings then we need different marking for
>   "compatible with tagged pointers" and "compatible
>   with faulting tag checking mode".

It sounds like IBT and SHSTK in CET.

> do you think such static marking design can be

I think we should do it now.

> introduced later? what should the compiler do?
> (in principle a compiler can generte tag-unsafe code
> for conforming c code currently, but that's unlikely,
> it's much more likely that the source code is doing
> something non-portable, but that's hard to find out
> in the compiler)

Compiler should try to detect it, like -Wstringop-overflow.
Initially compilers can add a marker only via a command-line
option.
  
H.J. Lu June 16, 2020, 1:37 p.m. UTC | #21
On Tue, Jun 16, 2020 at 2:36 AM Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
>
> On 15/06/2020 20:18, H.J. Lu via Libc-alpha wrote:
> > On Mon, Jun 15, 2020 at 11:41 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> >>
> >> The 06/15/2020 11:05, Paul Eggert wrote:
> >>> On 6/15/20 9:51 AM, Richard Earnshaw (lists) wrote:
> >>>> In practice it will work because objects passed to memmove will have to
> >>>> have a single colour,
> >>>
> >>> Does this mean all stack and heap objects visible to the C programmer must have
> >>> the same tag? This surprises me, as I thought part of the idea was to assign
> >>> tags randomly.
> >>
> >> the check works for non-overlapping objects with
> >> or without tagging the same way, so different
> >> heap allocations can have different color.
> >>
> >> for overlapping objects the pointers must have
> >> the same tag for the check to work, so single
> >> heap allocations must have a single color.
> >> this is guaranteed by the proposed design.
> >>
> >> (in case of heap allocation it would be
> >> difficult to do otherwise, but e.g. stack
> >> tagging could try to color different fields
> >> in a struct differently and then memmove
> >> would fail to detect an overlapping copy).
> >
> > I think we need a marker to indicate an object is MTE compatible.
> >
>
> I think that's inverted.  Firstly, we would then need to recompile
> everything in order to use MTE; that's highly undesirable, especially

Without the marker, MTE is just another debug tool which I don't think
glibc needs.  With the marker, MTE can be used to protect glibc.

> when it's largely unnecessary.  Secondly, how would a compiler know?  It
> generally can't see when a programmer isn't conforming fully to the
> standard - so you'd have to trust the user and just put in the tag.
> Quite frankly, that's just silly.

Initially the marker needs to be added explicitly by programmers, just
like we are enabling MTE in some glibc files.

> There are a small number of programs that do violate the principles that
> underlying tagging - sadly Python's memory management code is one of
> them - it has objects that are a mix of malloc'd objects and objects
> it's doled out from its own heap, but it assumes it can just grub around
> in a memory page to find out which - yuck.  Ideally, these would be
> marked in some way so that we never enabled MTE in those cases.  But at
> present, I don't see that as urgent.
>
> Remember, MTE is primarily a *debugging* aid, intended to help identify
> issues such as buffer overruns and other allocation issues such as
> use-after free or double free.  It doesn't have to be on all the time,
> so if some programs won't work with it that's not a fatal issue - just
> don't turn it on in that case.
>

But, MTE can be used to protect glibc. We may not be able to do it
today.  But an MTE marker can help us get there.
  
Florian Weimer June 16, 2020, 1:44 p.m. UTC | #22
* Szabolcs Nagy:

> The 06/15/2020 15:40, Richard Earnshaw wrote:
>> 2) Tests that assume that malloc_usable_size will return a specific
>> amount of free space.  The assumptions are not correct, because the
>> tag colouring boundaries needed for MTE means that the 8 bytes in the
>> block containing the back pointer can no-longer be used by users when
>> we have MTE (they have a different colour that belongs to the malloc
>> data structures).
>
> with --enable-memory-tagging i see
>
> FAIL: malloc/tst-malloc-usable
> FAIL: malloc/tst-malloc-usable-static
> FAIL: malloc/tst-malloc-usable-static-tunables
> FAIL: malloc/tst-malloc-usable-tunables
>
> malloc_usable_size(malloc(7)) is 16 with
> MALLOC_CHECK_=0 and it's 0 with MALLOC_CHECK_=3.
>
> i think this breaks existing usage, so either
> malloc check should be disabled if memory tagging
> is enabled or fixed to be compatible.
> (or at least the issue should be documented)

I'm with Richard here—this is an incorrect test expectation, not a bug
in the implementation.

Thanks,
Florian
  
Richard Earnshaw June 16, 2020, 2:06 p.m. UTC | #23
On 16/06/2020 14:44, Florian Weimer via Libc-alpha wrote:
> * Szabolcs Nagy:
> 
>> The 06/15/2020 15:40, Richard Earnshaw wrote:
>>> 2) Tests that assume that malloc_usable_size will return a specific
>>> amount of free space.  The assumptions are not correct, because the
>>> tag colouring boundaries needed for MTE means that the 8 bytes in the
>>> block containing the back pointer can no-longer be used by users when
>>> we have MTE (they have a different colour that belongs to the malloc
>>> data structures).
>>
>> with --enable-memory-tagging i see
>>
>> FAIL: malloc/tst-malloc-usable
>> FAIL: malloc/tst-malloc-usable-static
>> FAIL: malloc/tst-malloc-usable-static-tunables
>> FAIL: malloc/tst-malloc-usable-tunables
>>
>> malloc_usable_size(malloc(7)) is 16 with
>> MALLOC_CHECK_=0 and it's 0 with MALLOC_CHECK_=3.
>>
>> i think this breaks existing usage, so either
>> malloc check should be disabled if memory tagging
>> is enabled or fixed to be compatible.
>> (or at least the issue should be documented)
> 
> I'm with Richard here—this is an incorrect test expectation, not a bug
> in the implementation.
> 
> Thanks,
> Florian
> 

Actually, I think there is a real issue that I have to solve: the usable
size should never be less than the allocation request.

The problem is that I round down the allocation agressively in
malloc_usable_size and that does not account for the MALLOC_CHECK case
where the overall size is reduced by one to allow for the magic cookie
marker.

When MTE is enabled, I'm not sure it makes too much sense to also enable
MALLOC_CHECK: it does essentially the same thing, but less well.  But
when it is off (and the library is configured to support MTE), it does
need to work as expected.

I'm still thinking about this case.  One option is to put the cookie
inside the word that we no-longer hand out to the user.  It's harmless
to put it there and it avoids wasting even more space, even if the user
normally can't overrun it when MTE is on.

R.
  
Richard Earnshaw (lists) June 16, 2020, 2:14 p.m. UTC | #24
On 16/06/2020 14:31, H.J. Lu via Libc-alpha wrote:
> On Tue, Jun 16, 2020 at 1:14 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>
>> The 06/15/2020 12:18, H.J. Lu wrote:
>>> I think we need a marker to indicate an object is MTE compatible.
>>
>> i expect users to only able to discover tag-unsafety
>> issues in applications at runtime and even if there
>> is static information in binaries that's usually too
>> late to do anything useful about it (other than fail).
>>
>> so i think an initial implementation that is off by
>> default but can be turned on with an env var makes
>> sense, but i agree that to turn tagging on by default
>> some markings will be needed such that tag-unsafe
>> applications continue to work (if possible).
> 
> MTE isn't just another debugging tool.  Otherwise, we wouldn't
> want it in glibc.   

That doesn't follow from your claim.  eg we have MALLOC_CHECK - that's a
debugging tool and we have it in glibc.

> Since we are enabling MTE in some object files,
> at least we should mark them as MTE enabled.

When we start using MTE to colour stack objects, yes, we'll need to mark
object files that use it (they require longjmp to clean up the stack
colouring, for example).  Until then, I disagree.  You do not need to
mark every object file as compatible with *heap* objects that have been
coloured.  If you disagree please provide a concrete example.

> 
>> i think a marking design for heap tagging has to at
>> least consider:
>>
>> 1 malloc can be interposed and then tagging is not a
>>   libc internal decision. so i think we would have to
>>   expose the markings to user code (e.g. like hwcaps)
>>   or have an opt-in libc api for malloc implementors
>>   to request tagging which can fail in the presence
>>   of unmarked modules.
> 
> MTE marker can be in the GNU_PROPERTY segment.
> 
>> 2 software components other than malloc may choose
>>   to use tagging on memory that they manage: tagging
>>   can be controlled per page on aarch64, the only
>>   global settings are: enable syscall abi to take
>>   tagged pointers, select the tag checking behaviour
>>   (e.g. SIGSEGV on mismatch) and select the behaviour
>>   of the random tag generator instruction. The global
>>   settings may need coordination, but the ability to
>>   use tagging (locally) should not be disabled based
>>   on markings (of other modules).
> 
> How to use the MTE marker info is up to the runtime.
> 
>> 3 users will want to force the tagging on at runtime
>>   even if their code is not tagging safe: this can
>>   be used for debugging unmodified binaries, and
>>   hitting a tagging issue is a dynamic property not
>>   statically determined, i.e. using tagging with
>>   code that's not generally tagging safe may work.
>>   the nice thing about heap tagging is that it's a
>>   runtime decision, no recompilation is needed, we
>>   should not ruin this.
> 
> This sounds like
> 
> GLIBC_TUNABLES=glibc.cpu.x86_shstk=on
> 
>> 4 there is tag-unsafe code that makes assumptions
>>   about pointer representation (e.g. stores things in
>>   the top bits of pointers) passing tagged pointers
>>   to such a module does not work with whatever tag
>>   checking setting, but there is tag-unsafe code that
>>   makes assumptions about the granularity of memory
>>   protection (e.g. assumes out of bound read is ok if
>>   it does not cross a page boundary) which is fine
>>   with tagged pointers, it just does not want to fault
>>   (e.g. tag checking can have a monitoring mode where
>>   tag mismatch is counted by the kernel per process
>>   and not cause runtime behaviour changes), so if we
>>   have markings then we need different marking for
>>   "compatible with tagged pointers" and "compatible
>>   with faulting tag checking mode".
> 
> It sounds like IBT and SHSTK in CET.
> 
>> do you think such static marking design can be
> 
> I think we should do it now.
> 
>> introduced later? what should the compiler do?
>> (in principle a compiler can generte tag-unsafe code
>> for conforming c code currently, but that's unlikely,
>> it's much more likely that the source code is doing
>> something non-portable, but that's hard to find out
>> in the compiler)
> 
> Compiler should try to detect it, like -Wstringop-overflow.
> Initially compilers can add a marker only via a command-line
> option.
>
  
H.J. Lu June 16, 2020, 2:27 p.m. UTC | #25
On Tue, Jun 16, 2020 at 7:14 AM Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
>
> On 16/06/2020 14:31, H.J. Lu via Libc-alpha wrote:
> > On Tue, Jun 16, 2020 at 1:14 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> >>
> >> The 06/15/2020 12:18, H.J. Lu wrote:
> >>> I think we need a marker to indicate an object is MTE compatible.
> >>
> >> i expect users to only able to discover tag-unsafety
> >> issues in applications at runtime and even if there
> >> is static information in binaries that's usually too
> >> late to do anything useful about it (other than fail).
> >>
> >> so i think an initial implementation that is off by
> >> default but can be turned on with an env var makes
> >> sense, but i agree that to turn tagging on by default
> >> some markings will be needed such that tag-unsafe
> >> applications continue to work (if possible).
> >
> > MTE isn't just another debugging tool.  Otherwise, we wouldn't
> > want it in glibc.
>
> That doesn't follow from your claim.  eg we have MALLOC_CHECK - that's a
> debugging tool and we have it in glibc.
>
> > Since we are enabling MTE in some object files,
> > at least we should mark them as MTE enabled.
>
> When we start using MTE to colour stack objects, yes, we'll need to mark
> object files that use it (they require longjmp to clean up the stack
> colouring, for example).  Until then, I disagree.  You do not need to
> mark every object file as compatible with *heap* objects that have been
> coloured.  If you disagree please provide a concrete example.
>

I can image MTE is always on by default, starting with glibc.
  
Richard Earnshaw (lists) June 16, 2020, 2:34 p.m. UTC | #26
On 16/06/2020 15:27, H.J. Lu wrote:
> On Tue, Jun 16, 2020 at 7:14 AM Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com> wrote:
>>
>> On 16/06/2020 14:31, H.J. Lu via Libc-alpha wrote:
>> > On Tue, Jun 16, 2020 at 1:14 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>> >>
>> >> The 06/15/2020 12:18, H.J. Lu wrote:
>> >>> I think we need a marker to indicate an object is MTE compatible.
>> >>
>> >> i expect users to only able to discover tag-unsafety
>> >> issues in applications at runtime and even if there
>> >> is static information in binaries that's usually too
>> >> late to do anything useful about it (other than fail).
>> >>
>> >> so i think an initial implementation that is off by
>> >> default but can be turned on with an env var makes
>> >> sense, but i agree that to turn tagging on by default
>> >> some markings will be needed such that tag-unsafe
>> >> applications continue to work (if possible).
>> >
>> > MTE isn't just another debugging tool.  Otherwise, we wouldn't
>> > want it in glibc.
>>
>> That doesn't follow from your claim.  eg we have MALLOC_CHECK - that's a
>> debugging tool and we have it in glibc.
>>
>> > Since we are enabling MTE in some object files,
>> > at least we should mark them as MTE enabled.
>>
>> When we start using MTE to colour stack objects, yes, we'll need to mark
>> object files that use it (they require longjmp to clean up the stack
>> colouring, for example).  Until then, I disagree.  You do not need to
>> mark every object file as compatible with *heap* objects that have been
>> coloured.  If you disagree please provide a concrete example.
>>
> 
> I can image MTE is always on by default, starting with glibc.

I think that's unlikely, even with lazy faulting because the HW overhead
is not zero.  At some point, it might be that we'd want to enable it
semi-randomly when running some programs (I've heard of some OSes
planning to do something like that for telemetry type monitoring), but
we're a long way from that; and what's more, we're a long way from
really having a list of what's safe and what's not.  So until then,
arbitrarily marking objects to say they're safe when we don't know that
will just create a marker that has zero use, because we won't be able to
rely on it.

If you're going to have markers, you'd better be 100% sure what it's
telling you is right, or it's not worth the bits you've spent on it.

R.

> 
> 
> -- 
> H.J.
  
H.J. Lu June 16, 2020, 2:52 p.m. UTC | #27
On Tue, Jun 16, 2020 at 7:34 AM Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
>
> On 16/06/2020 15:27, H.J. Lu wrote:
> > On Tue, Jun 16, 2020 at 7:14 AM Richard Earnshaw (lists)
> > <Richard.Earnshaw@arm.com> wrote:
> >>
> >> On 16/06/2020 14:31, H.J. Lu via Libc-alpha wrote:
> >> > On Tue, Jun 16, 2020 at 1:14 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> >> >>
> >> >> The 06/15/2020 12:18, H.J. Lu wrote:
> >> >>> I think we need a marker to indicate an object is MTE compatible.
> >> >>
> >> >> i expect users to only able to discover tag-unsafety
> >> >> issues in applications at runtime and even if there
> >> >> is static information in binaries that's usually too
> >> >> late to do anything useful about it (other than fail).
> >> >>
> >> >> so i think an initial implementation that is off by
> >> >> default but can be turned on with an env var makes
> >> >> sense, but i agree that to turn tagging on by default
> >> >> some markings will be needed such that tag-unsafe
> >> >> applications continue to work (if possible).
> >> >
> >> > MTE isn't just another debugging tool.  Otherwise, we wouldn't
> >> > want it in glibc.
> >>
> >> That doesn't follow from your claim.  eg we have MALLOC_CHECK - that's a
> >> debugging tool and we have it in glibc.
> >>
> >> > Since we are enabling MTE in some object files,
> >> > at least we should mark them as MTE enabled.
> >>
> >> When we start using MTE to colour stack objects, yes, we'll need to mark
> >> object files that use it (they require longjmp to clean up the stack
> >> colouring, for example).  Until then, I disagree.  You do not need to
> >> mark every object file as compatible with *heap* objects that have been
> >> coloured.  If you disagree please provide a concrete example.
> >>
> >
> > I can image MTE is always on by default, starting with glibc.
>
> I think that's unlikely, even with lazy faulting because the HW overhead
> is not zero.  At some point, it might be that we'd want to enable it
> semi-randomly when running some programs (I've heard of some OSes
> planning to do something like that for telemetry type monitoring), but
> we're a long way from that; and what's more, we're a long way from
> really having a list of what's safe and what's not.  So until then,
> arbitrarily marking objects to say they're safe when we don't know that
> will just create a marker that has zero use, because we won't be able to
> rely on it.

Yes, there will be overhead.  But some users will be OK with it.

> If you're going to have markers, you'd better be 100% sure what it's
> telling you is right, or it's not worth the bits you've spent on it.
>

There will be mistakes.  It doesn't mean that we shouldn't do it.
  
Richard Earnshaw (lists) June 16, 2020, 3:10 p.m. UTC | #28
On 16/06/2020 15:52, H.J. Lu via Libc-alpha wrote:
> On Tue, Jun 16, 2020 at 7:34 AM Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com> wrote:
>>
>> On 16/06/2020 15:27, H.J. Lu wrote:
>>> On Tue, Jun 16, 2020 at 7:14 AM Richard Earnshaw (lists)
>>> <Richard.Earnshaw@arm.com> wrote:
>>>>
>>>> On 16/06/2020 14:31, H.J. Lu via Libc-alpha wrote:
>>>>> On Tue, Jun 16, 2020 at 1:14 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>>>>
>>>>>> The 06/15/2020 12:18, H.J. Lu wrote:
>>>>>>> I think we need a marker to indicate an object is MTE compatible.
>>>>>>
>>>>>> i expect users to only able to discover tag-unsafety
>>>>>> issues in applications at runtime and even if there
>>>>>> is static information in binaries that's usually too
>>>>>> late to do anything useful about it (other than fail).
>>>>>>
>>>>>> so i think an initial implementation that is off by
>>>>>> default but can be turned on with an env var makes
>>>>>> sense, but i agree that to turn tagging on by default
>>>>>> some markings will be needed such that tag-unsafe
>>>>>> applications continue to work (if possible).
>>>>>
>>>>> MTE isn't just another debugging tool.  Otherwise, we wouldn't
>>>>> want it in glibc.
>>>>
>>>> That doesn't follow from your claim.  eg we have MALLOC_CHECK - that's a
>>>> debugging tool and we have it in glibc.
>>>>
>>>>> Since we are enabling MTE in some object files,
>>>>> at least we should mark them as MTE enabled.
>>>>
>>>> When we start using MTE to colour stack objects, yes, we'll need to mark
>>>> object files that use it (they require longjmp to clean up the stack
>>>> colouring, for example).  Until then, I disagree.  You do not need to
>>>> mark every object file as compatible with *heap* objects that have been
>>>> coloured.  If you disagree please provide a concrete example.
>>>>
>>>
>>> I can image MTE is always on by default, starting with glibc.
>>
>> I think that's unlikely, even with lazy faulting because the HW overhead
>> is not zero.  At some point, it might be that we'd want to enable it
>> semi-randomly when running some programs (I've heard of some OSes
>> planning to do something like that for telemetry type monitoring), but
>> we're a long way from that; and what's more, we're a long way from
>> really having a list of what's safe and what's not.  So until then,
>> arbitrarily marking objects to say they're safe when we don't know that
>> will just create a marker that has zero use, because we won't be able to
>> rely on it.
> 
> Yes, there will be overhead.  But some users will be OK with it.
> 
>> If you're going to have markers, you'd better be 100% sure what it's
>> telling you is right, or it's not worth the bits you've spent on it.
>>
> 
> There will be mistakes.  It doesn't mean that we shouldn't do it.
> 
> 

But we shouldn't be doing it NOW, at least, not until we have a MUCH
better idea of what constitutes a safe vs an unsafe program and can give
clear advice to developers (or automate the tools) to do this sensibly.
 Anything else would be a random guess and create more problems than it
solves.

R.
  
H.J. Lu June 16, 2020, 4:33 p.m. UTC | #29
On Tue, Jun 16, 2020 at 8:10 AM Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
>
> On 16/06/2020 15:52, H.J. Lu via Libc-alpha wrote:
> > On Tue, Jun 16, 2020 at 7:34 AM Richard Earnshaw (lists)
> > <Richard.Earnshaw@arm.com> wrote:
> >>
> >> On 16/06/2020 15:27, H.J. Lu wrote:
> >>> On Tue, Jun 16, 2020 at 7:14 AM Richard Earnshaw (lists)
> >>> <Richard.Earnshaw@arm.com> wrote:
> >>>>
> >>>> On 16/06/2020 14:31, H.J. Lu via Libc-alpha wrote:
> >>>>> On Tue, Jun 16, 2020 at 1:14 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> >>>>>>
> >>>>>> The 06/15/2020 12:18, H.J. Lu wrote:
> >>>>>>> I think we need a marker to indicate an object is MTE compatible.
> >>>>>>
> >>>>>> i expect users to only able to discover tag-unsafety
> >>>>>> issues in applications at runtime and even if there
> >>>>>> is static information in binaries that's usually too
> >>>>>> late to do anything useful about it (other than fail).
> >>>>>>
> >>>>>> so i think an initial implementation that is off by
> >>>>>> default but can be turned on with an env var makes
> >>>>>> sense, but i agree that to turn tagging on by default
> >>>>>> some markings will be needed such that tag-unsafe
> >>>>>> applications continue to work (if possible).
> >>>>>
> >>>>> MTE isn't just another debugging tool.  Otherwise, we wouldn't
> >>>>> want it in glibc.
> >>>>
> >>>> That doesn't follow from your claim.  eg we have MALLOC_CHECK - that's a
> >>>> debugging tool and we have it in glibc.
> >>>>
> >>>>> Since we are enabling MTE in some object files,
> >>>>> at least we should mark them as MTE enabled.
> >>>>
> >>>> When we start using MTE to colour stack objects, yes, we'll need to mark
> >>>> object files that use it (they require longjmp to clean up the stack
> >>>> colouring, for example).  Until then, I disagree.  You do not need to
> >>>> mark every object file as compatible with *heap* objects that have been
> >>>> coloured.  If you disagree please provide a concrete example.
> >>>>
> >>>
> >>> I can image MTE is always on by default, starting with glibc.
> >>
> >> I think that's unlikely, even with lazy faulting because the HW overhead
> >> is not zero.  At some point, it might be that we'd want to enable it
> >> semi-randomly when running some programs (I've heard of some OSes
> >> planning to do something like that for telemetry type monitoring), but
> >> we're a long way from that; and what's more, we're a long way from
> >> really having a list of what's safe and what's not.  So until then,
> >> arbitrarily marking objects to say they're safe when we don't know that
> >> will just create a marker that has zero use, because we won't be able to
> >> rely on it.
> >
> > Yes, there will be overhead.  But some users will be OK with it.
> >
> >> If you're going to have markers, you'd better be 100% sure what it's
> >> telling you is right, or it's not worth the bits you've spent on it.
> >>
> >
> > There will be mistakes.  It doesn't mean that we shouldn't do it.
> >
> >
>
> But we shouldn't be doing it NOW, at least, not until we have a MUCH
> better idea of what constitutes a safe vs an unsafe program and can give
> clear advice to developers (or automate the tools) to do this sensibly.
>  Anything else would be a random guess and create more problems than it
> solves.

If we don't plan it from the very beginning, it may never happen.
  
Szabolcs Nagy June 17, 2020, 10:53 a.m. UTC | #30
The 06/16/2020 09:33, H.J. Lu via Libc-alpha wrote:
> On Tue, Jun 16, 2020 at 8:10 AM Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com> wrote:
> >
> > On 16/06/2020 15:52, H.J. Lu via Libc-alpha wrote:
> > > On Tue, Jun 16, 2020 at 7:34 AM Richard Earnshaw (lists)
> > > <Richard.Earnshaw@arm.com> wrote:
> > >>
> > >> On 16/06/2020 15:27, H.J. Lu wrote:
> > >>> On Tue, Jun 16, 2020 at 7:14 AM Richard Earnshaw (lists)
> > >>> <Richard.Earnshaw@arm.com> wrote:
> > >>>>
> > >>>> On 16/06/2020 14:31, H.J. Lu via Libc-alpha wrote:
> > >>>>> On Tue, Jun 16, 2020 at 1:14 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> > >>>>>>
> > >>>>>> The 06/15/2020 12:18, H.J. Lu wrote:
> > >>>>>>> I think we need a marker to indicate an object is MTE compatible.
> > >>>>>>
> > >>>>>> i expect users to only able to discover tag-unsafety
> > >>>>>> issues in applications at runtime and even if there
> > >>>>>> is static information in binaries that's usually too
> > >>>>>> late to do anything useful about it (other than fail).
> > >>>>>>
> > >>>>>> so i think an initial implementation that is off by
> > >>>>>> default but can be turned on with an env var makes
> > >>>>>> sense, but i agree that to turn tagging on by default
> > >>>>>> some markings will be needed such that tag-unsafe
> > >>>>>> applications continue to work (if possible).
> > >>>>>
> > >>>>> MTE isn't just another debugging tool.  Otherwise, we wouldn't
> > >>>>> want it in glibc.
> > >>>>
> > >>>> That doesn't follow from your claim.  eg we have MALLOC_CHECK - that's a
> > >>>> debugging tool and we have it in glibc.
> > >>>>
> > >>>>> Since we are enabling MTE in some object files,
> > >>>>> at least we should mark them as MTE enabled.
> > >>>>
> > >>>> When we start using MTE to colour stack objects, yes, we'll need to mark
> > >>>> object files that use it (they require longjmp to clean up the stack
> > >>>> colouring, for example).  Until then, I disagree.  You do not need to
> > >>>> mark every object file as compatible with *heap* objects that have been
> > >>>> coloured.  If you disagree please provide a concrete example.
> > >>>>
> > >>>
> > >>> I can image MTE is always on by default, starting with glibc.
> > >>
> > >> I think that's unlikely, even with lazy faulting because the HW overhead
> > >> is not zero.  At some point, it might be that we'd want to enable it
> > >> semi-randomly when running some programs (I've heard of some OSes
> > >> planning to do something like that for telemetry type monitoring), but
> > >> we're a long way from that; and what's more, we're a long way from
> > >> really having a list of what's safe and what's not.  So until then,
> > >> arbitrarily marking objects to say they're safe when we don't know that
> > >> will just create a marker that has zero use, because we won't be able to
> > >> rely on it.
> > >
> > > Yes, there will be overhead.  But some users will be OK with it.
> > >
> > >> If you're going to have markers, you'd better be 100% sure what it's
> > >> telling you is right, or it's not worth the bits you've spent on it.
> > >>
> > >
> > > There will be mistakes.  It doesn't mean that we shouldn't do it.
> > >
> > >
> >
> > But we shouldn't be doing it NOW, at least, not until we have a MUCH
> > better idea of what constitutes a safe vs an unsafe program and can give
> > clear advice to developers (or automate the tools) to do this sensibly.
> >  Anything else would be a random guess and create more problems than it
> > solves.
> 
> If we don't plan it from the very beginning, it may never happen.

the two main usage of heap tagging is debugging
and security hardening.

for debugging the env var is fine.

for security if we tag things as mte unsafe
we get a system where all the relevant
applications are marked unsafe and mte is not
enabled (the main culprits for mte unsafety are
the complex runtimes that we want to protect:
python, browsers, ..., they are also the cases
where marking does not really work: they dlopen
their dependencies, so they fail at runtime
with or without a marking)

i think heap tagging is different from shadow
stack in that 99% of heap tagging failures are
not some obscure hack that is valid but happens
to be incompatible with the feature, but very
likely a bug in the code that is undefined by
the language standard and thus the compiler
can already break the code if it was smarter.
(and often it may be exploitable and then we
definitely don't want to use marking, but
fix the actual bug)

i don't know how the shadow stack marking
works: none of the asm files in glibc are
marked, do you force the marking on in the
linker? or the assembler? either way i don't
think it is a very reliable or easy to
maintain mechanism. but i agree that we will
need some mechanism to keep existing binaries
working (so users don't need to make a choice
between secure but broken or insecure but
working setup)
  
H.J. Lu June 18, 2020, 8:30 p.m. UTC | #31
On Wed, Jun 17, 2020 at 3:53 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>
> The 06/16/2020 09:33, H.J. Lu via Libc-alpha wrote:
> > On Tue, Jun 16, 2020 at 8:10 AM Richard Earnshaw (lists)
> > <Richard.Earnshaw@arm.com> wrote:
> > >
> > > On 16/06/2020 15:52, H.J. Lu via Libc-alpha wrote:
> > > > On Tue, Jun 16, 2020 at 7:34 AM Richard Earnshaw (lists)
> > > > <Richard.Earnshaw@arm.com> wrote:
> > > >>
> > > >> On 16/06/2020 15:27, H.J. Lu wrote:
> > > >>> On Tue, Jun 16, 2020 at 7:14 AM Richard Earnshaw (lists)
> > > >>> <Richard.Earnshaw@arm.com> wrote:
> > > >>>>
> > > >>>> On 16/06/2020 14:31, H.J. Lu via Libc-alpha wrote:
> > > >>>>> On Tue, Jun 16, 2020 at 1:14 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> > > >>>>>>
> > > >>>>>> The 06/15/2020 12:18, H.J. Lu wrote:
> > > >>>>>>> I think we need a marker to indicate an object is MTE compatible.
> > > >>>>>>
> > > >>>>>> i expect users to only able to discover tag-unsafety
> > > >>>>>> issues in applications at runtime and even if there
> > > >>>>>> is static information in binaries that's usually too
> > > >>>>>> late to do anything useful about it (other than fail).
> > > >>>>>>
> > > >>>>>> so i think an initial implementation that is off by
> > > >>>>>> default but can be turned on with an env var makes
> > > >>>>>> sense, but i agree that to turn tagging on by default
> > > >>>>>> some markings will be needed such that tag-unsafe
> > > >>>>>> applications continue to work (if possible).
> > > >>>>>
> > > >>>>> MTE isn't just another debugging tool.  Otherwise, we wouldn't
> > > >>>>> want it in glibc.
> > > >>>>
> > > >>>> That doesn't follow from your claim.  eg we have MALLOC_CHECK - that's a
> > > >>>> debugging tool and we have it in glibc.
> > > >>>>
> > > >>>>> Since we are enabling MTE in some object files,
> > > >>>>> at least we should mark them as MTE enabled.
> > > >>>>
> > > >>>> When we start using MTE to colour stack objects, yes, we'll need to mark
> > > >>>> object files that use it (they require longjmp to clean up the stack
> > > >>>> colouring, for example).  Until then, I disagree.  You do not need to
> > > >>>> mark every object file as compatible with *heap* objects that have been
> > > >>>> coloured.  If you disagree please provide a concrete example.
> > > >>>>
> > > >>>
> > > >>> I can image MTE is always on by default, starting with glibc.
> > > >>
> > > >> I think that's unlikely, even with lazy faulting because the HW overhead
> > > >> is not zero.  At some point, it might be that we'd want to enable it
> > > >> semi-randomly when running some programs (I've heard of some OSes
> > > >> planning to do something like that for telemetry type monitoring), but
> > > >> we're a long way from that; and what's more, we're a long way from
> > > >> really having a list of what's safe and what's not.  So until then,
> > > >> arbitrarily marking objects to say they're safe when we don't know that
> > > >> will just create a marker that has zero use, because we won't be able to
> > > >> rely on it.
> > > >
> > > > Yes, there will be overhead.  But some users will be OK with it.
> > > >
> > > >> If you're going to have markers, you'd better be 100% sure what it's
> > > >> telling you is right, or it's not worth the bits you've spent on it.
> > > >>
> > > >
> > > > There will be mistakes.  It doesn't mean that we shouldn't do it.
> > > >
> > > >
> > >
> > > But we shouldn't be doing it NOW, at least, not until we have a MUCH
> > > better idea of what constitutes a safe vs an unsafe program and can give
> > > clear advice to developers (or automate the tools) to do this sensibly.
> > >  Anything else would be a random guess and create more problems than it
> > > solves.
> >
> > If we don't plan it from the very beginning, it may never happen.
>
> the two main usage of heap tagging is debugging
> and security hardening.

Yes, marking is intended for security hardening.  We need to plan
ahead.

> for debugging the env var is fine.

Agree.

> for security if we tag things as mte unsafe
> we get a system where all the relevant
> applications are marked unsafe and mte is not
> enabled (the main culprits for mte unsafety are
> the complex runtimes that we want to protect:
> python, browsers, ..., they are also the cases
> where marking does not really work: they dlopen
> their dependencies, so they fail at runtime
> with or without a marking)
>
> i think heap tagging is different from shadow
> stack in that 99% of heap tagging failures are
> not some obscure hack that is valid but happens
> to be incompatible with the feature, but very
> likely a bug in the code that is undefined by
> the language standard and thus the compiler
> can already break the code if it was smarter.
> (and often it may be exploitable and then we
> definitely don't want to use marking, but
> fix the actual bug)

We need to classify different taggings so that
some or all taggings can be enabled.

> i don't know how the shadow stack marking
> works: none of the asm files in glibc are
> marked, do you force the marking on in the
> linker? or the assembler? either way i don't

We do it in compiler and assembler with help from
compiler.   Since shadow stack is compatible with
most of codes, we simply add the shadow stack
marker with -fcf-protections.  The main problem is
JIT, which needs to unwind shadow stack when
stack frames are skipped.

> think it is a very reliable or easy to
> maintain mechanism. but i agree that we will
> need some mechanism to keep existing binaries
> working (so users don't need to make a choice
> between secure but broken or insecure but
> working setup)
>

Marker should be accurate.  Otherwise it won't work.
  
Szabolcs Nagy June 23, 2020, 1:22 p.m. UTC | #32
The 06/15/2020 15:40, Richard Earnshaw wrote:
> Testing
> =======
> 
> I've run all the malloc tests.  While not all of them pass at present,
> I have examined all the failing tests and I'm confident that none of the
> failures represent real bugs in the code; but I have not yet decided how
> best to address these failures.  The failures fall into three categories
> 
> 1) Tests that expect a particular failure mode on double-free operations.
> I've added a quick tag-checking access in the free entry path that
> essentially asserts that the tag colour of a free'd block of memory
> matches the colour of the pointer - this leads to the kernel delivering
> a different signal that the test code does not expect.
> 
> 2) Tests that assume that malloc_usable_size will return a specific
> amount of free space.  The assumptions are not correct, because the
> tag colouring boundaries needed for MTE means that the 8 bytes in the
> block containing the back pointer can no-longer be used by users when
> we have MTE (they have a different colour that belongs to the malloc
> data structures).
> 
> 3) Tests that construct a fake internal malloc data structure and then
> try to perform operations on them.  I haven't looked at these in too
> much detail, but the first issue is that the fake header is only
> 8-byte aligned and for MTE to work it requires a 16-byte aligned
> structure (the tag read/write operations require the address be
> granule aligned, and the real glibc data structure has this property).

now i run the glibc tests with the latest linux patches,
the new failures are

FAIL: malloc/tst-malloc-backtrace
FAIL: malloc/tst-malloc-usable
FAIL: malloc/tst-malloc-usable-static
FAIL: malloc/tst-malloc-usable-static-tunables
FAIL: malloc/tst-malloc-usable-tunables
FAIL: malloc/tst-mallocstate
FAIL: malloc/tst-safe-linking
FAIL: malloc/tst-tcfree1
FAIL: malloc/tst-tcfree2
FAIL: malloc/tst-tcfree3
FAIL: nptl/tst-stack3
FAIL: nptl/tst-stack3-mem
FAIL: posix/tst-mmap

all malloc failures are use after free or poking
at malloc internals, except malloc/tst-malloc-usable,
which now can return smaller amount than the
original allocation size (with MALLOC_CHECK_=3).

posix/tst-mmap uses mmap on malloced memory and mmap
fails with ENOMEM because the tagged-address syscall
abi rejects tagged address in mmap, the test expects
EINVAL for not page aligned address, in strace i see
[pid  3505] mmap(0xf00fffff7d544a1, 1000, PROT_READ, MAP_SHARED|MAP_FIXED, 3, 0x1000) = -1 ENOMEM (Cannot allocate memory)
[pid  3505] mmap(0xf00fffff7d544a1, 1000, PROT_READ, MAP_PRIVATE|MAP_FIXED, 3, 0x1000) = -1 ENOMEM (Cannot allocate memory)
...
the output is
wrong error value for mapping at address with mod pagesize != 0: %m (should be EINVAL)
wrong error value for mapping at address with mod pagesize != 0: %m (should be EINVAL)
wrong error value for mapping at address with mod pagesize != 0: %m (should be EINVAL)
wrong error value for mapping at address with mod pagesize != 0: %m (should be EINVAL)

nptl/tst-stack3 and nptl/tst-stack3-mem are real
issues: glibc calls madvise MADV_DONTNEED on
stack on thread exit, even if it was user allocated
with malloc, the effect is that tags on the memory
may get zeroed, so when the user actually tries
to free the memory the integrity tag check crashes.
(i think this should be fixed by not doing madvise
on user allocated thread stacks, and user code is
not allowed doing madvise on malloced memory either,
but i assume that's rare)