[RFC] <sys/tagged-address.h>: An API for tagged address

Message ID 20210211173711.71736-1-hjl.tools@gmail.com
State Superseded
Headers
Series [RFC] <sys/tagged-address.h>: An API for tagged address |

Commit Message

H.J. Lu Feb. 11, 2021, 5:37 p.m. UTC
  An API for tagged address:

/* Get the current address bits used in address translation.  */
extern unsigned int get_tagged_address_bits (void);

/* Get the current mask for address bits used in address translation.  */
extern uintptr_t get_tagged_address_mask (void);

/* Set the mask for address bits used in address translation.  Return 0
   on success.  Return -1 on error.  */
extern int set_tagged_address_mask (uintptr_t __mask);

/* Return the tagged address of __ADDR with the tag value __TAG.  */
extern void *tag_address (void *__addr, unsigned int __tag);

/* Return the untagged address of __ADDR.  */
extern void *untag_address (void *__addr);

 /* TRUE if constant address BITS is a valid tagged address bits.  */
 #define TAGGED_ADDRESS_VALID_BITS(BITS)

 /* A mask for constant address BITS used in address translation.  */
 #define TAGGED_ADDRESS_MASK(BITS)
---
 bits/tagged-address.h                         | 25 +++++++++
 include/sys/tagged-address.h                  |  9 +++
 misc/Makefile                                 |  3 +-
 misc/Versions                                 |  7 +++
 misc/set-tagged-address-mask.c                | 28 ++++++++++
 misc/sys/tagged-address.h                     | 56 +++++++++++++++++++
 misc/tagged-address.c                         | 55 ++++++++++++++++++
 sysdeps/generic/inline-tagged-address.h       | 43 ++++++++++++++
 sysdeps/unix/sysv/linux/i386/libc.abilist     |  5 ++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |  5 ++
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |  5 ++
 11 files changed, 240 insertions(+), 1 deletion(-)
 create mode 100644 bits/tagged-address.h
 create mode 100644 include/sys/tagged-address.h
 create mode 100644 misc/set-tagged-address-mask.c
 create mode 100644 misc/sys/tagged-address.h
 create mode 100644 misc/tagged-address.c
 create mode 100644 sysdeps/generic/inline-tagged-address.h
  

Comments

Joseph Myers Feb. 11, 2021, 8:28 p.m. UTC | #1
On Thu, 11 Feb 2021, H.J. Lu via Libc-alpha wrote:

> An API for tagged address:

Please write a longer commit message, discussing what "tagged address" is, 
which architectures have such a thing (the API should try to cover 
whatever is common between architectures as far as possible - is this 
meant to relate to AArch64 MTE, how does it relate to the MTE code we 
already have in glibc, what corresponding features are involved on other 
architectures?) and what the API is intended to be useful for.  A new API 
also needs additions to the manual and NEWS - again, explaining things at 
the user level.

The header naming suggests installed headers, but you don't appear to be 
installing them.

I don't see anything architecture-specific about the build logic, so I'd 
expect all libc.abilist files to have the new functions rather than just a 
few.
  
H.J. Lu Feb. 11, 2021, 9:39 p.m. UTC | #2
On Thu, Feb 11, 2021 at 12:28 PM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Thu, 11 Feb 2021, H.J. Lu via Libc-alpha wrote:
>
> > An API for tagged address:
>
> Please write a longer commit message, discussing what "tagged address" is,
> which architectures have such a thing (the API should try to cover
> whatever is common between architectures as far as possible - is this
> meant to relate to AArch64 MTE, how does it relate to the MTE code we

This API is for Intel LAM:

https://www.phoronix.com/scan.php?page=news_item&px=Intel-LAM-Glibc#:~:text=Intel%20Linear%20Address%20Masking%20(LAM,bit%20linear%20addresses%20for%20metadata.&text=With%20LAM%20enabled%2C%20the%20processor,linear%20address%20to%20access%20memory.

and ARM TBI:

https://en.wikichip.org/wiki/arm/tbi

> already have in glibc, what corresponding features are involved on other

ARM MTE is built on top of TBI.   My <sys/tagged-address.h> supports
LAM and TBI.  It can be used to:

1. Enable LAM/TBI.
2. Make mommove LAM/TBI compatible.
3. Enable LAM in HWASAN.

> architectures?) and what the API is intended to be useful for.  A new API
> also needs additions to the manual and NEWS - again, explaining things at
> the user level.
>
> The header naming suggests installed headers, but you don't appear to be
> installing them.

I will install them in the final version.  The complete LAM patch set is on
users/intel/lam/master branch at

https://gitlab.com/x86-glibc/glibc/-/tree/users/intel/lam/master

> I don't see anything architecture-specific about the build logic, so I'd
> expect all libc.abilist files to have the new functions rather than just a
> few.

The final version will update all libc.abilist files.

Thanks.
  
Florian Weimer Feb. 12, 2021, 9:43 a.m. UTC | #3
* H. J. Lu via Libc-alpha:

> On Thu, Feb 11, 2021 at 12:28 PM Joseph Myers <joseph@codesourcery.com> wrote:
>>
>> On Thu, 11 Feb 2021, H.J. Lu via Libc-alpha wrote:
>>
>> > An API for tagged address:
>>
>> Please write a longer commit message, discussing what "tagged address" is,
>> which architectures have such a thing (the API should try to cover
>> whatever is common between architectures as far as possible - is this
>> meant to relate to AArch64 MTE, how does it relate to the MTE code we
>
> This API is for Intel LAM:
>
> https://www.phoronix.com/scan.php?page=news_item&px=Intel-LAM-Glibc#:~:text=Intel%20Linear%20Address%20Masking%20(LAM,bit%20linear%20addresses%20for%20metadata.&text=With%20LAM%20enabled%2C%20the%20processor,linear%20address%20to%20access%20memory.
>
> and ARM TBI:
>
> https://en.wikichip.org/wiki/arm/tbi

Do the setters/getters change process or thread properties?

The interface assumes that the tag bits are uniform across pointer
types.  I think that's not true, at least from a historical perspective.

People complained that our protection key interfaces are too slow to be
useful.  Do we need to find a way to inline the tag/untag operations?

Thanks,
Florian
  
H.J. Lu Feb. 12, 2021, 1:06 p.m. UTC | #4
On Fri, Feb 12, 2021 at 1:43 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > On Thu, Feb 11, 2021 at 12:28 PM Joseph Myers <joseph@codesourcery.com> wrote:
> >>
> >> On Thu, 11 Feb 2021, H.J. Lu via Libc-alpha wrote:
> >>
> >> > An API for tagged address:
> >>
> >> Please write a longer commit message, discussing what "tagged address" is,
> >> which architectures have such a thing (the API should try to cover
> >> whatever is common between architectures as far as possible - is this
> >> meant to relate to AArch64 MTE, how does it relate to the MTE code we
> >
> > This API is for Intel LAM:
> >
> > https://www.phoronix.com/scan.php?page=news_item&px=Intel-LAM-Glibc#:~:text=Intel%20Linear%20Address%20Masking%20(LAM,bit%20linear%20addresses%20for%20metadata.&text=With%20LAM%20enabled%2C%20the%20processor,linear%20address%20to%20access%20memory.
> >
> > and ARM TBI:
> >
> > https://en.wikichip.org/wiki/arm/tbi
>
> Do the setters/getters change process or thread properties?

On x86-64, mask is stored in TCB.   TCB will be updated.  It applies
to all threads.

> The interface assumes that the tag bits are uniform across pointer
> types.  I think that's not true, at least from a historical perspective.

This is true for LAM and TBI.

> People complained that our protection key interfaces are too slow to be
> useful.  Do we need to find a way to inline the tag/untag operations?
>

The internal interface is inlined.  We can also inline the public interface
if needed.  But set_tagged_address_mask isn't inlined at all.
  
H.J. Lu Feb. 17, 2021, 6:43 p.m. UTC | #5
On Fri, Feb 12, 2021 at 5:06 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Fri, Feb 12, 2021 at 1:43 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * H. J. Lu via Libc-alpha:
> >
> > > On Thu, Feb 11, 2021 at 12:28 PM Joseph Myers <joseph@codesourcery.com> wrote:
> > >>
> > >> On Thu, 11 Feb 2021, H.J. Lu via Libc-alpha wrote:
> > >>
> > >> > An API for tagged address:
> > >>
> > >> Please write a longer commit message, discussing what "tagged address" is,
> > >> which architectures have such a thing (the API should try to cover
> > >> whatever is common between architectures as far as possible - is this
> > >> meant to relate to AArch64 MTE, how does it relate to the MTE code we
> > >
> > > This API is for Intel LAM:
> > >
> > > https://www.phoronix.com/scan.php?page=news_item&px=Intel-LAM-Glibc#:~:text=Intel%20Linear%20Address%20Masking%20(LAM,bit%20linear%20addresses%20for%20metadata.&text=With%20LAM%20enabled%2C%20the%20processor,linear%20address%20to%20access%20memory.
> > >
> > > and ARM TBI:
> > >
> > > https://en.wikichip.org/wiki/arm/tbi
> >
> > Do the setters/getters change process or thread properties?
>
> On x86-64, mask is stored in TCB.   TCB will be updated.  It applies
> to all threads.
>
> > The interface assumes that the tag bits are uniform across pointer
> > types.  I think that's not true, at least from a historical perspective.
>
> This is true for LAM and TBI.
>
> > People complained that our protection key interfaces are too slow to be
> > useful.  Do we need to find a way to inline the tag/untag operations?
> >
>
> The internal interface is inlined.  We can also inline the public interface
> if needed.  But set_tagged_address_mask isn't inlined at all.
>
> --
> H.J.

Add Ved and Kirill.
  
H.J. Lu Feb. 17, 2021, 9:58 p.m. UTC | #6
On Wed, Feb 17, 2021 at 10:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Fri, Feb 12, 2021 at 5:06 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Fri, Feb 12, 2021 at 1:43 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > * H. J. Lu via Libc-alpha:
> > >
> > > > On Thu, Feb 11, 2021 at 12:28 PM Joseph Myers <joseph@codesourcery.com> wrote:
> > > >>
> > > >> On Thu, 11 Feb 2021, H.J. Lu via Libc-alpha wrote:
> > > >>
> > > >> > An API for tagged address:
> > > >>
> > > >> Please write a longer commit message, discussing what "tagged address" is,
> > > >> which architectures have such a thing (the API should try to cover
> > > >> whatever is common between architectures as far as possible - is this
> > > >> meant to relate to AArch64 MTE, how does it relate to the MTE code we
> > > >
> > > > This API is for Intel LAM:
> > > >
> > > > https://www.phoronix.com/scan.php?page=news_item&px=Intel-LAM-Glibc#:~:text=Intel%20Linear%20Address%20Masking%20(LAM,bit%20linear%20addresses%20for%20metadata.&text=With%20LAM%20enabled%2C%20the%20processor,linear%20address%20to%20access%20memory.
> > > >
> > > > and ARM TBI:
> > > >
> > > > https://en.wikichip.org/wiki/arm/tbi
> > >
> > > Do the setters/getters change process or thread properties?
> >
> > On x86-64, mask is stored in TCB.   TCB will be updated.  It applies
> > to all threads.
> >
> > > The interface assumes that the tag bits are uniform across pointer
> > > types.  I think that's not true, at least from a historical perspective.
> >
> > This is true for LAM and TBI.
> >
> > > People complained that our protection key interfaces are too slow to be
> > > useful.  Do we need to find a way to inline the tag/untag operations?
> > >
> >
> > The internal interface is inlined.  We can also inline the public interface
> > if needed.  But set_tagged_address_mask isn't inlined at all.
> >
> > --
> > H.J.
>
> Add Ved and Kirill.
>

Kirill raised a question what should happen when

/* Set the mask for address bits used in address translation.  Return 0
   on success.  Return -1 on error.  */
extern int set_tagged_address_mask (uintptr_t __mask);

was called in a thread.  It won't work when 2 threads have different address
masks.  I think set_tagged_address_mask should be disallowed in child
threads and in parent thread when there are any active child threads.
  
Szabolcs Nagy Feb. 18, 2021, 1:17 p.m. UTC | #7
The 02/12/2021 05:06, H.J. Lu via Libc-alpha wrote:
> On Fri, Feb 12, 2021 at 1:43 AM Florian Weimer <fweimer@redhat.com> wrote:
> > * H. J. Lu via Libc-alpha:
> > > On Thu, Feb 11, 2021 at 12:28 PM Joseph Myers <joseph@codesourcery.com> wrote:
> > >> On Thu, 11 Feb 2021, H.J. Lu via Libc-alpha wrote:
> > >>
> > >> > An API for tagged address:
> > >>
> > >> Please write a longer commit message, discussing what "tagged address" is,
> > >> which architectures have such a thing (the API should try to cover
> > >> whatever is common between architectures as far as possible - is this
> > >> meant to relate to AArch64 MTE, how does it relate to the MTE code we
> > >
> > > This API is for Intel LAM:
> > >
> > > https://www.phoronix.com/scan.php?page=news_item&px=Intel-LAM-Glibc#:~:text=Intel%20Linear%20Address%20Masking%20(LAM,bit%20linear%20addresses%20for%20metadata.&text=With%20LAM%20enabled%2C%20the%20processor,linear%20address%20to%20access%20memory.
> > >
> > > and ARM TBI:
> > >
> > > https://en.wikichip.org/wiki/arm/tbi
> >
> > Do the setters/getters change process or thread properties?
> 
> On x86-64, mask is stored in TCB.   TCB will be updated.  It applies
> to all threads.
> 
> > The interface assumes that the tag bits are uniform across pointer
> > types.  I think that's not true, at least from a historical perspective.
> 
> This is true for LAM and TBI.

on aarch64 TBI is always top byte (top 8bit).

there are separate data access and instruction access TBI, but
they are both on in linux userspace (there was an argument to
only use data access TBI so pointer authentication can use extra
8 bits for code pointers, however the TBI setting is slow to
context switch and we would need it to be per process if it's
an opt-in feature. it's not clear if anything would break by
disabling code address TBI, but for now we are stuck with the
original ABI).

when MTE is in use then 4bits (bit 56 .. 59) are used for memory
tagging, the other 4bits (top 4 bits) are technically still
available for application use. but i'm not yet sure if we want
to expose that. (e.g. in case of a fault on a tagged address
by default the tag is zeroed in siginfo.si_addr, but kept when
SA_EXPOSE_TAGBITS is set, except with an MTE tag check fault
the top 4 bits are never kept.)

originally in the linux syscall abi tag bytes must be 0, so in
libc calls the tag bytes must be 0 too: the architectural TBI
only works for memory accesses. but userspace can opt-in to a
tagged address abi that allows passing tagged addresses to the
kernel.
https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.rst

with the tagged address syscall abi tagged addresses are usable
in practice (e.g. for HWASAN, but also needed for MTE). this abi
is mostly backward compatible (i'm not sure if anything breaks
if it's unconditionally on, invalid syscalls may no longer fail).
https://www.kernel.org/doc/Documentation/arm64/tagged-address-abi.rst

we want the libc to decide early about the tagged syscall abi,
later there is no reliable way to do it for the entire process.
for this we may need to mark binaries that use tags (such
marking also tells us that the binary is not MTE compatible)
unfortunately HWASAN does not use any marking. (lack of marking
is an issue in other sanitizers too). currently the opt-in is
done based on a tunable that requests heap tagging.

> 
> > People complained that our protection key interfaces are too slow to be
> > useful.  Do we need to find a way to inline the tag/untag operations?
> >
> 
> The internal interface is inlined.  We can also inline the public interface
> if needed.  But set_tagged_address_mask isn't inlined at all.

i'm not sure how the set_tagged_address_mask api would work.
  
Florian Weimer Feb. 18, 2021, 1:21 p.m. UTC | #8
* Szabolcs Nagy:

> we want the libc to decide early about the tagged syscall abi,
> later there is no reliable way to do it for the entire process.
> for this we may need to mark binaries that use tags (such
> marking also tells us that the binary is not MTE compatible)
> unfortunately HWASAN does not use any marking. (lack of marking
> is an issue in other sanitizers too). currently the opt-in is
> done based on a tunable that requests heap tagging.

Apparently x86 has a similar issue, see Kirill's comment that
H.J. relayed:

  <https://sourceware.org/pipermail/libc-alpha/2021-February/122775.html>

Thanks,
Florian
  
Szabolcs Nagy Feb. 18, 2021, 1:24 p.m. UTC | #9
The 02/17/2021 13:58, H.J. Lu via Libc-alpha wrote:
> On Wed, Feb 17, 2021 at 10:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > On Fri, Feb 12, 2021 at 5:06 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > On Fri, Feb 12, 2021 at 1:43 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >
> > > > Do the setters/getters change process or thread properties?
> > >
> > > On x86-64, mask is stored in TCB.   TCB will be updated.  It applies
> > > to all threads.
> 
> Kirill raised a question what should happen when
> 
> /* Set the mask for address bits used in address translation.  Return 0
>    on success.  Return -1 on error.  */
> extern int set_tagged_address_mask (uintptr_t __mask);
> 
> was called in a thread.  It won't work when 2 threads have different address
> masks.  I think set_tagged_address_mask should be disallowed in child
> threads and in parent thread when there are any active child threads.

i think this is the wrong api. currently the libc should set
things up early. api for user code is too late.

user code does not know if it runs single threaded or not
(although we have __libc_single_threaded now, i'm not sure if
we can use that for this purpose)
  
Florian Weimer Feb. 18, 2021, 1:28 p.m. UTC | #10
* Szabolcs Nagy:

>> was called in a thread.  It won't work when 2 threads have different address
>> masks.  I think set_tagged_address_mask should be disallowed in child
>> threads and in parent thread when there are any active child threads.
>
> i think this is the wrong api. currently the libc should set
> things up early. api for user code is too late.
>
> user code does not know if it runs single threaded or not
> (although we have __libc_single_threaded now, i'm not sure if
> we can use that for this purpose)

We could, but it's possible to launch threads from ELF constructors (and
I think some libraries do that).  So you could avoid the call or
diagnose a failure if single-threaded, but the tagged address feature
wouldn't compose well.

Some kernel interfaces have this problem (e.g., unshare), but they are
less general-purpose than tagged addresses.

Thanks,
Florian
  
Szabolcs Nagy Feb. 18, 2021, 1:50 p.m. UTC | #11
The 02/18/2021 14:28, Florian Weimer wrote:
> * Szabolcs Nagy:
> >> was called in a thread.  It won't work when 2 threads have different address
> >> masks.  I think set_tagged_address_mask should be disallowed in child
> >> threads and in parent thread when there are any active child threads.
> >
> > i think this is the wrong api. currently the libc should set
> > things up early. api for user code is too late.
> >
> > user code does not know if it runs single threaded or not
> > (although we have __libc_single_threaded now, i'm not sure if
> > we can use that for this purpose)
> 
> We could, but it's possible to launch threads from ELF constructors (and
> I think some libraries do that).  So you could avoid the call or
> diagnose a failure if single-threaded, but the tagged address feature
> wouldn't compose well.
> 
> Some kernel interfaces have this problem (e.g., unshare), but they are
> less general-purpose than tagged addresses.

it is possible to have a prctl flag that requests the abi
change for the entire process. (and then the kernel has
to do the sync across threads.)

the semantics is not entirely obvious wrt memory model
if we want to do this for e.g. MTE tag checks, but for
syscall abi it should be possible to define reasonably.

e.g. a use-case for changing mte/tag abi settings for an
entire process is a custom allocator that wants to use
heap tagging. by the time it can call the prctl the
process may be already multi-threaded.

there is also a problem with coordination between
concurrent callers. this is simple with the tagged
address abi which is a 1 bit state and we can say that
it always goes one way: no tag -> tag, but more complex
state like mte checking mode or mte tag exclusion set,
requires coordination, which tells me that there should
be a single owner: the libc.

but i don't know what requirements intel LAM has.
  
H.J. Lu Feb. 18, 2021, 10:32 p.m. UTC | #12
On Thu, Feb 18, 2021 at 5:50 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>
> The 02/18/2021 14:28, Florian Weimer wrote:
> > * Szabolcs Nagy:
> > >> was called in a thread.  It won't work when 2 threads have different address
> > >> masks.  I think set_tagged_address_mask should be disallowed in child
> > >> threads and in parent thread when there are any active child threads.
> > >
> > > i think this is the wrong api. currently the libc should set
> > > things up early. api for user code is too late.
> > >
> > > user code does not know if it runs single threaded or not
> > > (although we have __libc_single_threaded now, i'm not sure if
> > > we can use that for this purpose)
> >
> > We could, but it's possible to launch threads from ELF constructors (and
> > I think some libraries do that).  So you could avoid the call or
> > diagnose a failure if single-threaded, but the tagged address feature
> > wouldn't compose well.
> >
> > Some kernel interfaces have this problem (e.g., unshare), but they are
> > less general-purpose than tagged addresses.
>
> it is possible to have a prctl flag that requests the abi
> change for the entire process. (and then the kernel has
> to do the sync across threads.)
>
> the semantics is not entirely obvious wrt memory model
> if we want to do this for e.g. MTE tag checks, but for
> syscall abi it should be possible to define reasonably.
>
> e.g. a use-case for changing mte/tag abi settings for an
> entire process is a custom allocator that wants to use
> heap tagging. by the time it can call the prctl the
> process may be already multi-threaded.
>
> there is also a problem with coordination between
> concurrent callers. this is simple with the tagged
> address abi which is a 1 bit state and we can say that
> it always goes one way: no tag -> tag, but more complex
> state like mte checking mode or mte tag exclusion set,
> requires coordination, which tells me that there should
> be a single owner: the libc.
>
> but i don't know what requirements intel LAM has.

We are working to enable LAM in glibc and GCC (HWASAN).

0. LAM is disabled when the process starts.
1. Define GNU property markers for LAM compatibility.
2. Update ld.so to support LAM.
3. Make libc.so LAM compatible (memmove).
4. Provide an API to enable LAM.

We noticed a few issues:

1. HWASAN should use the glibc API to enable tagged address
since glibc must track the tagged address mask.
2. set_tagged_address_mask shouldn't be allowed after
pthread_create is called.
3. After set_tagged_address_mask is called, can it be called
again to change tagged address mask.
  
Szabolcs Nagy Feb. 22, 2021, 8:27 a.m. UTC | #13
The 02/18/2021 14:32, H.J. Lu wrote:
> 
> We are working to enable LAM in glibc and GCC (HWASAN).
> 
> 0. LAM is disabled when the process starts.
> 1. Define GNU property markers for LAM compatibility.
> 2. Update ld.so to support LAM.
> 3. Make libc.so LAM compatible (memmove).

if pointers to the same object always have the same tag,
then memmove should work without changes i think.

if such pointers can have different tags then all pointer
comparisions are problematic, not just memmove.

> 4. Provide an API to enable LAM.
> 
> We noticed a few issues:
> 
> 1. HWASAN should use the glibc API to enable tagged address
> since glibc must track the tagged address mask.

how does that mask work?
is it possible to set it to different values or just on/off?

> 2. set_tagged_address_mask shouldn't be allowed after
> pthread_create is called.

such api breaks software composability.
i don't have a good solution (other than libc doing an early
decision on its own).

> 3. After set_tagged_address_mask is called, can it be called
> again to change tagged address mask.

after tagged pointers escape it is unlikely that changing
settings works.
  
H.J. Lu Feb. 22, 2021, 1:57 p.m. UTC | #14
On Mon, Feb 22, 2021 at 12:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>
> The 02/18/2021 14:32, H.J. Lu wrote:
> >
> > We are working to enable LAM in glibc and GCC (HWASAN).
> >
> > 0. LAM is disabled when the process starts.
> > 1. Define GNU property markers for LAM compatibility.
> > 2. Update ld.so to support LAM.
> > 3. Make libc.so LAM compatible (memmove).
>
> if pointers to the same object always have the same tag,
> then memmove should work without changes i think.

string/memmove.c has

  /* This test makes the forward copying code be used whenever possible.
     Reduces the working set.  */
  if (dstp - srcp >= len) /* *Unsigned* compare!  */

This doesn't work when pointers have tags.

char *
inhibit_loop_to_libcall
simple_memmove (char *dst, const char *src, size_t n)
{
  char *ret = dst;
  if (src < dst)
    {
      dst += n;
      src += n;
      while (n--)
*--dst = *--src;
    }
  else
    while (n--)
      *dst++ = *src++;
  return ret;
}

has the same issue.

> if such pointers can have different tags then all pointer
> comparisions are problematic, not just memmove.

We haven't found other pointer usages in glibc which are
incompatible with tags.

> > 4. Provide an API to enable LAM.
> >
> > We noticed a few issues:
> >
> > 1. HWASAN should use the glibc API to enable tagged address
> > since glibc must track the tagged address mask.
>
> how does that mask work?
> is it possible to set it to different values or just on/off?

It is a bit mask of uintptr_t.  The default is (uintptr_t) -1.

#ifdef __GNUC__
/* A mask for constant address BITS used in address translation.  */
# define TAGGED_ADDRESS_MASK(BITS) \
  (__extension__ \
    ({ \
       _Static_assert (TAGGED_ADDRESS_VALID_BITS (BITS), \
       "Tagged address bits must be valid"); \
       (((uintptr_t) 1) << (BITS)) - 1; \
     }))
#endif

> > 2. set_tagged_address_mask shouldn't be allowed after
> > pthread_create is called.
>
> such api breaks software composability.
> i don't have a good solution (other than libc doing an early
> decision on its own).

How does it work with HWASAN?

> > 3. After set_tagged_address_mask is called, can it be called
> > again to change tagged address mask.
>
> after tagged pointers escape it is unlikely that changing
> settings works.
>
  

Patch

diff --git a/bits/tagged-address.h b/bits/tagged-address.h
new file mode 100644
index 0000000000..24f865b8af
--- /dev/null
+++ b/bits/tagged-address.h
@@ -0,0 +1,25 @@ 
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _BITS_TAGGED_ADDRESS_H
+#define _BITS_TAGGED_ADDRESS_H 1
+
+#ifndef _SYS_TAGGED_ADDRESS_H
+# error "Never include this file directly.  Use <sys/tagged-address.h> instead"
+#endif
+
+#endif /* <bits/tagged-address.h> */
diff --git a/include/sys/tagged-address.h b/include/sys/tagged-address.h
new file mode 100644
index 0000000000..2e902f72b2
--- /dev/null
+++ b/include/sys/tagged-address.h
@@ -0,0 +1,9 @@ 
+#include <misc/sys/tagged-address.h>
+
+#ifndef _ISOMAC
+# include <inline-tagged-address.h>
+# define get_tagged_address_bits()	__get_tagged_address_bits ()
+# define get_tagged_address_mask()	__get_tagged_address_mask ()
+# define tag_address(addr, tag)		__tag_address ((addr), (tag))
+# define untag_address(addr)		__untag_address ((addr))
+#endif
diff --git a/misc/Makefile b/misc/Makefile
index b08d7c68ab..c5d8838486 100644
--- a/misc/Makefile
+++ b/misc/Makefile
@@ -73,7 +73,8 @@  routines := brk sbrk sstk ioctl \
 	    fgetxattr flistxattr fremovexattr fsetxattr getxattr \
 	    listxattr lgetxattr llistxattr lremovexattr lsetxattr \
 	    removexattr setxattr getauxval ifunc-impl-list makedev \
-	    allocate_once fd_to_filename single_threaded
+	    allocate_once fd_to_filename single_threaded \
+	    tagged-address set-tagged-address-mask
 
 generated += tst-error1.mtrace tst-error1-mem.out \
   tst-allocate_once.mtrace tst-allocate_once-mem.out
diff --git a/misc/Versions b/misc/Versions
index 95666f6548..9e9ded4d5c 100644
--- a/misc/Versions
+++ b/misc/Versions
@@ -164,6 +164,13 @@  libc {
   GLIBC_2.32 {
     __libc_single_threaded;
   }
+  GLIBC_2.34 {
+    get_tagged_address_bits;
+    get_tagged_address_mask;
+    set_tagged_address_mask;
+    tag_address;
+    untag_address;
+  }
   GLIBC_PRIVATE {
     __madvise;
     __mktemp;
diff --git a/misc/set-tagged-address-mask.c b/misc/set-tagged-address-mask.c
new file mode 100644
index 0000000000..89902b47c2
--- /dev/null
+++ b/misc/set-tagged-address-mask.c
@@ -0,0 +1,28 @@ 
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sys/tagged-address.h>
+
+/* Set the mask for address bits used in address translation.  Return 0
+   on success.  Return -1 on error.  */
+
+int
+set_tagged_address_mask (uintptr_t mask)
+{
+  __set_errno (ENOSYS);
+  return -1;
+}
diff --git a/misc/sys/tagged-address.h b/misc/sys/tagged-address.h
new file mode 100644
index 0000000000..540db14dd2
--- /dev/null
+++ b/misc/sys/tagged-address.h
@@ -0,0 +1,56 @@ 
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_TAGGED_ADDRESS_H
+#define _SYS_TAGGED_ADDRESS_H 1
+
+#include <features.h>
+#include <stdint.h>
+#include <bits/tagged-address.h>
+
+#if defined TAGGED_ADDRESS_VALID_BITS && defined __GNUC__
+/* A mask for constant address BITS used in address translation.  */
+# define TAGGED_ADDRESS_MASK(BITS)				\
+  (__extension__						\
+    ({								\
+       _Static_assert (TAGGED_ADDRESS_VALID_BITS (BITS),	\
+		       "Tagged address bits must be valid");	\
+       (((uintptr_t) 1) << (BITS)) - 1;				\
+     }))
+#endif
+
+__BEGIN_DECLS
+
+/* Get the current address bits used in address translation.  */
+extern unsigned int get_tagged_address_bits (void);
+
+/* Get the current mask for address bits used in address translation.  */
+extern uintptr_t get_tagged_address_mask (void);
+
+/* Set the mask for address bits used in address translation.  Return 0
+   on success.  Return -1 on error.  */
+extern int set_tagged_address_mask (uintptr_t __mask);
+
+/* Return the tagged address of __ADDR with the tag value __TAG.  */
+extern void *tag_address (void *__addr, unsigned int __tag);
+
+/* Return the untagged address of __ADDR.  */
+extern void *untag_address (void *__addr);
+
+__END_DECLS
+
+#endif /* <sys/tagged-address.h> */
diff --git a/misc/tagged-address.c b/misc/tagged-address.c
new file mode 100644
index 0000000000..df474f3d0b
--- /dev/null
+++ b/misc/tagged-address.c
@@ -0,0 +1,55 @@ 
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sys/tagged-address.h>
+
+#undef get_tagged_address_bits
+#undef get_tagged_address_mask
+#undef tag_address
+#undef untag_address
+
+/* Get the current address bits used in address translation.  */
+
+unsigned int
+get_tagged_address_bits (void)
+{
+  return __get_tagged_address_bits ();
+}
+
+/* Get the current mask for address bits used in address translation.  */
+
+uintptr_t
+get_tagged_address_mask (void)
+{
+  return __get_tagged_address_mask ();
+}
+
+/* Return the tagged address of ADDR with the tag value TAG.  */
+
+void *
+tag_address (void *addr, unsigned int tag)
+{
+  return __tag_address (addr, tag);
+}
+
+/* Return the untagged address of ADDR.  */
+
+void *
+untag_address (void *addr)
+{
+  return __untag_address (addr);
+}
diff --git a/sysdeps/generic/inline-tagged-address.h b/sysdeps/generic/inline-tagged-address.h
new file mode 100644
index 0000000000..a016b16f21
--- /dev/null
+++ b/sysdeps/generic/inline-tagged-address.h
@@ -0,0 +1,43 @@ 
+/* Inline tagged address functions.  Generic version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+
+static inline unsigned int
+__get_tagged_address_bits (void)
+{
+  return sizeof (uintptr_t) * 8;
+}
+
+static inline uintptr_t
+__get_tagged_address_mask (void)
+{
+  return (uintptr_t) -1;
+}
+
+static inline void *
+__tag_address (void *addr, unsigned int tag)
+{
+  return addr;
+}
+
+static inline void *
+__untag_address (void *addr)
+{
+  return addr;
+}
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index ddc5837059..464c8af2ee 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2241,6 +2241,11 @@  GLIBC_2.33 mknod F
 GLIBC_2.33 mknodat F
 GLIBC_2.33 stat F
 GLIBC_2.33 stat64 F
+GLIBC_2.34 get_tagged_address_bits F
+GLIBC_2.34 get_tagged_address_mask F
+GLIBC_2.34 set_tagged_address_mask F
+GLIBC_2.34 tag_address F
+GLIBC_2.34 untag_address F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 2744bba4af..7486bf09bc 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2088,6 +2088,11 @@  GLIBC_2.33 mknod F
 GLIBC_2.33 mknodat F
 GLIBC_2.33 stat F
 GLIBC_2.33 stat64 F
+GLIBC_2.34 get_tagged_address_bits F
+GLIBC_2.34 get_tagged_address_mask F
+GLIBC_2.34 set_tagged_address_mask F
+GLIBC_2.34 tag_address F
+GLIBC_2.34 untag_address F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index ce2f4fb72b..d88202196f 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2185,3 +2185,8 @@  GLIBC_2.33 mknod F
 GLIBC_2.33 mknodat F
 GLIBC_2.33 stat F
 GLIBC_2.33 stat64 F
+GLIBC_2.34 get_tagged_address_bits F
+GLIBC_2.34 get_tagged_address_mask F
+GLIBC_2.34 set_tagged_address_mask F
+GLIBC_2.34 tag_address F
+GLIBC_2.34 untag_address F