[6/6] aarch64: Add hp-timing.h

Message ID 1403735086-21797-7-git-send-email-rth@twiddle.net
State Superseded
Headers

Commit Message

Richard Henderson June 25, 2014, 10:24 p.m. UTC
  * sysdeps/aarch64/hp-timing.h: New file.
---
 sysdeps/aarch64/hp-timing.h | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 sysdeps/aarch64/hp-timing.h
  

Comments

Andreas Schwab July 21, 2014, 11:41 a.m. UTC | #1
Richard Henderson <rth@twiddle.net> writes:

> +/* Sync the instruction stream, and read from the virtual cycle counter.  */
> +#define HP_TIMING_NOW(Var) \
> +  __asm__ __volatile__ ("isb; mrs %0, cntvct_el0" : "=r" (Var))

According to https://bugs.launchpad.net/bugs/1344320 the generic timers
are not part of the kernel-to-userspace contract.

Andreas.
  
Andrew Pinski July 21, 2014, 11:48 a.m. UTC | #2
> On Jul 21, 2014, at 4:41 AM, Andreas Schwab <schwab@suse.de> wrote:
> 
> Richard Henderson <rth@twiddle.net> writes:
> 
>> +/* Sync the instruction stream, and read from the virtual cycle counter.  */
>> +#define HP_TIMING_NOW(Var) \
>> +  __asm__ __volatile__ ("isb; mrs %0, cntvct_el0" : "=r" (Var))
> 
> According to https://bugs.launchpad.net/bugs/1344320 the generic timers
> are not part of the kernel-to-userspace contract.


I think this is bogus for the kernel folks not allow a high precision timer in user space. Timers like this are needed for micro-benchmarking compiler changes along with other libc changes. 

Thanks,
Andrew

> 
> Andreas.
> 
> -- 
> Andreas Schwab, SUSE Labs, schwab@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."
  
Andreas Schwab July 21, 2014, 12:01 p.m. UTC | #3
pinskia@gmail.com writes:

>> On Jul 21, 2014, at 4:41 AM, Andreas Schwab <schwab@suse.de> wrote:
>> 
>> Richard Henderson <rth@twiddle.net> writes:
>> 
>>> +/* Sync the instruction stream, and read from the virtual cycle counter.  */
>>> +#define HP_TIMING_NOW(Var) \
>>> +  __asm__ __volatile__ ("isb; mrs %0, cntvct_el0" : "=r" (Var))
>> 
>> According to https://bugs.launchpad.net/bugs/1344320 the generic timers
>> are not part of the kernel-to-userspace contract.
>
>
> I think this is bogus for the kernel folks not allow a high precision timer in user space. Timers like this are needed for micro-benchmarking compiler changes along with other libc changes. 

According to the qemu bug the timers are optional.

Andreas.
  
Marcus Shawcroft July 21, 2014, 6:31 p.m. UTC | #4
On 21 July 2014 13:01, Andreas Schwab <schwab@suse.de> wrote:
> pinskia@gmail.com writes:
>
>>> On Jul 21, 2014, at 4:41 AM, Andreas Schwab <schwab@suse.de> wrote:
>>>
>>> Richard Henderson <rth@twiddle.net> writes:
>>>
>>>> +/* Sync the instruction stream, and read from the virtual cycle counter.  */
>>>> +#define HP_TIMING_NOW(Var) \
>>>> +  __asm__ __volatile__ ("isb; mrs %0, cntvct_el0" : "=r" (Var))
>>>
>>> According to https://bugs.launchpad.net/bugs/1344320 the generic timers
>>> are not part of the kernel-to-userspace contract.
>>
>>
>> I think this is bogus for the kernel folks not allow a high precision timer in user space. Timers like this are needed for micro-benchmarking compiler changes along with other libc changes.
>
> According to the qemu bug the timers are optional.
>

The generic timers are indeed optional, therefore we should not assume
they are always present.  I don't want this to ship in 2.20 as is, so
will back out the patch in the morning pending a better solution.

/Marcus
  
Richard Henderson July 21, 2014, 7:05 p.m. UTC | #5
On 07/21/2014 08:31 AM, Marcus Shawcroft wrote:
> On 21 July 2014 13:01, Andreas Schwab <schwab@suse.de> wrote:
>> pinskia@gmail.com writes:
>>
>>>> On Jul 21, 2014, at 4:41 AM, Andreas Schwab <schwab@suse.de> wrote:
>>>>
>>>> Richard Henderson <rth@twiddle.net> writes:
>>>>
>>>>> +/* Sync the instruction stream, and read from the virtual cycle counter.  */
>>>>> +#define HP_TIMING_NOW(Var) \
>>>>> +  __asm__ __volatile__ ("isb; mrs %0, cntvct_el0" : "=r" (Var))
>>>>
>>>> According to https://bugs.launchpad.net/bugs/1344320 the generic timers
>>>> are not part of the kernel-to-userspace contract.
>>>
>>>
>>> I think this is bogus for the kernel folks not allow a high precision timer in user space. Timers like this are needed for micro-benchmarking compiler changes along with other libc changes.
>>
>> According to the qemu bug the timers are optional.
>>
> 
> The generic timers are indeed optional, therefore we should not assume
> they are always present.  I don't want this to ship in 2.20 as is, so
> will back out the patch in the morning pending a better solution.

Do you know of any real hardware that doesn't have the timers?  I really can't
imagine them being elided.

It should be trivial to enable this for qemu linux-user mode; we certainly
support the i386 rdtsc instruction.


r~
  
Andrew Pinski July 21, 2014, 8:42 p.m. UTC | #6
On Mon, Jul 21, 2014 at 12:05 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 07/21/2014 08:31 AM, Marcus Shawcroft wrote:
>> On 21 July 2014 13:01, Andreas Schwab <schwab@suse.de> wrote:
>>> pinskia@gmail.com writes:
>>>
>>>>> On Jul 21, 2014, at 4:41 AM, Andreas Schwab <schwab@suse.de> wrote:
>>>>>
>>>>> Richard Henderson <rth@twiddle.net> writes:
>>>>>
>>>>>> +/* Sync the instruction stream, and read from the virtual cycle counter.  */
>>>>>> +#define HP_TIMING_NOW(Var) \
>>>>>> +  __asm__ __volatile__ ("isb; mrs %0, cntvct_el0" : "=r" (Var))
>>>>>
>>>>> According to https://bugs.launchpad.net/bugs/1344320 the generic timers
>>>>> are not part of the kernel-to-userspace contract.
>>>>
>>>>
>>>> I think this is bogus for the kernel folks not allow a high precision timer in user space. Timers like this are needed for micro-benchmarking compiler changes along with other libc changes.
>>>
>>> According to the qemu bug the timers are optional.
>>>
>>
>> The generic timers are indeed optional, therefore we should not assume
>> they are always present.  I don't want this to ship in 2.20 as is, so
>> will back out the patch in the morning pending a better solution.
>
> Do you know of any real hardware that doesn't have the timers?  I really can't
> imagine them being elided.

More than that, right now the Linux kernel requires the timings so it
is not optional for the linux kernel.
http://lists.infradead.org/pipermail/linux-arm-kernel/2012-August/114155.html

That patch makes it sound like it is part of the kernel userland
ABI/API as it allows for a fast gettimeofday() (that does not have to
be in VDSO if I read the comment correctly).

Thanks,
Andrew Pinski

>
> It should be trivial to enable this for qemu linux-user mode; we certainly
> support the i386 rdtsc instruction.
>
>
> r~
>
  
Marcus Shawcroft July 22, 2014, 9:38 a.m. UTC | #7
> On Mon, Jul 21, 2014 at 12:05 PM, Richard Henderson <rth@twiddle.net> wrote:

> Do you know of any real hardware that doesn't have the timers?  I really can't
> imagine them being elided.

I'd like to see the generic timers none optional in cortex-a class
processors, however they are currently, unfortunately, defined as
optional. The design choices made in the first few instances of armv8
hardware are not necessarily a good basis for predicting the variety
of design choices that will follow in subsequent implementations.

> On 21 July 2014 21:42, Andrew Pinski <pinskia@gmail.com> wrote:

> More than that, right now the Linux kernel requires the timings so it
> is not optional for the linux kernel.
> http://lists.infradead.org/pipermail/linux-arm-kernel/2012-August/114155.html
>
> That patch makes it sound like it is part of the kernel userland
> ABI/API as it allows for a fast gettimeofday() (that does not have to
> be in VDSO if I read the comment correctly).

The counter is for the benefit of the VDSO in order for the VDSO to
provide a fast gettimeofday on systems where the counter is present.

Cheers
/Marcus
  
Carlos O'Donell June 21, 2015, 2:20 a.m. UTC | #8
On 07/21/2014 02:31 PM, Marcus Shawcroft wrote:
> On 21 July 2014 13:01, Andreas Schwab <schwab@suse.de> wrote:
>> pinskia@gmail.com writes:
>>
>>>> On Jul 21, 2014, at 4:41 AM, Andreas Schwab <schwab@suse.de> wrote:
>>>>
>>>> Richard Henderson <rth@twiddle.net> writes:
>>>>
>>>>> +/* Sync the instruction stream, and read from the virtual cycle counter.  */
>>>>> +#define HP_TIMING_NOW(Var) \
>>>>> +  __asm__ __volatile__ ("isb; mrs %0, cntvct_el0" : "=r" (Var))
>>>>
>>>> According to https://bugs.launchpad.net/bugs/1344320 the generic timers
>>>> are not part of the kernel-to-userspace contract.
>>>
>>>
>>> I think this is bogus for the kernel folks not allow a high
>>> precision timer in user space. Timers like this are needed for
>>> micro-benchmarking compiler changes along with other libc
>>> changes.
>>
>> According to the qemu bug the timers are optional.
>>
> 
> The generic timers are indeed optional, therefore we should not assume
> they are always present.  I don't want this to ship in 2.20 as is, so
> will back out the patch in the morning pending a better solution.

Any solution for this yet?

I was updating the HP_TIMING code for RHEL7 and noticed AArch64
still lacks HP_TIMING support.

Cheers,
Carlos.
  
Andrew Pinski June 21, 2015, 2:30 a.m. UTC | #9
On Tue, Jul 22, 2014 at 2:38 AM, Marcus Shawcroft
<marcus.shawcroft@gmail.com> wrote:
>> On Mon, Jul 21, 2014 at 12:05 PM, Richard Henderson <rth@twiddle.net> wrote:
>
>> Do you know of any real hardware that doesn't have the timers?  I really can't
>> imagine them being elided.
>
> I'd like to see the generic timers none optional in cortex-a class
> processors, however they are currently, unfortunately, defined as
> optional. The design choices made in the first few instances of armv8
> hardware are not necessarily a good basis for predicting the variety
> of design choices that will follow in subsequent implementations.
>
>> On 21 July 2014 21:42, Andrew Pinski <pinskia@gmail.com> wrote:
>
>> More than that, right now the Linux kernel requires the timings so it
>> is not optional for the linux kernel.
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2012-August/114155.html
>>
>> That patch makes it sound like it is part of the kernel userland
>> ABI/API as it allows for a fast gettimeofday() (that does not have to
>> be in VDSO if I read the comment correctly).
>
> The counter is for the benefit of the VDSO in order for the VDSO to
> provide a fast gettimeofday on systems where the counter is present.


Can there be a HWCAP that says if virtual timer is available or not then?

Thanks,
Andrew

>
> Cheers
> /Marcus
  
Andrew Pinski June 21, 2015, 3:09 a.m. UTC | #10
> On Jun 20, 2015, at 7:20 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> 
>> On 07/21/2014 02:31 PM, Marcus Shawcroft wrote:
>>> On 21 July 2014 13:01, Andreas Schwab <schwab@suse.de> wrote:
>>> pinskia@gmail.com writes:
>>> 
>>>>> On Jul 21, 2014, at 4:41 AM, Andreas Schwab <schwab@suse.de> wrote:
>>>>> 
>>>>> Richard Henderson <rth@twiddle.net> writes:
>>>>> 
>>>>>> +/* Sync the instruction stream, and read from the virtual cycle counter.  */
>>>>>> +#define HP_TIMING_NOW(Var) \
>>>>>> +  __asm__ __volatile__ ("isb; mrs %0, cntvct_el0" : "=r" (Var))
>>>>> 
>>>>> According to https://bugs.launchpad.net/bugs/1344320 the generic timers
>>>>> are not part of the kernel-to-userspace contract.
>>>> 
>>>> 
>>>> I think this is bogus for the kernel folks not allow a high
>>>> precision timer in user space. Timers like this are needed for
>>>> micro-benchmarking compiler changes along with other libc
>>>> changes.
>>> 
>>> According to the qemu bug the timers are optional.
>> 
>> The generic timers are indeed optional, therefore we should not assume
>> they are always present.  I don't want this to ship in 2.20 as is, so
>> will back out the patch in the morning pending a better solution.
> 
> Any solution for this yet?
> 
> I was updating the HP_TIMING code for RHEL7 and noticed AArch64
> still lacks HP_TIMING support.

Doesn't the server standard for armv8 require gti?  If so we can declare the standard glibc only for server base standard. And have an option later on for the other base standards. 

Thanks,
Andrew

> 
> Cheers,
> Carlos.
>
  
Andrew Pinski June 22, 2015, 7:08 a.m. UTC | #11
On Sat, Jun 20, 2015 at 8:09 PM,  <pinskia@gmail.com> wrote:
>
>
>
>
>> On Jun 20, 2015, at 7:20 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>>
>>> On 07/21/2014 02:31 PM, Marcus Shawcroft wrote:
>>>> On 21 July 2014 13:01, Andreas Schwab <schwab@suse.de> wrote:
>>>> pinskia@gmail.com writes:
>>>>
>>>>>> On Jul 21, 2014, at 4:41 AM, Andreas Schwab <schwab@suse.de> wrote:
>>>>>>
>>>>>> Richard Henderson <rth@twiddle.net> writes:
>>>>>>
>>>>>>> +/* Sync the instruction stream, and read from the virtual cycle counter.  */
>>>>>>> +#define HP_TIMING_NOW(Var) \
>>>>>>> +  __asm__ __volatile__ ("isb; mrs %0, cntvct_el0" : "=r" (Var))
>>>>>>
>>>>>> According to https://bugs.launchpad.net/bugs/1344320 the generic timers
>>>>>> are not part of the kernel-to-userspace contract.
>>>>>
>>>>>
>>>>> I think this is bogus for the kernel folks not allow a high
>>>>> precision timer in user space. Timers like this are needed for
>>>>> micro-benchmarking compiler changes along with other libc
>>>>> changes.
>>>>
>>>> According to the qemu bug the timers are optional.
>>>
>>> The generic timers are indeed optional, therefore we should not assume
>>> they are always present.  I don't want this to ship in 2.20 as is, so
>>> will back out the patch in the morning pending a better solution.
>>
>> Any solution for this yet?
>>
>> I was updating the HP_TIMING code for RHEL7 and noticed AArch64
>> still lacks HP_TIMING support.
>
> Doesn't the server standard for armv8 require gti?  If so we can declare the standard glibc only for server base standard. And have an option later on for the other base standards.

The answer to my question is yes it is a requirement for Server Base
System Architecture Level 0 which means all server class systems have
it.  It also means nobody is not going to implement one processor
without GTI that is also going to use glibc.  So I think we should
just enable it by default for glibc and then when the day comes (if it
comes but I doubt it) have an option to disable the support for the
virtual timer.

Thanks,
Andrew



>
> Thanks,
> Andrew
>
>>
>> Cheers,
>> Carlos.
>>
  
Marcus Shawcroft Sept. 3, 2015, 1:17 p.m. UTC | #12
On 22 June 2015 at 08:08, Andrew Pinski <pinskia@gmail.com> wrote:
> On Sat, Jun 20, 2015 at 8:09 PM,  <pinskia@gmail.com> wrote:
>>
>>
>>
>>
>>> On Jun 20, 2015, at 7:20 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>>>
>>>> On 07/21/2014 02:31 PM, Marcus Shawcroft wrote:
>>>>> On 21 July 2014 13:01, Andreas Schwab <schwab@suse.de> wrote:
>>>>> pinskia@gmail.com writes:
>>>>>
>>>>>>> On Jul 21, 2014, at 4:41 AM, Andreas Schwab <schwab@suse.de> wrote:
>>>>>>>
>>>>>>> Richard Henderson <rth@twiddle.net> writes:
>>>>>>>
>>>>>>>> +/* Sync the instruction stream, and read from the virtual cycle counter.  */
>>>>>>>> +#define HP_TIMING_NOW(Var) \
>>>>>>>> +  __asm__ __volatile__ ("isb; mrs %0, cntvct_el0" : "=r" (Var))
>>>>>>>
>>>>>>> According to https://bugs.launchpad.net/bugs/1344320 the generic timers
>>>>>>> are not part of the kernel-to-userspace contract.
>>>>>>
>>>>>>
>>>>>> I think this is bogus for the kernel folks not allow a high
>>>>>> precision timer in user space. Timers like this are needed for
>>>>>> micro-benchmarking compiler changes along with other libc
>>>>>> changes.
>>>>>
>>>>> According to the qemu bug the timers are optional.
>>>>
>>>> The generic timers are indeed optional, therefore we should not assume
>>>> they are always present.  I don't want this to ship in 2.20 as is, so
>>>> will back out the patch in the morning pending a better solution.
>>>
>>> Any solution for this yet?
>>>
>>> I was updating the HP_TIMING code for RHEL7 and noticed AArch64
>>> still lacks HP_TIMING support.
>>
>> Doesn't the server standard for armv8 require gti?  If so we can declare the standard glibc only for server base standard. And have an option later on for the other base standards.
>
> The answer to my question is yes it is a requirement for Server Base
> System Architecture Level 0 which means all server class systems have
> it.  It also means nobody is not going to implement one processor
> without GTI that is also going to use glibc.  So I think we should
> just enable it by default for glibc and then when the day comes (if it
> comes but I doubt it) have an option to disable the support for the
> virtual timer.

The "optionality" argument against the use on cntvct_el0 no longer
holds,  the  ARMv8-A Reference Manual Issue >= A.d does not describe
the generic timer as optional.  The other  remaining issue is whether
the kernel  exposes  cntvct_el0 to user space for general use or
whether that register is considered to be exposed for the benefit of
the VDSO only.

There are two ways forward:

1) Kernel folk agree to expose cntvct_el0 for the benefit of general user space.

2) We add a HWCAP bit to indicate if cntvct_el0 is exposed.

Taking the proposed patch as it stands in the absence of a commitment
from the kernel doesn't look sensible to me.  Since the loader uses hp
timers in normal operation we will end up with a dynamic loader that
faults if a future kernel disables visibility of cntvct_el0.

Catalin could you comment on options 1) and 2) above ?

Cheers
/Marcus
  
Peter Maydell Sept. 3, 2015, 1:49 p.m. UTC | #13
On 3 September 2015 at 14:17, Marcus Shawcroft
<marcus.shawcroft@gmail.com> wrote:
> There are two ways forward:
>
> 1) Kernel folk agree to expose cntvct_el0 for the benefit of general user space.
>
> 2) We add a HWCAP bit to indicate if cntvct_el0 is exposed.
>
> Taking the proposed patch as it stands in the absence of a commitment
> from the kernel doesn't look sensible to me.  Since the loader uses hp
> timers in normal operation we will end up with a dynamic loader that
> faults if a future kernel disables visibility of cntvct_el0.

You need to have the dynamic loader check whether the counter
is available somehow anyway, otherwise it won't work on:
 * older kernels that don't export the counter to userspace
 * QEMU usermode emulation
 * valgrind
 * possibly other similar JIT tools

Obviously valgrind/QEMU/etc can be improved to add the counter
support, but there are still going to be old versions out in the
wild.

thanks
-- PMM
  
Catalin Marinas Sept. 9, 2015, 11:14 a.m. UTC | #14
Hi Marcus,

On Thu, Sep 03, 2015 at 02:17:16PM +0100, Marcus Shawcroft wrote:
> On 22 June 2015 at 08:08, Andrew Pinski <pinskia@gmail.com> wrote:
> > On Sat, Jun 20, 2015 at 8:09 PM,  <pinskia@gmail.com> wrote:
> >>> I was updating the HP_TIMING code for RHEL7 and noticed AArch64
> >>> still lacks HP_TIMING support.
> >>
> >> Doesn't the server standard for armv8 require gti?  If so we can
> >> declare the standard glibc only for server base standard. And have
> >> an option later on for the other base standards.

I'm definitely against glibc flavours based on whether it's server or
mobile. And SBSA is just a recommendation, not some enforced standard.

> > The answer to my question is yes it is a requirement for Server Base
> > System Architecture Level 0 which means all server class systems have
> > it.  It also means nobody is not going to implement one processor
> > without GTI that is also going to use glibc.  So I think we should
> > just enable it by default for glibc and then when the day comes (if it
> > comes but I doubt it) have an option to disable the support for the
> > virtual timer.
> 
> The "optionality" argument against the use on cntvct_el0 no longer
> holds,  the  ARMv8-A Reference Manual Issue >= A.d does not describe
> the generic timer as optional.  The other  remaining issue is whether
> the kernel  exposes  cntvct_el0 to user space for general use or
> whether that register is considered to be exposed for the benefit of
> the VDSO only.

The ARM ARM may mandate the presence of the CNTVCT_EL0 register (IOW it
won't undef when read) but to be of any use it also needs to be wired up
on the SoC so that it counts something. That's outside the scope of the
ARM ARM, so we cannot guarantee this (it may simply be a case of
hardware bugs where the counter is not the same on all CPUs, so we
better disable it altogether; we've seen this before).

What are the glibc expectations here for HP_TIMING? Is it fine if it
always returns 0? Does its frequency need to be known? If yes, how do we
expose such frequency to user space?

> There are two ways forward:
> 
> 1) Kernel folk agree to expose cntvct_el0 for the benefit of general
> user space.

See above.

> 2) We add a HWCAP bit to indicate if cntvct_el0 is exposed.

And who's going to check this bit? From my experience, we had HWCAP_SWP
before, we removed it but no-one was checking it so we had to emulate
the instruction.

IIUC here it's a simple HP_TIMING_NOW macro which translates to an mrs.
Do you plan to add further checks for HWCAP here?

> Taking the proposed patch as it stands in the absence of a commitment
> from the kernel doesn't look sensible to me.  Since the loader uses hp
> timers in normal operation we will end up with a dynamic loader that
> faults if a future kernel disables visibility of cntvct_el0.

Is exposing a reader function via VDSO be feasible here? Or it's too
early for the dynamic loader?
  
Richard Henderson Sept. 10, 2015, 12:33 a.m. UTC | #15
On 09/09/2015 04:14 AM, Catalin Marinas wrote:
> What are the glibc expectations here for HP_TIMING? Is it fine if it
> always returns 0? Does its frequency need to be known? If yes, how do we
> expose such frequency to user space?

Yes it's fine if it returns 0.  You just won't get useful timing info when 
using LD_DEBUG=statistics.

No, we don't require a known frequency.  Even on x86 we simply print "cycles" 
rather than a scaled time.

I think the only real requirement is not crashing due to sigill while 
collecting the statistics. ;-)

> IIUC here it's a simple HP_TIMING_NOW macro which translates to an mrs.

I did that simply because it worked for me.  You'll notice the significant 
push-back that patch elicited.  ;-)

> Do you plan to add further checks for HWCAP here?

If required.  It ought to be easy.

> Is exposing a reader function via VDSO be feasible here? Or it's too
> early for the dynamic loader?

Good question... it's on the edge, right after self-relocation of ld.so, but 
before proper relocation of libc and re-relocation of ld.so (e.g. to use libc 
symbols like malloc instead of the bootstrapping stubs).

Finding the VDSO is certainly easy, since its pointer is in auxv, and we will 
have stored that.  But how much effort it is to look up a symbol by name for 
that at that point in the bootstrap, I don't know.

Would the kernel be likely to provide an implementation that wasn't simply the 
mrs insn?

If not, I don't know that's any better than the HWCAP bit.  Either way we need 
a test-and-branch.  Whether that's checking for a null pointer because the 
kernel didn't provide the vdso entry point, or checking hwcap for the bit, it's 
just about the same.


r~
  
Catalin Marinas Sept. 11, 2015, 1:14 p.m. UTC | #16
On Wed, Sep 09, 2015 at 05:33:23PM -0700, Richard Henderson wrote:
> On 09/09/2015 04:14 AM, Catalin Marinas wrote:
> >What are the glibc expectations here for HP_TIMING? Is it fine if it
> >always returns 0? Does its frequency need to be known? If yes, how do we
> >expose such frequency to user space?
> 
> Yes it's fine if it returns 0.  You just won't get useful timing info when
> using LD_DEBUG=statistics.
> 
> No, we don't require a known frequency.  Even on x86 we simply print
> "cycles" rather than a scaled time.
> 
> I think the only real requirement is not crashing due to sigill while
> collecting the statistics. ;-)
> 
> >IIUC here it's a simple HP_TIMING_NOW macro which translates to an mrs.
> 
> I did that simply because it worked for me.  You'll notice the significant
> push-back that patch elicited.  ;-)
> 
> >Do you plan to add further checks for HWCAP here?
> 
> If required.  It ought to be easy.
> 
> >Is exposing a reader function via VDSO be feasible here? Or it's too
> >early for the dynamic loader?
> 
> Good question... it's on the edge, right after self-relocation of ld.so, but
> before proper relocation of libc and re-relocation of ld.so (e.g. to use
> libc symbols like malloc instead of the bootstrapping stubs).
> 
> Finding the VDSO is certainly easy, since its pointer is in auxv, and we
> will have stored that.  But how much effort it is to look up a symbol by
> name for that at that point in the bootstrap, I don't know.
> 
> Would the kernel be likely to provide an implementation that wasn't simply
> the mrs insn?
> 
> If not, I don't know that's any better than the HWCAP bit.  Either way we
> need a test-and-branch.  Whether that's checking for a null pointer because
> the kernel didn't provide the vdso entry point, or checking hwcap for the
> bit, it's just about the same.

Let's start with a HWCAP bit. But someone needs to write a patch and
post it on linux-arm-kernel ;) (being an ABI request, it needs raising
it on the kernel list).
  

Patch

diff --git a/sysdeps/aarch64/hp-timing.h b/sysdeps/aarch64/hp-timing.h
new file mode 100644
index 0000000..18cb54f
--- /dev/null
+++ b/sysdeps/aarch64/hp-timing.h
@@ -0,0 +1,38 @@ 
+/* High precision, low overhead timing functions.  AArch64 version.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Ulrich Drepper <drepper@cygnus.com>, 1998.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _HP_TIMING_H
+#define _HP_TIMING_H	1
+
+/* We always assume having the cycle counter register.  */
+#define HP_TIMING_AVAIL		(1)
+
+/* We indeed have inlined functions.  */
+#define HP_TIMING_INLINE	(1)
+
+/* We use 64bit values for the times.  */
+typedef unsigned long long int hp_timing_t;
+
+/* Sync the instruction stream, and read from the virtual cycle counter.  */
+#define HP_TIMING_NOW(Var) \
+  __asm__ __volatile__ ("isb; mrs %0, cntvct_el0" : "=r" (Var))
+
+#include <hp-timing-common.h>
+
+#endif	/* hp-timing.h */