AArch64: Add hp-timing.h

Message ID VE1PR08MB5599E9E405FE180BD3ACE2F283039@VE1PR08MB5599.eurprd08.prod.outlook.com
State Committed
Commit 6a34c928c2ac9fce926b1348d61dae34262e3f77
Headers
Series AArch64: Add hp-timing.h |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Wilco Dijkstra June 28, 2021, 10:11 a.m. UTC
  Add hp-timing.h using the cntcvt_el0 counter. Return timing in nanoseconds
so it is fully compatible with generic hp-timing. Don't set HP_TIMING_INLINE
in the dynamic linker since very old kernels may not handle cntcvt correctly.

---
  

Comments

Florian Weimer June 28, 2021, 10:20 a.m. UTC | #1
* Wilco Dijkstra via Libc-alpha:

> Add hp-timing.h using the cntcvt_el0 counter. Return timing in nanoseconds
> so it is fully compatible with generic hp-timing. Don't set HP_TIMING_INLINE
> in the dynamic linker since very old kernels may not handle cntcvt correctly.

Is the MSR called cntcvt_el0 or cntvct_el0?

I'm clearly missing something: If this is broken on some kernels, how is
this disabled?

Thanks,
Florian
  
Wilco Dijkstra June 28, 2021, 10:53 a.m. UTC | #2
Hi Florian,

> Is the MSR called cntcvt_el0 or cntvct_el0?

cntvct_el0 - it's just a random combination of characters!

> I'm clearly missing something: If this is broken on some kernels, how is
> this disabled?

It's not enabled just to be extremely conservative - however I don't believe
this is an issue in practice since it works fine even on the first AArch64 boards
and servers.

Cheers
Wilco
  
Florian Weimer June 28, 2021, 11:06 a.m. UTC | #3
* Wilco Dijkstra:

> Hi Florian,
>
>> Is the MSR called cntcvt_el0 or cntvct_el0?
>
> cntvct_el0 - it's just a random combination of characters!

Okay, then please fix the other occurence that is a typo.

>> I'm clearly missing something: If this is broken on some kernels, how is
>> this disabled?
>
> It's not enabled just to be extremely conservative - however I don't
> believe this is an issue in practice since it works fine even on the
> first AArch64 boards and servers.

I meant: How is this disabled in glibc on older kernels?  Or is this
code just dormant as of now?

Thanks,
Florian
  
Wilco Dijkstra June 28, 2021, 11:35 a.m. UTC | #4
Hi Florian,

> I meant: How is this disabled in glibc on older kernels?  Or is this
> code just dormant as of now?

As is, the timer is not used at all in the GLIBC installation - it is only used for timing
in benchtests. So if you somehow managed to install a new distro on an obsolete
kernel, then there still wouldn't be any issues.

Cheers,
Wilco
  
Florian Weimer June 28, 2021, 11:39 a.m. UTC | #5
* Wilco Dijkstra via Libc-alpha:

> Hi Florian,
>
>> I meant: How is this disabled in glibc on older kernels?  Or is this
>> code just dormant as of now?
>
> As is, the timer is not used at all in the GLIBC installation - it is
> only used for timing in benchtests. So if you somehow managed to
> install a new distro on an obsolete kernel, then there still wouldn't
> be any issues.

Ah, thanks, that is what I was missing.

Florian
  
Szabolcs Nagy June 28, 2021, 11:49 a.m. UTC | #6
The 06/28/2021 13:06, Florian Weimer wrote:
> * Wilco Dijkstra:
> > Hi Florian,
> >> Is the MSR called cntcvt_el0 or cntvct_el0?
> >
> > cntvct_el0 - it's just a random combination of characters!
> 
> Okay, then please fix the other occurence that is a typo.
> 
> >> I'm clearly missing something: If this is broken on some kernels, how is
> >> this disabled?
> >
> > It's not enabled just to be extremely conservative - however I don't
> > believe this is an issue in practice since it works fine even on the
> > first AArch64 boards and servers.
> 
> I meant: How is this disabled in glibc on older kernels?  Or is this
> code just dormant as of now?

iirc linux devs asked that it is only used in vdso so
that it is under the kernel's control in case there
are implementation issues (vdso is easy to disable
and on some system it got disabled for timer reasons).

is hp-timing.h for some internal benchmarks only?

if no, then i think we should get the opinion of
linux maintainers before we use this in glibc.
  
Adhemerval Zanella Netto June 28, 2021, 11:55 a.m. UTC | #7
On 28/06/2021 08:49, Szabolcs Nagy via Libc-alpha wrote:
> The 06/28/2021 13:06, Florian Weimer wrote:
>> * Wilco Dijkstra:
>>> Hi Florian,
>>>> Is the MSR called cntcvt_el0 or cntvct_el0?
>>>
>>> cntvct_el0 - it's just a random combination of characters!
>>
>> Okay, then please fix the other occurence that is a typo.
>>
>>>> I'm clearly missing something: If this is broken on some kernels, how is
>>>> this disabled?
>>>
>>> It's not enabled just to be extremely conservative - however I don't
>>> believe this is an issue in practice since it works fine even on the
>>> first AArch64 boards and servers.
>>
>> I meant: How is this disabled in glibc on older kernels?  Or is this
>> code just dormant as of now?
> 
> iirc linux devs asked that it is only used in vdso so
> that it is under the kernel's control in case there
> are implementation issues (vdso is easy to disable
> and on some system it got disabled for timer reasons).
> 
> is hp-timing.h for some internal benchmarks only?
> 
> if no, then i think we should get the opinion of
> linux maintainers before we use this in glibc.
> 

The patch sets HP_TIMING_INLINE to 0, so it is only used on benchmarks.
  
Wilco Dijkstra June 28, 2021, 12:10 p.m. UTC | #8
Hi Szabolcs,

> iirc linux devs asked that it is only used in vdso so
> that it is under the kernel's control in case there
> are implementation issues (vdso is easy to disable
> and on some system it got disabled for timer reasons).

No, the kernel must still emulate cntvct_el0 in that case because of SBSA
which requires that the timer is always available. This is just as easy as
disabling vDSO - and we know this works and is surprisingly fast as I
mentioned.

Cheers,
Wilco
  
Adhemerval Zanella Netto June 28, 2021, 1:15 p.m. UTC | #9
On 28/06/2021 09:10, Wilco Dijkstra via Libc-alpha wrote:
> Hi Szabolcs,
> 
>> iirc linux devs asked that it is only used in vdso so
>> that it is under the kernel's control in case there
>> are implementation issues (vdso is easy to disable
>> and on some system it got disabled for timer reasons).
> 
> No, the kernel must still emulate cntvct_el0 in that case because of SBSA
> which requires that the timer is always available. This is just as easy as
> disabling vDSO - and we know this works and is surprisingly fast as I
> mentioned.

Any particular reason why you haven't enabled HP_TIMING_INLINE to 1?
  
Wilco Dijkstra June 28, 2021, 2:24 p.m. UTC | #10
Hi Adhemerval,

> Any particular reason why you haven't enabled HP_TIMING_INLINE to 1?

Well, what exactly is it used for? I've never quite understood the reasoning behind
hp-timing. It seems to me it just slows down ld.so with extra timing calls which are
unused 100% of the time. So I don't see a reason to enable it on AArch64. However
with the header in place, it's easy to set it to 1 and rebuild if needed.

Cheers,
Wilco
  
Adhemerval Zanella Netto June 28, 2021, 2:46 p.m. UTC | #11
On 28/06/2021 11:24, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
>> Any particular reason why you haven't enabled HP_TIMING_INLINE to 1?
> 
> Well, what exactly is it used for? I've never quite understood the reasoning behind
> hp-timing. It seems to me it just slows down ld.so with extra timing calls which are
> unused 100% of the time. So I don't see a reason to enable it on AArch64. However
> with the header in place, it's easy to set it to 1 and rebuild if needed.

It is used for LD_DEBUG=statistics to give better profiling numbers
(theoretically). However, the profiling is still measured in some
parts even LD_DEBUG is not set unfortunately.  It assumes that reading
the clock is as fast as one instructions, that why it is only enabled
for HP_TIMING_INLINE.

Not sure if it is worth to enable on aarch64 indeed.
  
Wilco Dijkstra June 28, 2021, 7:16 p.m. UTC | #12
Hi Adhemerval,

> It is used for LD_DEBUG=statistics to give better profiling numbers
> (theoretically). However, the profiling is still measured in some
> parts even LD_DEBUG is not set unfortunately.  It assumes that reading
> the clock is as fast as one instructions, that why it is only enabled
> for HP_TIMING_INLINE.

If emulated, it's over 600 cycles per use, not something you want to
accidentally do by default...

> Not sure if it is worth to enable on aarch64 indeed.

It would be alright if the timing was done conditionally so that there is no
performance hit and there can be no concern that ld.so might crash on an
ancient kernel.

Cheers,
Wilco
  
Szabolcs Nagy June 30, 2021, 8:46 a.m. UTC | #13
The 06/28/2021 11:11, Wilco Dijkstra wrote:
> Add hp-timing.h using the cntcvt_el0 counter. Return timing in nanoseconds
> so it is fully compatible with generic hp-timing. Don't set HP_TIMING_INLINE
> in the dynamic linker since very old kernels may not handle cntcvt correctly.

can you please add a comment here that this
is only used in internal benchmarks and not
in installed glibc binaries. with that it's
ok to commit.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>



> diff --git a/sysdeps/aarch64/hp-timing.h b/sysdeps/aarch64/hp-timing.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..f9822e725c227b3b57d508dc7d44cd7d2d04faf4
> --- /dev/null
> +++ b/sysdeps/aarch64/hp-timing.h
> @@ -0,0 +1,47 @@
> +/* High precision, low overhead timing functions.  AArch64 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _HP_TIMING_H
> +#define _HP_TIMING_H   1
> +
> +#include <time.h>
> +#include <stdint.h>
> +#include <hp-timing-common.h>
> +#include <libc-symbols.h>
> +
> +/* Don't use inline timer in ld.so.  */
> +#if IS_IN(rtld)
> +# define HP_TIMING_INLINE      (0)
> +#else
> +# define HP_TIMING_INLINE      (1)
> +#endif
> +
> +typedef uint64_t hp_timing_t;
> +
> +#define HP_TIMING_NOW(var) \
> +  __asm__ __volatile__ ("isb; mrs %0, cntvct_el0" : "=r" (var))
> +
> +/* Compute elapsed time in nanoseconds.  */
> +#undef HP_TIMING_DIFF
> +#define HP_TIMING_DIFF(Diff, Start, End)                       \
> +({  hp_timing_t freq;                                          \
> +    __asm__ __volatile__ ("mrs %0, cntfrq_el0" : "=r" (freq)); \
> +   (Diff) = ((End) - (Start)) * (UINT64_C(1000000000) / freq); \
> +})
> +
> +#endif /* hp-timing.h */
  
Florian Weimer June 30, 2021, 8:55 a.m. UTC | #14
* Szabolcs Nagy via Libc-alpha:

> The 06/28/2021 11:11, Wilco Dijkstra wrote:
>> Add hp-timing.h using the cntcvt_el0 counter. Return timing in nanoseconds
>> so it is fully compatible with generic hp-timing. Don't set HP_TIMING_INLINE
>> in the dynamic linker since very old kernels may not handle cntcvt correctly.
>
> can you please add a comment here that this
> is only used in internal benchmarks and not
> in installed glibc binaries. with that it's
> ok to commit.

This commit would not be entirely accurate.  We install and ship the
benchtests, so they need to use a stable ISA (and kernel ABI).

If the benchtests are like the testsuite and are not intended to be
installed and distributed, we probably need to make downstream changes.

Thanks,
Florian
  
Siddhesh Poyarekar June 30, 2021, 9:33 a.m. UTC | #15
On 6/30/21 2:25 PM, Florian Weimer via Libc-alpha wrote:
> * Szabolcs Nagy via Libc-alpha:
> 
>> The 06/28/2021 11:11, Wilco Dijkstra wrote:
>>> Add hp-timing.h using the cntcvt_el0 counter. Return timing in nanoseconds
>>> so it is fully compatible with generic hp-timing. Don't set HP_TIMING_INLINE
>>> in the dynamic linker since very old kernels may not handle cntcvt correctly.
>>
>> can you please add a comment here that this
>> is only used in internal benchmarks and not
>> in installed glibc binaries. with that it's
>> ok to commit.
> 
> This commit would not be entirely accurate.  We install and ship the
> benchtests, so they need to use a stable ISA (and kernel ABI).

That doesn't seem like a reasonable limitation for benchtests.

> If the benchtests are like the testsuite and are not intended to be
> installed and distributed, we probably need to make downstream changes.

There's value in having the benchtest binaries in the distribution too, 
so maybe we need to figure out a way to do both, i.e. not be bound by 
ABI restrictions and at the same time, be deployable to do performance 
comparison.

Siddhesh
  
Wilco Dijkstra July 1, 2021, 10:21 a.m. UTC | #16
Hi Florian,

> This commit would not be entirely accurate.  We install and ship the
> benchtests, so they need to use a stable ISA (and kernel ABI).
>
> If the benchtests are like the testsuite and are not intended to be
> installed and distributed, we probably need to make downstream changes.

Is this something distro specific? I don't see it in the GLIBC install directory after
"make install" or anywhere on my system. 

In any case, there is no ABI change here - cntvct_el0 is always allowed in userspace.

Cheers,
Wilco
  
Szabolcs Nagy July 1, 2021, 1:48 p.m. UTC | #17
The 07/01/2021 11:21, Wilco Dijkstra wrote:
> Hi Florian,
> 
> > This commit would not be entirely accurate.  We install and ship the
> > benchtests, so they need to use a stable ISA (and kernel ABI).
> >
> > If the benchtests are like the testsuite and are not intended to be
> > installed and distributed, we probably need to make downstream changes.
> 
> Is this something distro specific? I don't see it in the GLIBC install directory after
> "make install" or anywhere on my system.
> 
> In any case, there is no ABI change here - cntvct_el0 is always allowed in userspace.

then just say the commit only affects the glibc benchmark code.

i'm happy to accept the commit with that.

if we want to use it in the dynamic linker then we should
dig into the potential kernel issues (what versions are
affected etc)
  
Wilco Dijkstra July 1, 2021, 3:33 p.m. UTC | #18
Hi Szabolcs,

> then just say the commit only affects the glibc benchmark code.
>
> i'm happy to accept the commit with that.

I've changed the log to:

    Add hp-timing.h using the cntvct_el0 counter. Return timing in nanoseconds
    so it is fully compatible with generic hp-timing. Don't set HP_TIMING_INLINE
    in the dynamic linker since it adds unnecessary overheads and some ancient
    kernels may not handle emulating cntcvt correctly. Currently cntvct_el0 is
    only used for timing in the benchtests.

> if we want to use it in the dynamic linker then we should
> dig into the potential kernel issues (what versions are
> affected etc)

Agreed. Plus which machines might be affected - not all CPUs have
trouble incrementing!

Cheers,
Wilco
  
Florian Weimer July 6, 2021, 11:33 a.m. UTC | #19
* Wilco Dijkstra:

> Hi Florian,
>
>> This commit would not be entirely accurate.  We install and ship the
>> benchtests, so they need to use a stable ISA (and kernel ABI).
>>
>> If the benchtests are like the testsuite and are not intended to be
>> installed and distributed, we probably need to make downstream changes.
>
> Is this something distro specific? I don't see it in the GLIBC install
> directory after "make install" or anywhere on my system.

Looks like it.

> In any case, there is no ABI change here - cntvct_el0 is always
> allowed in userspace.

Good to know.  It's just that the performance could be problematic,
right?

I'm still puzzled why we can't leave the decision to the vDSO.  Generic
hp-timing support should do that.

Thanks,
Florian
  

Patch

diff --git a/sysdeps/aarch64/hp-timing.h b/sysdeps/aarch64/hp-timing.h
new file mode 100644
index 0000000000000000000000000000000000000000..f9822e725c227b3b57d508dc7d44cd7d2d04faf4
--- /dev/null
+++ b/sysdeps/aarch64/hp-timing.h
@@ -0,0 +1,47 @@ 
+/* High precision, low overhead timing functions.  AArch64 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _HP_TIMING_H
+#define _HP_TIMING_H	1
+
+#include <time.h>
+#include <stdint.h>
+#include <hp-timing-common.h>
+#include <libc-symbols.h>
+
+/* Don't use inline timer in ld.so.  */
+#if IS_IN(rtld)
+# define HP_TIMING_INLINE	(0)
+#else
+# define HP_TIMING_INLINE	(1)
+#endif
+
+typedef uint64_t hp_timing_t;
+
+#define HP_TIMING_NOW(var) \
+  __asm__ __volatile__ ("isb; mrs %0, cntvct_el0" : "=r" (var))
+
+/* Compute elapsed time in nanoseconds.  */
+#undef HP_TIMING_DIFF
+#define HP_TIMING_DIFF(Diff, Start, End)			\
+({  hp_timing_t freq;						\
+    __asm__ __volatile__ ("mrs %0, cntfrq_el0" : "=r" (freq));	\
+   (Diff) = ((End) - (Start)) * (UINT64_C(1000000000) / freq);	\
+})
+
+#endif	/* hp-timing.h */