benchtests: Restore the clock_gettime option
Commit Message
commit 7621e38bf3c58b2d0359545f1f2898017fd89d05
Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Date: Tue Jan 29 17:43:45 2019 +0000
Add generic hp-timing support
removed the clock_gettime option. On x86, fewer cycles doesn't
necessarily mean faster exection due to frequency drop. We should
restore the clock_gettime option.
---
benchtests/Makefile | 6 ++++++
benchtests/README | 7 ++++++-
benchtests/bench-timing.h | 6 +++++-
3 files changed, 17 insertions(+), 2 deletions(-)
Comments
On Tue, 19 May 2020, H.J. Lu via Libc-alpha wrote:
> commit 7621e38bf3c58b2d0359545f1f2898017fd89d05
> Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
> Date: Tue Jan 29 17:43:45 2019 +0000
>
> Add generic hp-timing support
>
> removed the clock_gettime option. On x86, fewer cycles doesn't
> necessarily mean faster exection due to frequency drop. We should
> restore the clock_gettime option.
Can you please elaborate more which x86 CPUs you have in mind here,
as since Nehalem (2008) when invariant TSC was introduced, the rdtsc(p)
instructions count at a fixed rate rather than at CPU clock rate.
And before 2008, there were no turbo frequencies and no AVX frequency drop.
Alexander
On Tue, May 19, 2020 at 2:18 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
>
> On Tue, 19 May 2020, H.J. Lu via Libc-alpha wrote:
>
> > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05
> > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
> > Date: Tue Jan 29 17:43:45 2019 +0000
> >
> > Add generic hp-timing support
> >
> > removed the clock_gettime option. On x86, fewer cycles doesn't
> > necessarily mean faster exection due to frequency drop. We should
> > restore the clock_gettime option.
>
> Can you please elaborate more which x86 CPUs you have in mind here,
> as since Nehalem (2008) when invariant TSC was introduced, the rdtsc(p)
> instructions count at a fixed rate rather than at CPU clock rate.
> And before 2008, there were no turbo frequencies and no AVX frequency drop.
https://stackoverflow.com/questions/56852812/simd-instructions-lowering-cpu-frequency
On Tue, 19 May 2020, H.J. Lu wrote:
> On Tue, May 19, 2020 at 2:18 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> >
> >
> >
> > On Tue, 19 May 2020, H.J. Lu via Libc-alpha wrote:
> >
> > > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05
> > > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
> > > Date: Tue Jan 29 17:43:45 2019 +0000
> > >
> > > Add generic hp-timing support
> > >
> > > removed the clock_gettime option. On x86, fewer cycles doesn't
> > > necessarily mean faster exection due to frequency drop. We should
> > > restore the clock_gettime option.
> >
> > Can you please elaborate more which x86 CPUs you have in mind here,
> > as since Nehalem (2008) when invariant TSC was introduced, the rdtsc(p)
> > instructions count at a fixed rate rather than at CPU clock rate.
> > And before 2008, there were no turbo frequencies and no AVX frequency drop.
>
> https://stackoverflow.com/questions/56852812/simd-instructions-lowering-cpu-frequency
I am well aware. Again: rdtsc does not count CPU cycles on recent Intel CPUs.
RDTSC reads a register that increments at a fixed rate. So its increment is
proportional to wall clock time. When a workload is causing a reduction in
actual CPU frequency, RDTSC increment frequency is not affected and so it
remains suitable for measuring the actual wall-clock time.
Alexander
On Tue, May 19, 2020 at 3:16 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
> On Tue, 19 May 2020, H.J. Lu wrote:
>
> > On Tue, May 19, 2020 at 2:18 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> > >
> > >
> > >
> > > On Tue, 19 May 2020, H.J. Lu via Libc-alpha wrote:
> > >
> > > > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05
> > > > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
> > > > Date: Tue Jan 29 17:43:45 2019 +0000
> > > >
> > > > Add generic hp-timing support
> > > >
> > > > removed the clock_gettime option. On x86, fewer cycles doesn't
> > > > necessarily mean faster exection due to frequency drop. We should
> > > > restore the clock_gettime option.
> > >
> > > Can you please elaborate more which x86 CPUs you have in mind here,
> > > as since Nehalem (2008) when invariant TSC was introduced, the rdtsc(p)
> > > instructions count at a fixed rate rather than at CPU clock rate.
> > > And before 2008, there were no turbo frequencies and no AVX frequency drop.
> >
> > https://stackoverflow.com/questions/56852812/simd-instructions-lowering-cpu-frequency
>
> I am well aware. Again: rdtsc does not count CPU cycles on recent Intel CPUs.
> RDTSC reads a register that increments at a fixed rate. So its increment is
> proportional to wall clock time. When a workload is causing a reduction in
> actual CPU frequency, RDTSC increment frequency is not affected and so it
> remains suitable for measuring the actual wall-clock time.
>
> Alexander
We'd like to have it as an option on x86.
On Wed, 20 May 2020, H.J. Lu wrote:
> On Tue, May 19, 2020 at 3:16 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> >
> > On Tue, 19 May 2020, H.J. Lu wrote:
> >
> > > On Tue, May 19, 2020 at 2:18 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> > > >
> > > >
> > > >
> > > > On Tue, 19 May 2020, H.J. Lu via Libc-alpha wrote:
> > > >
> > > > > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05
> > > > > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
> > > > > Date: Tue Jan 29 17:43:45 2019 +0000
> > > > >
> > > > > Add generic hp-timing support
> > > > >
> > > > > removed the clock_gettime option. On x86, fewer cycles doesn't
> > > > > necessarily mean faster exection due to frequency drop. We should
> > > > > restore the clock_gettime option.
> > > >
> > > > Can you please elaborate more which x86 CPUs you have in mind here,
> > > > as since Nehalem (2008) when invariant TSC was introduced, the rdtsc(p)
> > > > instructions count at a fixed rate rather than at CPU clock rate.
> > > > And before 2008, there were no turbo frequencies and no AVX frequency drop.
> > >
> > > https://stackoverflow.com/questions/56852812/simd-instructions-lowering-cpu-frequency
> >
> > I am well aware. Again: rdtsc does not count CPU cycles on recent Intel CPUs.
> > RDTSC reads a register that increments at a fixed rate. So its increment is
> > proportional to wall clock time. When a workload is causing a reduction in
> > actual CPU frequency, RDTSC increment frequency is not affected and so it
> > remains suitable for measuring the actual wall-clock time.
> >
> > Alexander
>
> We'd like to have it as an option on x86.
Then I would suggest rewording the proposed commit message to explain that.
As written, commit message gives a misleading/wrong motivation.
Alexander
* Alexander Monakov via Libc-alpha:
> I am well aware. Again: rdtsc does not count CPU cycles on recent
> Intel CPUs.
H.J. probably has a different view on what those “recent Intel CPUs”
are. 8-) I have not reviewed the mechanics of the patch, but if we need
this for some CPUs, we should make the change.
Thanks,
Florian
@@ -132,11 +132,17 @@ endif
CPPFLAGS-nonlib += -DDURATION=$(BENCH_DURATION) -D_ISOMAC
+# Use clock_gettime to measure performance of functions. The default is
+# to use the architecture-specific high precision timing instructions.
+ifdef USE_CLOCK_GETTIME
+CPPFLAGS-nonlib += -DUSE_CLOCK_GETTIME
+else
# On x86 processors, use RDTSCP, instead of RDTSC, to measure performance
# of functions. All x86 processors since 2010 support RDTSCP instruction.
ifdef USE_RDTSCP
CPPFLAGS-nonlib += -DUSE_RDTSCP
endif
+endif
DETAILED_OPT :=
@@ -27,7 +27,12 @@ BENCH_DURATION.
The benchmark suite does function call measurements using architecture-specific
high precision timing instructions whenever available. When such support is
-not available, it uses clock_gettime (CLOCK_MONOTONIC).
+not available, it uses clock_gettime (CLOCK_MONOTONIC). One can force the
+benchmark to use clock_gettime by invoking make as follows:
+
+ $ make USE_CLOCK_GETTIME=1 bench
+
+Again, one must run `make bench-clean' before changing the measurement method.
On x86 processors, RDTSCP instruction provides more precise timing data
than RDTSC instruction. All x86 processors since 2010 support RDTSCP
@@ -19,7 +19,11 @@
#undef attribute_hidden
#define attribute_hidden
#define __clock_gettime clock_gettime
-#include <hp-timing.h>
+#ifdef USE_CLOCK_GETTIME
+# include <sysdeps/generic/hp-timing.h>
+#else
+# include <hp-timing.h>
+#endif
#include <stdint.h>
#define GL(x) _##x