V2 [PATCH] benchtests: Restore the clock_gettime option
Commit Message
On Wed, May 20, 2020 at 11:05 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Alexander Monakov via Libc-alpha:
>
> > I am well aware. Again: rdtsc does not count CPU cycles on recent
> > Intel CPUs.
>
> H.J. probably has a different view on what those “recent Intel CPUs”
> are. 8-) I have not reviewed the mechanics of the patch, but if we need
> this for some CPUs, we should make the change.
>
Here the patch with updated commit message:
commit 7621e38bf3c58b2d0359545f1f2898017fd89d05
Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Date: Tue Jan 29 17:43:45 2019 +0000
Add generic hp-timing support
removed the clock_gettime option. Restore the clock_gettime option for
some x86 CPUs on which value from RDTSC may not be incremented at a fixed
rate.
OK for master?
Thanks.
Comments
* H. J. Lu:
> On Wed, May 20, 2020 at 11:05 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Alexander Monakov via Libc-alpha:
>>
>> > I am well aware. Again: rdtsc does not count CPU cycles on recent
>> > Intel CPUs.
>>
>> H.J. probably has a different view on what those “recent Intel CPUs”
>> are. 8-) I have not reviewed the mechanics of the patch, but if we need
>> this for some CPUs, we should make the change.
>>
>
> Here the patch with updated commit message:
>
> commit 7621e38bf3c58b2d0359545f1f2898017fd89d05
> Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
> Date: Tue Jan 29 17:43:45 2019 +0000
>
> Add generic hp-timing support
>
> removed the clock_gettime option. Restore the clock_gettime option for
> some x86 CPUs on which value from RDTSC may not be incremented at a fixed
> rate.
>
> OK for master?
Patch looks okay to me.
Thanks,
Florian
On 20/05/2020 15:14, H.J. Lu via Libc-alpha wrote:
> On Wed, May 20, 2020 at 11:05 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Alexander Monakov via Libc-alpha:
>>
>>> I am well aware. Again: rdtsc does not count CPU cycles on recent
>>> Intel CPUs.
>>
>> H.J. probably has a different view on what those “recent Intel CPUs”
>> are. 8-) I have not reviewed the mechanics of the patch, but if we need
>> this for some CPUs, we should make the change.
>>
>
> Here the patch with updated commit message:
>
> commit 7621e38bf3c58b2d0359545f1f2898017fd89d05
> Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
> Date: Tue Jan 29 17:43:45 2019 +0000
>
> Add generic hp-timing support
>
> removed the clock_gettime option. Restore the clock_gettime option for
> some x86 CPUs on which value from RDTSC may not be incremented at a fixed
> rate.
>
> OK for master?
What kind of result discrepancies are you seeing using hp-timing.h on x86?
The clock_gettime help in what exactly here (it was not clear from
discussion, neither from patch submission)?
I am asking because we rely on hp-timing.h to get the loader profiling,
so if this does provide accurate information in some cases it might be
the case to disable on ld.so/libc.so as well.
On Wed, May 20, 2020 at 11:38 AM Adhemerval Zanella via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
>
>
> On 20/05/2020 15:14, H.J. Lu via Libc-alpha wrote:
> > On Wed, May 20, 2020 at 11:05 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * Alexander Monakov via Libc-alpha:
> >>
> >>> I am well aware. Again: rdtsc does not count CPU cycles on recent
> >>> Intel CPUs.
> >>
> >> H.J. probably has a different view on what those “recent Intel CPUs”
> >> are. 8-) I have not reviewed the mechanics of the patch, but if we need
> >> this for some CPUs, we should make the change.
> >>
> >
> > Here the patch with updated commit message:
> >
> > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05
> > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
> > Date: Tue Jan 29 17:43:45 2019 +0000
> >
> > Add generic hp-timing support
> >
> > removed the clock_gettime option. Restore the clock_gettime option for
> > some x86 CPUs on which value from RDTSC may not be incremented at a fixed
> > rate.
> >
> > OK for master?
>
> What kind of result discrepancies are you seeing using hp-timing.h on x86?
> The clock_gettime help in what exactly here (it was not clear from
> discussion, neither from patch submission)?
>
> I am asking because we rely on hp-timing.h to get the loader profiling,
> so if this does provide accurate information in some cases it might be
> the case to disable on ld.so/libc.so as well.
We are using glibc benchmarks for tuning memory/string functions.
We want to compare results between RDTSC and clock_gettime.
On 5/20/20 2:14 PM, H.J. Lu via Libc-alpha wrote:
> On Wed, May 20, 2020 at 11:05 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Alexander Monakov via Libc-alpha:
>>
>>> I am well aware. Again: rdtsc does not count CPU cycles on recent
>>> Intel CPUs.
>>
>> H.J. probably has a different view on what those “recent Intel CPUs”
>> are. 8-) I have not reviewed the mechanics of the patch, but if we need
>> this for some CPUs, we should make the change.
>>
>
> Here the patch with updated commit message:
>
> commit 7621e38bf3c58b2d0359545f1f2898017fd89d05
> Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
> Date: Tue Jan 29 17:43:45 2019 +0000
>
> Add generic hp-timing support
>
> removed the clock_gettime option. Restore the clock_gettime option for
> some x86 CPUs on which value from RDTSC may not be incremented at a fixed
> rate.
>
> OK for master?
OK for master.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> Thanks.
>
> From 7e48f7adbd53c18df7ab5fdd488bfcc134627480 Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" <hjl.tools@gmail.com>
> Date: Mon, 18 May 2020 17:28:42 -0700
> Subject: [PATCH] benchtests: Restore the clock_gettime option
>
> commit 7621e38bf3c58b2d0359545f1f2898017fd89d05
> Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
> Date: Tue Jan 29 17:43:45 2019 +0000
>
> Add generic hp-timing support
>
> removed the clock_gettime option. Restore the clock_gettime option for
> some x86 CPUs on which value from RDTSC may not be incremented at a fixed
> rate.
> ---
> benchtests/Makefile | 6 ++++++
> benchtests/README | 7 ++++++-
> benchtests/bench-timing.h | 6 +++++-
> 3 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/benchtests/Makefile b/benchtests/Makefile
> index 335d643ecb..99e90d17a0 100644
> --- a/benchtests/Makefile
> +++ b/benchtests/Makefile
> @@ -132,11 +132,17 @@ endif
>
> CPPFLAGS-nonlib += -DDURATION=$(BENCH_DURATION) -D_ISOMAC
>
> +# Use clock_gettime to measure performance of functions. The default is
> +# to use the architecture-specific high precision timing instructions.
> +ifdef USE_CLOCK_GETTIME
> +CPPFLAGS-nonlib += -DUSE_CLOCK_GETTIME
> +else
> # On x86 processors, use RDTSCP, instead of RDTSC, to measure performance
> # of functions. All x86 processors since 2010 support RDTSCP instruction.
> ifdef USE_RDTSCP
> CPPFLAGS-nonlib += -DUSE_RDTSCP
> endif
> +endif
>
> DETAILED_OPT :=
>
> diff --git a/benchtests/README b/benchtests/README
> index c4f03fd872..f440f3295a 100644
> --- a/benchtests/README
> +++ b/benchtests/README
> @@ -27,7 +27,12 @@ BENCH_DURATION.
>
> The benchmark suite does function call measurements using architecture-specific
> high precision timing instructions whenever available. When such support is
> -not available, it uses clock_gettime (CLOCK_MONOTONIC).
> +not available, it uses clock_gettime (CLOCK_MONOTONIC). One can force the
> +benchmark to use clock_gettime by invoking make as follows:
> +
> + $ make USE_CLOCK_GETTIME=1 bench
> +
> +Again, one must run `make bench-clean' before changing the measurement method.
>
> On x86 processors, RDTSCP instruction provides more precise timing data
> than RDTSC instruction. All x86 processors since 2010 support RDTSCP
> diff --git a/benchtests/bench-timing.h b/benchtests/bench-timing.h
> index 5b9a8384bb..844a7727c9 100644
> --- a/benchtests/bench-timing.h
> +++ b/benchtests/bench-timing.h
> @@ -19,7 +19,11 @@
> #undef attribute_hidden
> #define attribute_hidden
> #define __clock_gettime clock_gettime
> -#include <hp-timing.h>
> +#ifdef USE_CLOCK_GETTIME
> +# include <sysdeps/generic/hp-timing.h>
> +#else
> +# include <hp-timing.h>
> +#endif
> #include <stdint.h>
>
> #define GL(x) _##x
> --
> 2.26.2
>
From 7e48f7adbd53c18df7ab5fdd488bfcc134627480 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 18 May 2020 17:28:42 -0700
Subject: [PATCH] benchtests: Restore the clock_gettime option
commit 7621e38bf3c58b2d0359545f1f2898017fd89d05
Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Date: Tue Jan 29 17:43:45 2019 +0000
Add generic hp-timing support
removed the clock_gettime option. Restore the clock_gettime option for
some x86 CPUs on which value from RDTSC may not be incremented at a fixed
rate.
---
benchtests/Makefile | 6 ++++++
benchtests/README | 7 ++++++-
benchtests/bench-timing.h | 6 +++++-
3 files changed, 17 insertions(+), 2 deletions(-)
@@ -132,11 +132,17 @@ endif
CPPFLAGS-nonlib += -DDURATION=$(BENCH_DURATION) -D_ISOMAC
+# Use clock_gettime to measure performance of functions. The default is
+# to use the architecture-specific high precision timing instructions.
+ifdef USE_CLOCK_GETTIME
+CPPFLAGS-nonlib += -DUSE_CLOCK_GETTIME
+else
# On x86 processors, use RDTSCP, instead of RDTSC, to measure performance
# of functions. All x86 processors since 2010 support RDTSCP instruction.
ifdef USE_RDTSCP
CPPFLAGS-nonlib += -DUSE_RDTSCP
endif
+endif
DETAILED_OPT :=
@@ -27,7 +27,12 @@ BENCH_DURATION.
The benchmark suite does function call measurements using architecture-specific
high precision timing instructions whenever available. When such support is
-not available, it uses clock_gettime (CLOCK_MONOTONIC).
+not available, it uses clock_gettime (CLOCK_MONOTONIC). One can force the
+benchmark to use clock_gettime by invoking make as follows:
+
+ $ make USE_CLOCK_GETTIME=1 bench
+
+Again, one must run `make bench-clean' before changing the measurement method.
On x86 processors, RDTSCP instruction provides more precise timing data
than RDTSC instruction. All x86 processors since 2010 support RDTSCP
@@ -19,7 +19,11 @@
#undef attribute_hidden
#define attribute_hidden
#define __clock_gettime clock_gettime
-#include <hp-timing.h>
+#ifdef USE_CLOCK_GETTIME
+# include <sysdeps/generic/hp-timing.h>
+#else
+# include <hp-timing.h>
+#endif
#include <stdint.h>
#define GL(x) _##x
--
2.26.2