Improve math benchmark infrastructure

Message ID	AM5PR0802MB26105ACEDEC600EF2C4510BD83C00@AM5PR0802MB2610.eurprd08.prod.outlook.com
State	Superseded
Headers	Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: "libc-alpha@sourceware.org" <libc-alpha@sourceware.org> CC: nd <nd@arm.com> Subject: [PATCH] Improve math benchmark infrastructure Date: Thu, 15 Jun 2017 12:17:12 +0000 Message-ID: <AM5PR0802MB26105ACEDEC600EF2C4510BD83C00@AM5PR0802MB2610.eurprd08.prod.outlook.com> nodisclaimer: True spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0

Message ID

AM5PR0802MB26105ACEDEC600EF2C4510BD83C00@AM5PR0802MB2610.eurprd08.prod.outlook.com

State

Superseded

Headers

Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: "libc-alpha@sourceware.org" <libc-alpha@sourceware.org>
CC: nd <nd@arm.com>
Subject: [PATCH] Improve math benchmark infrastructure
Date: Thu, 15 Jun 2017 12:17:12 +0000
Message-ID: <AM5PR0802MB26105ACEDEC600EF2C4510BD83C00@AM5PR0802MB2610.eurprd08.prod.outlook.com>
nodisclaimer: True
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Jun 2017 12:17:12.5985
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0802MB2611

Commit Message

Wilco Dijkstra June 15, 2017, 12:17 p.m. UTC

  Improve support for math function benchmarking.  This patch adds
a feature that allows accurate benchmarking of traces extracted
from real workloads.  This is done by iterating over all samples
rather than repeating each sample many times (which completely 
ignores branch prediction and cache effects).  A trace can be
added to existing math function inputs via "## name: bench",
followed by the trace.

OK for commit?

ChangeLog:
2017-06-15  Wilco Dijkstra  <wdijkstr@arm.com>

        * benchtests/bench-skeleton.c (main): Add support for
        benchmarking traces from workloads.
--

Comments

Siddhesh Poyarekar June 19, 2017, 4:26 a.m. UTC | #1

On Thursday 15 June 2017 05:47 PM, Wilco Dijkstra wrote:
> Improve support for math function benchmarking.  This patch adds
> a feature that allows accurate benchmarking of traces extracted
> from real workloads.  This is done by iterating over all samples
> rather than repeating each sample many times (which completely 
> ignores branch prediction and cache effects).  A trace can be
> added to existing math function inputs via "## name: bench",
> followed by the trace.

There is no easy way to actually replicate the cache effects of a real
workload without replicating the program in some form, but this is a
good start.  Long term we should consider doing things like invalidating
cache at specific points in the program in an attempt to emulate the
workload a little more closely.  This may be more relevant for string
benchmarks than for math though.

Please make a couple of minor changes and repost.

> diff --git a/benchtests/bench-skeleton.c b/benchtests/bench-skeleton.c
> index 09eb78df1bce2d9f5e410e3e82821eb9b271e70d..48001763e8481592182fd8d67948ccb4078a48cc 100644
> --- a/benchtests/bench-skeleton.c
> +++ b/benchtests/bench-skeleton.c
> @@ -68,34 +68,50 @@ main (int argc, char **argv)
>        clock_gettime (CLOCK_MONOTONIC_RAW, &runtime);
>        runtime.tv_sec += DURATION;
>  
> +      bool is_bench = strcmp (VARIANT (v), "bench") == 0;

Use the name convention workload-* to better reflect what the dataset
is.  So for your wrf powf trace, call it workload-spec2006.wrf.  Also,
please document this feature in benchtests/README.

>        double d_total_i = 0;
>        timing_t total = 0, max = 0, min = 0x7fffffffffffffff;
>        int64_t c = 0;
> +      uint64_t cur;
>        while (1)
>  	{
> -	  for (i = 0; i < NUM_SAMPLES (v); i++)
> +	  if (is_bench)
>  	    {
> -	      uint64_t cur;
> +	      /* Benchmark a real trace of calls - all samples are iterated
> +		 over once before repeating.  This models actual use more
> +		 accurately than repeating the same sample many times.  */
>  	      TIMING_NOW (start);
>  	      for (k = 0; k < iters; k++)
> -		BENCH_FUNC (v, i);
> +		for (i = 0; i < NUM_SAMPLES (v); i++)
> +		  BENCH_FUNC (v, i);
>  	      TIMING_NOW (end);
> -
>  	      TIMING_DIFF (cur, start, end);
> +	      TIMING_ACCUM (total, cur);
> +	      d_total_i += iters;
> +	    }
> +	  else
> +	    for (i = 0; i < NUM_SAMPLES (v); i++)
> +	      {
> +		TIMING_NOW (start);
> +		for (k = 0; k < iters; k++)
> +		  BENCH_FUNC (v, i);
> +		TIMING_NOW (end);
>  
> -	      if (cur > max)
> -		max = cur;
> +		TIMING_DIFF (cur, start, end);
>  
> -	      if (cur < min)
> -		min = cur;
> +		if (cur > max)
> +		  max = cur;
>  
> -	      TIMING_ACCUM (total, cur);
> -	      /* Accumulate timings for the value.  In the end we will divide
> -	         by the total iterations.  */
> -	      RESULT_ACCUM (cur, v, i, c * iters, (c + 1) * iters);
> +		if (cur < min)
> +		  min = cur;
>  
> -	      d_total_i += iters;
> -	    }
> +		TIMING_ACCUM (total, cur);
> +		/* Accumulate timings for the value.  In the end we will divide
> +		   by the total iterations.  */
> +		RESULT_ACCUM (cur, v, i, c * iters, (c + 1) * iters);
> +
> +		d_total_i += iters;
> +	      }
>  	  c++;
>  	  struct timespec curtime;
>  
> @@ -117,11 +133,14 @@ main (int argc, char **argv)
>  
>        json_attr_double (&json_ctx, "duration", d_total_s);
>        json_attr_double (&json_ctx, "iterations", d_total_i);
> -      json_attr_double (&json_ctx, "max", max / d_iters);
> -      json_attr_double (&json_ctx, "min", min / d_iters);
> +      if (!is_bench)
> +	{
> +	  json_attr_double (&json_ctx, "max", max / d_iters);
> +	  json_attr_double (&json_ctx, "min", min / d_iters);
> +	}
>        json_attr_double (&json_ctx, "mean", d_total_s / d_total_i);
>  
> -      if (detailed)
> +      if (detailed && !is_bench)
>  	{
>  	  json_array_begin (&json_ctx, "timings");
>  
>

diff mbox

Patch

diff --git a/benchtests/bench-skeleton.c b/benchtests/bench-skeleton.c
index 09eb78df1bce2d9f5e410e3e82821eb9b271e70d..48001763e8481592182fd8d67948ccb4078a48cc 100644
--- a/benchtests/bench-skeleton.c
+++ b/benchtests/bench-skeleton.c
@@ -68,34 +68,50 @@  main (int argc, char **argv)
       clock_gettime (CLOCK_MONOTONIC_RAW, &runtime);
       runtime.tv_sec += DURATION;
 
+      bool is_bench = strcmp (VARIANT (v), "bench") == 0;
       double d_total_i = 0;
       timing_t total = 0, max = 0, min = 0x7fffffffffffffff;
       int64_t c = 0;
+      uint64_t cur;
       while (1)
 	{
-	  for (i = 0; i < NUM_SAMPLES (v); i++)
+	  if (is_bench)
 	    {
-	      uint64_t cur;
+	      /* Benchmark a real trace of calls - all samples are iterated
+		 over once before repeating.  This models actual use more
+		 accurately than repeating the same sample many times.  */
 	      TIMING_NOW (start);
 	      for (k = 0; k < iters; k++)
-		BENCH_FUNC (v, i);
+		for (i = 0; i < NUM_SAMPLES (v); i++)
+		  BENCH_FUNC (v, i);
 	      TIMING_NOW (end);
-
 	      TIMING_DIFF (cur, start, end);
+	      TIMING_ACCUM (total, cur);
+	      d_total_i += iters;
+	    }
+	  else
+	    for (i = 0; i < NUM_SAMPLES (v); i++)
+	      {
+		TIMING_NOW (start);
+		for (k = 0; k < iters; k++)
+		  BENCH_FUNC (v, i);
+		TIMING_NOW (end);
 
-	      if (cur > max)
-		max = cur;
+		TIMING_DIFF (cur, start, end);
 
-	      if (cur < min)
-		min = cur;
+		if (cur > max)
+		  max = cur;
 
-	      TIMING_ACCUM (total, cur);
-	      /* Accumulate timings for the value.  In the end we will divide
-	         by the total iterations.  */
-	      RESULT_ACCUM (cur, v, i, c * iters, (c + 1) * iters);
+		if (cur < min)
+		  min = cur;
 
-	      d_total_i += iters;
-	    }
+		TIMING_ACCUM (total, cur);
+		/* Accumulate timings for the value.  In the end we will divide
+		   by the total iterations.  */
+		RESULT_ACCUM (cur, v, i, c * iters, (c + 1) * iters);
+
+		d_total_i += iters;
+	      }
 	  c++;
 	  struct timespec curtime;
 
@@ -117,11 +133,14 @@  main (int argc, char **argv)
 
       json_attr_double (&json_ctx, "duration", d_total_s);
       json_attr_double (&json_ctx, "iterations", d_total_i);
-      json_attr_double (&json_ctx, "max", max / d_iters);
-      json_attr_double (&json_ctx, "min", min / d_iters);
+      if (!is_bench)
+	{
+	  json_attr_double (&json_ctx, "max", max / d_iters);
+	  json_attr_double (&json_ctx, "min", min / d_iters);
+	}
       json_attr_double (&json_ctx, "mean", d_total_s / d_total_i);
 
-      if (detailed)
+      if (detailed && !is_bench)
 	{
 	  json_array_begin (&json_ctx, "timings");