[v3] Add math benchmark latency test

Message ID	DB6PR0801MB2053511A303179877C15394A83830@DB6PR0801MB2053.eurprd08.prod.outlook.com
State	New, archived
Headers	Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: Siddhesh Poyarekar <siddhesh@gotplt.org>, Alexander Monakov <amonakov@ispras.ru>, Arjan van de Ven <arjan@linux.intel.com> CC: "libc-alpha@sourceware.org" <libc-alpha@sourceware.org>, nd <nd@arm.com> Subject: Re: [PATCH v3] Add math benchmark latency test Date: Thu, 17 Aug 2017 13:08:46 +0000 Message-ID: <DB6PR0801MB2053511A303179877C15394A83830@DB6PR0801MB2053.eurprd08.prod.outlook.com> References: <0e008f2e-f41a-1bb8-803c-2f798e2c3541@gotplt.org> <d8612b78-7a02-40e8-d8bd-9b221eff4b7b@linux.intel.com> <a3d0f45c-d283-8144-c770-96f434aef7b4@gotplt.org> <b1cfe590-d5af-bba4-0d9a-c639165854a9@linux.intel.com> <alpine.LNX.2.20.13.1708161737010.2420@monopod.intra.ispras.ru> <DB6PR0801MB205326EE1564C219082724A983820@DB6PR0801MB2053.eurprd08.prod.outlook.com>, <28bd6bf5-6e1a-7b21-9a5c-c6b2c6be61e3@gotplt.org> In-Reply-To: <28bd6bf5-6e1a-7b21-9a5c-c6b2c6be61e3@gotplt.org> nodisclaimer: True received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0

Message ID

DB6PR0801MB2053511A303179877C15394A83830@DB6PR0801MB2053.eurprd08.prod.outlook.com

State

New, archived

Headers

Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: Siddhesh Poyarekar <siddhesh@gotplt.org>, Alexander Monakov
	<amonakov@ispras.ru>, Arjan van de Ven <arjan@linux.intel.com>
CC: "libc-alpha@sourceware.org" <libc-alpha@sourceware.org>, nd <nd@arm.com>
Subject: Re: [PATCH v3] Add math benchmark latency test
Date: Thu, 17 Aug 2017 13:08:46 +0000
Message-ID: <DB6PR0801MB2053511A303179877C15394A83830@DB6PR0801MB2053.eurprd08.prod.outlook.com>
References: <0e008f2e-f41a-1bb8-803c-2f798e2c3541@gotplt.org>
	<d8612b78-7a02-40e8-d8bd-9b221eff4b7b@linux.intel.com>
	<a3d0f45c-d283-8144-c770-96f434aef7b4@gotplt.org>
	<b1cfe590-d5af-bba4-0d9a-c639165854a9@linux.intel.com>
	<alpine.LNX.2.20.13.1708161737010.2420@monopod.intra.ispras.ru>
	<DB6PR0801MB205326EE1564C219082724A983820@DB6PR0801MB2053.eurprd08.prod.outlook.com>,
	<28bd6bf5-6e1a-7b21-9a5c-c6b2c6be61e3@gotplt.org>
In-Reply-To: <28bd6bf5-6e1a-7b21-9a5c-c6b2c6be61e3@gotplt.org>
nodisclaimer: True
received-spf: None (protection.outlook.com: arm.com does not designate
	permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Aug 2017 13:08:46.5253
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB2053

Commit Message

Wilco Dijkstra Aug. 17, 2017, 1:08 p.m. UTC

  OK, here is version 3:

This patch further improves math function benchmarking by adding a latency
test in addition to throughput.  This enables more accurate comparisons of the
math functions. The latency test works by creating a dependency on the previous
iteration: func_res = F (func_res * zero + input[i]). The multiply by zero avoids
changing the input.

It reports reciprocal throughput and latency in nanoseconds (depending on the
timing header used) and max/min throughput in iterations per second:

   "workload-spec2006.wrf": {
    "reciprocal-throughput": 100,
    "latency": 200,
    "max-throughput": 1.0e+07,
    "min-throughput": 5.0e+06
   }

OK for commit?

ChangeLog:
2017-08-17  Wilco Dijkstra  <wdijkstr@arm.com>
  
        * benchtests/bench-skeleton.c (main): Add support for
        latency benchmarking.
        * benchtests/scripts/bench.py: Add support for latency benchmarking.
--

Comments

Siddhesh Poyarekar Aug. 17, 2017, 2:01 p.m. UTC | #1

On Thursday 17 August 2017 06:38 PM, Wilco Dijkstra wrote:
> OK, here is version 3:
> 
> This patch further improves math function benchmarking by adding a latency
> test in addition to throughput.  This enables more accurate comparisons of the
> math functions. The latency test works by creating a dependency on the previous
> iteration: func_res = F (func_res * zero + input[i]). The multiply by zero avoids
> changing the input.
> 
> It reports reciprocal throughput and latency in nanoseconds (depending on the
> timing header used) and max/min throughput in iterations per second:
> 
>    "workload-spec2006.wrf": {
>     "reciprocal-throughput": 100,
>     "latency": 200,
>     "max-throughput": 1.0e+07,
>     "min-throughput": 5.0e+06
>    }

This looks good, please commit.

Siddhesh

> 
> OK for commit?
> 
> ChangeLog:
> 2017-08-17  Wilco Dijkstra  <wdijkstr@arm.com>
>   
>         * benchtests/bench-skeleton.c (main): Add support for
>         latency benchmarking.
>         * benchtests/scripts/bench.py: Add support for latency benchmarking.
> --
> 
> diff --git a/benchtests/bench-skeleton.c b/benchtests/bench-skeleton.c
> index 3c6dad705594ac0a53edcb4e09686252c13127cf..955b2e1d2125775c6ffc10a004184a9564496e46 100644
> --- a/benchtests/bench-skeleton.c
> +++ b/benchtests/bench-skeleton.c
> @@ -71,8 +71,10 @@ main (int argc, char **argv)
>        bool is_bench = strncmp (VARIANT (v), "workload-", 9) == 0;
>        double d_total_i = 0;
>        timing_t total = 0, max = 0, min = 0x7fffffffffffffff;
> +      timing_t throughput = 0, latency = 0;
>        int64_t c = 0;
>        uint64_t cur;
> +      BENCH_VARS;
>        while (1)
>  	{
>  	  if (is_bench)
> @@ -86,7 +88,16 @@ main (int argc, char **argv)
>  		  BENCH_FUNC (v, i);
>  	      TIMING_NOW (end);
>  	      TIMING_DIFF (cur, start, end);
> -	      TIMING_ACCUM (total, cur);
> +	      TIMING_ACCUM (throughput, cur);
> +
> +	      TIMING_NOW (start);
> +	      for (k = 0; k < iters; k++)
> +		for (i = 0; i < NUM_SAMPLES (v); i++)
> +		  BENCH_FUNC_LAT (v, i);
> +	      TIMING_NOW (end);
> +	      TIMING_DIFF (cur, start, end);
> +	      TIMING_ACCUM (latency, cur);
> +
>  	      d_total_i += iters * NUM_SAMPLES (v);
>  	    }
>  	  else
> @@ -131,12 +142,20 @@ main (int argc, char **argv)
>        /* Begin variant.  */
>        json_attr_object_begin (&json_ctx, VARIANT (v));
>  
> -      json_attr_double (&json_ctx, "duration", d_total_s);
> -      json_attr_double (&json_ctx, "iterations", d_total_i);
>        if (is_bench)
> -	json_attr_double (&json_ctx, "throughput", d_total_s / d_total_i);
> +	{
> +	  json_attr_double (&json_ctx, "reciprocal-throughput",
> +			    throughput / d_total_i);
> +	  json_attr_double (&json_ctx, "latency", latency / d_total_i);
> +	  json_attr_double (&json_ctx, "max-throughput",
> +			    d_total_i / throughput * 1000000000.0);
> +	  json_attr_double (&json_ctx, "min-throughput",
> +			    d_total_i / latency * 1000000000.0);
> +	}
>        else
>  	{
> +	  json_attr_double (&json_ctx, "duration", d_total_s);
> +	  json_attr_double (&json_ctx, "iterations", d_total_i);
>  	  json_attr_double (&json_ctx, "max", max / d_iters);
>  	  json_attr_double (&json_ctx, "min", min / d_iters);
>  	  json_attr_double (&json_ctx, "mean", d_total_s / d_total_i);
> diff --git a/benchtests/scripts/bench.py b/benchtests/scripts/bench.py
> index 8c1c9eeb2bc67a16cb8a8e010fd2b8a2ef8ab6df..b7ccb7c8c2bf1822202a2377dfb0675516115cc5 100755
> --- a/benchtests/scripts/bench.py
> +++ b/benchtests/scripts/bench.py
> @@ -45,7 +45,7 @@ DEFINES_TEMPLATE = '''
>  # variant is represented by the _VARIANT structure.  The ARGS structure
>  # represents a single set of arguments.
>  STRUCT_TEMPLATE = '''
> -#define CALL_BENCH_FUNC(v, i) %(func)s (%(func_args)s)
> +#define CALL_BENCH_FUNC(v, i, x) %(func)s (x %(func_args)s)
>  
>  struct args
>  {
> @@ -84,7 +84,9 @@ EPILOGUE = '''
>  #define RESULT(__v, __i) (variants[(__v)].in[(__i)].timing)
>  #define RESULT_ACCUM(r, v, i, old, new) \\
>          ((RESULT ((v), (i))) = (RESULT ((v), (i)) * (old) + (r)) / ((new) + 1))
> -#define BENCH_FUNC(i, j) ({%(getret)s CALL_BENCH_FUNC (i, j);})
> +#define BENCH_FUNC(i, j) ({%(getret)s CALL_BENCH_FUNC (i, j, );})
> +#define BENCH_FUNC_LAT(i, j) ({%(getret)s CALL_BENCH_FUNC (i, j, %(latarg)s);})
> +#define BENCH_VARS %(defvar)s
>  #define FUNCNAME "%(func)s"
>  #include "bench-skeleton.c"'''
>  
> @@ -122,17 +124,22 @@ def gen_source(func, directives, all_vals):
>      # If we have a return value from the function, make sure it is
>      # assigned to prevent the compiler from optimizing out the
>      # call.
> +    getret = ''
> +    latarg = ''
> +    defvar = ''
> +
>      if directives['ret']:
>          print('static %s volatile ret;' % directives['ret'])
> -        getret = 'ret = '
> -    else:
> -        getret = ''
> +        print('static %s zero __attribute__((used)) = 0;' % directives['ret'])
> +        getret = 'ret = func_res = '
> +        latarg = 'func_res * zero +'
> +        defvar = '%s func_res = 0;' % directives['ret']
>  
>      # Test initialization.
>      if directives['init']:
>          print('#define BENCH_INIT %s' % directives['init'])
>  
> -    print(EPILOGUE % {'getret': getret, 'func': func})
> +    print(EPILOGUE % {'getret': getret, 'func': func, 'latarg': latarg, 'defvar': defvar })
>  
>  
>  def _print_arg_data(func, directives, all_vals):
>

diff mbox

Patch

diff --git a/benchtests/bench-skeleton.c b/benchtests/bench-skeleton.c
index 3c6dad705594ac0a53edcb4e09686252c13127cf..955b2e1d2125775c6ffc10a004184a9564496e46 100644
--- a/benchtests/bench-skeleton.c
+++ b/benchtests/bench-skeleton.c
@@ -71,8 +71,10 @@  main (int argc, char **argv)
       bool is_bench = strncmp (VARIANT (v), "workload-", 9) == 0;
       double d_total_i = 0;
       timing_t total = 0, max = 0, min = 0x7fffffffffffffff;
+      timing_t throughput = 0, latency = 0;
       int64_t c = 0;
       uint64_t cur;
+      BENCH_VARS;
       while (1)
 	{
 	  if (is_bench)
@@ -86,7 +88,16 @@  main (int argc, char **argv)
 		  BENCH_FUNC (v, i);
 	      TIMING_NOW (end);
 	      TIMING_DIFF (cur, start, end);
-	      TIMING_ACCUM (total, cur);
+	      TIMING_ACCUM (throughput, cur);
+
+	      TIMING_NOW (start);
+	      for (k = 0; k < iters; k++)
+		for (i = 0; i < NUM_SAMPLES (v); i++)
+		  BENCH_FUNC_LAT (v, i);
+	      TIMING_NOW (end);
+	      TIMING_DIFF (cur, start, end);
+	      TIMING_ACCUM (latency, cur);
+
 	      d_total_i += iters * NUM_SAMPLES (v);
 	    }
 	  else
@@ -131,12 +142,20 @@  main (int argc, char **argv)
       /* Begin variant.  */
       json_attr_object_begin (&json_ctx, VARIANT (v));
 
-      json_attr_double (&json_ctx, "duration", d_total_s);
-      json_attr_double (&json_ctx, "iterations", d_total_i);
       if (is_bench)
-	json_attr_double (&json_ctx, "throughput", d_total_s / d_total_i);
+	{
+	  json_attr_double (&json_ctx, "reciprocal-throughput",
+			    throughput / d_total_i);
+	  json_attr_double (&json_ctx, "latency", latency / d_total_i);
+	  json_attr_double (&json_ctx, "max-throughput",
+			    d_total_i / throughput * 1000000000.0);
+	  json_attr_double (&json_ctx, "min-throughput",
+			    d_total_i / latency * 1000000000.0);
+	}
       else
 	{
+	  json_attr_double (&json_ctx, "duration", d_total_s);
+	  json_attr_double (&json_ctx, "iterations", d_total_i);
 	  json_attr_double (&json_ctx, "max", max / d_iters);
 	  json_attr_double (&json_ctx, "min", min / d_iters);
 	  json_attr_double (&json_ctx, "mean", d_total_s / d_total_i);
diff --git a/benchtests/scripts/bench.py b/benchtests/scripts/bench.py
index 8c1c9eeb2bc67a16cb8a8e010fd2b8a2ef8ab6df..b7ccb7c8c2bf1822202a2377dfb0675516115cc5 100755
--- a/benchtests/scripts/bench.py
+++ b/benchtests/scripts/bench.py
@@ -45,7 +45,7 @@  DEFINES_TEMPLATE = '''
 # variant is represented by the _VARIANT structure.  The ARGS structure
 # represents a single set of arguments.
 STRUCT_TEMPLATE = '''
-#define CALL_BENCH_FUNC(v, i) %(func)s (%(func_args)s)
+#define CALL_BENCH_FUNC(v, i, x) %(func)s (x %(func_args)s)
 
 struct args
 {
@@ -84,7 +84,9 @@  EPILOGUE = '''
 #define RESULT(__v, __i) (variants[(__v)].in[(__i)].timing)
 #define RESULT_ACCUM(r, v, i, old, new) \\
         ((RESULT ((v), (i))) = (RESULT ((v), (i)) * (old) + (r)) / ((new) + 1))
-#define BENCH_FUNC(i, j) ({%(getret)s CALL_BENCH_FUNC (i, j);})
+#define BENCH_FUNC(i, j) ({%(getret)s CALL_BENCH_FUNC (i, j, );})
+#define BENCH_FUNC_LAT(i, j) ({%(getret)s CALL_BENCH_FUNC (i, j, %(latarg)s);})
+#define BENCH_VARS %(defvar)s
 #define FUNCNAME "%(func)s"
 #include "bench-skeleton.c"'''
 
@@ -122,17 +124,22 @@  def gen_source(func, directives, all_vals):
     # If we have a return value from the function, make sure it is
     # assigned to prevent the compiler from optimizing out the
     # call.
+    getret = ''
+    latarg = ''
+    defvar = ''
+
     if directives['ret']:
         print('static %s volatile ret;' % directives['ret'])
-        getret = 'ret = '
-    else:
-        getret = ''
+        print('static %s zero __attribute__((used)) = 0;' % directives['ret'])
+        getret = 'ret = func_res = '
+        latarg = 'func_res * zero +'
+        defvar = '%s func_res = 0;' % directives['ret']
 
     # Test initialization.
     if directives['init']:
         print('#define BENCH_INIT %s' % directives['init'])
 
-    print(EPILOGUE % {'getret': getret, 'func': func})
+    print(EPILOGUE % {'getret': getret, 'func': func, 'latarg': latarg, 'defvar': defvar })
 
 
 def _print_arg_data(func, directives, all_vals):