[1/4] Make bench.out in json format

Message ID 20140325102859.GA1850@spoyarek.pnq.redhat.com
State Committed
Headers

Commit Message

Siddhesh Poyarekar March 25, 2014, 10:28 a.m. UTC
  Hi,

This patch changes the output format of the main benchmark output file
(bench.out) to an extensible format.  I chose JSON over XML because in
addition to being extensible, it is also not too verbose.
Additionally it has good support in python.

The significant change I have made in terms of functionality is to put
timing information as an attribute in JSON instead of a string and to
do that, there is a separate program that prints out a JSON snippet
mentioning the type of timing (hp_timing or clock_gettime).  The mean
timing has now changed from iterations per unit to actual timing per
iteration.

Siddhesh

	* benchtests/Makefile (timing-type): New binary.
	(bench-clean): Also remove bench-timing-type.
	(bench): New target for timing-type.
	(bench-func): Print output in JSON format.
	* benchtests/bench-skeleton.c (main): Print output in JSON
	format.
	* benchtests/bench-timing-type.c: New file.
	* benchtests/bench-timing.h (TIMING_TYPE): New macro.
	(TIMING_PRINT_STATS): Remove.
	* benchtests/scripts/bench.py (_print_arg_data): Store variant
	name separately.

---
 benchtests/Makefile            | 23 +++++++++++++++++++----
 benchtests/bench-skeleton.c    | 18 +++++++++++++++---
 benchtests/bench-timing-type.c | 27 +++++++++++++++++++++++++++
 benchtests/bench-timing.h      | 14 ++++----------
 benchtests/scripts/bench.py    |  2 +-
 5 files changed, 66 insertions(+), 18 deletions(-)
 create mode 100644 benchtests/bench-timing-type.c
  

Comments

Will Newton March 25, 2014, 11:48 a.m. UTC | #1
On 25 March 2014 10:28, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> Hi,
>
> This patch changes the output format of the main benchmark output file
> (bench.out) to an extensible format.  I chose JSON over XML because in
> addition to being extensible, it is also not too verbose.
> Additionally it has good support in python.
>
> The significant change I have made in terms of functionality is to put
> timing information as an attribute in JSON instead of a string and to
> do that, there is a separate program that prints out a JSON snippet
> mentioning the type of timing (hp_timing or clock_gettime).  The mean
> timing has now changed from iterations per unit to actual timing per
> iteration.
>
> Siddhesh
>
>         * benchtests/Makefile (timing-type): New binary.
>         (bench-clean): Also remove bench-timing-type.
>         (bench): New target for timing-type.
>         (bench-func): Print output in JSON format.
>         * benchtests/bench-skeleton.c (main): Print output in JSON
>         format.
>         * benchtests/bench-timing-type.c: New file.
>         * benchtests/bench-timing.h (TIMING_TYPE): New macro.
>         (TIMING_PRINT_STATS): Remove.
>         * benchtests/scripts/bench.py (_print_arg_data): Store variant
>         name separately.
>
> ---
>  benchtests/Makefile            | 23 +++++++++++++++++++----
>  benchtests/bench-skeleton.c    | 18 +++++++++++++++---
>  benchtests/bench-timing-type.c | 27 +++++++++++++++++++++++++++
>  benchtests/bench-timing.h      | 14 ++++----------
>  benchtests/scripts/bench.py    |  2 +-
>  5 files changed, 66 insertions(+), 18 deletions(-)
>  create mode 100644 benchtests/bench-timing-type.c

In principle this seems ok to me, although it does seem to make the
output quite ugly for the human reader. Would it be possible to get
the layout a bit tidier? i.e. indentation and newlines.
  
Siddhesh Poyarekar March 25, 2014, 11:58 a.m. UTC | #2
On Tue, Mar 25, 2014 at 11:48:24AM +0000, Will Newton wrote:
> In principle this seems ok to me, although it does seem to make the
> output quite ugly for the human reader. Would it be possible to get
> the layout a bit tidier? i.e. indentation and newlines.

I half-heartedly tried to do it initially and gave up midway.  I could
do a more deliberate pass on top of these patches to get the
formatting cleaner to read than it currently is.

Of course, the more important question here is whether there's anyone
who hates the idea of using JSON as the preferred format.  Going
forward I think we ought to use JSON formats to describe the string
test outputs as well.

Siddhesh
  
Will Newton March 25, 2014, 12:01 p.m. UTC | #3
On 25 March 2014 11:58, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> On Tue, Mar 25, 2014 at 11:48:24AM +0000, Will Newton wrote:
>> In principle this seems ok to me, although it does seem to make the
>> output quite ugly for the human reader. Would it be possible to get
>> the layout a bit tidier? i.e. indentation and newlines.
>
> I half-heartedly tried to do it initially and gave up midway.  I could
> do a more deliberate pass on top of these patches to get the
> formatting cleaner to read than it currently is.
>
> Of course, the more important question here is whether there's anyone
> who hates the idea of using JSON as the preferred format.  Going
> forward I think we ought to use JSON formats to describe the string
> test outputs as well.

I currently have scripts that parse the output of the string tests and
it's not too bad, but a properly defined format would probably make it
a bit easier. However I think it is useful to be able to eyeball/diff
test results to be able to sanity check them as well as parse them
programatically.

I agree with your choice not to use XML. ;-)
  
Rich Felker March 25, 2014, 3:41 p.m. UTC | #4
On Tue, Mar 25, 2014 at 05:28:13PM +0530, Siddhesh Poyarekar wrote:
> On Tue, Mar 25, 2014 at 11:48:24AM +0000, Will Newton wrote:
> > In principle this seems ok to me, although it does seem to make the
> > output quite ugly for the human reader. Would it be possible to get
> > the layout a bit tidier? i.e. indentation and newlines.
> 
> I half-heartedly tried to do it initially and gave up midway.  I could
> do a more deliberate pass on top of these patches to get the
> formatting cleaner to read than it currently is.

It should be possible to 'pretty-print' json with a few lines of C (or
if you prefer, awk or perl or something) that just tracks the nesting
depth and adds the appropriate indention.

> Of course, the more important question here is whether there's anyone
> who hates the idea of using JSON as the preferred format.  Going
> forward I think we ought to use JSON formats to describe the string
> test outputs as well.

While you're at it, can you change that hideous malloc_info interface
that outputs xml to use json? ;-)

Rich
  

Patch

diff --git a/benchtests/Makefile b/benchtests/Makefile
index b331d1a..be11708 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -98,11 +98,14 @@  run-bench = $(test-wrapper-env) \
 	    GCONV_PATH=$(common-objpfx)iconvdata LC_ALL=C \
 	    $($*-ENV) $(rtld-prefix) $${run}
 
+timing-type := $(objpfx)bench-timing-type
+
 bench-clean:
 	rm -f $(binaries-bench) $(addsuffix .o,$(binaries-bench))
 	rm -f $(binaries-benchset) $(addsuffix .o,$(binaries-benchset))
+	rm -f $(timing-type) $(addsuffix .o,$(timing-type))
 
-bench: bench-set bench-func
+bench: $(timing-type) bench-set bench-func
 
 bench-set: $(binaries-benchset)
 	for run in $^; do \
@@ -110,17 +113,29 @@  bench-set: $(binaries-benchset)
 	  $(run-bench) > $${run}.out; \
 	done
 
+# Build and execute the benchmark functions.  This target generates JSON
+# formatted bench.out.  Each of the programs produce independent JSON output,
+# so one could even execute them individually and process it using any JSON
+# capable language or tool.
 bench-func: $(binaries-bench)
-	{ for run in $^; do \
+	{ echo "{"; \
+	$(timing-type); \
+	echo "  ,\"functions\": {"; \
+	for run in $^; do \
+	  if ! [ "x$${run}" = "x$<" ]; then \
+	    echo ","; \
+	  fi; \
 	  echo "Running $${run}" >&2; \
 	  $(run-bench); \
-	done; } > $(objpfx)bench.out-tmp; \
+	done; \
+	echo "  }"; \
+	echo "}"; } > $(objpfx)bench.out-tmp; \
 	if [ -f $(objpfx)bench.out ]; then \
 	  mv -f $(objpfx)bench.out $(objpfx)bench.out.old; \
 	fi; \
 	mv -f $(objpfx)bench.out-tmp $(objpfx)bench.out
 
-$(binaries-bench) $(binaries-benchset): %: %.o \
+$(timing-type) $(binaries-bench) $(binaries-benchset): %: %.o \
   $(sort $(filter $(common-objpfx)lib%,$(link-libc))) \
   $(addprefix $(csu-objpfx),start.o) $(+preinit) $(+postinit)
 	$(+link)
diff --git a/benchtests/bench-skeleton.c b/benchtests/bench-skeleton.c
index 4290e76..faef7eb 100644
--- a/benchtests/bench-skeleton.c
+++ b/benchtests/bench-skeleton.c
@@ -59,8 +59,13 @@  main (int argc, char **argv)
 
   iters = 1000 * res;
 
+  /* Begin function.  */
+  printf ("\"%s\": {\n", FUNCNAME);
+
   for (int v = 0; v < NUM_VARIANTS; v++)
     {
+      if (v)
+	putc (',', stdout);
       /* Run for approximately DURATION seconds.  */
       clock_gettime (CLOCK_MONOTONIC_RAW, &runtime);
       runtime.tv_sec += DURATION;
@@ -86,7 +91,6 @@  main (int argc, char **argv)
 		min = cur;
 
 	      TIMING_ACCUM (total, cur);
-
 	      d_total_i += iters;
 	    }
 	  struct timespec curtime;
@@ -104,9 +108,17 @@  main (int argc, char **argv)
       d_total_s = total;
       d_iters = iters;
 
-      TIMING_PRINT_STATS (VARIANT (v), d_total_s, d_iters, d_total_i, max,
-			  min);
+      printf ("\"%s\": {\n", VARIANT (v));
+      printf ("\"duration\": %g, \"iterations\": %g, "
+	      "\"max\": %g, \"min\": %g, \"mean\": %g\n",
+	      d_total_s, d_total_i, max / d_iters, min / d_iters,
+	      d_total_s / d_total_i);
+
+      puts ("}");
     }
 
+  /* End function.  */
+  puts ("}");
+
   return 0;
 }
diff --git a/benchtests/bench-timing-type.c b/benchtests/bench-timing-type.c
new file mode 100644
index 0000000..903a61f
--- /dev/null
+++ b/benchtests/bench-timing-type.c
@@ -0,0 +1,27 @@ 
+/* Print out the timing type used by the benchmark run.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "bench-timing.h"
+#include <stdio.h>
+
+int
+main (int argc, char **argv)
+{
+  printf ("\"timing-type\": \"%s\"\n", TIMING_TYPE);
+  return 0;
+}
diff --git a/benchtests/bench-timing.h b/benchtests/bench-timing.h
index 13fc946..ccde601 100644
--- a/benchtests/bench-timing.h
+++ b/benchtests/bench-timing.h
@@ -25,6 +25,8 @@ 
 hp_timing_t _dl_hp_timing_overhead;
 typedef hp_timing_t timing_t;
 
+# define TIMING_TYPE "hp_timing"
+
 # define TIMING_INIT(res) \
 ({									      \
   HP_TIMING_DIFF_INIT();						      \
@@ -35,16 +37,13 @@  typedef hp_timing_t timing_t;
 # define TIMING_DIFF(diff, start, end) HP_TIMING_DIFF ((diff), (start), (end))
 # define TIMING_ACCUM(sum, diff) HP_TIMING_ACCUM_NT ((sum), (diff))
 
-# define TIMING_PRINT_STATS(func, d_total_s, d_iters, d_total_i, max, min) \
-  printf ("%s: ITERS:%g: TOTAL:%gMcy, MAX:%gcy, MIN:%gcy, %g calls/Mcy\n",    \
-	  (func), (d_total_i), (d_total_s) * 1e-6, (max) / (d_iters),	      \
-	  (min) / (d_iters), 1e6 * (d_total_i) / (d_total_s));
-
 #else
 
 #include <time.h>
 typedef uint64_t timing_t;
 
+# define TIMING_TYPE "clock_gettime"
+
 /* Measure the resolution of the clock so we can scale the number of
    benchmark iterations by this value.  */
 # define TIMING_INIT(res) \
@@ -64,11 +63,6 @@  typedef uint64_t timing_t;
 # define TIMING_DIFF(diff, start, end) (diff) = (end) - (start)
 # define TIMING_ACCUM(sum, diff) (sum) += (diff)
 
-# define TIMING_PRINT_STATS(func, d_total_s, d_iters, d_total_i, max, min) \
-  printf ("%s: ITERS:%g: TOTAL:%gs, MAX:%gns, MIN:%gns, %g iter/s\n",	      \
-	  (func), (d_total_i), (d_total_s) * 1e-9, (max) / (d_iters),		      \
-	  (min) / (d_iters), 1e9 * (d_total_i) / (d_total_s))
-
 #endif
 
 #define TIMING_PRINT_MEAN(d_total_s, d_iters) \
diff --git a/benchtests/scripts/bench.py b/benchtests/scripts/bench.py
index e500a33..90317b5 100755
--- a/benchtests/scripts/bench.py
+++ b/benchtests/scripts/bench.py
@@ -172,7 +172,7 @@  def _print_arg_data(func, directives, all_vals):
 
         # Members for the variants structure list that we will
         # print later.
-        variants.append('  {"%s(%s)", %d, in%d},' % (func, k, len(vals), i))
+        variants.append('  {"%s", %d, in%d},' % (k, len(vals), i))
         print(ARGS_TEMPLATE % {'argnum': i, 'num_args': len(vals),
                                'args': '\n'.join(out)})