[4/4] Validate bench.out against a JSON schema

Message ID 20140325103017.GD1850@spoyarek.pnq.redhat.com
State Rejected
Headers

Commit Message

Siddhesh Poyarekar March 25, 2014, 10:30 a.m. UTC
  Hi,

The first patch in this series converted the benchmark output into
JSON format.  This patch adds a JSON schema that describes the format
of the benchmark output.  There is also a script that validates the
generated bench.out against the schema at the end of `make bench`.

Siddhesh

	* benchtests/scripts/benchout.schema.json: New file.
	* benchtests/scripts/validate_benchout.py: New script.
	* benchtests/Makefile (bench-func): Call it.

---
 benchtests/Makefile                     |  2 +
 benchtests/scripts/benchout.schema.json | 57 ++++++++++++++++++++++++
 benchtests/scripts/validate_benchout.py | 78 +++++++++++++++++++++++++++++++++
 3 files changed, 137 insertions(+)
 create mode 100644 benchtests/scripts/benchout.schema.json
 create mode 100755 benchtests/scripts/validate_benchout.py
  

Comments

Will Newton March 25, 2014, 2:49 p.m. UTC | #1
On 25 March 2014 10:30, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> Hi,
>
> The first patch in this series converted the benchmark output into
> JSON format.  This patch adds a JSON schema that describes the format
> of the benchmark output.  There is also a script that validates the
> generated bench.out against the schema at the end of `make bench`.
>
> Siddhesh
>
>         * benchtests/scripts/benchout.schema.json: New file.
>         * benchtests/scripts/validate_benchout.py: New script.
>         * benchtests/Makefile (bench-func): Call it.

I had a problem validating the results of modf:

  File "/usr/lib/python2.7/site-packages/jsonschema.py", line 311, in validate
    raise error
jsonschema.ValidationError: Additional properties are not allowed
(u'modf()' was unexpected)

Failed validating u'additionalProperties' in
schema[u'properties'][u'functions'][u'patternProperties'][u'^[_a-zA-Z][_a-zA-Z0-9]+$']:
    {u'additionalProperties': False,
     u'patternProperties': {u'^[_a-zA-Z0-9]*$': {u'additionalProperties': False,
                                                 u'properties':
{u'duration': {u'type': u'number'},

u'iterations': {u'type': u'number'},

u'max': {u'type': u'number'},

u'mean': {u'type': u'number'},

u'min': {u'type': u'number'},

u'timings': {u'items': {u'type': u'number'},

       u'type': u'array'}},
                                                 u'required': [u'duration',
                                                               u'iterations',
                                                               u'max',
                                                               u'min',
                                                               u'mean'],
                                                 u'title': u'Function variants',
                                                 u'type': u'object'}},
     u'title': u'Function names',
     u'type': u'object'}

On instance[u'functions'][u'modf']:
    {u'modf()': {u'duration': 22571700000.0,
                 u'iterations': 587498000.0,
                 u'max': 1061.83,
                 u'mean': 38.4201,
                 u'min': 37.375}}

Although now I have a problem building the modf test, so it may be I
was just out of date:

/home/linaro/glibc-build/benchtests/bench-modf.o: In function `main':
/home/linaro/glibc/benchtests/bench-skeleton.c:102: undefined
reference to `RESULT_ACCUM'
/home/linaro/glibc/benchtests/bench-skeleton.c:135: undefined
reference to `RESULT'
/home/linaro/glibc/benchtests/bench-skeleton.c:135: undefined
reference to `RESULT'
collect2: error: ld returned 1 exit status


> ---
>  benchtests/Makefile                     |  2 +
>  benchtests/scripts/benchout.schema.json | 57 ++++++++++++++++++++++++
>  benchtests/scripts/validate_benchout.py | 78 +++++++++++++++++++++++++++++++++
>  3 files changed, 137 insertions(+)
>  create mode 100644 benchtests/scripts/benchout.schema.json
>  create mode 100755 benchtests/scripts/validate_benchout.py
>
> diff --git a/benchtests/Makefile b/benchtests/Makefile
> index 1669b6f..4b5b568 100644
> --- a/benchtests/Makefile
> +++ b/benchtests/Makefile
> @@ -143,6 +143,8 @@ bench-func: $(binaries-bench)
>           mv -f $(objpfx)bench.out $(objpfx)bench.out.old; \
>         fi; \
>         mv -f $(objpfx)bench.out-tmp $(objpfx)bench.out
> +       $(.)scripts/validate_benchout.py $(objpfx)bench.out \
> +               $(.)scripts/benchout.schema.json
>
>  $(timing-type) $(binaries-bench) $(binaries-benchset): %: %.o \
>    $(sort $(filter $(common-objpfx)lib%,$(link-libc))) \
> diff --git a/benchtests/scripts/benchout.schema.json b/benchtests/scripts/benchout.schema.json
> new file mode 100644
> index 0000000..3f39823
> --- /dev/null
> +++ b/benchtests/scripts/benchout.schema.json
> @@ -0,0 +1,57 @@
> +{
> +  "title": "benchmark",
> +  "type": "object",
> +  "properties": {
> +    "machine": {
> +      "type": "object",
> +      "properties": {
> +        "vendor-id": {"type": "string"},
> +        "cpu-family": {"type": "number"},
> +        "model": {"type": "number"},
> +        "cpu-cores": {"type": "number"},
> +        "processors": {"type": "number"},
> +        "cache-size": {"type": "number"},
> +        "memory": {"type": "number"},
> +        "swap": {"type": "number"}
> +      },
> +      "required": ["vendor-id", "cpu-family", "model", "cpu-cores",
> +                   "processors", "cache-size", "memory", "swap"]

This isn't going to work for non-Intel architectures. On ARM I get:

{
  "memory": 2068232,
  "swap": 0,
  "processors": 2
}

I'm not sure what the best solution is, /proc/cpuinfo seems pretty
much architecture specific:

Processor    : ARMv7 Processor rev 4 (v7l)
processor    : 0
BogoMIPS    : 1694.10

processor    : 1
BogoMIPS    : 1694.10

Features    : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls
vfpv4 idiva idivt
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x0
CPU part    : 0xc0f
CPU revision    : 4

Hardware    : SAMSUNG EXYNOS5 (Flattened Device Tree)
Revision    : 0000
Serial        : 0000000000000000


> +    },
> +    "timing-type": {
> +      "type": "string"
> +    },
> +    "functions": {
> +      "title": "Associative array of functions",
> +      "type": "object",
> +      "patternProperties": {
> +        "^[_a-zA-Z][_a-zA-Z0-9]+$": {
> +          "title": "Function names",
> +          "type": "object",
> +          "patternProperties": {
> +            "^[_a-zA-Z0-9]*$": {
> +              "title": "Function variants",
> +              "type": "object",
> +              "properties": {
> +                "duration": {"type": "number"},
> +                "iterations": {"type": "number"},
> +                "max": {"type": "number"},
> +                "min": {"type": "number"},
> +                "mean": {"type": "number"},
> +                "timings": {
> +                  "type": "array",
> +                  "items": {"type": "number"}
> +                }
> +              },
> +              "required": ["duration", "iterations", "max", "min", "mean"],
> +              "additionalProperties": false
> +            }
> +          },
> +          "additionalProperties": false
> +        }
> +      },
> +      "minProperties": 1
> +    }
> +  },
> +  "required": ["machine", "timing-type", "functions"],
> +  "additionalProperties": false
> +}
> diff --git a/benchtests/scripts/validate_benchout.py b/benchtests/scripts/validate_benchout.py
> new file mode 100755
> index 0000000..b6a34e2
> --- /dev/null
> +++ b/benchtests/scripts/validate_benchout.py
> @@ -0,0 +1,78 @@
> +#!/usr/bin/python
> +# Copyright (C) 2014 Free Software Foundation, Inc.
> +# This file is part of the GNU C Library.
> +#
> +# The GNU C Library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# The GNU C Library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with the GNU C Library; if not, see
> +# <http://www.gnu.org/licenses/>.
> +"""Benchmark output validator
> +
> +Given a benchmark output file in json format and a benchmark schema file,
> +validate the output against the schema.
> +"""
> +
> +from __future__ import print_function
> +import jsonschema

This module is not part of the standard Python install so it should
probably be imported inside a try block so we can catch if it is
missing and print a message. Otherwise make bench fails with a
non-zero exit which I don't think is necessarily desirable given this
validation will generally not be required.

> +import json
> +import sys
> +import os
> +
> +
> +def validate_bench(benchfile, schemafile):
> +    """Validate benchmark file
> +
> +    Validate a benchmark output file against a JSON schema.
> +
> +    Args:
> +        benchfile: The file name of the bench.out file.
> +        schemafile: The file name of the JSON schema file to validate
> +        bench.out against.
> +
> +    Exceptions:
> +        jsonschema.ValidationError: When bench.out is not valid
> +        jsonschema.SchemaError: When the JSON schema is not valid
> +        IOError: If any of the files are not found.
> +    """
> +    with open(benchfile, 'r') as bfile:
> +        with open(schemafile, 'r') as sfile:
> +            bench = json.load(bfile)
> +            schema = json.load(sfile)
> +            jsonschema.validate(bench, schema)
> +
> +    # If we reach here, we're all good.
> +    print("Benchmark output in %s is valid." % benchfile)
> +
> +
> +def main(args):
> +    """Main entry point
> +
> +    Args:
> +        args: The command line arguments to the program
> +
> +    Returns:
> +        0 on success or a non-zero failure code
> +
> +    Exceptions:
> +        Exceptions thrown by validate_bench
> +    """
> +    if len(args) != 2:
> +        print("Usage: %s <bench.out file> <bench.out schema>" % sys.argv[0],
> +                file=sys.stderr)
> +        return os.EX_USAGE
> +
> +    validate_bench(args[0], args[1])
> +    return os.EX_OK
> +
> +
> +if __name__ == '__main__':
> +    sys.exit(main(sys.argv[1:]))
> --
> 1.8.3.1
>
  
Siddhesh Poyarekar March 25, 2014, 3:57 p.m. UTC | #2
On Tue, Mar 25, 2014 at 02:49:18PM +0000, Will Newton wrote:
> On 25 March 2014 10:30, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> > Hi,
> >
> > The first patch in this series converted the benchmark output into
> > JSON format.  This patch adds a JSON schema that describes the format
> > of the benchmark output.  There is also a script that validates the
> > generated bench.out against the schema at the end of `make bench`.
> >
> > Siddhesh
> >
> >         * benchtests/scripts/benchout.schema.json: New file.
> >         * benchtests/scripts/validate_benchout.py: New script.
> >         * benchtests/Makefile (bench-func): Call it.
> 
> I had a problem validating the results of modf:
> 
>   File "/usr/lib/python2.7/site-packages/jsonschema.py", line 311, in validate
>     raise error
> jsonschema.ValidationError: Additional properties are not allowed
> (u'modf()' was unexpected)
> 

Ahh, you need this patch:

https://patchwork.siddhesh.in/patch/206/

> > +      "required": ["vendor-id", "cpu-family", "model", "cpu-cores",
> > +                   "processors", "cache-size", "memory", "swap"]
> 
> This isn't going to work for non-Intel architectures. On ARM I get:
> 
> {
>   "memory": 2068232,
>   "swap": 0,
>   "processors": 2
> }
> 
> I'm not sure what the best solution is, /proc/cpuinfo seems pretty
> much architecture specific:
> 
> Processor    : ARMv7 Processor rev 4 (v7l)
> processor    : 0
> BogoMIPS    : 1694.10
> 
> processor    : 1
> BogoMIPS    : 1694.10
> 
> Features    : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls
> vfpv4 idiva idivt
> CPU implementer    : 0x41
> CPU architecture: 7
> CPU variant    : 0x0
> CPU part    : 0xc0f
> CPU revision    : 4
> 
> Hardware    : SAMSUNG EXYNOS5 (Flattened Device Tree)
> Revision    : 0000
> Serial        : 0000000000000000

Thanks, I didn't know this.  Would it be useful to just have a
free-form machine information to begin with, i.e. arbitrary name-value
pairs without a strict set of rules?  The main use I see for machine
information is for benchmark comparisons, i.e. where we want to know
what it is we're comparing.  The secondary goal from that would be to
ensure that benchmark comparions are done across comparable
configurations.

> > +from __future__ import print_function
> > +import jsonschema
> 
> This module is not part of the standard Python install so it should
> probably be imported inside a try block so we can catch if it is
> missing and print a message. Otherwise make bench fails with a
> non-zero exit which I don't think is necessarily desirable given this
> validation will generally not be required.

Ack, will fix.  I also should be adding a note in the README about the
dependency.

Thanks,
Siddhesh
  

Patch

diff --git a/benchtests/Makefile b/benchtests/Makefile
index 1669b6f..4b5b568 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -143,6 +143,8 @@  bench-func: $(binaries-bench)
 	  mv -f $(objpfx)bench.out $(objpfx)bench.out.old; \
 	fi; \
 	mv -f $(objpfx)bench.out-tmp $(objpfx)bench.out
+	$(.)scripts/validate_benchout.py $(objpfx)bench.out \
+		$(.)scripts/benchout.schema.json
 
 $(timing-type) $(binaries-bench) $(binaries-benchset): %: %.o \
   $(sort $(filter $(common-objpfx)lib%,$(link-libc))) \
diff --git a/benchtests/scripts/benchout.schema.json b/benchtests/scripts/benchout.schema.json
new file mode 100644
index 0000000..3f39823
--- /dev/null
+++ b/benchtests/scripts/benchout.schema.json
@@ -0,0 +1,57 @@ 
+{
+  "title": "benchmark",
+  "type": "object",
+  "properties": {
+    "machine": {
+      "type": "object",
+      "properties": {
+        "vendor-id": {"type": "string"},
+        "cpu-family": {"type": "number"},
+        "model": {"type": "number"},
+        "cpu-cores": {"type": "number"},
+        "processors": {"type": "number"},
+        "cache-size": {"type": "number"},
+        "memory": {"type": "number"},
+        "swap": {"type": "number"}
+      },
+      "required": ["vendor-id", "cpu-family", "model", "cpu-cores",
+                   "processors", "cache-size", "memory", "swap"]
+    },
+    "timing-type": {
+      "type": "string"
+    },
+    "functions": {
+      "title": "Associative array of functions",
+      "type": "object",
+      "patternProperties": {
+        "^[_a-zA-Z][_a-zA-Z0-9]+$": {
+          "title": "Function names",
+          "type": "object",
+          "patternProperties": {
+            "^[_a-zA-Z0-9]*$": {
+              "title": "Function variants",
+              "type": "object",
+              "properties": {
+                "duration": {"type": "number"},
+                "iterations": {"type": "number"},
+                "max": {"type": "number"},
+                "min": {"type": "number"},
+                "mean": {"type": "number"},
+                "timings": {
+                  "type": "array",
+                  "items": {"type": "number"}
+                }
+              },
+              "required": ["duration", "iterations", "max", "min", "mean"],
+              "additionalProperties": false
+            }
+          },
+          "additionalProperties": false
+        }
+      },
+      "minProperties": 1
+    }
+  },
+  "required": ["machine", "timing-type", "functions"],
+  "additionalProperties": false
+}
diff --git a/benchtests/scripts/validate_benchout.py b/benchtests/scripts/validate_benchout.py
new file mode 100755
index 0000000..b6a34e2
--- /dev/null
+++ b/benchtests/scripts/validate_benchout.py
@@ -0,0 +1,78 @@ 
+#!/usr/bin/python
+# Copyright (C) 2014 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+#
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+"""Benchmark output validator
+
+Given a benchmark output file in json format and a benchmark schema file,
+validate the output against the schema.
+"""
+
+from __future__ import print_function
+import jsonschema
+import json
+import sys
+import os
+
+
+def validate_bench(benchfile, schemafile):
+    """Validate benchmark file
+
+    Validate a benchmark output file against a JSON schema.
+
+    Args:
+        benchfile: The file name of the bench.out file.
+        schemafile: The file name of the JSON schema file to validate
+        bench.out against.
+
+    Exceptions:
+        jsonschema.ValidationError: When bench.out is not valid
+        jsonschema.SchemaError: When the JSON schema is not valid
+        IOError: If any of the files are not found.
+    """
+    with open(benchfile, 'r') as bfile:
+        with open(schemafile, 'r') as sfile:
+            bench = json.load(bfile)
+            schema = json.load(sfile)
+            jsonschema.validate(bench, schema)
+
+    # If we reach here, we're all good.
+    print("Benchmark output in %s is valid." % benchfile)
+
+
+def main(args):
+    """Main entry point
+
+    Args:
+        args: The command line arguments to the program
+
+    Returns:
+        0 on success or a non-zero failure code
+
+    Exceptions:
+        Exceptions thrown by validate_bench
+    """
+    if len(args) != 2:
+        print("Usage: %s <bench.out file> <bench.out schema>" % sys.argv[0],
+                file=sys.stderr)
+        return os.EX_USAGE
+
+    validate_bench(args[0], args[1])
+    return os.EX_OK
+
+
+if __name__ == '__main__':
+    sys.exit(main(sys.argv[1:]))