[v2,3/5] benchtests: Add a script to convert benchout string JSON to CSV

Message ID 20210720063540.362366-1-naohirot@fujitsu.com
State Deferred
Headers
Series benchtests: Add memset zero fill benchmark tests |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Naohiro Tamura July 20, 2021, 6:35 a.m. UTC
  This patch adds "benchout_string2csv.sh" script to convert benchout
string JSON to CSV so that we can visualize performance data by any
spreadsheet such as MS Excel and Google Sheet.

Usage: benchout_string2csv.sh
  read benchout string JSON from standard input
  write CSV to standard output
ex:
  $ cat bench-memset.out | benchout_string2csv.sh > bench-memset.csv
---
 benchtests/scripts/benchout_string2csv.sh | 44 +++++++++++++++++++++++
 1 file changed, 44 insertions(+)
 create mode 100755 benchtests/scripts/benchout_string2csv.sh
  

Comments

develop--- via Libc-alpha July 21, 2021, 2:41 a.m. UTC | #1
This is self-review.

> From: Naohiro Tamura <naohirot@fujitsu.com>
> Sent: Tuesday, July 20, 2021 3:36 PM
 
> This patch adds "benchout_string2csv.sh" script to convert benchout
> string JSON to CSV so that we can visualize performance data by any
> spreadsheet such as MS Excel and Google Sheet.
> 
> Usage: benchout_string2csv.sh
>   read benchout string JSON from standard input
>   write CSV to standard output
> ex:
>   $ cat bench-memset.out | benchout_string2csv.sh > bench-memset.csv
> ---
>  benchtests/scripts/benchout_string2csv.sh | 44 +++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
>  create mode 100755 benchtests/scripts/benchout_string2csv.sh
> 
> diff --git a/benchtests/scripts/benchout_string2csv.sh b/benchtests/scripts/benchout_string2csv.sh
> new file mode 100755
> index 000000000000..045870fed162
> --- /dev/null
> +++ b/benchtests/scripts/benchout_string2csv.sh
> @@ -0,0 +1,44 @@
> +#!/bin/bash
> +# Copyright (C) 2021 Free Software Foundation, Inc.
> +# This file is part of the GNU C Library.
> +# Contributed by Ulrich Drepper <drepper@cygnus.com>, 1998.
oops! I'll remove the above line.

> +
> +# The GNU C Library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +
> +# The GNU C Library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with the GNU C Library; if not, see
> +# <https://www.gnu.org/licenses/>.
> +
> +#
> +# Convert benchout string JSON to CSV
> +#
> +if [[ $1 == "-h" ]] || [[ $# != 0 ]]; then
> +  echo "Usage: ${0##*/}"
> +  echo "  read benchout string JSON from standard input"
> +  echo "  write CSV to standard output"
> +  echo "ex:"
> +  echo "  $ cat bench-memset.out | ${0##*/} > bench-memset.csv"
> +exit 1
> +fi
> +
> +jq -r '
> +  . as $root |
> +  . as {$functions} |
> +  $functions | to_entries | .[0].value as $func_value |
> +  $func_value as {$_, $ifuncs, $results} |
> +  (["timing_type", $root.timing_type] | @csv),
> +  (["functions", ($functions | keys | .[0]),
> +    "bench-variant", $func_value."bench-variant"] | @csv),
> +  ($results[0] | to_entries | map([.key]) | flatten | @csv),
> +  ($results[0] | reduce range(1; . | length) as $_ ([]; . + [""])
> +    + $ifuncs | @csv),
> +  ($results[] | to_entries | map([.value]) | flatten | @csv)
> +'
> --
> 2.17.1
  
Joseph Myers July 27, 2021, 8:17 p.m. UTC | #2
On Tue, 20 Jul 2021, Naohiro Tamura via Libc-alpha wrote:

> +jq -r '

I don't think introducing a use of a new tool like that (not mentioned in 
install.texi) is a particularly good idea.  I'd suggest implementing this 
conversion in Python, given that the Python standard library supports both 
JSON and CSV and is already used for various purposes in glibc scripts.
  
develop--- via Libc-alpha July 29, 2021, 1:56 a.m. UTC | #3
Hi Joseph,

Thanks for the comment.

> > +jq -r '
> 
> I don't think introducing a use of a new tool like that (not mentioned in
> install.texi) is a particularly good idea.  I'd suggest implementing this
> conversion in Python, given that the Python standard library supports both
> JSON and CSV and is already used for various purposes in glibc scripts.

I'm having a hard time to analyze string benchmark results.
'jq' was chosen to just get my job done quickly, because it's natural
for me to process JSON than python.

I believe that most of people who tried to improve string ifunc may
have developed similar tools in their local, and not shared.
And those people must be in the same situation which doesn't allow to
spend time to port it to another language or sort out to be useful for
other people because it's not their primary job, but side way job.

It would be nice if we can stop that each developer develops similar
tools again and again.

So is there any possibility to be accepted to be able to share these
trivial tools if install.texi is updated?

Thanks
Naohiro
  
Siddhesh Poyarekar July 29, 2021, 4:42 a.m. UTC | #4
On 7/29/21 7:26 AM, naohirot--- via Libc-alpha wrote:
> I'm having a hard time to analyze string benchmark results.
> 'jq' was chosen to just get my job done quickly, because it's natural
> for me to process JSON than python.
> 
> I believe that most of people who tried to improve string ifunc may
> have developed similar tools in their local, and not shared.
> And those people must be in the same situation which doesn't allow to
> spend time to port it to another language or sort out to be useful for
> other people because it's not their primary job, but side way job.
> 
> It would be nice if we can stop that each developer develops similar
> tools again and again.

Most people in the community who work in string function improvements  
tend to use (and improve wherever it is lacking)  
benchtests/scripts/compare_strings.py for their result analysis.  Adding  
a flag to dump csv to that script ought to be trivial if that's what you  
need.

The script is under-documented though, so perhaps a wiki page describing  
what the script does and various example uses would go a very long way.

> So is there any possibility to be accepted to be able to share these
> trivial tools if install.texi is updated?

The reason for emitting json is precisely to allow developers to  
implement their own analysis tools around them when their use cases are  
niche.  Your specific use case is not niche and could be added as a flag  
to compare_strings.py if needed.  You only need a new flag --csv (or -o  
csv, tab, etc.) to print in csv instead of the current output, which is  
meant for reading on the terminal.

Siddhesh
  
develop--- via Libc-alpha July 30, 2021, 7:05 a.m. UTC | #5
Hi Siddhesh,

Thank you for the advice!

> Most people in the community who work in string function improvements
> tend to use (and improve wherever it is lacking)
> benchtests/scripts/compare_strings.py for their result analysis.  Adding
> a flag to dump csv to that script ought to be trivial if that's what you
> need.

I see. I didn't use compare_strings.py daily, but plot_strings.py.

> The script is under-documented though, so perhaps a wiki page describing
> what the script does and various example uses would go a very long way.

I found the wiki page.
https://sourceware.org/glibc/wiki/benchmarking/benchmarks

> The reason for emitting json is precisely to allow developers to
> implement their own analysis tools around them when their use cases are
> niche.  Your specific use case is not niche and could be added as a flag
> to compare_strings.py if needed.  You only need a new flag --csv (or -o
> csv, tab, etc.) to print in csv instead of the current output, which is
> meant for reading on the terminal.

Yes, converting to CSV in Python will be easy.
But comparing two string benchout results directly between "before" and "after"
was not so easy AFAIK. 
And creating graphs in spreadsheet manually is tolerable in a few times, but not
in frequent times.
That's the reason I created another 'jq' script, merge_strings4graph.sh".
Dose most people compare the two results indirectly through a common base ifunc
using "--base" option of compare_strings.py or "--baseline" option of plot_strings.py?

Thanks.
Naohiro
  
Siddhesh Poyarekar July 31, 2021, 10:47 a.m. UTC | #6
On 7/30/21 12:35 PM, naohirot@fujitsu.com wrote:
> Hi Siddhesh,
> 
> Thank you for the advice!
> 
>> Most people in the community who work in string function improvements
>> tend to use (and improve wherever it is lacking)
>> benchtests/scripts/compare_strings.py for their result analysis.  Adding
>> a flag to dump csv to that script ought to be trivial if that's what you
>> need.
> 
> I see. I didn't use compare_strings.py daily, but plot_strings.py.
> 
>> The script is under-documented though, so perhaps a wiki page describing
>> what the script does and various example uses would go a very long way.
> 
> I found the wiki page.
> https://sourceware.org/glibc/wiki/benchmarking/benchmarks

Yeah that needs to improve :/

>> The reason for emitting json is precisely to allow developers to
>> implement their own analysis tools around them when their use cases are
>> niche.  Your specific use case is not niche and could be added as a flag
>> to compare_strings.py if needed.  You only need a new flag --csv (or -o
>> csv, tab, etc.) to print in csv instead of the current output, which is
>> meant for reading on the terminal.
> 
> Yes, converting to CSV in Python will be easy.
> But comparing two string benchout results directly between "before" and "after"
> was not so easy AFAIK.
> And creating graphs in spreadsheet manually is tolerable in a few times, but not
> in frequent times.
> That's the reason I created another 'jq' script, merge_strings4graph.sh".
> Dose most people compare the two results indirectly through a common base ifunc
> using "--base" option of compare_strings.py or "--baseline" option of plot_strings.py?

If you need to compare two results then just set one as the base using  
the --base/--baseline and see how the other compares.  That's usually  
sufficient to justify addition of new variants to glibc.

Siddhesh
  

Patch

diff --git a/benchtests/scripts/benchout_string2csv.sh b/benchtests/scripts/benchout_string2csv.sh
new file mode 100755
index 000000000000..045870fed162
--- /dev/null
+++ b/benchtests/scripts/benchout_string2csv.sh
@@ -0,0 +1,44 @@ 
+#!/bin/bash
+# Copyright (C) 2021 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+# Contributed by Ulrich Drepper <drepper@cygnus.com>, 1998.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <https://www.gnu.org/licenses/>.
+
+#
+# Convert benchout string JSON to CSV
+#
+if [[ $1 == "-h" ]] || [[ $# != 0 ]]; then
+  echo "Usage: ${0##*/}"
+  echo "  read benchout string JSON from standard input"
+  echo "  write CSV to standard output"
+  echo "ex:"
+  echo "  $ cat bench-memset.out | ${0##*/} > bench-memset.csv"
+exit 1
+fi
+
+jq -r '
+  . as $root |
+  . as {$functions} |
+  $functions | to_entries | .[0].value as $func_value |
+  $func_value as {$_, $ifuncs, $results} |
+  (["timing_type", $root.timing_type] | @csv),
+  (["functions", ($functions | keys | .[0]),
+    "bench-variant", $func_value."bench-variant"] | @csv),
+  ($results[0] | to_entries | map([.key]) | flatten | @csv),
+  ($results[0] | reduce range(1; . | length) as $_ ([]; . + [""])
+    + $ifuncs | @csv),
+  ($results[] | to_entries | map([.value]) | flatten | @csv)
+'