[PING] Wordexp benchtest: good, bad and unreliable.

  On Tue, Jun 16, 2015 at 03:24:03PM -0400, Carlos O'Donell wrote:
> On 06/16/2015 01:32 PM, Ondřej Bílka wrote:
> >> As I mentioned before for wordexp performance depends on too many
> >> parameters. So here is microbenchmark that shows some improvements and
> >> some regressions. There is lot of noise. I ran these three times and on
> >> one * and ffffffffoo pattern become twice slower for no reason.
> 
> There is a reason, but we just don't know it yet.
>
My guess that difference was caused by stat variance but I couldn't
verify that. 

> >> So how do you decide with conflicted data like this?
> 
> You can't make an "automatic" or "automated" decision, but as a baseline
> the statistics do help. With noisy benchmarks you can still look at
> potential trends in the noise. We don't want the bad patterns to get 5x
> slower.
> 

Thats for most of functions simply unavoidable, what I submitted is classical example that you cannot avoid some paterns being 5 times slower as you are shifting worst case.

Here with my wordexp change patterns that didn't used strchr(ifs,x)
would get five times slower when user uses big IFS.

On other hand patterns that checked strchr(ifs,x) will become five times
faster with this change.

These tradeoffs are actually quite routine, for example most of string
routines tend to be slow when inputs are always at page boundary. You
cannot avoid that as alternatives move worst case to much more probable
scenarios like empirical probability ~1/8 that 16 bytes cross 64 byte boundary.

I could improve performance by optimizing that case aggresively. However
it would likely be net loss as cross-page is most of time cold path and
I try to optimize it to size.

> Similarly we will have conflicting workloads modeled as benchmarks, and
> the hard part is that as an expert we have to make a difficult choice.
> We have to understand the workload and decide "Yes, we ignore this performance
> decrease because we expect fewer people will suffer." However, in order
> to do that we need comments about exactly what workload we're trying to model.
>
Thats problem that you need to collect lot of domain knowledge to be
usable. Problem is that I didn't find yet application that uses it.

> For most of our microbenchmarks the workloads are "raw performance of function
> call" and that's very simplistic, it's also very generic.
> 
Thats quite problem when there are many variants that need to be tried
what should you priorize, as this migth be improved or not depending on
average ifs size and average size of pattern.

> This group of tests should have a comment about what's it's testing.
> 
> What workload are we modeling?
>
None in particular, just trying to measure performance change.

> >> +
> >> +
> >> +  do_test ("*", 0);
> >> +  setenv("IFS","                                                  \
> >> +                                               ", 1);
> >> +
> >> +  do_test ("foo", 0);
> >> +  do_test ("ffoo", 0);
> >> +  do_test ("ffffoo", 0);
> >> +  do_test ("ffffffffoo", 0);
> >> +  do_test ("ffffffffffffffffoo", 0);
> >> +  do_test ("ffffffffffffffffffffffffffffffffoo", 0);
> >> +do_test ("ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo", 0);
> >> +  do_test ("*", 0);
> >> +
> 
> Similarly with the IFS change. This is not a normal IFS change for an average workload.
> 
> Usually one changes IFS to be a series of characters, or just tab, or just newline etc.
> 
These are worst case for my change. There is constant overhead per call
to parse IFS, then my patch saves some cycles per character depending on
which path we take.

On paths of f...ffoo form you save cost of conversion
 strchr ("\n|&;<>(){}", ch)
per character.

I read code more and I could also measure saving of
strchr(ifs, ch) 
per character for patterns ?fff...fffo.

However measuring exact performance is bit difficult again due to noise
as wordexp calls glob which takes ~70000 cycles. Thats lot compared with
savings in wordexp itself, difference between longest pattern was 4500
which this change reduced to 500 cycles. I am not completely sure about
that numbers as I got that estimate by subtracting performance of glob
without ifs.

old one:

                       	wordexp
Pattern foo flags 0:	1131.86
Pattern ffoo flags 0:	629.5
Pattern ffffoo flags 0:	709.562
Pattern ffffffffoo flags 0:	866.359
Pattern ffffffffffffffffoo flags 0:	1115.53
Pattern ffffffffffffffffffffffffffffffffoo flags 0:	1591.81
Pattern ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	2608.45
Pattern ?foo flags 0:	74733.9
Pattern ?ffoo flags 0:	71044.9
Pattern ?ffffoo flags 0:	71165.1
Pattern ?ffffffffoo flags 0:	71009
Pattern ?ffffffffffffffffoo flags 0:	71283.4
Pattern ?ffffffffffffffffffffffffffffffffoo flags 0:	71748.2
Pattern ?ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	73446.5
Pattern * flags 0:	135284
Hundred byte IFS
Pattern foo flags 0:	1067.41
Pattern ffoo flags 0:	1103.17
Pattern ffffoo flags 0:	1164.02
Pattern ffffffffoo flags 0:	1449.36
Pattern ffffffffffffffffoo flags 0:	2375.23
Pattern ffffffffffffffffffffffffffffffffoo flags 0:	2170.95
Pattern ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	3120.31
Pattern ?foo flags 0:	72397
Pattern ?ffoo flags 0:	71573.4
Pattern ?ffffoo flags 0:	71768.1
Pattern ?ffffffffoo flags 0:	71925.9
Pattern ?ffffffffffffffffoo flags 0:	72591.7
Pattern ?ffffffffffffffffffffffffffffffffoo flags 0:	73172.4
Pattern ?ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	78188.5
Pattern * flags 0:	134651

new one:

                       	wordexp
Pattern foo flags 0:	1056.98
Pattern ffoo flags 0:	680.562
Pattern ffffoo flags 0:	690.219
Pattern ffffffffoo flags 0:	842.516
Pattern ffffffffffffffffoo flags 0:	1016.67
Pattern ffffffffffffffffffffffffffffffffoo flags 0:	1411.91
Pattern ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	2174.77
Pattern ?foo flags 0:	81948.1
Pattern ?ffoo flags 0:	71048.2
Pattern ?ffffoo flags 0:	71345
Pattern ?ffffffffoo flags 0:	71098.2
Pattern ?ffffffffffffffffoo flags 0:	71314.9
Pattern ?ffffffffffffffffffffffffffffffffoo flags 0:	71669
Pattern ?ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	72204.1
Pattern * flags 0:	137048
Hundred byte IFS
Pattern foo flags 0:	1568.56
Pattern ffoo flags 0:	1581.34
Pattern ffffoo flags 0:	1627.89
Pattern ffffffffoo flags 0:	1924.27
Pattern ffffffffffffffffoo flags 0:	2109.28
Pattern ffffffffffffffffffffffffffffffffoo flags 0:	2456.67
Pattern ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	3204.42
Pattern ?foo flags 0:	74253.8
Pattern ?ffoo flags 0:	72561.5
Pattern ?ffffoo flags 0:	72630.3
Pattern ?ffffffffoo flags 0:	72794.3
Pattern ?ffffffffffffffffoo flags 0:	72951
Pattern ?ffffffffffffffffffffffffffffffffoo flags 0:	73194.2
Pattern ?ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffoo flags 0:	74068.2
Pattern * flags 0:	137750

[PING] Wordexp benchtest: good, bad and unreliable.

Commit Message

Comments

Patch