Message ID | 20150512210946.GB24057@domone |
---|---|
State | Not applicable |
Headers |
Received: (qmail 40452 invoked by alias); 12 May 2015 21:09:56 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 40442 invoked by uid 89); 12 May 2015 21:09:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, SPF_NEUTRAL autolearn=no version=3.3.2 X-HELO: popelka.ms.mff.cuni.cz Date: Tue, 12 May 2015 23:09:46 +0200 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= <neleai@seznam.cz> To: libc-alpha@sourceware.org Subject: strstr benchtest. Message-ID: <20150512210946.GB24057@domone> References: <20150512205311.GA24057@domone> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20150512205311.GA24057@domone> User-Agent: Mutt/1.5.20 (2009-06-14) |
Commit Message
Ondrej Bilka
May 12, 2015, 9:09 p.m. UTC
On Tue, May 12, 2015 at 10:53:11PM +0200, Ondřej Bílka wrote: > Hi, > As I mentioned strstr benchmarking I used following. Current benchmark is as useless as rest of string benchmarks. To compile you need to delete benchtest/bench-strstr.o as this is tested by header inclusion and benchtest Makefile doesn't detect that strstr.c changed. Main problem of current benchtest was very biased input. I replaced that with randomly generated ones that are closer to average case. A problem with periodic input with period 16 is that it would skew branch prediction a lot. It also creates periodic pattern for branches and they would be predicted perfectly while in reality performance would be magnitude worse. Second problem is that it was a worst-case scenario for this heuristic instead average as first three characters form searched triplet instead randomly distributed when characters are selected independently. I couldn't get definite answer. As I said before this aproach is good when string has high entropy and problematic on lower-entropy ones. Selection which is better depends on entropy of strings used by given application. It isn't always better but benefits outweigth the cost.
Comments
On 05/12/2015 11:09 PM, Ondřej Bílka wrote: > Main problem of current benchtest was very biased input. I replaced that > with randomly generated ones that are closer to average case. But strstr is not run on random strings. Is there some easy way to generated traces of actual strstr arguments from a running system, perhaps using Systemtap?
On Thu, May 14, 2015 at 11:33:58AM +0200, Florian Weimer wrote: > On 05/12/2015 11:09 PM, Ondřej Bílka wrote: > > Main problem of current benchtest was very biased input. I replaced that > > with randomly generated ones that are closer to average case. > > But strstr is not run on random strings. > > Is there some easy way to generated traces of actual strstr arguments > from a running system, perhaps using Systemtap? > Use my profiler here. http://kam.mff.cuni.cz/~ondra/benchmark_string/strstr_profile250813.tar.bz2 But it won't help you much with this case. Haystacks are really short, 61% of calls end with match in first 16 bytes. See: http://kam.mff.cuni.cz/~ondra/benchmark_string/profile/results/result.html strstr function or http://kam.mff.cuni.cz/~ondra/benchmark_string/i7_ivy_bridge/strstr_profile/results_gcc/result.html While its extremely valuable information and you should focus to optimize header it would probably report little change as both implementations just do initial strchr. Longer strings where you could see improvement don't exist in practice. So you are left with generating some data.
On Thu, May 14, 2015 at 11:33:58AM +0200, Florian Weimer wrote: > On 05/12/2015 11:09 PM, Ondřej Bílka wrote: > > Main problem of current benchtest was very biased input. I replaced that > > with randomly generated ones that are closer to average case. > > But strstr is not run on random strings. > > Is there some easy way to generated traces of actual strstr arguments > from a running system, perhaps using Systemtap? > To be able to rerun these I wrote a dryrun, I send a previous version of it some time ago. As systemtap problem is that it offers only problems with zero benefit as you need just write strings to disk which is easier just in pure c without unnecessary wrapping of that, I put current version here: http://kam.mff.cuni.cz/~ondra/dryrun.tar.bz2 You should record data with LD_PRELOAD=path/record_strstr.so bash These could be replayed with LD_PRELOADed suitable alternative implementation with LD_PRELOAD=new_strstr.so ./bench_strstr Then you can inspect data with ./show_strstr and finally summaries from these are created by ./summary #for brief one or ./summary_strstr -u # to show only unique binaries or ./summary_strstr -a # to also show repeated recordings of same binary For me ./summary_strstr -u shows following and I wont submit raw data as it contains at least list of my mail headers. replaying mc average size 4.62 needle size 1.01 comparisons 6.62 digraphs 0.02 trigraphs 0.00 calls 2440 succeed 99.7% latencies -18.7 -18.6 s1 aligned to 4 bytes 99.9% aligned to 8 bytes 94.5% aligned to 16 bytes 94.5% s2 aligned to 4 bytes 99.8% aligned to 8 bytes 99.8% aligned to 16 bytes 99.7% needle found in n bytes: n <= 0: 0.2% n <= 1: 17.0% n <= 2: 66.4% n <= 3: 69.1% n <= 4: 71.0% n <= 8: 81.4% n <= 16: 95.6% n <= 32: 99.8% n <= 64: 100.0% needle size: n <= 0: 0.0% n <= 1: 99.8% n <= 2: 99.8% n <= 3: 99.8% n <= 4: 99.9% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying mutt average size 20.60 needle size 2.01 comparisons 20.90 digraphs 0.08 trigraphs 0.00 calls 3505 succeed 8.4% latencies -97.4 -88.6 s1 aligned to 4 bytes 99.7% aligned to 8 bytes 99.5% aligned to 16 bytes 99.5% s2 aligned to 4 bytes 1.4% aligned to 8 bytes 0.7% aligned to 16 bytes 0.1% needle found in n bytes: n <= 0: 8.0% n <= 1: 8.3% n <= 2: 8.3% n <= 3: 8.9% n <= 4: 9.0% n <= 8: 12.2% n <= 16: 68.9% n <= 32: 84.1% n <= 64: 94.8% needle size: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 99.3% n <= 3: 99.9% n <= 4: 99.9% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying /usr/bin/python average size 53.38 needle size 8.55 comparisons 54.57 digraphs 0.10 trigraphs 0.00 calls 60 succeed 0.0% latencies -73.0 -68.5 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 26.7% aligned to 8 bytes 26.7% aligned to 16 bytes 26.7% needle found in n bytes: n <= 0: 1.7% n <= 1: 1.7% n <= 2: 1.7% n <= 3: 1.7% n <= 4: 1.7% n <= 8: 1.7% n <= 16: 1.7% n <= 32: 21.7% n <= 64: 60.0% needle size: n <= 0: 1.7% n <= 1: 1.7% n <= 2: 1.7% n <= 3: 1.7% n <= 4: 26.7% n <= 8: 26.7% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying /opt/google/chrome/chrome average size 6.95 needle size 1.35 comparisons 8.63 digraphs 0.05 trigraphs 0.00 calls 1990 succeed 79.8% latencies -44.7 -46.4 s1 aligned to 4 bytes 96.7% aligned to 8 bytes 95.5% aligned to 16 bytes 95.0% s2 aligned to 4 bytes 90.5% aligned to 8 bytes 89.6% aligned to 16 bytes 75.4% needle found in n bytes: n <= 0: 0.2% n <= 1: 0.2% n <= 2: 0.3% n <= 3: 0.4% n <= 4: 46.0% n <= 8: 78.3% n <= 16: 96.6% n <= 32: 99.7% n <= 64: 99.9% needle size: n <= 0: 0.1% n <= 1: 90.6% n <= 2: 95.0% n <= 3: 95.7% n <= 4: 95.7% n <= 8: 99.0% n <= 16: 99.8% n <= 32: 100.0% n <= 64: 100.0% replaying /usr/lib/kde4/libexec/kdm_greet average size 11.38 needle size 8.99 comparisons 11.41 digraphs 0.00 trigraphs 0.00 calls 23221 succeed 0.2% latencies -14.4 -13.0 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 0.1% aligned to 8 bytes 0.0% aligned to 16 bytes 0.0% needle found in n bytes: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 0.0% n <= 3: 0.0% n <= 4: 0.0% n <= 8: 0.0% n <= 16: 99.8% n <= 32: 99.8% n <= 64: 99.8% needle size: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 0.0% n <= 3: 0.0% n <= 4: 0.2% n <= 8: 0.3% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying su average size 40.09 needle size 9.60 comparisons 45.20 digraphs 0.29 trigraphs 0.00 calls 91 succeed 0.0% latencies -2.1 -0.4 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 51.6% aligned to 16 bytes 51.6% s2 aligned to 4 bytes 1.1% aligned to 8 bytes 1.1% aligned to 16 bytes 1.1% needle found in n bytes: n <= 0: 1.1% n <= 1: 1.1% n <= 2: 1.1% n <= 3: 1.1% n <= 4: 1.1% n <= 8: 4.4% n <= 16: 26.4% n <= 32: 28.6% n <= 64: 100.0% needle size: n <= 0: 1.1% n <= 1: 1.1% n <= 2: 1.1% n <= 3: 1.1% n <= 4: 1.1% n <= 8: 1.1% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying make average size 32.87 needle size 1.20 comparisons 34.73 digraphs 0.01 trigraphs 0.01 calls 277 succeed 91.7% latencies -12.5 -10.5 s1 aligned to 4 bytes 31.0% aligned to 8 bytes 17.3% aligned to 16 bytes 13.4% s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 98.6% needle found in n bytes: n <= 0: 0.4% n <= 1: 4.3% n <= 2: 4.7% n <= 3: 6.9% n <= 4: 7.6% n <= 8: 9.7% n <= 16: 14.8% n <= 32: 35.0% n <= 64: 99.6% needle size: n <= 0: 0.4% n <= 1: 95.3% n <= 2: 95.7% n <= 3: 97.1% n <= 4: 97.1% n <= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0% replaying gdb average size 14.69 needle size 6.77 comparisons 15.46 digraphs 0.06 trigraphs 0.01 calls 17206 succeed 5.0% latencies 8.4 10.2 s1 aligned to 4 bytes 52.6% aligned to 8 bytes 45.0% aligned to 16 bytes 41.3% s2 aligned to 4 bytes 9.6% aligned to 8 bytes 9.1% aligned to 16 bytes 9.1% needle found in n bytes: n <= 0: 0.2% n <= 1: 1.0% n <= 2: 4.3% n <= 3: 5.3% n <= 4: 7.6% n <= 8: 32.9% n <= 16: 81.9% n <= 32: 92.1% n <= 64: 96.7% needle size: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 27.9% n <= 3: 62.5% n <= 4: 64.8% n <= 8: 65.1% n <= 16: 92.7% n <= 32: 100.0% n <= 64: 100.0% replaying /usr/bin/urxvt average size 10.04 needle size 9.01 comparisons 10.05 digraphs 0.00 trigraphs 0.00 calls 11674 succeed 0.0% latencies -5.5 -0.7 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 0.6% aligned to 8 bytes 0.3% aligned to 16 bytes 0.2% needle found in n bytes: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 0.0% n <= 3: 0.0% n <= 4: 0.0% n <= 8: 0.0% n <= 16: 99.8% n <= 32: 100.0% n <= 64: 100.0% needle size: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 0.0% n <= 3: 0.0% n <= 4: 0.0% n <= 8: 0.3% n <= 16: 99.9% n <= 32: 100.0% n <= 64: 100.0% replaying man average size 7.91 needle size 29.43 comparisons 10.99 digraphs 0.24 trigraphs 0.24 calls 74 succeed 33.8% latencies 0.0 0.0 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 63.5% aligned to 8 bytes 63.5% aligned to 16 bytes 62.2% needle found in n bytes: n <= 0: 47.3% n <= 1: 47.3% n <= 2: 47.3% n <= 3: 51.4% n <= 4: 51.4% n <= 8: 51.4% n <= 16: 83.8% n <= 32: 100.0% n <= 64: 100.0% needle size: n <= 0: 1.4% n <= 1: 1.4% n <= 2: 1.4% n <= 3: 2.7% n <= 4: 5.4% n <= 8: 6.8% n <= 16: 39.2% n <= 32: 55.4% n <= 64: 100.0% replaying troff average size 11.87 needle size 9.00 comparisons 11.87 digraphs 0.00 trigraphs 0.00 calls 5498 succeed 0.0% latencies 0.0 0.0 s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 100.0% s2 aligned to 4 bytes 0.0% aligned to 8 bytes 0.0% aligned to 16 bytes 0.0% needle found in n bytes: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 0.0% n <= 3: 0.0% n <= 4: 0.0% n <= 8: 0.0% n <= 16: 81.5% n <= 32: 100.0% n <= 64: 100.0% needle size: n <= 0: 0.0% n <= 1: 0.0% n <= 2: 0.0% n <= 3: 0.0% n <= 4: 0.0% n <= 8: 0.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
diff --git a/benchtests/bench-strstr.c b/benchtests/bench-strstr.c index 74f3ee8..0412c1b 100644 --- a/benchtests/bench-strstr.c +++ b/benchtests/bench-strstr.c @@ -81,31 +81,15 @@ do_test (size_t align1, size_t align2, size_t len1, size_t len2, char *s1 = (char *) (buf1 + align1); char *s2 = (char *) (buf2 + align2); - static const char d[] = "1234567890abcdef"; -#define dl (sizeof (d) - 1) - char *ss2 = s2; - for (size_t l = len2; l > 0; l = l > dl ? l - dl : 0) + for (size_t l = 0; l < len2; l++) { - size_t t = l > dl ? dl : l; - ss2 = mempcpy (ss2, d, t); + s2[l] = (rand () % 128) + 1; } s2[len2] = '\0'; - if (fail) + for (size_t l = 0; l < len1; l++) { - char *ss1 = s1; - for (size_t l = len1; l > 0; l = l > dl ? l - dl : 0) - { - size_t t = l > dl ? dl : l; - memcpy (ss1, d, t); - ++ss1[len2 > 7 ? 7 : len2 - 1]; - ss1 += t; - } - } - else - { - memset (s1, '0', len1); - memcpy (s1 + len1 - len2, s2, len2); + s1[l] = (rand () % 128) + 1; } s1[len1] = '\0';