From patchwork Mon Jun 19 16:52:30 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 21110 Received: (qmail 4580 invoked by alias); 19 Jun 2017 16:52:33 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 4379 invoked by uid 89); 19 Jun 2017 16:52:31 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: EUR01-DB5-obe.outbound.protection.outlook.com From: Wilco Dijkstra To: Siddhesh Poyarekar , "libc-alpha@sourceware.org" CC: nd Subject: Re: [PATCH] Improve math benchmark infrastructure Date: Mon, 19 Jun 2017 16:52:30 +0000 Message-ID: References: , <2a29a005-acdc-787a-8503-2673bac474e6@gotplt.org> In-Reply-To: <2a29a005-acdc-787a-8503-2673bac474e6@gotplt.org> authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM5PR0802MB2612; 7:Z5iOZ0mPtfarVUaKCMLloEuAV8wS5FKBYJnn1VQkx63HQPdI2i+U0qNAqWdSVoJfidkgrkaAnR8kjQAAek8O6q1XPVvbaNPJRHkwms7VS+N3iWe7vtFLjxHyygSsq7nDE8Ojvnci5753fAZHJMh1OgEWiCN2IwJx+cbUi9kHcG1zo1qm+97YU5s7XsNH274E5jDiAr5JSgMRgo+LjXAiRox+UCwdTtGs8IKJ2XgB/1RwDQhz9ciAYWv5xnDCs7R920zGPAxPWZromiW5//RI7CJi9BEgrwvjXvRtxnESVd+XEejx8oJ/2WJycsJjrJgN5ub15iHkYjK+qJNeKGZGB7QDNUibPFAhZjQes8AEasmJtmHWdKdMibmqAcqkRHqEFXtq3W72cOPfsiH4XlY3wyIa4SSTHB2khbCpG3Pnvz+41CteguW4yE6jZgJcooQ4yEkmS8qeHhTne6jluobH/7OcaclcYqHiNYpI/RVaA49CDgtrv/p9CTWpbOXixnbVqRUYpREMk2QmS3GA/2vc6QU2G7l3QTFwuYWr1CQ9nczCYQJWW4vDxXJet+YD36ZwraoGoo6v5jpxZf16PP2aIq1MtpIGvVjC2UTEOQuPK8GrXzor325M2saAxR5doeZ953oIJf0XtK/ZsOmFFaZBp3S06n1S0xUFhiFyBKXNo6Ikxqw97x1Jwz1BeHVlHxXWWI59RB7XIc08KvEOfraIrENpkv5K+TXPF1hAGI46ibQIv0yVc+N1j0MLO+nBq9jV5J0IYHvNqrqjjbAwzGY06RXbK8MVjIDrHjOTOKVkIiU= x-ms-office365-filtering-correlation-id: 57e2be12-1f5f-4ca0-dced-08d4b733932d x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254075)(48565401081)(201703131423075)(201703031133081)(201702281549075); SRVR:AM5PR0802MB2612; x-ms-traffictypediagnostic: AM5PR0802MB2612: nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(100000703101)(100105400095)(93006095)(93001095)(6055026)(6041248)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(20161123562025)(20161123558100)(20161123564025)(6072148)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:AM5PR0802MB2612; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:AM5PR0802MB2612; x-forefront-prvs: 0343AC1D30 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(39850400002)(39410400002)(39400400002)(39840400002)(39860400002)(39450400003)(377424004)(24454002)(6436002)(189998001)(9686003)(50986999)(54356999)(53936002)(76176999)(3280700002)(478600001)(86362001)(575784001)(55016002)(99286003)(5660300001)(7696004)(2906002)(5250100002)(14454004)(74316002)(2900100001)(102836003)(38730400002)(305945005)(8676002)(3660700001)(8936002)(2950100002)(229853002)(33656002)(6116002)(25786009)(6246003)(7736002)(72206003)(6506006)(81166006); DIR:OUT; SFP:1101; SCL:1; SRVR:AM5PR0802MB2612; H:AM5PR0802MB2610.eurprd08.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 19 Jun 2017 16:52:30.5304 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0802MB2612 Siddhesh Poyarekar wrote: > > There is no easy way to actually replicate the cache effects of a real > workload without replicating the program in some form, but this is a > good start.  Long term we should consider doing things like invalidating > cache at specific points in the program in an attempt to emulate the > workload a little more closely.  This may be more relevant for string > benchmarks than for math though. Yes it's hard to replicate the full cache effects in a micro benchmark. I'm not sure flushing is the best approach - at least for the string functions it is possible to avoid using the same pointers again and again and do something similar to memcpy-random by precomputing a set of operations and then do them in one go. For math functions it's likely more about I-cache pressure. > Use the name convention workload-* to better reflect what the dataset > is.  So for your wrf powf trace, call it workload-spec2006.wrf.  Also, > please document this feature in benchtests/README. Done, see below: Improve support for math function benchmarking. This patch adds a feature that allows accurate benchmarking of traces extracted from real workloads. This is done by iterating over all samples rather than repeating each sample many times (which completely ignores branch prediction and cache effects). A trace can be added to existing math function inputs via "## name: workload-", followed by the trace. OK for commit? ChangeLog: 2017-06-19 Wilco Dijkstra * benchtests/README: Describe workload feature. * benchtests/bench-skeleton.c (main): Add support for benchmarking traces from workloads. diff --git a/benchtests/README b/benchtests/README index 2c5f38113593ea7da90895266c8fd523fa21c5a1..67333707d5bbc2c6cf5a4de5698c18dfdf086076 100644 --- a/benchtests/README +++ b/benchtests/README @@ -102,6 +102,12 @@ the same file by using the `name' directive that looks something like this: See the pow-inputs file for an example of what such a partitioned input file would look like. +It is also possible to measure throughput of a (partial) trace extracted from +a real workload. In this case the whole trace is iterated over 'iter' times +rather than repeating every input multiple times. This can be done via: + + ##name: workload- + Benchmark Sets: ============== diff --git a/benchtests/bench-skeleton.c b/benchtests/bench-skeleton.c index 09eb78df1bce2d9f5e410e3e82821eb9b271e70d..8c98ed673c055a5cf4d774604eb7bf0a383cecb2 100644 --- a/benchtests/bench-skeleton.c +++ b/benchtests/bench-skeleton.c @@ -68,34 +68,50 @@ main (int argc, char **argv) clock_gettime (CLOCK_MONOTONIC_RAW, &runtime); runtime.tv_sec += DURATION; + bool is_bench = strncmp (VARIANT (v), "workload-", 9) == 0; double d_total_i = 0; timing_t total = 0, max = 0, min = 0x7fffffffffffffff; int64_t c = 0; + uint64_t cur; while (1) { - for (i = 0; i < NUM_SAMPLES (v); i++) + if (is_bench) { - uint64_t cur; + /* Benchmark a real trace of calls - all samples are iterated + over once before repeating. This models actual use more + accurately than repeating the same sample many times. */ TIMING_NOW (start); for (k = 0; k < iters; k++) - BENCH_FUNC (v, i); + for (i = 0; i < NUM_SAMPLES (v); i++) + BENCH_FUNC (v, i); TIMING_NOW (end); - TIMING_DIFF (cur, start, end); + TIMING_ACCUM (total, cur); + d_total_i += iters * NUM_SAMPLES (v); + } + else + for (i = 0; i < NUM_SAMPLES (v); i++) + { + TIMING_NOW (start); + for (k = 0; k < iters; k++) + BENCH_FUNC (v, i); + TIMING_NOW (end); - if (cur > max) - max = cur; + TIMING_DIFF (cur, start, end); - if (cur < min) - min = cur; + if (cur > max) + max = cur; - TIMING_ACCUM (total, cur); - /* Accumulate timings for the value. In the end we will divide - by the total iterations. */ - RESULT_ACCUM (cur, v, i, c * iters, (c + 1) * iters); + if (cur < min) + min = cur; - d_total_i += iters; - } + TIMING_ACCUM (total, cur); + /* Accumulate timings for the value. In the end we will divide + by the total iterations. */ + RESULT_ACCUM (cur, v, i, c * iters, (c + 1) * iters); + + d_total_i += iters; + } c++; struct timespec curtime; @@ -117,11 +133,18 @@ main (int argc, char **argv) json_attr_double (&json_ctx, "duration", d_total_s); json_attr_double (&json_ctx, "iterations", d_total_i); - json_attr_double (&json_ctx, "max", max / d_iters); - json_attr_double (&json_ctx, "min", min / d_iters); - json_attr_double (&json_ctx, "mean", d_total_s / d_total_i); + if (is_bench) + { + json_attr_double (&json_ctx, "throughput", d_total_s / d_total_i); + } + else + { + json_attr_double (&json_ctx, "max", max / d_iters); + json_attr_double (&json_ctx, "min", min / d_iters); + json_attr_double (&json_ctx, "mean", d_total_s / d_total_i); + } - if (detailed) + if (detailed && !is_bench) { json_array_begin (&json_ctx, "timings");