From patchwork Thu Aug 17 13:08:46 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 22189 Received: (qmail 21632 invoked by alias); 17 Aug 2017 13:08:56 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 17507 invoked by uid 89); 17 Aug 2017 13:08:52 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: EUR01-VE1-obe.outbound.protection.outlook.com From: Wilco Dijkstra To: Siddhesh Poyarekar , Alexander Monakov , Arjan van de Ven CC: "libc-alpha@sourceware.org" , nd Subject: Re: [PATCH v3] Add math benchmark latency test Date: Thu, 17 Aug 2017 13:08:46 +0000 Message-ID: References: <0e008f2e-f41a-1bb8-803c-2f798e2c3541@gotplt.org> , <28bd6bf5-6e1a-7b21-9a5c-c6b2c6be61e3@gotplt.org> In-Reply-To: <28bd6bf5-6e1a-7b21-9a5c-c6b2c6be61e3@gotplt.org> authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB6PR0801MB2053; 6:BtbWThzhVr0Y2Etq6PEcSk9MDMzRsZkJ7gKqKlagfkFmfffBpPm5fZCJzD4ae6fSwHb6P4lWudaayvlx6LfapWy78CdgiYq8mmOeH0Z7qMJzvDbUG3wavYVl9DRStmzIIP7ztafygh5/EKGuZfZ487NGVAfz9RvpP7jPFAmd/Q/PCqlRiLCYlTgkinQ8uitZkwmwl48EEA1cL2lbUDKgOAzx8AET/gzSFXqlGgMcA8S4J3e3QItIpwxgpnoqmNahlFbJYC2WAtOTH0mehdcWL0P0Rjpio3gUgbtOfgsS8qMBjzBWkX70k318xtf1XzUBWKqGgS1Hm6nBN7db2m65jQ==; 5:6Gz5XL5izyFZVmsT5Snzy3d1rG1hq5xAwJt7rmbvUUzGtDnb+kVVHJHomwPuVlAK/CSNIU7AmYGkGCEuF68ww1Ek+EXAMSUzKSuh9eKEx5e5MA8bdnR27gXItM+hOtWdfdVRp+ftXq9ddk1ExKyPzg==; 24:pKWCig8NTh5mRdt4TX9l1J9S3SZbC4X7i2Kogv/p/qHo+eXFi2OLcsESmR4oV1TQnFk54yI4AN1Mhx3eWyYNpIhPbwBi2bRjUPHQimbDReU=; 7:AJbgc+NbLflzlP0Ecif3+C5KxIpG21NlAAigtGHdJr57jUwaIfGIAjo++CYLgAQfN5ZPgdOlMxJgwyxRe7RcoUjmDbtDJHNwqasSWFVBLkt3at57jCEK6tyVqsnFAG3Wp1zi+9zVprvnlzaYHkNBw+6VT1ke9rOs3l2gIJVJfX0rN9OLDqlS0lwaMMfy0/v6NSllt/Bsk5QSAwyqopIChy2bSEqySqdaTeidBtangLg= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 34e49da1-5c9b-459e-e027-08d4e571183a x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(48565401081)(300000503095)(300135400095)(2017052603031)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:DB6PR0801MB2053; x-ms-traffictypediagnostic: DB6PR0801MB2053: nodisclaimer: True x-exchange-antispam-report-test: UriScan:(180628864354917)(20558992708506); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(93006095)(93001095)(100000703101)(100105400095)(6055026)(6041248)(20161123555025)(20161123564025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(20161123560025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DB6PR0801MB2053; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DB6PR0801MB2053; x-forefront-prvs: 0402872DA1 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(39830400002)(54534003)(377424004)(189002)(199003)(81166006)(53936002)(99286003)(8676002)(81156014)(8936002)(25786009)(9686003)(54906002)(97736004)(55016002)(68736007)(575784001)(86362001)(93886005)(50986999)(76176999)(54356999)(2906002)(4326008)(3660700001)(3280700002)(6116002)(102836003)(478600001)(3846002)(105586002)(106356001)(66066001)(33656002)(5250100002)(101416001)(7736002)(2950100002)(74316002)(5660300001)(7696004)(189998001)(14454004)(2900100001)(229853002)(305945005)(6246003)(72206003)(6436002)(6506006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB6PR0801MB2053; H:DB6PR0801MB2053.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Aug 2017 13:08:46.5253 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB2053 OK, here is version 3: This patch further improves math function benchmarking by adding a latency test in addition to throughput. This enables more accurate comparisons of the math functions. The latency test works by creating a dependency on the previous iteration: func_res = F (func_res * zero + input[i]). The multiply by zero avoids changing the input. It reports reciprocal throughput and latency in nanoseconds (depending on the timing header used) and max/min throughput in iterations per second: "workload-spec2006.wrf": { "reciprocal-throughput": 100, "latency": 200, "max-throughput": 1.0e+07, "min-throughput": 5.0e+06 } OK for commit? ChangeLog: 2017-08-17 Wilco Dijkstra * benchtests/bench-skeleton.c (main): Add support for latency benchmarking. * benchtests/scripts/bench.py: Add support for latency benchmarking. diff --git a/benchtests/bench-skeleton.c b/benchtests/bench-skeleton.c index 3c6dad705594ac0a53edcb4e09686252c13127cf..955b2e1d2125775c6ffc10a004184a9564496e46 100644 --- a/benchtests/bench-skeleton.c +++ b/benchtests/bench-skeleton.c @@ -71,8 +71,10 @@ main (int argc, char **argv) bool is_bench = strncmp (VARIANT (v), "workload-", 9) == 0; double d_total_i = 0; timing_t total = 0, max = 0, min = 0x7fffffffffffffff; + timing_t throughput = 0, latency = 0; int64_t c = 0; uint64_t cur; + BENCH_VARS; while (1) { if (is_bench) @@ -86,7 +88,16 @@ main (int argc, char **argv) BENCH_FUNC (v, i); TIMING_NOW (end); TIMING_DIFF (cur, start, end); - TIMING_ACCUM (total, cur); + TIMING_ACCUM (throughput, cur); + + TIMING_NOW (start); + for (k = 0; k < iters; k++) + for (i = 0; i < NUM_SAMPLES (v); i++) + BENCH_FUNC_LAT (v, i); + TIMING_NOW (end); + TIMING_DIFF (cur, start, end); + TIMING_ACCUM (latency, cur); + d_total_i += iters * NUM_SAMPLES (v); } else @@ -131,12 +142,20 @@ main (int argc, char **argv) /* Begin variant. */ json_attr_object_begin (&json_ctx, VARIANT (v)); - json_attr_double (&json_ctx, "duration", d_total_s); - json_attr_double (&json_ctx, "iterations", d_total_i); if (is_bench) - json_attr_double (&json_ctx, "throughput", d_total_s / d_total_i); + { + json_attr_double (&json_ctx, "reciprocal-throughput", + throughput / d_total_i); + json_attr_double (&json_ctx, "latency", latency / d_total_i); + json_attr_double (&json_ctx, "max-throughput", + d_total_i / throughput * 1000000000.0); + json_attr_double (&json_ctx, "min-throughput", + d_total_i / latency * 1000000000.0); + } else { + json_attr_double (&json_ctx, "duration", d_total_s); + json_attr_double (&json_ctx, "iterations", d_total_i); json_attr_double (&json_ctx, "max", max / d_iters); json_attr_double (&json_ctx, "min", min / d_iters); json_attr_double (&json_ctx, "mean", d_total_s / d_total_i); diff --git a/benchtests/scripts/bench.py b/benchtests/scripts/bench.py index 8c1c9eeb2bc67a16cb8a8e010fd2b8a2ef8ab6df..b7ccb7c8c2bf1822202a2377dfb0675516115cc5 100755 --- a/benchtests/scripts/bench.py +++ b/benchtests/scripts/bench.py @@ -45,7 +45,7 @@ DEFINES_TEMPLATE = ''' # variant is represented by the _VARIANT structure. The ARGS structure # represents a single set of arguments. STRUCT_TEMPLATE = ''' -#define CALL_BENCH_FUNC(v, i) %(func)s (%(func_args)s) +#define CALL_BENCH_FUNC(v, i, x) %(func)s (x %(func_args)s) struct args { @@ -84,7 +84,9 @@ EPILOGUE = ''' #define RESULT(__v, __i) (variants[(__v)].in[(__i)].timing) #define RESULT_ACCUM(r, v, i, old, new) \\ ((RESULT ((v), (i))) = (RESULT ((v), (i)) * (old) + (r)) / ((new) + 1)) -#define BENCH_FUNC(i, j) ({%(getret)s CALL_BENCH_FUNC (i, j);}) +#define BENCH_FUNC(i, j) ({%(getret)s CALL_BENCH_FUNC (i, j, );}) +#define BENCH_FUNC_LAT(i, j) ({%(getret)s CALL_BENCH_FUNC (i, j, %(latarg)s);}) +#define BENCH_VARS %(defvar)s #define FUNCNAME "%(func)s" #include "bench-skeleton.c"''' @@ -122,17 +124,22 @@ def gen_source(func, directives, all_vals): # If we have a return value from the function, make sure it is # assigned to prevent the compiler from optimizing out the # call. + getret = '' + latarg = '' + defvar = '' + if directives['ret']: print('static %s volatile ret;' % directives['ret']) - getret = 'ret = ' - else: - getret = '' + print('static %s zero __attribute__((used)) = 0;' % directives['ret']) + getret = 'ret = func_res = ' + latarg = 'func_res * zero +' + defvar = '%s func_res = 0;' % directives['ret'] # Test initialization. if directives['init']: print('#define BENCH_INIT %s' % directives['init']) - print(EPILOGUE % {'getret': getret, 'func': func}) + print(EPILOGUE % {'getret': getret, 'func': func, 'latarg': latarg, 'defvar': defvar }) def _print_arg_data(func, directives, all_vals):