From patchwork Fri Dec 1 13:51:13 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 24674 Received: (qmail 19049 invoked by alias); 1 Dec 2017 13:51:21 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 19033 invoked by uid 89); 1 Dec 2017 13:51:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, KB_WAM_FROM_NAME_SINGLEWORD, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=slowdown, frees X-HELO: EUR02-VE1-obe.outbound.protection.outlook.com From: Wilco Dijkstra To: "libc-alpha@sourceware.org" CC: nd Subject: [PATCH] Add malloc micro benchmark Date: Fri, 1 Dec 2017 13:51:13 +0000 Message-ID: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB6PR0801MB2056; 6:Tw00HN5WL1C5hvCY4PyGvGG5gkVppdDKCeqCKDxiTR6YEz75A9cShTf41QXAPz6UpzATwiQjezbbb3Ulm2ZBXlWQDr44gfxvPzzCrC0kmN6oLV3ygQwTJGLjnKf76e3RD/GuInCaXZTTXIf0seCF3wmx4AkqM41y6f3PPHDnXAi/iDOtjxGFRXiHIjTs+JUTSeWvBixhsRdHon00Q4EhC63ji9EDw2xLFATvPuMxH+22nn5YBTN4e/Oe7Npg2bHks01VmYRml4gWCQIPCUdMqltManKhHQqgZiY2GxPPOQBEz3/ClJCTPXHt/kk5VNL+WYPUyAF7r4NbnWKNuaWZLeVNTrayyrFs1oe+w9lMUkY=; 5:vIg2Y6nmHZZNuvQbEGXpplBDzKDcfclSllDiwIMkEMhRIge5LcWvFnyQakYTm26rHOwzdT6VDIOey8uZ+NH8XvZxahp3FxkuRGimo1ImMrTx9/IhorVGc+Oy+9m4n4LzLA1V4ng0JCw0ewaIB6iyJLXLoMJ/AhhDsU8KpUwjbk8=; 24:o6hNBLn7bHi+HtAsxiWr9gAh4QnNHtEOS/rLDHc1e8ZRMR5ozEPHrrDyzJiMIOio3nk6r5oglq1JHqso+k6XuIS11c7wWDGMZpwrRllb66o=; 7:erR0TJwxAz/v43eWMztMxe8BrpeYleVTDXEfPfPFDht8SkLap/La/6SUHbt6TSjupRhxyoOw5LHJF5uOPZ38Gt+4j3oulrmjH/zTfmcz7RphFIid+LBHWHa+n3YUOOswKGl+6NprkL/KjwG7AuKc9ByBaxUBl4lk7BEdDvpxd1ROoB8GvT1ZWoVyaULpKBD+Cn+/bf4b01jwQhIxBi2+gvTM7p5VwY45Y7oOUFAYQ8XCXgZGABTuxzPMzOcqWcuH x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: aaacb1b9-fd78-4cea-cf95-08d538c29630 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(48565401081)(5600026)(4604075)(2017052603286); SRVR:DB6PR0801MB2056; x-ms-traffictypediagnostic: DB6PR0801MB2056: nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(250305191791016)(180628864354917)(22074186197030); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040450)(2401047)(8121501046)(5005006)(3231022)(10201501046)(3002001)(93006095)(93001095)(6055026)(6041248)(20161123562025)(20161123564025)(20161123558100)(20161123560025)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:DB6PR0801MB2056; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:DB6PR0801MB2056; x-forefront-prvs: 05087F0C24 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(366004)(39860400002)(346002)(376002)(199003)(54534003)(189002)(377424004)(102836003)(9686003)(6506006)(33656002)(6436002)(105586002)(106356001)(53936002)(5640700003)(3846002)(6116002)(101416001)(316002)(4326008)(6306002)(2351001)(305945005)(478600001)(7736002)(81166006)(81156014)(8676002)(189998001)(8936002)(25786009)(74316002)(2501003)(14454004)(72206003)(2906002)(5660300001)(97736004)(3280700002)(6916009)(3660700001)(2900100001)(68736007)(5250100002)(7696005)(575784001)(86362001)(66066001)(99286004)(54356011)(55016002)(2004002); DIR:OUT; SFP:1101; SCL:1; SRVR:DB6PR0801MB2056; H:DB6PR0801MB2053.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: aaacb1b9-fd78-4cea-cf95-08d538c29630 X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Dec 2017 13:51:13.6604 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB2056 Add a malloc micro benchmark to enable accurate testing of the various paths in malloc and free. The benchmark does a varying number of allocations of a given block size, then frees them again. It does so for several block sizes and number of allocated blocks. Although the test is single-threaded, it also tests what happens when you disable single-threaded fast paths (ie. SINGLE_THREAD_P is false). OK for commit? Typical output on an x64 box: { "timing_type": "hp_timing", "functions": { "malloc": { "malloc_block_size_0016": { "st_num_allocs_0025_time": 53.5486, "st_num_allocs_0100_time": 57.2553, "st_num_allocs_0400_time": 57.3204, "st_num_allocs_1000_time": 57.2059, "mt_num_allocs_0025_time": 87.7903, "mt_num_allocs_0100_time": 100.772, "mt_num_allocs_0400_time": 103.827, "mt_num_allocs_1000_time": 104.812 }, "malloc_block_size_0256": { "st_num_allocs_0025_time": 78.3056, "st_num_allocs_0100_time": 85.6392, "st_num_allocs_0400_time": 91.5187, "st_num_allocs_1000_time": 163.458, "mt_num_allocs_0025_time": 115.925, "mt_num_allocs_0100_time": 140.735, "mt_num_allocs_0400_time": 152.044, "mt_num_allocs_1000_time": 225.118 }, "malloc_block_size_1024": { "st_num_allocs_0025_time": 113.705, "st_num_allocs_0100_time": 103.79, "st_num_allocs_0400_time": 479.029, "st_num_allocs_1000_time": 634.228, "mt_num_allocs_0025_time": 145.807, "mt_num_allocs_0100_time": 151.157, "mt_num_allocs_0400_time": 526.499, "mt_num_allocs_1000_time": 687.357 }, "malloc_block_size_4096": { "st_num_allocs_0025_time": 105.101, "st_num_allocs_0100_time": 1640.23, "st_num_allocs_0400_time": 2411.26, "st_num_allocs_1000_time": 2641.56, "mt_num_allocs_0025_time": 156.323, "mt_num_allocs_0100_time": 1702.94, "mt_num_allocs_0400_time": 2453, "mt_num_allocs_1000_time": 2676.75 } } } } Note something very bad happens for the larger allocations, there is a 25x slowdown from 25 to 400 allocations of 4KB blocks... ChangeLog: 2017-12-01 Wilco Dijkstra * benchtests/Makefile: Add malloc-simple benchmark. * benchtests/bench-malloc-simple.c: New benchmark. diff --git a/benchtests/Makefile b/benchtests/Makefile index d8681fce8cf399bc655f3f6a7717897eb9c30619..a4b2573cfa706bd6369063a995d512e0947c7bd5 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -67,8 +67,10 @@ stdio-common-benchset := sprintf math-benchset := math-inlines +malloc-benchset := malloc-simple + benchset := $(string-benchset-all) $(stdlib-benchset) $(stdio-common-benchset) \ - $(math-benchset) + $(math-benchset) $(malloc-benchset) CFLAGS-bench-ffs.c += -fno-builtin CFLAGS-bench-ffsll.c += -fno-builtin @@ -86,6 +88,7 @@ $(addprefix $(objpfx)bench-,$(bench-math)): $(libm) $(addprefix $(objpfx)bench-,$(math-benchset)): $(libm) $(addprefix $(objpfx)bench-,$(bench-pthread)): $(shared-thread-library) $(objpfx)bench-malloc-thread: $(shared-thread-library) +$(objpfx)bench-malloc-simple: $(shared-thread-library) diff --git a/benchtests/bench-malloc-simple.c b/benchtests/bench-malloc-simple.c new file mode 100644 index 0000000000000000000000000000000000000000..e786ddd9635f835b2f01b00a80f3cf0d2de82d48 --- /dev/null +++ b/benchtests/bench-malloc-simple.c @@ -0,0 +1,152 @@ +/* Benchmark malloc and free functions. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include "bench-timing.h" +#include "json-lib.h" + +#define NUM_ITERS 1000000 +#define NUM_ALLOCS 4 +#define NUM_SIZES 4 +#define MAX_ALLOCS 1000 + +typedef struct +{ + size_t iters; + size_t size; + int n; + timing_t elapsed; +} malloc_args; + +static void +do_benchmark (malloc_args *args, int **arr) +{ + timing_t start, stop; + size_t iters = args->iters; + size_t size = args->size; + int n = args->n; + + TIMING_NOW (start); + + for (int j = 0; j < iters; j++) + { + for (int i = 0; i < n; i++) + arr[i] = malloc (size); + + for (int i = 0; i < n; i++) + free (arr[i]); + } + + TIMING_NOW (stop); + + TIMING_DIFF (args->elapsed, start, stop); +} + +static malloc_args tests[2][NUM_SIZES][NUM_ALLOCS]; +static int allocs[NUM_ALLOCS] = { 25, 100, 400, MAX_ALLOCS }; +static size_t sizes[NUM_SIZES] = { 16, 256, 1024, 4096 }; + +static void * +dummy (void *p) +{ + return p; +} + +int +main (int argc, char **argv) +{ + size_t iters = NUM_ITERS; + int **arr = (int**) malloc (MAX_ALLOCS * sizeof (void*)); + unsigned long res; + + TIMING_INIT (res); + (void) res; + + for (int t = 0; t < 2; t++) + for (int j = 0; j < NUM_SIZES; j++) + for (int i = 0; i < NUM_ALLOCS; i++) + { + tests[t][j][i].n = allocs[i]; + tests[t][j][i].size = sizes[j]; + tests[t][j][i].iters = iters / allocs[i]; + + /* Do a quick warmup run. */ + if (t == 0) + do_benchmark (&tests[0][j][i], arr); + } + + /* Run benchmark single threaded. */ + for (int j = 0; j < NUM_SIZES; j++) + for (int i = 0; i < NUM_ALLOCS; i++) + do_benchmark (&tests[0][j][i], arr); + + /* Create an empty thread so SINGLE_THREAD_P becomes false. */ + pthread_t t; + pthread_create(&t, NULL, dummy, NULL); + pthread_join(t, NULL); + + /* Repeat benchmark with SINGLE_THREAD_P == false. */ + for (int j = 0; j < NUM_SIZES; j++) + for (int i = 0; i < NUM_ALLOCS; i++) + do_benchmark (&tests[1][j][i], arr); + + free (arr); + + json_ctx_t json_ctx; + + json_init (&json_ctx, 0, stdout); + + json_document_begin (&json_ctx); + + json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); + + json_attr_object_begin (&json_ctx, "functions"); + + json_attr_object_begin (&json_ctx, "malloc"); + + for (int j = 0; j < NUM_SIZES; j++) + { + char s[100]; + double iters2 = iters; + sprintf (s, "malloc_block_size_%04ld", sizes[j]); + json_attr_object_begin (&json_ctx, s); + + for (int i = 0; i < NUM_ALLOCS; i++) + { + sprintf (s, "st_num_allocs_%04d_time", allocs[i]); + json_attr_double (&json_ctx, s, tests[0][j][i].elapsed / iters2); + } + + for (int i = 0; i < NUM_ALLOCS; i++) + { + sprintf (s, "mt_num_allocs_%04d_time", allocs[i]); + json_attr_double (&json_ctx, s, tests[1][j][i].elapsed / iters2); + } + + json_attr_object_end (&json_ctx); + } + + json_attr_object_end (&json_ctx); + + json_attr_object_end (&json_ctx); + + json_document_end (&json_ctx); + return 0; +}