From patchwork Thu Jul 14 11:37:44 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Blanchard X-Patchwork-Id: 13792 Received: (qmail 85271 invoked by alias); 14 Jul 2016 11:38:05 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 85235 invoked by uid 89); 14 Jul 2016 11:38:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=ab, compressed, Hx-languages-length:2506, ages X-HELO: mx0a-001b2d01.pphosted.com X-IBM-Helo: d23dlp03.au.ibm.com X-IBM-MailFrom: anton@au1.ibm.com X-IBM-RcptTo: libc-alpha@sourceware.org Date: Thu, 14 Jul 2016 21:37:44 +1000 From: Anton Blanchard To: DJ Delorie Cc: carlos@redhat.com, sid@reserved-bit.com, libc-alpha@sourceware.org Subject: Re: Malloc improvements In-Reply-To: References: <20160712224047.358f7fc1@kryten> MIME-Version: 1.0 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16071411-0052-0000-0000-000001AE35DE X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16071411-0053-0000-0000-0000065EF27D Message-Id: <20160714213744.665694a1@kryten> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-07-14_06:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1607140129 Hi DJ, > I have a trace here that's 360 Gb, which is 24 Gb after conversion. > > Although I am switching to a binary raw file, with a separate utility > for converting it to ASCII. The size and time considerations were > significant. Looks good. Is the plan to ship around the *.out or *.wl files? I reran a trace of omnetpp from SPECint2006 with the new binary format: 23G mtrace.out.77514 842M mtrace.out.77514.xz 3.2G test.wl 841M test.wl.xz There isn't much difference after compression, but it took ages to compress the *.out file. Not surprisingly the *.wl file compressed much faster. > It would be interesting to rerun that with my new converter (in case > the old one is overly pessimistic about synchronizing), but in > general, every time a pointer passes "ownership" from one thread to > another, the simulator puts in a set of calls to synchronize the two > threads (the sync_w and sync_r commands in trace_run.c). If you can > come up with a faster way of doing it, or a way to reduce the number > of times it's needed, I'm all ears, but I'm not that worried about it > - the purpose of the simulator is to capture the application's > malloc/free pattern "good enough" to benchmark the glibc calls in a > way that "represents" the application's needs. In the future, we'll > be able to make performance changes to malloc's code with a good > understanding of how it impacts a wide range of applications. I was thinking about single threaded traces, perhaps we could avoid all the locking in that case. My tests show avoiding the locking is about 4x faster on the omnetpp trace on POWER8. As well as the locking, the memory initialisation loops were showing up in profiles. Is there a reason for encoding the offset in free_wipe()? If not we can just use memset() which is much faster. Anton --- When initialising memory use memset() instead of an open coded loop. diff --git a/malloc/trace2wl.cc b/malloc/trace2wl.cc index aa53fb3..f3d60b5 100644 --- a/malloc/trace2wl.cc +++ b/malloc/trace2wl.cc @@ -156,13 +184,11 @@ static void wmem (volatile void *ptr, int count) { char *p = (char *)ptr; - int i; if (!p) return; - for (i=0; i