From patchwork Mon Oct 19 14:47:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Liebler X-Patchwork-Id: 40753 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0F4513870883; Mon, 19 Oct 2020 14:47:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0F4513870883 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1603118864; bh=Zosyy/GbjyX6mUu9nXZnS0iNbrSn+Pk0A3mjx3Cs0Ng=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=gv+2HTdCk/ejfxl8wpNPDfMZ9t4P7EVaIIPoKsmMoZLNcgr0JJzVenjJZjlTPTJKL gIsE0Mu3mrVrW13YDtLnuSuucM3fLkDqSvuiOe6+avItv8p+1hDGpSCeG4uoNJaGNo oHaTeZE2sTq+HyHrFyuzGC1VXOhNWQrt73yinUUk= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id DBC743858020 for ; Mon, 19 Oct 2020 14:47:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org DBC743858020 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 09JEhnJD062646 for ; Mon, 19 Oct 2020 10:47:40 -0400 Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 349ctpg3uc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 19 Oct 2020 10:47:39 -0400 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 09JElVZK004339 for ; Mon, 19 Oct 2020 14:47:37 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma05fra.de.ibm.com with ESMTP id 347r8814k7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 19 Oct 2020 14:47:37 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 09JElYkl22217176 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 19 Oct 2020 14:47:34 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 511CCA4057; Mon, 19 Oct 2020 14:47:34 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2F219A4040; Mon, 19 Oct 2020 14:47:34 +0000 (GMT) Received: from t35lp56.lnxne.boe (unknown [9.152.108.100]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 19 Oct 2020 14:47:34 +0000 (GMT) To: libc-alpha@sourceware.org Subject: [PATCH v2] Loosen the limits of time/tst-cpuclock1. Date: Mon, 19 Oct 2020 16:47:15 +0200 Message-Id: <20201019144715.3236886-1-stli@linux.ibm.com> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-10-19_06:2020-10-16, 2020-10-19 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 priorityscore=1501 mlxlogscore=999 spamscore=0 clxscore=1015 adultscore=0 malwarescore=0 phishscore=0 suspectscore=1 lowpriorityscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010190101 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Stefan Liebler via Libc-alpha From: Stefan Liebler Reply-To: Stefan Liebler Cc: Stefan Liebler Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Starting with the commit 04deeaa9ea74b0679dfc9d9155a37b6425f19a9f "Fix time/tst-cpuclock1 intermitent failures" (2020-07-11), this test fails quite often on s390x/s390 with one/multiple of those: "before - after" / "nanosleep time" / "dead - after" ourside reasonable range. On a zVM/kvm guest the CPUs are shared between multiple guests. And even on the lpar (kvm host) the CPUs are usually shared between multiple lpars. The defined CPUs for a lpar/zVM-system could also have lower weights compared to other lpars which let the steal time further grow. Usually I build (-j$(nproc)) and test (PARALLELMFLAGS="-j$(nproc)") glibc multiple times, e.g. with different GCCs, on various lpars or zVM guests at the same time. During this time, I've run the test for 13500 times and obvserved the following fails: ~600x "before - after" ~60x "nanosleep time" ~70x "dead - after" I've also observed a lot of "before - after" fails on a intel kvm-guest while building/testing glibc on it. The mentioned commit has tighten the limits of valid tv_nsec ranges: "before - after" (expected: 500000000): - 100000000 ... 600000000 + 450000000 ... 550000000 "nanosleep time" (expected: 100000000): - 100000000 ... 200000000 + 090000000 ... 120000000 "dead - after" (expected: 100000000): - ... 200000000 + 090000000 ... 120000000 The test itself forks a child process which chew_cpu (user- and kernel-space). The parent process sleeps with nanosleep(0.5s) and measures the child_clock time: diff = after - before With much workload on the machine, the child won't make much progess and it can fall much beyond the minimum limit. Thus this check is now removed! Afterwards the parent process sleeps with clock_nanosleep (child_clock, 0.1s): diff = afterns - after The test currently also allows 0.9 * 0.1s. As this would be an error, the hard limit of 1.0 * 0.1s is now used as minimum border! Depending on the workload, the maximum limit can exceed the 1.2 * 0.1s. Therefore the upper limit is set to 2.0 which was also used before the mentioned commit. For "dead - after", the parent process kills the child process and waits long enough to let the child finish dying. Then it gets the time of the child: diff = dead - after Note that diff also contains the time for the previous clock_nanosleep. Thus you'll often see both fails at the same time. As we are here not interested in the timings but in checking the interface, this timing check is also removed. --- time/tst-cpuclock1.c | 67 ++++++++++---------------------------------- 1 file changed, 15 insertions(+), 52 deletions(-) diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c index 1ac611a92b..cc08150654 100644 --- a/time/tst-cpuclock1.c +++ b/time/tst-cpuclock1.c @@ -125,7 +125,7 @@ do_test (void) child, (unsigned long int) child_clock, (uintmax_t) res.tv_sec, (uintmax_t) res.tv_nsec); - struct timespec before, after; + struct timespec before; if (clock_gettime (child_clock, &before) < 0) { printf ("clock_gettime on live PID %d clock %lx => %s\n", @@ -137,38 +137,7 @@ do_test (void) printf ("live PID %d before sleep => %ju.%.9ju\n", child, (uintmax_t) before.tv_sec, (uintmax_t) before.tv_nsec); - struct timespec sleeptime = { .tv_nsec = 500000000 }; - if (nanosleep (&sleeptime, NULL) != 0) - { - perror ("nanosleep"); - result = 1; - goto done; - } - - if (clock_gettime (child_clock, &after) < 0) - { - printf ("clock_gettime on live PID %d clock %lx => %s\n", - child, (unsigned long int) child_clock, strerror (errno)); - result = 1; - goto done; - } - /* Should be close to 0.5. */ - printf ("live PID %d after sleep => %ju.%.9ju\n", - child, (uintmax_t) after.tv_sec, (uintmax_t) after.tv_nsec); - - /* The bound values are empirically defined by testing this code over high cpu - usage and different nice values. Of all the values we keep the 90th - percentile of values and use those values for our testing allowed range. */ - struct timespec diff = timespec_sub (support_timespec_normalize (after), - support_timespec_normalize (before)); - if (!support_timespec_check_in_range (sleeptime, diff, .9, 1.1)) - { - printf ("before - after %ju.%.9ju outside reasonable range\n", - (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); - result = 1; - } - - sleeptime.tv_nsec = 100000000; + struct timespec sleeptime = { .tv_nsec = 100000000 }; e = clock_nanosleep (child_clock, 0, &sleeptime, NULL); if (e == EINVAL || e == ENOTSUP || e == ENOSYS) { @@ -191,13 +160,19 @@ do_test (void) } else { - /* The bound values are empirically defined by testing this code over - high cpu usage and different nice values. Of all the values we keep - the 90th percentile of values and use those values for our testing - allowed range. */ + printf ("live PID %d after sleep => %ju.%.9ju\n", + child, (uintmax_t) afterns.tv_sec, + (uintmax_t) afterns.tv_nsec); + + /* As the sleep is based on the child clock, the diff should never + be less than the specified sleeptime. Otherwise this is an error. + The upper bound is quite high in order to get no failure if running + with high cpu usage and/or on virtualized environments with shared + CPUs. */ + struct timespec diff; diff = timespec_sub (support_timespec_normalize (afterns), - support_timespec_normalize (after)); - if (!support_timespec_check_in_range (sleeptime, diff, .9, 1.2)) + support_timespec_normalize (before)); + if (!support_timespec_check_in_range (sleeptime, diff, 1.0, 2.0)) { printf ("nanosleep time %ju.%.9ju outside reasonable range\n", (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); @@ -231,21 +206,9 @@ do_test (void) result = 1; goto done; } - /* Should be close to 0.6. */ + /* Should be close to 0.1. */ printf ("dead PID %d => %ju.%.9ju\n", child, (uintmax_t) dead.tv_sec, (uintmax_t) dead.tv_nsec); - /* The bound values are empirically defined by testing this code over high cpu - usage and different nice values. Of all the values we keep the 90th - percentile of values and use those values for our testing allowed range. */ - diff = timespec_sub (support_timespec_normalize (dead), - support_timespec_normalize (after)); - sleeptime.tv_nsec = 100000000; - if (!support_timespec_check_in_range (sleeptime, diff, .9, 1.2)) - { - printf ("dead - after %ju.%.9ju outside reasonable range\n", - (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); - result = 1; - } /* Now reap the child and verify that its clock is no longer valid. */ {