Patchwork [2/6] linux: Assume clock_getres CLOCK_{PROCESS,THREAD}_CPUTIME_ID

login
register
mail settings
Submitter Adhemerval Zanella Netto
Date Feb. 8, 2019, 6:39 p.m.
Message ID <20190208183910.30799-2-adhemerval.zanella@linaro.org>
Download mbox | patch
Permalink /patch/31363/
State New
Headers show

Comments

Adhemerval Zanella Netto - Feb. 8, 2019, 6:39 p.m.
This patch assumes that clock_getres syscall always support
CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, so there is no need
to fallback to hp-timing support for _SC_MONOTONIC_CLOCK.  This allows
simplify the sysconf support to always use the syscall.

The ia64 implementation is also simplified and consolidate in one file.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu.

	* sysdeps/unix/sysv/linux/ia64/has_cpuclock.c: Remove file.
	* sysdeps/unix/sysv/linux/ia64/sysconf.c (ia64_check_cpuclock): New
	function.
	* sysdeps/unix/sysv/linux/sysconf.c (has_cpuclock): Remove function.
	(check_clock_getres): New function.
	(__sysconf): Use check_clock_getres instead of has_cpuclock.
---
 sysdeps/unix/sysv/linux/ia64/has_cpuclock.c | 51 ---------------------
 sysdeps/unix/sysv/linux/ia64/sysconf.c      | 36 ++++++++++++---
 sysdeps/unix/sysv/linux/sysconf.c           | 46 ++++++-------------
 3 files changed, 44 insertions(+), 89 deletions(-)
 delete mode 100644 sysdeps/unix/sysv/linux/ia64/has_cpuclock.c
Florian Weimer - Feb. 9, 2019, 10:35 a.m.
* Adhemerval Zanella:

> This patch assumes that clock_getres syscall always support
> CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, so there is no need
> to fallback to hp-timing support for _SC_MONOTONIC_CLOCK.  This allows
> simplify the sysconf support to always use the syscall.
>
> The ia64 implementation is also simplified and consolidate in one
> file.

Is the ia64-specific implementation really needed?

My impression is that we plan to use CPU-internal timing (such as RDTSC
on x86) in the dynamic linker even if it will produce wrong results.  I
don't like this, but I don't feel strongly about this.

But this means that nothing actually needs the information whether
CPU-internal timing on ia64 is accurate.  If there is a kernel vDSO that
uses the CPU counters for CLOCK_MONOTONIC or CLOCK_MONOTONIC_RAW, then
it's the job of the kernel to ensure that only avoids the syscall if the
counters are accurate.

Thanks,
Florian
Adhemerval Zanella Netto - Feb. 11, 2019, 11:40 a.m.
On 09/02/2019 08:35, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> This patch assumes that clock_getres syscall always support
>> CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, so there is no need
>> to fallback to hp-timing support for _SC_MONOTONIC_CLOCK.  This allows
>> simplify the sysconf support to always use the syscall.
>>
>> The ia64 implementation is also simplified and consolidate in one
>> file.
> 
> Is the ia64-specific implementation really needed?

I am not sure, the information I am gathering indicates taht processors with 
ITC drift still requires user adjustments according to IA64 manual:

"The interval counter can be written, for initialization purposes, by
privileged code. The ITC is not architecturally guaranteed to be
synchronized with any other processor's interval time counter in a 
multi-processor system, nor is it synchronized with the wall clock.
*Software must calibrate interval timer ticks to wall clock time and
periodically adjust for drift." (3.3.4.2 page 3-15) [1].

Also Linux code has these two comments:

arch/ia64/kernel/time.c
283 #ifdef CONFIG_SMP
284                 /* On IA64 in an SMP configuration ITCs are never accurately synchronized.
285                  * Jitter compensation requires a cmpxchg which may limit
286                  * the scalability of the syscalls for retrieving time.
287                  * The ITC synchronization is usually successful to within a few
288                  * ITC ticks but this is not a sure thing. If you need to improve
289                  * timer performance in SMP situations then boot the kernel with the
290                  * "nojitter" option. However, doing so may result in time fluctuating (maybe
291                  * even going backward) if the ITC offsets between the individual CPUs
292                  * are too large.
293                  */
294                 if (!nojitter)
295                         itc_jitter_data.itc_jitter = 1;
296 #endif
297         } else
298                 /*
299                  * ITC is drifty and we have not synchronized the ITCs in smpboot.c.
300                  * ITC values may fluctuate significantly between processors.
301                  * Clock should not be used for hrtimers. Mark itc as only
302                  * useful for boot and testing.
303                  *
304                  * Note that jitter compensation is off! There is no point of
305                  * synchronizing ITCs since they may be large differentials
306                  * that change over time.
307                  *
308                  * The only way to fix this would be to repeatedly sync the
309                  * ITCs. Until that time we have to avoid ITC.
310                  */
311                 clocksource_itc.rating = 50;

So taking in consideration that kernel still thinks it is important to
explicit export '/proc/sal/itc_drift' and the fact that clock_gettime
ia syscall implementation (arch/ia64/kernel/fsys.S) uses a quite complex
code to adjust the timer code using itc jitter timer reference, I think
it is still relevant for some IA64 chips that does not have itc_drift
as '1' (ITC may drift).

The Itanium system I have access, Itanium 2 9020, does not have ITC drift
so I can't really tell how it plays on syscalls.  Also I am not sure how
common is systems with ITC that may drift.

> 
> My impression is that we plan to use CPU-internal timing (such as RDTSC
> on x86) in the dynamic linker even if it will produce wrong results.  I
> don't like this, but I don't feel strongly about this.

I think the word here is not 'wrong', but rather 'inaccurate'.  I still
think it provides valuable profiling data about loader's work (assuming
loader work does not overflow the counter itself, as we do for alpha).

> 
> But this means that nothing actually needs the information whether
> CPU-internal timing on ia64 is accurate.  If there is a kernel vDSO that
> uses the CPU counters for CLOCK_MONOTONIC or CLOCK_MONOTONIC_RAW, then
> it's the job of the kernel to ensure that only avoids the syscall if the
> counters are accurate.

nscd uses _SC_MONOTONIC_CLOCK and if I understood correctly both IA64 
manual and kernel code, the CLOCK_MONOTONIC implementation on SMP IA64
system with ITC that may drift is a best effort...

> 
> Thanks,
> Florian
> 

[1] http://refspecs.linux-foundation.org/IA64-softdevman-vol2.pdf
Florian Weimer - Feb. 11, 2019, 11:43 a.m.
* Adhemerval Zanella:

> nscd uses _SC_MONOTONIC_CLOCK and if I understood correctly both IA64 
> manual and kernel code, the CLOCK_MONOTONIC implementation on SMP IA64
> system with ITC that may drift is a best effort...

Are you sure about that?  Wouldn't that be a bug in kernel
clock_gettime, and not something we should paper over in userspace?

Thanks,
Florian
Adhemerval Zanella Netto - Feb. 11, 2019, 12:10 p.m.
On 11/02/2019 09:43, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> nscd uses _SC_MONOTONIC_CLOCK and if I understood correctly both IA64 
>> manual and kernel code, the CLOCK_MONOTONIC implementation on SMP IA64
>> system with ITC that may drift is a best effort...
> 
> Are you sure about that?  Wouldn't that be a bug in kernel
> clock_gettime, and not something we should paper over in userspace?

Not sure, I was just my impression checking kernel ia64 code but I am not
an architecture expert. To be safe I kept the original code in patch
proposal.
Florian Weimer - Feb. 11, 2019, 12:54 p.m.
* Adhemerval Zanella:

> On 11/02/2019 09:43, Florian Weimer wrote:
>> * Adhemerval Zanella:
>> 
>>> nscd uses _SC_MONOTONIC_CLOCK and if I understood correctly both IA64 
>>> manual and kernel code, the CLOCK_MONOTONIC implementation on SMP IA64
>>> system with ITC that may drift is a best effort...
>> 
>> Are you sure about that?  Wouldn't that be a bug in kernel
>> clock_gettime, and not something we should paper over in userspace?
>
> Not sure, I was just my impression checking kernel ia64 code but I am
> not an architecture expert. To be safe I kept the original code in
> patch proposal.

I think it's not worth maintaing the ia64 divergence for this.  If it is
a problem with the kernel's CLOCK_MONOTONIC implementation, it has to be
fixed there.

Thanks,
Florian
Carlos O'Donell - Feb. 11, 2019, 1:22 p.m.
On 2/11/19 7:54 AM, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> On 11/02/2019 09:43, Florian Weimer wrote:
>>> * Adhemerval Zanella:
>>>
>>>> nscd uses _SC_MONOTONIC_CLOCK and if I understood correctly both IA64 
>>>> manual and kernel code, the CLOCK_MONOTONIC implementation on SMP IA64
>>>> system with ITC that may drift is a best effort...
>>>
>>> Are you sure about that?  Wouldn't that be a bug in kernel
>>> clock_gettime, and not something we should paper over in userspace?
>>
>> Not sure, I was just my impression checking kernel ia64 code but I am
>> not an architecture expert. To be safe I kept the original code in
>> patch proposal.
> 
> I think it's not worth maintaing the ia64 divergence for this.  If it is
> a problem with the kernel's CLOCK_MONOTONIC implementation, it has to be
> fixed there.

I tend to agree. Standard interfaces should behave in standard ways.

We have the same problem in HPPA where a similar clock (and my feeling was
always that IA64 derived a lot from HP PARISC 64-bit) that needs synchronization
at SMP rendevous at boot time, followed by bias and drift adjustment.

For HPPA we made sue that the everything was accurately compensated at the
kernel timer source level, and never in userspace. You *can* do it in userspace
and it's faster and lower latency, and all of those things, but it's more complex
and prone to breakage for odd reasons (compiler optimizations etc.).

For HPPA you can access the tick counter in userspace, raw, and it's
useful for timing small sequences of code, but if you move to a new CPU
mid-measurement, you'll get wrong values because the value and bias is per-cpu.
Adhemerval Zanella Netto - Feb. 11, 2019, 1:49 p.m.
On 11/02/2019 11:22, Carlos O'Donell wrote:
> On 2/11/19 7:54 AM, Florian Weimer wrote:
>> * Adhemerval Zanella:
>>
>>> On 11/02/2019 09:43, Florian Weimer wrote:
>>>> * Adhemerval Zanella:
>>>>
>>>>> nscd uses _SC_MONOTONIC_CLOCK and if I understood correctly both IA64 
>>>>> manual and kernel code, the CLOCK_MONOTONIC implementation on SMP IA64
>>>>> system with ITC that may drift is a best effort...
>>>>
>>>> Are you sure about that?  Wouldn't that be a bug in kernel
>>>> clock_gettime, and not something we should paper over in userspace?
>>>
>>> Not sure, I was just my impression checking kernel ia64 code but I am
>>> not an architecture expert. To be safe I kept the original code in
>>> patch proposal.
>>
>> I think it's not worth maintaing the ia64 divergence for this.  If it is
>> a problem with the kernel's CLOCK_MONOTONIC implementation, it has to be
>> fixed there.
> 
> I tend to agree. Standard interfaces should behave in standard ways.
> 
> We have the same problem in HPPA where a similar clock (and my feeling was
> always that IA64 derived a lot from HP PARISC 64-bit) that needs synchronization
> at SMP rendevous at boot time, followed by bias and drift adjustment.
> 
> For HPPA we made sue that the everything was accurately compensated at the
> kernel timer source level, and never in userspace. You *can* do it in userspace
> and it's faster and lower latency, and all of those things, but it's more complex
> and prone to breakage for odd reasons (compiler optimizations etc.).
> 
> For HPPA you can access the tick counter in userspace, raw, and it's
> useful for timing small sequences of code, but if you move to a new CPU
> mid-measurement, you'll get wrong values because the value and bias is per-cpu.
> 

Alright, I will refactor it to remove ia64 idiosyncrasies.
Florian Weimer - Feb. 11, 2019, 2:04 p.m.
* Carlos O'Donell:

> For HPPA we made sue that the everything was accurately compensated at
> the kernel timer source level, and never in userspace. You *can* do it
> in userspace and it's faster and lower latency, and all of those
> things, but it's more complex and prone to breakage for odd reasons
> (compiler optimizations etc.).

And the kernel can do it in userspace via the vDSO, like on other
architectures.

Thanks,
Florian

Patch

diff --git a/sysdeps/unix/sysv/linux/ia64/has_cpuclock.c b/sysdeps/unix/sysv/linux/ia64/has_cpuclock.c
deleted file mode 100644
index b3afb37f8b..0000000000
--- a/sysdeps/unix/sysv/linux/ia64/has_cpuclock.c
+++ /dev/null
@@ -1,51 +0,0 @@ 
-/* Copyright (C) 2000-2019 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <errno.h>
-#include <fcntl.h>
-#include <unistd.h>
-#include <sys/stat.h>
-#include <sys/types.h>
-#include <not-cancel.h>
-
-static int itc_usable;
-
-static int
-has_cpuclock (void)
-{
-  if (__builtin_expect (itc_usable == 0, 0))
-    {
-      int newval = 1;
-      int fd = __open_nocancel ("/proc/sal/itc_drift", O_RDONLY);
-      if (__builtin_expect (fd != -1, 1))
-	{
-	  char buf[16];
-	  /* We expect the file to contain a single digit followed by
-	     a newline.  If the format changes we better not rely on
-	     the file content.  */
-	  if (__read_nocancel (fd, buf, sizeof buf) != 2
-	      || buf[0] != '0' || buf[1] != '\n')
-	    newval = -1;
-
-	  __close_nocancel_nostatus (fd);
-	}
-
-      itc_usable = newval;
-    }
-
-  return itc_usable;
-}
diff --git a/sysdeps/unix/sysv/linux/ia64/sysconf.c b/sysdeps/unix/sysv/linux/ia64/sysconf.c
index ef75322f1f..48bb008f50 100644
--- a/sysdeps/unix/sysv/linux/ia64/sysconf.c
+++ b/sysdeps/unix/sysv/linux/ia64/sysconf.c
@@ -16,15 +16,39 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include <assert.h>
 #include <stdbool.h>
 #include <stdlib.h>
 #include <unistd.h>
-
-
-#include "has_cpuclock.c"
-#define HAS_CPUCLOCK(name) (has_cpuclock () ? _POSIX_VERSION : -1)
-
+#include <not-cancel.h>
+
+static bool itc_usable;
+
+static bool
+ia64_check_cpuclock (void)
+{
+  if (__glibc_unlikely (itc_usable == 0))
+    {
+      int newval = true;
+      int fd = __open_nocancel ("/proc/sal/itc_drift", O_RDONLY);
+      if (__glibc_likely (fd != -1))
+	{
+	  char buf[16];
+	  /* We expect the file to contain a single digit followed by
+	     a newline.  If the format changes we better not rely on
+	     the file content.  */
+	  if (__read_nocancel (fd, buf, sizeof buf) != 2
+	      || buf[0] != '0' || buf[1] != '\n')
+	    newval = false;
+
+	  __close_nocancel_nostatus (fd);
+	}
+
+      itc_usable = newval;
+    }
+
+  return itc_usable;
+}
+#define HAS_CPUCLOCK(name) (ia64_check_cpuclock () ? _POSIX_VERSION : -1)
 
 /* Now the generic Linux version.  */
 #include <sysdeps/unix/sysv/linux/sysconf.c>
diff --git a/sysdeps/unix/sysv/linux/sysconf.c b/sysdeps/unix/sysv/linux/sysconf.c
index 4b297ba35f..b870a21810 100644
--- a/sysdeps/unix/sysv/linux/sysconf.c
+++ b/sysdeps/unix/sysv/linux/sysconf.c
@@ -35,31 +35,19 @@ 
 static long int posix_sysconf (int name);
 
 
-#ifndef HAS_CPUCLOCK
 static long int
-has_cpuclock (int name)
+check_clock_getres (clockid_t clk_id)
 {
-# if defined __NR_clock_getres || HP_TIMING_AVAIL
-  /* If we have HP_TIMING, we will fall back on that if the system
-     call does not work, so we support it either way.  */
-#  if !HP_TIMING_AVAIL
-  /* Check using the clock_getres system call.  */
   struct timespec ts;
   INTERNAL_SYSCALL_DECL (err);
-  int r = INTERNAL_SYSCALL (clock_getres, err, 2,
-			    (name == _SC_CPUTIME
-			     ? CLOCK_PROCESS_CPUTIME_ID
-			     : CLOCK_THREAD_CPUTIME_ID),
-			    &ts);
+  int r = INTERNAL_SYSCALL_CALL (clock_getres, err, clk_id, &ts);
   if (INTERNAL_SYSCALL_ERROR_P (r, err))
     return -1;
-#  endif
   return _POSIX_VERSION;
-# else
-  return -1;
-# endif
 }
-# define HAS_CPUCLOCK(name) has_cpuclock (name)
+
+#ifndef HAS_CPUCLOCK
+# define HAS_CPUCLOCK(name) check_clock_getres (name)
 #endif
 
 
@@ -71,29 +59,22 @@  __sysconf (int name)
 
   switch (name)
     {
-      struct rlimit rlimit;
-#ifdef __NR_clock_getres
     case _SC_MONOTONIC_CLOCK:
-      /* Check using the clock_getres system call.  */
-      {
-	struct timespec ts;
-	INTERNAL_SYSCALL_DECL (err);
-	int r;
-	r = INTERNAL_SYSCALL (clock_getres, err, 2, CLOCK_MONOTONIC, &ts);
-	return INTERNAL_SYSCALL_ERROR_P (r, err) ? -1 : _POSIX_VERSION;
-      }
-#endif
+      return check_clock_getres (CLOCK_MONOTONIC);
 
     case _SC_CPUTIME:
+      return HAS_CPUCLOCK (CLOCK_PROCESS_CPUTIME_ID);
     case _SC_THREAD_CPUTIME:
-      return HAS_CPUCLOCK (name);
+      return HAS_CPUCLOCK (CLOCK_THREAD_CPUTIME_ID);
 
-    case _SC_ARG_MAX:
+    case _SC_ARG_MAX: {
+      struct rlimit rlimit;
       /* Use getrlimit to get the stack limit.  */
       if (__getrlimit (RLIMIT_STACK, &rlimit) == 0)
 	return MAX (legacy_ARG_MAX, rlimit.rlim_cur / 4);
 
       return legacy_ARG_MAX;
+    } break;
 
     case _SC_NGROUPS_MAX:
       /* Try to read the information from the /proc/sys/kernel/ngroups_max
@@ -101,13 +82,14 @@  __sysconf (int name)
       procfname = "/proc/sys/kernel/ngroups_max";
       break;
 
-    case _SC_SIGQUEUE_MAX:
+    case _SC_SIGQUEUE_MAX: {
+      struct rlimit rlimit;
       if (__getrlimit (RLIMIT_SIGPENDING, &rlimit) == 0)
 	return rlimit.rlim_cur;
 
       /* The /proc/sys/kernel/rtsig-max file contains the answer.  */
       procfname = "/proc/sys/kernel/rtsig-max";
-      break;
+    } break;
 
     default:
       break;