getsysstats: use sysinfo() instead of parsing /proc/meminfo

Message ID 1439471796-17003-1-git-send-email-rv@rasmusvillemoes.dk
State Superseded
Headers

Commit Message

Rasmus Villemoes Aug. 13, 2015, 1:16 p.m. UTC
  Profiling git's test suite, Linus noted [1] that a disproportionately
large amount of time was spent reading /proc/meminfo. This is done by
the glibc functions get_phys_pages and get_avphys_pages, but they only
need the MemTotal and MemFree fields, respectively. That same
information can be obtained with a single syscall, sysinfo, instead of
six: open, fstat, mmap, read, close, munmap. While sysinfo also
provides more than necessary, it does a lot less work than what the
kernel needs to do to provide the entire /proc/meminfo. Both strace -T
and in-app microbenchmarks shows that the sysinfo() approach is
roughly an order of magnitude faster.

sysinfo() is much older than what glibc currently requires, so I don't
think there's any reason to keep the old parsing code. Moreover, this
makes get_[av]phys_pages work even in the absence of /proc.

Linus noted that something as simple as 'bash -c "echo"' would trigger
the reading of /proc/meminfo, but gdb says that many more applications
than just bash are affected:

Starting program: /bin/bash "-c" "echo"

Breakpoint 1, __get_phys_pages () at ../sysdeps/unix/sysv/linux/getsysstats.c:283
283     ../sysdeps/unix/sysv/linux/getsysstats.c: No such file or directory.
(gdb) bt
#0  __get_phys_pages () at ../sysdeps/unix/sysv/linux/getsysstats.c:283
#1  0x00007ffff76d21c5 in posix_sysconf (name=<optimized out>) at ../sysdeps/posix/sysconf.c:634
#2  linux_sysconf (name=<optimized out>) at ../sysdeps/unix/sysv/linux/x86_64/../sysconf.c:136
#3  *__GI___sysconf (name=85) at ../sysdeps/unix/sysv/linux/x86_64/sysconf.c:37
#4  0x00007ffff765b02a in *(int0_t, long double) (b=<optimized out>, n=76, s=18446744073708915495, cmp=0x472e30, arg=0x0) at msort.c:188
#5  0x000000000042210e in ?? ()
#6  0x000000000042065d in main ()

So it seems that any application that uses qsort on a moderately sized
array will incur this cost (once), which is obviously proportionately
more expensive for lots of short-lived processes (such as the git test
suite).

[1] http://thread.gmane.org/gmane.linux.kernel/2019285

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
---
 sysdeps/unix/sysv/linux/getsysstats.c | 88 ++++++++++++-----------------------
 1 file changed, 30 insertions(+), 58 deletions(-)
  

Comments

Ondrej Bilka Aug. 13, 2015, 2:35 p.m. UTC | #1
On Thu, Aug 13, 2015 at 03:16:36PM +0200, Rasmus Villemoes wrote:
> Profiling git's test suite, Linus noted [1] that a disproportionately
> large amount of time was spent reading /proc/meminfo. This is done by
> the glibc functions get_phys_pages and get_avphys_pages, but they only
> need the MemTotal and MemFree fields, respectively. That same
> information can be obtained with a single syscall, sysinfo, instead of
> six: open, fstat, mmap, read, close, munmap. While sysinfo also
> provides more than necessary, it does a lot less work than what the
> kernel needs to do to provide the entire /proc/meminfo. Both strace -T
> and in-app microbenchmarks shows that the sysinfo() approach is
> roughly an order of magnitude faster.
> 
> sysinfo() is much older than what glibc currently requires, so I don't
> think there's any reason to keep the old parsing code. Moreover, this
> makes get_[av]phys_pages work even in the absence of /proc.
> 
> Linus noted that something as simple as 'bash -c "echo"' would trigger
> the reading of /proc/meminfo, but gdb says that many more applications
> than just bash are affected:
> 
> Starting program: /bin/bash "-c" "echo"
> 
> Breakpoint 1, __get_phys_pages () at ../sysdeps/unix/sysv/linux/getsysstats.c:283
> 283     ../sysdeps/unix/sysv/linux/getsysstats.c: No such file or directory.
> (gdb) bt
> #0  __get_phys_pages () at ../sysdeps/unix/sysv/linux/getsysstats.c:283
> #1  0x00007ffff76d21c5 in posix_sysconf (name=<optimized out>) at ../sysdeps/posix/sysconf.c:634
> #2  linux_sysconf (name=<optimized out>) at ../sysdeps/unix/sysv/linux/x86_64/../sysconf.c:136
> #3  *__GI___sysconf (name=85) at ../sysdeps/unix/sysv/linux/x86_64/sysconf.c:37
> #4  0x00007ffff765b02a in *(int0_t, long double) (b=<optimized out>, n=76, s=18446744073708915495, cmp=0x472e30, arg=0x0) at msort.c:188
> #5  0x000000000042210e in ?? ()
> #6  0x000000000042065d in main ()
> 
> So it seems that any application that uses qsort on a moderately sized
> array will incur this cost (once), which is obviously proportionately
> more expensive for lots of short-lived processes (such as the git test
> suite).
> 
> [1] http://thread.gmane.org/gmane.linux.kernel/2019285
>

Looks almost ok.

There are two details, first is that we memunit needs to be power of
two. Second comment is couldn't simply you use uint64 for computation.
As product is physical memory it should fit into 64 bits.
  
Joseph Myers Aug. 13, 2015, 2:53 p.m. UTC | #2
On Thu, 13 Aug 2015, Rasmus Villemoes wrote:

> +sysinfo_mempages(unsigned long int num, unsigned int mem_unit)

Missing space before '('.  Appears several times in this patch.

How was this patch tested?  I'd have expected linknamespace test failures 
from the calls to sysinfo (that is, I'd expect you to need to make sysinfo 
into a weak alias for __sysinfo and call __sysinfo here to avoid such 
namespace issues).
  
Rasmus Villemoes Aug. 14, 2015, 12:04 p.m. UTC | #3
On Thu, Aug 13 2015, Joseph Myers <joseph@codesourcery.com> wrote:

> On Thu, 13 Aug 2015, Rasmus Villemoes wrote:
>
>> +sysinfo_mempages(unsigned long int num, unsigned int mem_unit)
>
> Missing space before '('.  Appears several times in this patch.
>
> How was this patch tested?

Only compile-tested, admittedly.

> I'd have expected linknamespace test failures from the calls to
> sysinfo (that is, I'd expect you to need to make sysinfo into a weak
> alias for __sysinfo and call __sysinfo here to avoid such namespace
> issues).

You're right. But I'm afraid I'll need more specific advice to fix
that. Doing

-sysinfo                EXTRA   sysinfo         i:p     sysinfo
+sysinfo                EXTRA   sysinfo         i:p     __sysinfo       sysinfo

to syscalls.list wasn't enough. Where should I put a prototype for
__sysinfo? I assume sysdeps/unix/sysv/linux/sys/sysinfo.h is the wrong
place, since that's the installed header.

Rasmus
  
Joseph Myers Aug. 14, 2015, 12:09 p.m. UTC | #4
On Fri, 14 Aug 2015, Rasmus Villemoes wrote:

> to syscalls.list wasn't enough. Where should I put a prototype for
> __sysinfo? I assume sysdeps/unix/sysv/linux/sys/sysinfo.h is the wrong
> place, since that's the installed header.

Add a header sysdeps/unix/sysv/linux/include/sys/sysinfo.h - include/ 
headers are used for internal wrappers for installed headers.  It can be 
declared as "extern __typeof (sysinfo) __sysinfo __THROW;" to avoid 
duplicating the full prototype.
  

Patch

diff --git a/sysdeps/unix/sysv/linux/getsysstats.c b/sysdeps/unix/sysv/linux/getsysstats.c
index 410f1a9..c2410cc 100644
--- a/sysdeps/unix/sysv/linux/getsysstats.c
+++ b/sysdeps/unix/sysv/linux/getsysstats.c
@@ -278,81 +278,53 @@  __get_nprocs_conf (void)
 }
 weak_alias (__get_nprocs_conf, get_nprocs_conf)
 
-/* General function to get information about memory status from proc
-   filesystem.  */
+
+/* Compute (num*mem_unit)/pagesize, but avoid overflowing long int.
+   In practice, mem_unit is never bigger than the page size, so after
+   the first loop it is 1.  [In the kernel, it is initialized to
+   PAGE_SIZE in mm/page_alloc.c:si_meminfo(), and then in
+   kernel.sys.c:do_sysinfo() it is set to 1 if unsigned long can
+   represent all the sizes measured in bytes].  */
 static long int
-internal_function
-phys_pages_info (const char *format)
+sysinfo_mempages(unsigned long int num, unsigned int mem_unit)
 {
-  char buffer[8192];
-  long int result = -1;
+  unsigned long int ps = __getpagesize ();
 
-  /* If we haven't found an appropriate entry return 1.  */
-  FILE *fp = fopen ("/proc/meminfo", "rce");
-  if (fp != NULL)
+  while (mem_unit > 1 && ps > 1)
     {
-      /* No threads use this stream.  */
-      __fsetlocking (fp, FSETLOCKING_BYCALLER);
-
-      result = 0;
-      /* Read all lines and count the lines starting with the
-	 string "processor".  We don't have to fear extremely long
-	 lines since the kernel will not generate them.  8192
-	 bytes are really enough.  */
-      while (__fgets_unlocked (buffer, sizeof buffer, fp) != NULL)
-	if (sscanf (buffer, format, &result) == 1)
-	  {
-	    result /= (__getpagesize () / 1024);
-	    break;
-	  }
-
-      fclose (fp);
+      mem_unit >>= 1;
+      ps >>= 1;
     }
-
-  if (result == -1)
-    /* We cannot get the needed value: signal an error.  */
-    __set_errno (ENOSYS);
-
-  return result;
+  num *= mem_unit;
+  while (ps > 1)
+    {
+      ps >>= 1;
+      num >>= 1;
+    }
+  return num;
 }
 
-
-/* Return the number of pages of physical memory in the system.  There
-   is currently (as of version 2.0.21) no system call to determine the
-   number.  It is planned for the 2.1.x series to add this, though.
-
-   One possibility to implement it for systems using Linux 2.0 is to
-   examine the pseudo file /proc/cpuinfo.  Here we have one entry for
-   each processor.
-
-   But not all systems have support for the /proc filesystem.  If it
-   is not available we return -1 as an error signal.  */
+/* Return the number of pages of total/available physical memory in
+   the system.  This used to be done by parsing /proc/meminfo, but
+   that's unnecessarily expensive (and /proc is not always available).
+   The sysinfo syscall provides the same information, and has been
+   available at least since kernel 2.3.48.  */
 long int
 __get_phys_pages (void)
 {
-  /* XXX Here will come a test for the new system call.  */
+  struct sysinfo info;
 
-  return phys_pages_info ("MemTotal: %ld kB");
+  sysinfo(&info);
+  return sysinfo_mempages(info.totalram, info.mem_unit);
 }
 weak_alias (__get_phys_pages, get_phys_pages)
 
-
-/* Return the number of available pages of physical memory in the
-   system.  There is currently (as of version 2.0.21) no system call
-   to determine the number.  It is planned for the 2.1.x series to add
-   this, though.
-
-   One possibility to implement it for systems using Linux 2.0 is to
-   examine the pseudo file /proc/cpuinfo.  Here we have one entry for
-   each processor.
-
-   But not all systems have support for the /proc filesystem.  If it
-   is not available we return -1 as an error signal.  */
 long int
 __get_avphys_pages (void)
 {
-  /* XXX Here will come a test for the new system call.  */
+  struct sysinfo info;
 
-  return phys_pages_info ("MemFree: %ld kB");
+  sysinfo(&info);
+  return sysinfo_mempages(info.freeram, info.mem_unit);
 }
 weak_alias (__get_avphys_pages, get_avphys_pages)