[v2] linux/getsysstats.c: use sysinfo() instead of parsing /proc/meminfo

Message ID 1439891734-1387-1-git-send-email-rv@rasmusvillemoes.dk
State Committed
Headers

Commit Message

Rasmus Villemoes Aug. 18, 2015, 9:55 a.m. UTC
  Profiling git's test suite, Linus noted [1] that a disproportionately
large amount of time was spent reading /proc/meminfo. This is done by
the glibc functions get_phys_pages and get_avphys_pages, but they only
need the MemTotal and MemFree fields, respectively. That same
information can be obtained with a single syscall, sysinfo, instead of
six: open, fstat, mmap, read, close, munmap. While sysinfo also
provides more than necessary, it does a lot less work than what the
kernel needs to do to provide the entire /proc/meminfo. Both strace -T
and in-app microbenchmarks shows that the sysinfo() approach is
roughly an order of magnitude faster.

sysinfo() is much older than what glibc currently requires, so I don't
think there's any reason to keep the old parsing code. Moreover, this
makes get_[av]phys_pages work even in the absence of /proc.

Linus noted that something as simple as 'bash -c "echo"' would trigger
the reading of /proc/meminfo, but gdb says that many more applications
than just bash are affected:

Starting program: /bin/bash "-c" "echo"

Breakpoint 1, __get_phys_pages () at ../sysdeps/unix/sysv/linux/getsysstats.c:283
283     ../sysdeps/unix/sysv/linux/getsysstats.c: No such file or directory.
(gdb) bt
#0  __get_phys_pages () at ../sysdeps/unix/sysv/linux/getsysstats.c:283
#1  0x00007ffff76d21c5 in posix_sysconf (name=<optimized out>) at ../sysdeps/posix/sysconf.c:634
#2  linux_sysconf (name=<optimized out>) at ../sysdeps/unix/sysv/linux/x86_64/../sysconf.c:136
#3  *__GI___sysconf (name=85) at ../sysdeps/unix/sysv/linux/x86_64/sysconf.c:37
#4  0x00007ffff765b02a in *(int0_t, long double) (b=<optimized out>, n=76, s=18446744073708915495, cmp=0x472e30, arg=0x0) at msort.c:188
#5  0x000000000042210e in ?? ()
#6  0x000000000042065d in main ()

So it seems that any application that uses qsort on a moderately sized
array will incur this cost (once), which is obviously proportionately
more expensive for lots of short-lived processes (such as the git test
suite).

[1] http://thread.gmane.org/gmane.linux.kernel/2019285

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>

	* sysdeps/unix/sysv/linux/getsysstats.c (__get_phys_pages):
	Use sysinfo system call instead of parsing /proc/meminfo.
	* sysdeps/unix/sysv/linux/getsysstats.c (__get_avphys_pages):
	Likewise.

---
v2: fix linknamespace violations and a few style issues.

 sysdeps/unix/sysv/linux/getsysstats.c         | 88 +++++++++------------------
 sysdeps/unix/sysv/linux/include/sys/sysinfo.h | 26 ++++++++
 sysdeps/unix/sysv/linux/syscalls.list         |  2 +-
 3 files changed, 57 insertions(+), 59 deletions(-)
 create mode 100644 sysdeps/unix/sysv/linux/include/sys/sysinfo.h
  

Comments

Mike Frysinger Aug. 18, 2015, 2:12 p.m. UTC | #1
looks fine to me.  will wait a bit to see if anyone else has an opinion.
-mike
  
Rasmus Villemoes Sept. 8, 2015, 7:02 a.m. UTC | #2
On Tue, Aug 18 2015, Mike Frysinger <vapier@gentoo.org> wrote:

> looks fine to me.  will wait a bit to see if anyone else has an
> opinion.

It doesn't seem that way. Could this get applied?

Rasmus
  
Mike Frysinger Sept. 13, 2015, 1:51 a.m. UTC | #3
On 08 Sep 2015 09:02, Rasmus Villemoes wrote:
> On Tue, Aug 18 2015, Mike Frysinger wrote:
> > looks fine to me.  will wait a bit to see if anyone else has an
> > opinion.
> 
> It doesn't seem that way. Could this get applied?

done now.  thanks!
-mike
  

Patch

diff --git a/sysdeps/unix/sysv/linux/getsysstats.c b/sysdeps/unix/sysv/linux/getsysstats.c
index 410f1a9..5bb5543 100644
--- a/sysdeps/unix/sysv/linux/getsysstats.c
+++ b/sysdeps/unix/sysv/linux/getsysstats.c
@@ -278,81 +278,53 @@  __get_nprocs_conf (void)
 }
 weak_alias (__get_nprocs_conf, get_nprocs_conf)
 
-/* General function to get information about memory status from proc
-   filesystem.  */
+
+/* Compute (num*mem_unit)/pagesize, but avoid overflowing long int.
+   In practice, mem_unit is never bigger than the page size, so after
+   the first loop it is 1.  [In the kernel, it is initialized to
+   PAGE_SIZE in mm/page_alloc.c:si_meminfo(), and then in
+   kernel.sys.c:do_sysinfo() it is set to 1 if unsigned long can
+   represent all the sizes measured in bytes].  */
 static long int
-internal_function
-phys_pages_info (const char *format)
+sysinfo_mempages (unsigned long int num, unsigned int mem_unit)
 {
-  char buffer[8192];
-  long int result = -1;
+  unsigned long int ps = __getpagesize ();
 
-  /* If we haven't found an appropriate entry return 1.  */
-  FILE *fp = fopen ("/proc/meminfo", "rce");
-  if (fp != NULL)
+  while (mem_unit > 1 && ps > 1)
     {
-      /* No threads use this stream.  */
-      __fsetlocking (fp, FSETLOCKING_BYCALLER);
-
-      result = 0;
-      /* Read all lines and count the lines starting with the
-	 string "processor".  We don't have to fear extremely long
-	 lines since the kernel will not generate them.  8192
-	 bytes are really enough.  */
-      while (__fgets_unlocked (buffer, sizeof buffer, fp) != NULL)
-	if (sscanf (buffer, format, &result) == 1)
-	  {
-	    result /= (__getpagesize () / 1024);
-	    break;
-	  }
-
-      fclose (fp);
+      mem_unit >>= 1;
+      ps >>= 1;
     }
-
-  if (result == -1)
-    /* We cannot get the needed value: signal an error.  */
-    __set_errno (ENOSYS);
-
-  return result;
+  num *= mem_unit;
+  while (ps > 1)
+    {
+      ps >>= 1;
+      num >>= 1;
+    }
+  return num;
 }
 
-
-/* Return the number of pages of physical memory in the system.  There
-   is currently (as of version 2.0.21) no system call to determine the
-   number.  It is planned for the 2.1.x series to add this, though.
-
-   One possibility to implement it for systems using Linux 2.0 is to
-   examine the pseudo file /proc/cpuinfo.  Here we have one entry for
-   each processor.
-
-   But not all systems have support for the /proc filesystem.  If it
-   is not available we return -1 as an error signal.  */
+/* Return the number of pages of total/available physical memory in
+   the system.  This used to be done by parsing /proc/meminfo, but
+   that's unnecessarily expensive (and /proc is not always available).
+   The sysinfo syscall provides the same information, and has been
+   available at least since kernel 2.3.48.  */
 long int
 __get_phys_pages (void)
 {
-  /* XXX Here will come a test for the new system call.  */
+  struct sysinfo info;
 
-  return phys_pages_info ("MemTotal: %ld kB");
+  __sysinfo (&info);
+  return sysinfo_mempages (info.totalram, info.mem_unit);
 }
 weak_alias (__get_phys_pages, get_phys_pages)
 
-
-/* Return the number of available pages of physical memory in the
-   system.  There is currently (as of version 2.0.21) no system call
-   to determine the number.  It is planned for the 2.1.x series to add
-   this, though.
-
-   One possibility to implement it for systems using Linux 2.0 is to
-   examine the pseudo file /proc/cpuinfo.  Here we have one entry for
-   each processor.
-
-   But not all systems have support for the /proc filesystem.  If it
-   is not available we return -1 as an error signal.  */
 long int
 __get_avphys_pages (void)
 {
-  /* XXX Here will come a test for the new system call.  */
+  struct sysinfo info;
 
-  return phys_pages_info ("MemFree: %ld kB");
+  __sysinfo (&info);
+  return sysinfo_mempages (info.freeram, info.mem_unit);
 }
 weak_alias (__get_avphys_pages, get_avphys_pages)
diff --git a/sysdeps/unix/sysv/linux/include/sys/sysinfo.h b/sysdeps/unix/sysv/linux/include/sys/sysinfo.h
new file mode 100644
index 0000000..45b7126
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/include/sys/sysinfo.h
@@ -0,0 +1,26 @@ 
+/* Internal declarations for sys/sysinfo.h.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _INCLUDE_SYS_SYSINFO_H
+#define _INCLUDE_SYS_SYSINFO_H	1
+
+#include_next <sys/sysinfo.h>
+
+extern __typeof (sysinfo) __sysinfo __THROW;
+
+#endif /* sys/sysinfo.h */
diff --git a/sysdeps/unix/sysv/linux/syscalls.list b/sysdeps/unix/sysv/linux/syscalls.list
index ed5d38f..62bb3cc 100644
--- a/sysdeps/unix/sysv/linux/syscalls.list
+++ b/sysdeps/unix/sysv/linux/syscalls.list
@@ -73,7 +73,7 @@  setpgid		-	setpgid		i:ii	__setpgid	setpgid
 sigaltstack	-	sigaltstack	i:PP	__sigaltstack	sigaltstack
 splice		EXTRA	splice		Ci:iPiPii	splice
 stime		-	stime		i:p	stime
-sysinfo		EXTRA	sysinfo		i:p	sysinfo
+sysinfo		EXTRA	sysinfo		i:p	__sysinfo	sysinfo
 swapon		-	swapon		i:si	__swapon	swapon
 swapoff		-	swapoff		i:s	__swapoff	swapoff
 tee		EXTRA	tee		Ci:iiii	tee