[2/2] build-many-glibcs: Impose a memory limit on build processes.

Message ID 20170709141348.16337-2-zackw@panix.com
State Superseded
Headers

Commit Message

Zack Weinberg July 9, 2017, 2:13 p.m. UTC
  There are sometimes bugs in the compiler
(e.g. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78460) that cause
it to consume all available memory.  To limit the impact of this on
automated test robots, impose memory limits on all subprocesses in
build-many-glibcs.py.  When the bug hits, the compiler will still run
out of memory and crash, but that should not affect any other
simultaneous task.

The limit can be configured with the --memory-limit command line switch.
The default is 1.5 gigabytes or physical RAM divided by the number of
jobs to run in parallel, whichever is larger.  (Empirically, 1.5 gigs
per process is enough for everything but the files affected by GCC
bug 78640, but 1 gig per process is insufficient for some of the math
tests and also for the "genautomata" step when building compilers for
powerpc64.)

Rather than continue to lengthen the argument list of the Context
constructor, it now takes the entire 'opts' object as its sole argument.

	* scripts/build-many-glibcs.py (total_ram): New function.
	(Context.set_memory_limits): New function.
	(Context.run_builds): Call set_memory_limits immediately before
	do_build.
	(get_parser): Add --memory-limit command-line switch.
	(Context.__init__): Take 'opts' object as sole argument.
	Add 'memory_limit' attribute to self.  Make topdir absolute here.
	(main): Update to match.
---
 scripts/build-many-glibcs.py | 105 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 87 insertions(+), 18 deletions(-)
  

Comments

Joseph Myers July 10, 2017, 10:48 a.m. UTC | #1
On Sun, 9 Jul 2017, Zack Weinberg wrote:

> The limit can be configured with the --memory-limit command line switch.
> The default is 1.5 gigabytes or physical RAM divided by the number of
> jobs to run in parallel, whichever is larger.  (Empirically, 1.5 gigs
> per process is enough for everything but the files affected by GCC
> bug 78640, but 1 gig per process is insufficient for some of the math
> tests and also for the "genautomata" step when building compilers for
> powerpc64.)

I think the default should allow more room than just being slightly above 
what's enough right now in a particular build (does --enable-checking, 
i.e. GCC mainline, use more memory? I don't know offhand) - say 3 GB.  
(The default may be different from the threshold for the warning for 
values expected to be too small.)

> +    def set_memory_limits(self):
> +        """Impose a memory-consumption limit on this process, and therefore
> +           all of the subprocesses it creates.  The limit can be set
> +           on the command line; the default is either physical RAM
> +           divided by the number of jobs to be run in parallel, or 1.5
> +           gigabytes, whichever is larger.  (1GB is too small for
> +           genautomata on MIPS and for the compilation of several
> +           large math test cases.)

Is it both powerpc64 and MIPS that need more than 1 GB, then?
  
Zack Weinberg July 10, 2017, 12:23 p.m. UTC | #2
On Mon, Jul 10, 2017 at 6:48 AM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Sun, 9 Jul 2017, Zack Weinberg wrote:
>
>> The limit can be configured with the --memory-limit command line switch.
>> The default is 1.5 gigabytes or physical RAM divided by the number of
>> jobs to run in parallel, whichever is larger.  (Empirically, 1.5 gigs
>> per process is enough for everything but the files affected by GCC
>> bug 78640, but 1 gig per process is insufficient for some of the math
>> tests and also for the "genautomata" step when building compilers for
>> powerpc64.)
>
> I think the default should allow more room than just being slightly above
> what's enough right now in a particular build (does --enable-checking,
> i.e. GCC mainline, use more memory? I don't know offhand) - say 3 GB.
> (The default may be different from the threshold for the warning for
> values expected to be too small.)

So I actually wanted to make it _lower_ -- my usual build machine has
8 CPUs and 8G of RAM, so anything higher than 1G/concurrent process
risks going into swap.  But then genautomata. :-(

I think what I'll do -- after the powerpc situation is resolved -- is
put together some instrumentation to find out how much RAM is used
both on average and at worst; that will let us make more informed
decisions about the defaults.  I'll try GCC mainline as well.

>> +           gigabytes, whichever is larger.  (1GB is too small for
>> +           genautomata on MIPS and for the compilation of several
>> +           large math test cases.)
>
> Is it both powerpc64 and MIPS that need more than 1 GB, then?

No, it's just powerpc64, that's a mistake in the comment.

zw
  
Joseph Myers July 10, 2017, 1:12 p.m. UTC | #3
On Mon, 10 Jul 2017, Zack Weinberg wrote:

> So I actually wanted to make it _lower_ -- my usual build machine has
> 8 CPUs and 8G of RAM, so anything higher than 1G/concurrent process
> risks going into swap.  But then genautomata. :-(
> 
> I think what I'll do -- after the powerpc situation is resolved -- is
> put together some instrumentation to find out how much RAM is used
> both on average and at worst; that will let us make more informed
> decisions about the defaults.  I'll try GCC mainline as well.

Since there are six SH multilibs, 6 * limit + (CPUs - 6) * average is 
probably the peak amount that gets used in practice.  (My bots use a limit 
of 16 GB - set before starting build-many-glibcs.py - on systems with 32 
hardware threads and 128 GB RAM.)

Also note incidentally: if anyone wanted to use build-many-glibcs.py to 
run build tools that had been built with Address Sanitizer, they'd need to 
disable the limits (Address Sanitizer uses a very large amount of address 
space, I think 16 TB).
  

Patch

diff --git a/scripts/build-many-glibcs.py b/scripts/build-many-glibcs.py
index 7b267d14d9..a4d72f29e4 100755
--- a/scripts/build-many-glibcs.py
+++ b/scripts/build-many-glibcs.py
@@ -86,33 +86,56 @@  except:
 
     subprocess.run = _run
 
+def total_ram():
+    """Retrieve the total amount of physical RAM available on this computer."""
+
+    # This can't be done cross-platform using the Python standard library.
+    # If the add-on 'psutil' module is available, use it.
+    try:
+        import psutil
+        # Despite the name, virtual_memory() counts only physical RAM, not swap.
+        return psutil.virtual_memory().total
+
+    except ImportError:
+        pass
+
+    # This works on Linux, but (reportedly) not on all other Unixes.
+    try:
+        return \
+            os.sysconf('SC_PAGESIZE') * os.sysconf('SC_PHYS_PAGES')
+
+    except:
+        pass
+
+    # We don't know.  Return a very large number.
+    return sys.maxsize
 
 class Context(object):
     """The global state associated with builds in a given directory."""
 
-    def __init__(self, topdir, parallelism, keep, replace_sources, strip,
-                 action):
+    def __init__(self, opts):
         """Initialize the context."""
-        self.topdir = topdir
-        self.parallelism = parallelism
-        self.keep = keep
-        self.replace_sources = replace_sources
-        self.strip = strip
-        self.srcdir = os.path.join(topdir, 'src')
+        self.topdir = os.path.abspath(opts.topdir)
+        self.parallelism = opts.parallelism
+        self.memory_limit = opts.memory_limit
+        self.keep = opts.keep
+        self.replace_sources = opts.replace_sources
+        self.strip = opts.strip
+        self.srcdir = os.path.join(self.topdir, 'src')
         self.versions_json = os.path.join(self.srcdir, 'versions.json')
-        self.build_state_json = os.path.join(topdir, 'build-state.json')
-        self.bot_config_json = os.path.join(topdir, 'bot-config.json')
-        self.installdir = os.path.join(topdir, 'install')
+        self.build_state_json = os.path.join(self.topdir, 'build-state.json')
+        self.bot_config_json = os.path.join(self.topdir, 'bot-config.json')
+        self.installdir = os.path.join(self.topdir, 'install')
         self.host_libraries_installdir = os.path.join(self.installdir,
                                                       'host-libraries')
-        self.builddir = os.path.join(topdir, 'build')
-        self.logsdir = os.path.join(topdir, 'logs')
-        self.logsdir_old = os.path.join(topdir, 'logs-old')
+        self.builddir = os.path.join(self.topdir, 'build')
+        self.logsdir = os.path.join(self.topdir, 'logs')
+        self.logsdir_old = os.path.join(self.topdir, 'logs-old')
         self.makefile = os.path.join(self.builddir, 'Makefile')
         self.wrapper = os.path.join(self.builddir, 'wrapper')
         self.save_logs = os.path.join(self.builddir, 'save-logs')
         self.script_text = self.get_script_text()
-        if action != 'checkout':
+        if opts.action != 'checkout':
             self.build_triplet = self.get_build_triplet()
             self.glibc_version = self.get_glibc_version()
         self.configs = {}
@@ -134,6 +157,50 @@  class Context(object):
         sys.stdout.flush()
         os.execv(sys.executable, [sys.executable] + sys.argv)
 
+    def set_memory_limits(self):
+        """Impose a memory-consumption limit on this process, and therefore
+           all of the subprocesses it creates.  The limit can be set
+           on the command line; the default is either physical RAM
+           divided by the number of jobs to be run in parallel, or 1.5
+           gigabytes, whichever is larger.  (1GB is too small for
+           genautomata on MIPS and for the compilation of several
+           large math test cases.)
+        """
+        if self.memory_limit == 0:
+            return
+        try:
+            import resource
+        except ImportError as e:
+            print('warning: cannot set memory limit:', e)
+            return
+
+        if self.memory_limit is None:
+            physical_ram = total_ram()
+            memory_limit = int(max(physical_ram / self.parallelism,
+                                   1.5 * 1024 * 1024 * 1024))
+        else:
+            if memory_limit < 1.5:
+                print('warning: memory limit %.5g GB known to be too small'
+                      % memory_limit)
+            memory_limit = int(memory_limit * 1024 * 1024 * 1024)
+
+        set_a_limit = False
+        for mem_rsrc_name in ['RLIMIT_DATA', 'RLIMIT_STACK', 'RLIMIT_RSS',
+                              'RLIMIT_VMEM', 'RLIMIT_AS']:
+            mem_rsrc = getattr(resource, mem_rsrc_name, None)
+            if mem_rsrc is not None:
+                soft, hard = resource.getrlimit(mem_rsrc)
+                if hard == resource.RLIM_INFINITY or hard > memory_limit:
+                    hard = memory_limit
+                if soft == resource.RLIM_INFINITY or soft > hard:
+                    soft = hard
+                resource.setrlimit(mem_rsrc, (soft, hard))
+                set_a_limit = True
+
+        if set_a_limit:
+            print("Per-process memory limit set to %.5g GB." %
+                  (memory_limit / (1024 * 1024 * 1024)))
+
     def get_build_triplet(self):
         """Determine the build triplet with config.guess."""
         config_guess = os.path.join(self.component_srcdir('gcc'),
@@ -465,6 +532,7 @@  class Context(object):
             old_versions = self.build_state['compilers']['build-versions']
             self.build_glibcs(configs)
         self.write_files()
+        self.set_memory_limits()
         self.do_build()
         if configs:
             # Partial build, do not update stored state.
@@ -1589,6 +1657,9 @@  def get_parser():
     parser.add_argument('-j', dest='parallelism',
                         help='Run this number of jobs in parallel',
                         type=int, default=os.cpu_count())
+    parser.add_argument('--memory-limit',
+                        help='Per-process memory limit in gigabytes (0 for unlimited)',
+                        type=float, default=None)
     parser.add_argument('--keep', dest='keep',
                         help='Whether to keep all build directories, '
                         'none or only those from failed builds',
@@ -1614,9 +1685,7 @@  def main(argv):
     """The main entry point."""
     parser = get_parser()
     opts = parser.parse_args(argv)
-    topdir = os.path.abspath(opts.topdir)
-    ctx = Context(topdir, opts.parallelism, opts.keep, opts.replace_sources,
-                  opts.strip, opts.action)
+    ctx = Context(opts)
     ctx.run_builds(opts.action, opts.configs)