[1/2,RFC] #17645, fix slow DSO sorting behavior in dynamic loader

Message ID fd8d80af-c01d-0f2d-8082-ecc8c5537600@mentor.com
State Superseded
Headers

Commit Message

Chung-Lin Tang July 25, 2019, 6:46 p.m. UTC
  On 2019/7/23 9:21 PM, Florian Weimer wrote:
> Is => intended to cover the case of run-time dependencies added late due
> to lazy binding?
> 
> Currently, those late dependencies have two effects, I think: They keep
> around the referenced libraries longer than before (so that dlclose
> would not remove an object which is still in used solely due to lazy
> binding).  And the ELF destructors are reordered to reflect these added
> run-time dependencies.

Yes, you can test that. The effect of => is to create a caller/callee relation
between objects: 'x=>y' creates fn_x() and fn_y() in those two DSOs,
and fn_x() has a call to fn_y().

Though that's the only immediate effect that => has. To construct a test of
run-time added dependencies related to dlopen/etc. you also need to add those
operations inside the '{}' construct.

All the created DSOs have a constructor/destructor that outputs their single
character name. The generated main() program prints '[]' brackets after
dlopen/dlclose calls to separate out the following constructor/destructor output.
So taken whole, the entire output string should capture all constructor/destructor
activity and ordering behavior.

> Can your test framework test both cases?  What's your position on the
> second effect?  I think it sometimes results in destructors running not
> in the opposite order of constructors, due to the new topological sort.
> (This also happens with the current implementation.)

What I did in the ld.so code patch was add a second pass of sorting that ignores
runtime deps, prioritizing link dependencies; this appears to also be what prior
discussion pointed towards, see more details in that 2nd email with the actual
code patch.

I have attached an updated patch here; fixed some bugs in the script related to
the '@' operator for the main program construct.

Thanks,
Chung-Lin
  

Comments

Florian Weimer July 29, 2019, 9:48 a.m. UTC | #1
* Chung-Lin Tang:

> On 2019/7/23 9:21 PM, Florian Weimer wrote:
>> Is => intended to cover the case of run-time dependencies added late due
>> to lazy binding?
>>
>> Currently, those late dependencies have two effects, I think: They keep
>> around the referenced libraries longer than before (so that dlclose
>> would not remove an object which is still in used solely due to lazy
>> binding).  And the ELF destructors are reordered to reflect these added
>> run-time dependencies.
>
> Yes, you can test that. The effect of => is to create a caller/callee
> relation between objects: 'x=>y' creates fn_x() and fn_y() in those
> two DSOs, and fn_x() has a call to fn_y().
>
> Though that's the only immediate effect that => has. To construct a
> test of run-time added dependencies related to dlopen/etc. you also
> need to add those operations inside the '{}' construct.
>
> All the created DSOs have a constructor/destructor that outputs their
> single character name. The generated main() program prints '[]'
> brackets after dlopen/dlclose calls to separate out the following
> constructor/destructor output.  So taken whole, the entire output
> string should capture all constructor/destructor activity and ordering
> behavior.

I see, thanks.

>> Can your test framework test both cases?  What's your position on the
>> second effect?  I think it sometimes results in destructors running not
>> in the opposite order of constructors, due to the new topological sort.
>> (This also happens with the current implementation.)
>
> What I did in the ld.so code patch was add a second pass of sorting
> that ignores runtime deps, prioritizing link dependencies; this
> appears to also be what prior discussion pointed towards, see more
> details in that 2nd email with the actual code patch.

I wonder if it makes sense to disentangle this (desirable) functional
change from the rest (which sould be purely an optimization).

Is it even necessary to re-sort on dlclose?  Is the original dependency
order available somewhere?  Then we could make it explicit that the
destructor order is the reverse of the constructor order (for the
objects unloaded).  Or is there a corner case which causes an expected
divergence?

Thanks,
Florian
  
Chung-Lin Tang Aug. 5, 2019, 10:39 a.m. UTC | #2
On 2019/7/29 5:48 PM, Florian Weimer wrote:
> * Chung-Lin Tang:

>>> Can your test framework test both cases?  What's your position on the
>>> second effect?  I think it sometimes results in destructors running not
>>> in the opposite order of constructors, due to the new topological sort.
>>> (This also happens with the current implementation.)
>>
>> What I did in the ld.so code patch was add a second pass of sorting
>> that ignores runtime deps, prioritizing link dependencies; this
>> appears to also be what prior discussion pointed towards, see more
>> details in that 2nd email with the actual code patch.
> 
> I wonder if it makes sense to disentangle this (desirable) functional
> change from the rest (which sould be purely an optimization).

By "functional change" here, are you referring to the testing framework,
or the described ld.so destructor behavior I described above?

> Is it even necessary to re-sort on dlclose?  Is the original dependency
> order available somewhere?  Then we could make it explicit that the
> destructor order is the reverse of the constructor order (for the
> objects unloaded).  Or is there a corner case which causes an expected
> divergence?

Dynamic loaded objects could add more relocation dependencies, and thus augment
the dependency relations (by adding more constraints), so a final sort should
still be required.

Thanks,
Chung-Lin
  
Florian Weimer Aug. 5, 2019, 10:45 a.m. UTC | #3
* Chung-Lin Tang:

> On 2019/7/29 5:48 PM, Florian Weimer wrote:
>> * Chung-Lin Tang:
>
>>>> Can your test framework test both cases?  What's your position on the
>>>> second effect?  I think it sometimes results in destructors running not
>>>> in the opposite order of constructors, due to the new topological sort.
>>>> (This also happens with the current implementation.)
>>>
>>> What I did in the ld.so code patch was add a second pass of sorting
>>> that ignores runtime deps, prioritizing link dependencies; this
>>> appears to also be what prior discussion pointed towards, see more
>>> details in that 2nd email with the actual code patch.
>>
>> I wonder if it makes sense to disentangle this (desirable) functional
>> change from the rest (which sould be purely an optimization).
>
> By "functional change" here, are you referring to the testing framework,
> or the described ld.so destructor behavior I described above?

The destructor behavior.

>> Is it even necessary to re-sort on dlclose?  Is the original dependency
>> order available somewhere?  Then we could make it explicit that the
>> destructor order is the reverse of the constructor order (for the
>> objects unloaded).  Or is there a corner case which causes an expected
>> divergence?
>
> Dynamic loaded objects could add more relocation dependencies, and
> thus augment the dependency relations (by adding more constraints), so
> a final sort should still be required.

Yes, these dynamically added relocation dependencies could mean that
fewer objects than had been loaded by the dlopen can be freed with
dlclose.  But if we disregard those relocation dependencies for
destructor order sorting, wouldn't be the sorted result equivalent to
the constructor order?

Thanks,
Florian
  
Chung-Lin Tang Aug. 10, 2019, 11:49 a.m. UTC | #4
On 2019/8/5 6:45 PM, Florian Weimer wrote:
>>>> What I did in the ld.so code patch was add a second pass of sorting
>>>> that ignores runtime deps, prioritizing link dependencies; this
>>>> appears to also be what prior discussion pointed towards, see more
>>>> details in that 2nd email with the actual code patch.
>>> I wonder if it makes sense to disentangle this (desirable) functional
>>> change from the rest (which sould be purely an optimization).
>> By "functional change" here, are you referring to the testing framework,
>> or the described ld.so destructor behavior I described above?
> The destructor behavior.

Well, I'm definitely not suggesting adding the two-pass sorting described
above with the current sorting algorithm (even if it should be relatively
straightforward to do so)

The entire #17645 issue is due to the current algorithm becoming prohibitively
slow in certain pathological cases. Trying to fix the destructor behavior that
way without replacing the current sorting algorithm will greatly exacerbate
the performance problem.

>>> Is it even necessary to re-sort on dlclose?  Is the original dependency
>>> order available somewhere?  Then we could make it explicit that the
>>> destructor order is the reverse of the constructor order (for the
>>> objects unloaded).  Or is there a corner case which causes an expected
>>> divergence?
>> Dynamic loaded objects could add more relocation dependencies, and
>> thus augment the dependency relations (by adding more constraints), so
>> a final sort should still be required.
> Yes, these dynamically added relocation dependencies could mean that
> fewer objects than had been loaded by the dlopen can be freed with
> dlclose.  But if we disregard those relocation dependencies for
> destructor order sorting, wouldn't be the sorted result equivalent to
> the constructor order?

Relocation dependencies are not completely disregarded during destruction,
just that they're prioritized lower than static link dependencies (when
dependence cycles cause ambiguity in determining a single ordering), hence
the two passes of sorting. Besides that, dlopen'ed but not dlclose'd objects
also need to be processed along, so any existing already-computed ordering
is probably not enough in the general case.

Chung-Lin
  

Patch

diff --git a/elf/Makefile b/elf/Makefile
index a3eefd1..1c4e941 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -383,6 +383,48 @@  tests-special += $(objpfx)order-cmp.out $(objpfx)tst-array1-cmp.out \
 		 $(objpfx)tst-unused-dep-cmp.out
 endif
 
+# DSO sorting tests:
+# The dso-ordering-test.py script generates testcase source files in $(objpfx),
+# and outputs Makefile fragments for use here. However because normal output
+# from $(shell ..) has newlines changed into spaces, we have to save it to a
+# temporary file and then include it. We wrap this entire testcase construction
+# into a function here to make things more convenient.
+define test_dso_ordering
+$(shell $(PYTHON) $(..)scripts/dso-ordering-test.py \
+	$(2) $(1) $(objpfx) > $(objpfx)$(1).tmp-makefile)
+$(shell echo $(3) > $(objpfx)$(1).exp)
+include $(objpfx)$(1).tmp-makefile
+endef
+
+# Individual DSO sorting tests. The test description and expected output for
+# each test is specified directly here. See the source of dso-ordering-test.py
+# for documentation on this.
+# Note that we need to create the $(objpfx) directory here immediately to hold
+# the generated source files and Makefile fragments.
+$(shell mkdir -p $(objpfx))
+$(eval $(call test_dso_ordering,tst-dso-ordering1,'a->b->c','cba{}abc'))
+$(eval $(call test_dso_ordering,tst-dso-ordering2,\
+	'a->b->[cd]->e','edcba{}abcde'))
+$(eval $(call test_dso_ordering,tst-dso-ordering3,\
+	'a->[bc]->[def]->[gh]->i','ihgfedcba{}abcdefghi'))
+$(eval $(call test_dso_ordering,tst-dso-ordering4,\
+	'a->b->[de];a->c->d->e','edcba{}abcde'))
+$(eval $(call test_dso_ordering,tst-dso-ordering5,\
+	'a->[bc]->d;b->c','dcba{}abcd'))
+$(eval $(call test_dso_ordering,tst-dso-ordering6,\
+	'a->[bcde]->f','fedcba{}abcdef'))
+$(eval $(call test_dso_ordering,tst-dso-ordering7,\
+	'a->[bc];b->[cde];e->f','fedcba{}abcdef'))
+$(eval $(call test_dso_ordering,tst-dso-ordering8,\
+	'a->b->c=>a;{}->[ba]','cba{}abc'))
+$(eval $(call test_dso_ordering,tst-dso-ordering9,\
+	'a->b->c->d->e;{}!->[abcde]','edcba{}abcde'))
+
+# From BZ #15311
+$(eval $(call test_dso_ordering,tst-bz15311,\
+'{+a;+e;+f;+g;+d;%d;-d;-g;-f;-e;-a};a->b->c->d;d=>[ba];c=>a;b=>e=>a;c=>f=>b;d=>g=>c',\
+'{+a[dcba];+e[e];+f[f];+g[g];+d[];<d<b<e<a>>><a><g<c<a><f<b<e<a>>>>>>>;-d[];-g[];-f[];-e[];-a[gfabcde];}'))
+
 check-abi: $(objpfx)check-abi-ld.out
 tests-special += $(objpfx)check-abi-ld.out
 update-abi: update-abi-ld
diff --git a/scripts/dso-ordering-test.py b/scripts/dso-ordering-test.py
new file mode 100644
index 0000000..a494ba2
--- /dev/null
+++ b/scripts/dso-ordering-test.py
@@ -0,0 +1,556 @@ 
+#!/usr/bin/python3
+# Generate testcase files and Makefile fragments for DSO sorting test
+# Copyright (C) 2019 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+#
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+"""Generate testcase files and Makefile fragments for DSO sorting test
+
+This script takes a semicolon-separated description string, and generates
+a testcase, including main program and associated modules, and Makefile
+fragments for including into elf/Makefile.
+
+This is intended to speed up complex dynamic linker testcase construction,
+therefore features are largely mechanical in nature; inconsistencies or
+errors may occur if input case was itself erroronous or have
+unforeseen interactions.
+
+On the description language used, as an example description string:
+
+  a->b!->[cdef];c=>g=>h;{+c;%c;-c}->a
+
+Each single alphabet character represents a shared object module (currently
+[a-zA-Z0-9] are allowed, case-sensitive)
+All such shared objects have a constructor/destructor generated for them
+that emits its single character name by putchar().
+
+The -> operator specifies a link time dependency, these can be chained for
+convenience (e.g. a->b->c->d).
+
+The => operator creates a call-reference, e.g. for a=>b, an fn_a() function
+is created inside module 'a', which calls fn_b() in module 'b'.
+These module functions emit '<name>' output in nested form,
+e.g. a=>b emits '<a<b>>'
+
+Square brackets [] in the description specifies multiple objects;
+e.g. a->[bcd]->e is equivalent to a->b->e;a->c->e;a->d->e
+
+A {} construct specifies the main test program, and its link dependencies
+are also specified using ->. Inside {}, a few ;-seperated constructs are
+allowed:
+         +a   Loads module a using dlopen(RTLD_LAZY|RTLD_GLOBAL)
+         :a   Loads module a using dlopen(RTLD_LAZY)
+         %a   Use dlsym() to load and call fn_a()
+         @a   Calls fn_a() directly.
+         -a   Unloads module a using dlclose()
+
+The generated main program outputs '{' '}' with all output from above
+constructs in between. The other output before/after {} are the ordered
+constructor/destructor output.
+
+If no {} construct is present, a default empty main program is linked
+against all objects which have no dependency linked to it. e.g. for
+'[ab]->c;d->e', the default main program is equivalent to '{}->[abd]'
+
+The '!' operator after object names turns on permutation of its
+dependencies, e.g. while a->[bcd] only generates one set of objects,
+with 'a.so' built with a link line of "b.so c.so d.so", for a!->[bcd]
+permutations of a's dependencies creates multiple testcases with
+different link line orders: "b.so c.so d.so", "c.so b.so d.so",
+"b.so d.so c.so", etc. Note that for a <test-name> specified on
+the script command-line, multiple <test-name_1>, <test-name_2>, etc.
+tests will be generated (e.g. for a!->[bc]!->[de], eight tests with
+different link orders for a, b, and c will be generated)
+
+"""
+
+import re
+import os
+import subprocess
+import argparse
+from collections import OrderedDict
+import itertools
+
+# BUILD_GCC is only used under the --build option,
+# which builds the generated testcase, including DSOs using BUILD_GCC.
+# Mainly for testing purposes, especially debugging of this script,
+# and can be changed here to another toolchain path if needed.
+build_gcc = "gcc"
+
+parser = argparse.ArgumentParser()
+parser.add_argument("description",
+                    help="Description string of DSO dependency test to be "
+                    "generated (see script source for documentation of "
+                    "description language)")
+parser.add_argument("test_name", help="Identifier for testcase being "
+                    "generated")
+parser.add_argument("objpfx",
+                    help="Path to place generated files, defaults to "
+                    "current directory if none specified",
+                    nargs="?", default="./")
+parser.add_argument("--build", help="After C testcase generated, build it "
+                    "using gcc (for manual testing purposes)",
+                    action="store_true")
+parser.add_argument("--debug-output", help="Prints some internal data "
+                    "structures; used for debugging of this script",
+                    action="store_true")
+cmdlineargs = parser.parse_args()
+base_test_name = cmdlineargs.test_name
+test_name = cmdlineargs.test_name
+objpfx = cmdlineargs.objpfx
+
+obj_deps = OrderedDict()
+obj_callrefs = OrderedDict()
+
+all_objs = []
+curr_objs = []
+
+obj_dep_permutations = OrderedDict()
+
+# Add 'object -> [object, object, ...]' relations to CURR_MAP
+def add_deps (src_objs, dst_objs, curr_map):
+    for src in src_objs:
+        for dst in dst_objs:
+            if not src in curr_map:
+                curr_map[src] = []
+            if not dst in curr_map[src]:
+                curr_map[src].append (dst)
+
+# For inside the {} construct
+main_program = []
+main_program_needs_ldl = False
+main_program_default_deps = True
+def process_main_program (mainprog_str):
+    global main_program
+    global main_program_needs_ldl
+    global main_program_default_deps
+    if mainprog_str:
+        main_program = mainprog_str.split (';')
+    for s in main_program:
+        m = re.match (r"^([+\-%:@])([0-9a-zA-Z]+)$", s)
+        if not m: print ("'%s'" % (s))
+        assert (m)
+        # Determined the main program needs libdl
+        main_program_needs_ldl = True
+        if len(m.group(2)) > 1:
+            print ("Error: only single character object names allowed, "
+                   + "'%s' is invalid" % (m.group(1)))
+            exit -1
+        obj = m.group(2)
+        if not obj in all_objs:
+            all_objs.append (obj)
+        if m.group(1) == '%' or m.group(1) == '@':
+            add_deps (['#'], [obj], obj_callrefs)
+    # We have a main program specified, turn this off
+    main_program_default_deps = False
+
+# Lexer for tokens
+tokenspec = [ ("OBJ",      r"([0-9a-zA-Z]+)"),
+              ("DEP",      r"->"),
+              ("CALLREF",  r"=>"),
+              ("OBJSET",   r"\[([0-9a-zA-Z]+)\]"),
+              ("PROG",     r"{([0-9a-zA-Z;+:\-%@]*)}"),
+              ("PERMUTE",  r"!"),
+              ("SEMICOL",  r";"),
+              ("ERROR",    r".") ]
+tok_re = '|'.join('(?P<%s>%s)' % pair for pair in tokenspec)
+
+# State used when parsing dependencies
+in_dep = False
+in_callref = False
+def clear_dep_state ():
+    global in_dep, in_callref
+    in_dep = in_callref = False
+
+# Main parser
+for m in re.finditer(tok_re, cmdlineargs.description):
+    kind = m.lastgroup
+    value = m.group ()
+    if kind == "OBJ":
+        if len (value) > 1:
+            print ("Error: only single character object names allowed, "
+                   + "'%s' is invalid" % (value))
+            exit (-1)
+        if in_dep:
+            add_deps (curr_objs, [value], obj_deps)
+        elif in_callref:
+            add_deps (curr_objs, [value], obj_callrefs)
+        clear_dep_state ()
+        curr_objs = [value]
+        if not value in all_objs:
+            all_objs.append (value)
+
+    elif kind == "OBJSET":
+        objset = value[1:len(value)-1]
+        if in_dep:
+            add_deps (curr_objs, list (objset), obj_deps)
+        elif in_callref:
+            add_deps (curr_objs, list (objset), obj_callrefs)
+        clear_dep_state ()
+        curr_objs = list (objset)
+        for o in list (objset):
+            if not o in all_objs:
+                all_objs.append (o)
+
+    elif kind == "PERMUTE":
+        if in_dep or in_callref:
+            print ("Error: syntax error, permute operation invalid here")
+            exit -1
+        if not curr_objs:
+            print ("Error: syntax error, no objects to permute here")
+            exit -1
+        for obj in curr_objs:
+            if not obj in obj_dep_permutations:
+                # Signal this object has permutated dependencies
+                obj_dep_permutations[obj] = []
+
+    elif kind == "PROG":
+        if main_program:
+            print ("Error: cannot have more than one main program")
+            exit (-1)
+        if in_dep:
+            print ("Error: objects cannot have dependency on main program")
+            exit (-1)
+        if in_callref:
+            add_deps (curr_objs, ["#"], obj_callrefs)
+        process_main_program (value[1:len(value)-1])
+        clear_dep_state ()
+        curr_objs = ["#"]
+
+    elif kind == "DEP":
+        if in_dep or in_callref:
+            print ("Error: syntax error, multiple contiguous ->,=> operations")
+            exit -1
+        in_dep = True
+
+    elif kind == "CALLREF":
+        if in_dep or in_callref:
+            print ("Error: syntax error, multiple contiguous ->,=> operations")
+            exit -1
+        in_callref = True
+        
+    elif kind == "SEMICOL":
+        curr_objs = []
+        clear_dep_state ()
+
+    else:
+        print ("Error: unknown token '%s'" % (value))
+        exit (-1)
+
+def find_objs_not_depended_on ():
+    global all_objs, obj_deps
+    objs_not_depended_on = []
+    for obj in all_objs:
+        skip = False
+        for r in obj_deps.items():
+            if obj in r[1]:
+                skip = True
+                break
+        if not skip:
+            objs_not_depended_on.append (obj)
+    return objs_not_depended_on
+        
+# If no main program was specified in dependency description, make a
+# default main program with deps pointing to all DSOs which are not
+# depended by another DSO.
+if main_program_default_deps:
+    main_deps = find_objs_not_depended_on ()
+    # main_deps = []
+    # for o in all_objs:
+    #     skip = False
+    #     for r in obj_deps.items():
+    #         if o in r[1]:
+    #             skip = True
+    #             break
+    #     if skip:
+    #         continue
+    #     main_deps.append (o)
+    add_deps (["#"], main_deps, obj_deps)    
+        
+# Debug output
+if cmdlineargs.debug_output:
+    print ("All objects: %s" % (all_objs))
+    print ("--- Static link dependencies ---")
+    for r in obj_deps.items():
+        print ("%s -> %s" % (r[0], r[1]))
+    print ("--- Objects whose dependencies are to be permutated ---")
+    for r in obj_dep_permutations.items():
+        print ("%s" % (r[0]))
+    #print (obj_dep_permutations)
+    print ("--- Call reference dependencies ---")
+    for r in obj_callrefs.items():
+        print ("%s => %s" % (r[0], r[1]))
+    print ("--- main program ---")
+    print (main_program)
+
+# Main testcase processing routine, does Makefile fragment generation,
+# testcase source generation, and if --build specified builds testcase.
+def process_testcase (test_name):
+    global objpfx, all_objs, obj_deps, obj_callrefs
+    global base_test_name, main_program, main_program_needs_ldl
+
+    # Print out needed Makefile fragments for use in glibc/elf/Makefile.
+    #if makefile:
+    print ("ifeq (yes,$(build-shared))")
+    t = ""
+    for o in all_objs:
+        t += " " + test_name + "-" + o
+    print ("modules-names +=%s" % (t))
+    print ("tests += %s" % (test_name))
+
+    # Print direct link dependencies for each DSO
+    for obj in all_objs:
+        if obj in obj_deps:
+            dso = test_name + "-" + obj + ".so"
+            depstr = ""
+            for dep in obj_deps[obj]:
+                depstr += " $(objpfx)" + test_name + "-" + dep + ".so"
+            print ("$(objpfx)%s:%s" % (dso, depstr))
+
+    # Print LDFLAGS-* and *-no-z-defs
+    for o in all_objs:
+        dso = test_name + "-" + o + ".so"
+        print ("LDFLAGS-%s = $(no-as-needed)" % (dso))
+        if o in obj_callrefs:
+            print ("%s-no-z-defs = yes" % (dso))
+
+    # Print dependencies for main test program
+    depstr = ""
+    if '#' in obj_deps:
+        for o in obj_deps['#']:
+            depstr += " $(objpfx)" + test_name + "-" + o + ".so"
+    if main_program_needs_ldl:
+        depstr += " $(libdl)"
+    print ("$(objpfx)%s:%s" % (test_name, depstr))
+    print ("LDFLAGS-%s = $(no-as-needed)" % (test_name))
+
+    not_depended_objs = find_objs_not_depended_on ()
+    if not_depended_objs:
+        depstr = ""
+        for dep in not_depended_objs:
+            depstr += " $(objpfx)" + test_name + "-" + dep + ".so"
+        print ("$(objpfx)%s.out:%s" % (test_name, depstr))
+    
+    # Note this is compared with the "base" <test_name>.exp, not
+    # <test_name>_<N> with permutation index
+    print ("$(objpfx)%s-cmp.out: $(objpfx)%s.exp $(objpfx)%s.out"
+           % (test_name, base_test_name, test_name))
+    print ("\tdiff -wu $^ > $@; $(evaluate-test)")
+    print ("endif")
+    print ("ifeq ($(run-built-tests),yes)")
+    print ("tests-special += $(objpfx)%s-cmp.out" % (test_name))
+    print ("endif")
+
+    # Generate C files according to dependency and calling relations from
+    # description string.
+    for obj in all_objs:
+        src_name = test_name + "-" + obj + ".c"
+        f = open (objpfx + src_name, "w")
+        if obj in obj_callrefs:
+            called_objs = obj_callrefs[obj]
+            for callee in called_objs:
+                f.write ("extern void fn_%s (void);\n" % (callee))
+        f.write ("extern int putchar(int);\n")
+        f.write ("static void __attribute__((constructor)) " +
+                 "init(void){putchar('%s');}\n" % (obj))
+        f.write ("static void __attribute__((destructor)) " +
+                 "fini(void){putchar('%s');}\n" % (obj))
+        if obj in obj_callrefs:
+            called_objs = obj_callrefs[obj]
+            f.write ("void fn_%s (void) {\n" % (obj))
+            f.write ("  putchar ('<');\n");
+            f.write ("  putchar ('%s');\n" % (obj));
+            for callee in called_objs:
+                f.write ("  fn_%s ();\n" % (callee))
+            f.write ("  putchar ('>');\n");
+            f.write ("}\n")
+        else:
+            for callref in obj_callrefs.items():
+                if obj in callref[1]:
+                    f.write ("void fn_%s (void) {\n" % (obj))
+                    f.write ("  putchar ('<');\n");
+                    f.write ("  putchar ('%s');\n" % (obj));
+                    f.write ("  putchar ('>');\n");
+                    f.write ("}\n")
+                    break
+        f.close ()
+
+    # Open C file for writing
+    f = open (objpfx + test_name + ".c", "w")
+
+    # if there are some operations in main(), it means we need -ldl
+    if main_program_needs_ldl:
+        f.write ("#include <dlfcn.h>\n")
+    f.write ("#include <stdio.h>\n")
+    f.write ("#include <stdlib.h>\n")
+    for s in main_program:
+        if s[0] == '@':
+            f.write ("extern void fn_%s (void);\n" % (s[1]));
+    f.write ("int main (void) {\n")
+    f.write ("  putchar('{');\n")
+
+    # Helper routine for sanity check code
+    def put_fail_check (fail_cond, action_desc):
+        f.write ('  if (%s) { printf ("\\n%s failed: %%s\\n", '
+                 'dlerror ()); exit (1);}\n' % (fail_cond, action_desc))
+    i = 0
+    while i < len(main_program):
+        s = main_program[i]
+        obj = s[len(s)-1]
+        dso = test_name + "-" + obj
+        if s[0] == '+' or s[0] == ':':
+            if s[0] == '+':
+                dlopen_flags = "RTLD_LAZY|RTLD_GLOBAL"
+                f.write ("  putchar('+');\n");
+            else:
+                dlopen_flags = "RTLD_LAZY"
+                f.write ("  putchar(':');\n");
+            f.write ("  putchar('%s');\n" % (obj));
+            f.write ("  putchar('[');\n");
+            f.write ('  void *%s = dlopen ("%s.so", %s);\n'
+                     % (obj, dso, dlopen_flags))
+            put_fail_check ("!%s" % (obj),
+                            "%s.so dlopen" % (dso))
+            f.write ("  putchar(']');\n");
+        elif s[0] == '-':
+            f.write ("  putchar('-');\n");
+            f.write ("  putchar('%s');\n" % (obj));
+            f.write ("  putchar('[');\n");
+            put_fail_check ("dlclose (%s) != 0" % (obj),
+                            "%s.so dlclose" % (dso))
+            f.write ("  putchar(']');\n");
+        elif s[0] == '%':
+            f.write ('  void (*fn_%s)(void) = dlsym (%s, "fn_%s");\n'
+                     % (obj, obj, obj))
+            put_fail_check ("!fn_%s" % (obj),
+                            "dlsym(fn_%s) from %s.so" % (obj, dso))
+            f.write ("  fn_%s ();\n" % (obj))
+        elif s[0] == '@':
+            f.write ("  fn_%s ();\n" % (obj))
+        f.write ("  putchar(';');\n");
+        i += 1
+    f.write ("  putchar('}');\n")
+    f.write ("  return 0;\n")
+    f.write ("}\n")
+    f.close ()
+
+    # Helper routine to run a shell command, for running GCC below
+    def run_cmd (args):
+        if cmdlineargs.debug_output:
+            print (str.join (' ', args))
+        p = subprocess.Popen (args)
+        p.wait ()
+        if p.returncode != 0:
+            print ("Error running: %s" % (str.join (' ', args)))
+            exit -1
+
+    # Depth-first traversal, executing FN(OBJ) in post-order
+    obj_visited = {}
+    def dfs (obj, fn):
+        if obj in obj_visited:
+            return
+        obj_visited[obj] = True
+        if obj in obj_deps:
+            for dep in obj_deps[obj]:
+                dfs (dep, fn)
+        fn (obj)
+
+    # Function to create <test_name>-<obj>.so
+    def build_dso (obj):
+        obj_name = test_name + "-" + obj + ".os"
+        dso_name = test_name + "-" + obj + ".so"
+        deps = []
+        if obj in obj_deps:
+            deps = obj_deps[obj]
+        dso_deps = map (lambda d: objpfx + test_name + "-" + d + ".so", deps)
+        cmd = ([build_gcc, "-shared", "-o", objpfx + dso_name,
+                objpfx + obj_name, "-Wl,--no-as-needed"] + list(dso_deps))
+        run_cmd (cmd)
+
+    # --build option processing: build generated sources using 'build_gcc'
+    if cmdlineargs.build:
+        # Compile individual .os files
+        for obj in all_objs:
+            src_name = test_name + "-" + obj + ".c"
+            obj_name = test_name + "-" + obj + ".os"
+            run_cmd ([build_gcc, "-c", "-fPIC", objpfx + src_name,
+                      "-o", objpfx + obj_name])
+
+        # Build all DSOs, this needs to be in topological dependency order,
+        # or link will fail
+        for obj in all_objs:
+            dfs (obj, build_dso)
+
+        # Build main program
+        deps = []
+        if '#' in obj_deps:
+            deps = obj_deps['#']
+        main_deps = map (lambda d: objpfx + test_name + "-" + d + ".so", deps)
+        cmd = ([build_gcc, "-Wl,--no-as-needed", "-o", objpfx + test_name,
+                objpfx + test_name + ".c", "-L%s" % (os.getcwd ()),
+                "-Wl,-rpath-link=%s" % (os.getcwd ())]
+               + list (main_deps))
+        if main_program_needs_ldl:
+            cmd += ["-ldl"]
+        run_cmd (cmd)
+
+# Check if we need to enumerate permutations of dependencies
+need_permutation_processing = False       
+if obj_dep_permutations:
+    # Adjust obj_dep_permutations into map of object -> dependency permutations
+    for r in obj_dep_permutations.items():
+        obj = r[0]
+        if obj in obj_deps and len(obj_deps[obj]) > 1:
+            deps = obj_deps[obj]
+            obj_dep_permutations[obj] = list (itertools.permutations (deps))
+            need_permutation_processing = True
+
+test_subindex = 1
+curr_perms = []
+def enum_permutations (perm_list):
+    global test_name, obj_deps, test_subindex, curr_perms
+    if len(perm_list) >= 1:
+        curr = perm_list[0]
+        obj = curr[0]
+        perms = curr[1]
+        if not perms:
+            # This may be an empty list if no multiple dependencies to permute
+            # were found, skip to next in this case
+            enum_permutations (perm_list[1:])
+        else:
+            for deps in perms:
+                obj_deps[obj] = deps
+                permstr = "" if obj == "#" else obj + "_"
+                permstr += str.join ('', deps)
+                curr_perms.append (permstr) 
+                enum_permutations (perm_list[1:])
+                curr_perms = curr_perms[0:len(curr_perms)-1]
+    else:
+        # obj_deps is now instantiated with one dependency order permutation
+        # (across all objects that have multiple permutations)
+        # Now process a testcase
+        #if not os.path.exists (objpfx + base_test_name+ "-permutations/"):
+        #    os.mkdir (objpfx + base_test_name+ "-permutations/")
+        process_testcase (base_test_name + "_" + str (test_subindex)
+                          + "-" + str.join ('-', curr_perms))
+        test_subindex += 1
+
+if need_permutation_processing:
+    enum_permutations (list (obj_dep_permutations.items()))
+else:
+    # We have no permutations to enumerate, just process testcase normally
+    process_testcase (test_name)
+