[RFC] introduce dl_iterate_phdr_parallel

Problem: exception handling is not scalable. Attached program mt.cc
demonstrates this easily. It runs around 1 seconds with 1 thread, but
with only 4 threads it takes 4.5 seconds to complete. The reason is
locks that are taken on unwind path.

There are two locks right now.

Fist is in libgcc's _Unwind_Find_registered_FDE and it is eliminated by
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01629.html (for dynamic
executables only, but this is common case).

Second one is dl_load_write_lock in __dl_iterate_phdr. It serves dual
purpose: it stops the list of loaded objects from been modified while
iterating over it and it makes sure that more than one callback will
not run in parallel. This means that even if we will find a cleaver way
to make __dl_iterate_phdr iterate over the objects list locklessly the
lock will still have to be taken around the call to callback to preserve
second guaranty.

This patch here propose to introduce another API: dl_iterate_phdr_parallel
which does exactly same thing as dl_iterate_phdr but may run more than
one provided callback in parallel. And to make it more scalable it
breaks single dl_load_write_lock into arrays of locks. Reader takes only
one lock, but writer has to lock all of them to proceed with modifying
the list.

Attached gcc.patch makes gcc use proposed dl_iterate_phdr_parallel
interface and with this modification (and the patch linked above) same
mt.cc with 4 threads runs around 1 second.


--
			Gleb.
#include <iostream>
#include <thread>
#include <vector>
#include <assert.h>

constexpr unsigned N = 3;

void func(int c) {
  cpu_set_t cpuset;
  CPU_ZERO(&cpuset);
  CPU_SET(c, &cpuset);
  int rc = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset);
  assert(!rc);

  for (auto i = 0; i < 1000000; i++) {
    try {
      throw 42;
    } catch (...) {
    }
  }
}

int main() {
  std::vector<std::thread> threads;
  threads.reserve(N);

  for (unsigned i = 0; i < N; i++)
    threads.emplace_back(std::thread( [i] { func(i); } ));

  func(N);

  for (unsigned i = 0; i < N; i++)
    threads[i].join();

  return 0;
}
commit fc040d3c4cd2b4b69cd07187090f2a250712c47a
Author: Gleb Natapov <gleb@scylladb.com>
Date:   Mon Jul 25 16:28:53 2016 +0300

    use dl_iterate_phdr_parallel

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index f7a1c3f..d3be0ad 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -120,7 +120,7 @@ struct unw_eh_frame_hdr
 
 #define FRAME_HDR_CACHE_SIZE 8
 
-static struct frame_hdr_cache_element
+static __thread struct frame_hdr_cache_element
 {
   _Unwind_Ptr pc_low;
   _Unwind_Ptr pc_high;
@@ -130,7 +130,7 @@ static struct frame_hdr_cache_element
   struct frame_hdr_cache_element *link;
 } frame_hdr_cache[FRAME_HDR_CACHE_SIZE];
 
-static struct frame_hdr_cache_element *frame_hdr_cache_head;
+static __thread struct frame_hdr_cache_element *frame_hdr_cache_head;
 
 /* Like base_of_encoded_value, but take the base from a struct
    unw_eh_callback_data instead of an _Unwind_Context.  */
@@ -195,7 +195,7 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, size_t size, void *ptr)
 
   if (data->check_cache && size >= sizeof (struct ext_dl_phdr_info))
     {
-      static unsigned long long adds = -1ULL, subs;
+      static __thread unsigned long long adds = -1ULL, subs;
       struct ext_dl_phdr_info *einfo = (struct ext_dl_phdr_info *) info;
 
       /* We use a least recently used cache replacement policy.  Also,
@@ -445,6 +445,10 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, size_t size, void *ptr)
   return 1;
 }
 
+extern int dl_iterate_phdr_parallel (int (*__callback) (struct dl_phdr_info *,
+                                                        size_t, void *),
+                                     void *__data);
+
 const fde *
 _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
 {
@@ -462,7 +466,7 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
   data.ret = NULL;
   data.check_cache = 1;
 
-  if (dl_iterate_phdr (_Unwind_IteratePhdrCallback, &data) < 0)
+  if (dl_iterate_phdr_parallel (_Unwind_IteratePhdrCallback, &data) < 0)
     return NULL;
 
   if (data.ret)
  

[RFC] introduce dl_iterate_phdr_parallel

Commit Message

Comments

Patch