pthread stack cache and valgrind helgrind false positive race error on tls variables

Message ID 1405890490.27752.49.camel@soleil
State Not applicable
Headers

Commit Message

Philippe Waroquiers July 20, 2014, 9:08 p.m. UTC
  On Sat, 2014-07-19 at 17:02 -0700, Roland McGrath wrote:
> Separate from the implementation details of the workaround you propose, I
> think we would like to understand more clearly what is going on to confuse
> helgrind.
I have attached the (under review) valgrind patch.
Whatever better solution we might have in a future glibc (see below
discussion), we should have a solution for old/current glibc.
Note that this patch applies to the valgrind SVN (it also contains the
test program that shows the tls false positive).


> Though it's all unclear to me so far, I tend to suspect both
> that this workaround is not what we'd consider the ideal fix and that the
> underlying problem is likely to have other symptoms (for other parts of the
> code, other usage patterns, and/or other code outside of libc that rolls
> its own synchronization primitives).
Yes, helgrind has several tricks (kludges?) to avoid raising false
positive in various "low level" code:
  * ld.so is not instrumented, otherwise this causes huge nr of false
    positive to (expensively) suppress.
  * the default valgrind error suppression file contains very general
    suppression entries, suppressing all races that have their top frame
    in libc or libpthread.
    Search for 'not very clever' or 'ugly' in valgrind default.supp.
    A.o., default.supp mentions that helgrind does not understand
    stdio locking.
    The 'not clever/ugly' is mostly because these supp can cause
    false negative.
  * several more precise suppression entries for various false +
    in glibc and others.


The above techniques do not work with the false positive caused
by the stack cache/tls internal structures, as (some of) the races
appears in user code. Also, pthread cond var cannot be properly
handled by helgrind.

For sure, it would be nice to have a general solution for all that
(or at least a solution that decreases the nr of needed kludges
in helgrind).

An approach could be to add in glibc specific helgrind requests to
describe the low level synchronisation primitives 
(e.g. describe the 'create lock/lock/unlock'operations).
This has some drawbacks:
* it introduces a few instructions in the 'hot path' of the low level
  glibc primitives.
  (this might imply to have library versions compiled with/without
   these requests, which makes it more cumbersome for the end user,
   which has then to point at the good library when using helgrind).
* Such annotations are valgrind specific. They will probably work also
  for the other valgrind race detector (drd). But there are other 
  race detection tools which might have similar difficulties.
* It is not sure that all helgrind kludges can be eliminated by
  using these requests.

Someone on valgrind dev mailing list suggested to have specific debug
information used to mark synchronisation primitives.
Interesting idea, but not sure if/how that can work
(and that is for sure a very long term solution).

In summary, have a better support for race detection tools in glibc
is attractive, but looks not trivial.

So, several various things can be done:
* short term: the valgrind hack patch to disable stack cache in
  to support old and current glibc
* when glibc provides a tuning infrastructure, replace the hack
  by a cleaner way to disable the cache
* more longer term: see how glibc could "describe" its behaviour
  to race detection tools
* Maybe at somewhat shorter term, solve the problem of pthread cond var,
  for which there is currently no known hack/kludge
  (see helgrind user manual section 7.5 for a description of the
   difficulty)

Thanks

Philippe
  

Comments

Siddhesh Poyarekar Aug. 13, 2014, 11:10 a.m. UTC | #1
On Sun, Jul 20, 2014 at 11:08:10PM +0200, Philippe Waroquiers wrote:
> * when glibc provides a tuning infrastructure, replace the hack
>   by a cleaner way to disable the cache

Ack, I hope to work on this Real Soon now.  IIRC we had explicitly
mentioned stack_cache as one of the first things to make configurable
via tunables when we discussed it at the Cauldron last year.

Siddhesh
  

Patch

Index: coregrind/m_clientstate.c
===================================================================
--- coregrind/m_clientstate.c	(revision 14176)
+++ coregrind/m_clientstate.c	(working copy)
@@ -107,6 +107,8 @@ 
    VG_(get_StackTrace) in m_stacktrace.c for further info. */
 Addr VG_(client__dl_sysinfo_int80) = 0;
 
+/* Address of the (internal) glibc nptl pthread stack cache size. */
+SizeT* VG_(client__stack_cache_actsize) = 0;
 
 /*--------------------------------------------------------------------*/
 /*--- end                                                          ---*/
Index: coregrind/pub_core_clientstate.h
===================================================================
--- coregrind/pub_core_clientstate.h	(revision 14176)
+++ coregrind/pub_core_clientstate.h	(working copy)
@@ -93,6 +93,24 @@ 
 extern Addr VG_(client__dl_sysinfo_int80);
 
 
+/* glibc pthread systems only, when --deactivate-pthread-stack-cache=viahack
+   Used for a (hacky) way to disable the cache of stacks as implemented in nptl
+   glibc. 
+   Based on internal knowledge of the pthread glibc code:
+   a huge value in stack_cache_actsize (bigger than the constant
+   stack_cache_maxsize) makes glibc believes the cache is full
+   and so stack are always released when a pthread terminates.
+   See glibc nptl/allocatestack.c.
+   Several ugliness in this hack:
+   * hardcodes private glibc var name "stack_cache_maxsize"
+   * based on knowledge of the code of the functions
+   queue_stack and __free_stacks
+   * static symbol for "stack_cache_maxsize" must be in
+   the symbol.
+   It would be much cleaner to have a documented and supported
+   way to disable the pthread stack cache. */
+extern SizeT* VG_(client__stack_cache_actsize);
+
 #endif   // __PUB_CORE_CLIENTSTATE_H
 
 /*--------------------------------------------------------------------*/
Index: coregrind/m_options.c
===================================================================
--- coregrind/m_options.c	(revision 14176)
+++ coregrind/m_options.c	(working copy)
@@ -120,6 +120,7 @@ 
 const HChar* VG_(clo_req_tsyms)[VG_CLO_MAX_REQ_TSYMS];
 HChar* VG_(clo_require_text_symbol) = NULL;
 Bool   VG_(clo_run_libc_freeres) = True;
+Bool   VG_(sim_hint_deactivate_pthread_stack_cache_via_hack) = False;
 Bool   VG_(clo_track_fds)      = False;
 Bool   VG_(clo_show_below_main)= False;
 Bool   VG_(clo_show_emwarns)   = False;
Index: coregrind/pub_core_options.h
===================================================================
--- coregrind/pub_core_options.h	(revision 14176)
+++ coregrind/pub_core_options.h	(working copy)
@@ -279,6 +279,15 @@ 
    cannot be overridden from the command line. */
 extern Bool  VG_(clo_run_libc_freeres);
 
+/* Should we disable glibc pthread stack cache (via a hack) ?
+   Currently, we can only disable it via a hack. Maybe one day,
+   glibc will provide a clean way to disable it.
+   => using a Bool for now, but might become an 'enum' then.
+   The boolean is set to True if deactivate-pthread-stack-cache-via-hack is
+   given in --sim-hints. */
+extern Bool  VG_(sim_hint_deactivate_pthread_stack_cache_via_hack);
+
+
 /* Should we show VEX emulation warnings?  Default: NO */
 extern Bool VG_(clo_show_emwarns);
 
Index: coregrind/m_redir.c
===================================================================
--- coregrind/m_redir.c	(revision 14176)
+++ coregrind/m_redir.c	(working copy)
@@ -403,6 +403,10 @@ 
    Bool         check_ppcTOCs = False;
    Bool         isText;
    const HChar* newdi_soname;
+   Bool         dehacktivate_pthread_stack_cache_var_search = False;
+   const HChar* const pthread_soname = "libpthread.so.0";
+   const HChar* const pthread_stack_cache_actsize_varname
+      = "stack_cache_actsize";
 
 #  if defined(VG_PLAT_USES_PPCTOC)
    check_ppcTOCs = True;
@@ -497,6 +501,10 @@ 
 
    specList = NULL; /* the spec list we're building up */
 
+   dehacktivate_pthread_stack_cache_var_search = 
+      VG_(sim_hint_deactivate_pthread_stack_cache_via_hack)
+      && 0 == VG_(strcmp)(newdi_soname, pthread_soname);
+
    nsyms = VG_(DebugInfo_syms_howmany)( newdi );
    for (i = 0; i < nsyms; i++) {
       VG_(DebugInfo_syms_getidx)( newdi, i, &sym_addr, &sym_toc,
@@ -513,8 +521,22 @@ 
                                      demangled_fnpatt, N_DEMANGLED,
                                      &isWrap, &becTag, &becPrio );
          /* ignore data symbols */
-         if (!isText)
+         if (!isText) {
+            /* But search for dehacktivate stack cache var if needed. */
+            if (dehacktivate_pthread_stack_cache_var_search
+                && 0 == VG_(strcmp)(*names,
+                                    pthread_stack_cache_actsize_varname)) {
+               if ( VG_(clo_verbosity) > 1 ) {
+                  VG_(message)( Vg_DebugMsg,
+                                "deactivate pthread_stack_cache via hack:"
+                                " found symbol %s at addr %p\n",
+                                *names, (void*) sym_addr); 
+               }
+               VG_(client__stack_cache_actsize) = (SizeT*) sym_addr;
+               dehacktivate_pthread_stack_cache_var_search = False;
+            }
             continue;
+         }
          if (!ok) {
             /* It's not a full-scale redirect, but perhaps it is a load-notify
                fn?  Let the load-notify department see it. */
@@ -589,6 +611,13 @@ 
       }
       free_symname_array(names_init, &twoslots[0]);
    }
+   if (dehacktivate_pthread_stack_cache_var_search) {
+      VG_(message)(Vg_DebugMsg,
+                   "WARNING: could not find symbol for var %s in %s\n",
+                   pthread_stack_cache_actsize_varname, pthread_soname);
+      VG_(message)(Vg_DebugMsg,
+                   "=> pthread cache stack cannot be disabled!\n");
+   }
 
    if (check_ppcTOCs) {
       for (i = 0; i < nsyms; i++) {
Index: coregrind/m_main.c
===================================================================
--- coregrind/m_main.c	(revision 14176)
+++ coregrind/m_main.c	(working copy)
@@ -175,7 +175,8 @@ 
 "    --vgdb-prefix=<prefix>    prefix for vgdb FIFOs [%s]\n"
 "    --run-libc-freeres=no|yes free up glibc memory at exit on Linux? [yes]\n"
 "    --sim-hints=hint1,hint2,...  known hints:\n"
-"                                 lax-ioctls, enable-outer, fuse-compatible [none]\n"
+"         deactivate-pthread-stack-cache-via-hack, lax-ioctls, enable-outer,\n"
+"         no-inner-prefix, fuse-compatible [none]\n"
 "    --fair-sched=no|yes|try   schedule threads fairly on multicore systems [no]\n"
 "    --kernel-variant=variant1,variant2,...  known variants: bproc [none]\n"
 "                              handle non-standard kernel variants\n"
@@ -376,7 +377,12 @@ 
       // Set up VG_(clo_sim_hints). This is needed a.o. for an inner
       // running in an outer, to have "no-inner-prefix" enabled
       // as early as possible.
-      else if VG_STR_CLO (str, "--sim-hints",     VG_(clo_sim_hints)) {}
+      else if VG_STR_CLO (str, "--sim-hints",     VG_(clo_sim_hints)) {
+         /* Set initial value if hint selected. */
+         VG_(sim_hint_deactivate_pthread_stack_cache_via_hack) = 
+            0 != VG_(strstr)(VG_(clo_sim_hints), 
+                             "deactivate-pthread-stack-cache-via-hack");
+      }
    }
 }
 
@@ -563,6 +569,7 @@ 
       else if VG_BOOL_CLO(arg, "--show-emwarns",   VG_(clo_show_emwarns)) {}
 
       else if VG_BOOL_CLO(arg, "--run-libc-freeres", VG_(clo_run_libc_freeres)) {}
+      else if VG_XACT_CLO(arg, "--vgdb=yes",       VG_(clo_vgdb), Vg_VgdbYes) {}
       else if VG_BOOL_CLO(arg, "--show-below-main",  VG_(clo_show_below_main)) {}
       else if VG_BOOL_CLO(arg, "--time-stamp",       VG_(clo_time_stamp)) {}
       else if VG_BOOL_CLO(arg, "--track-fds",        VG_(clo_track_fds)) {}
Index: coregrind/m_scheduler/scheduler.c
===================================================================
--- coregrind/m_scheduler/scheduler.c	(revision 14176)
+++ coregrind/m_scheduler/scheduler.c	(working copy)
@@ -61,14 +61,15 @@ 
 #include "pub_core_basics.h"
 #include "pub_core_debuglog.h"
 #include "pub_core_vki.h"
-#include "pub_core_vkiscnums.h"    // __NR_sched_yield
-#include "pub_core_libcsetjmp.h"   // to keep _threadstate.h happy
+#include "pub_core_vkiscnums.h"  // __NR_sched_yield
+#include "pub_core_libcsetjmp.h" // to keep _threadstate.h happy
 #include "pub_core_threadstate.h"
+#include "pub_core_clientstate.h"
 #include "pub_core_aspacemgr.h"
-#include "pub_core_clreq.h"         // for VG_USERREQ__*
+#include "pub_core_clreq.h"      // for VG_USERREQ__*
 #include "pub_core_dispatch.h"
-#include "pub_core_errormgr.h"      // For VG_(get_n_errs_found)()
-#include "pub_core_gdbserver.h"     // for VG_(gdbserver) and VG_(gdbserver_activity)
+#include "pub_core_errormgr.h"   // For VG_(get_n_errs_found)()
+#include "pub_core_gdbserver.h"  // for VG_(gdbserver)/VG_(gdbserver_activity)
 #include "pub_core_libcbase.h"
 #include "pub_core_libcassert.h"
 #include "pub_core_libcprint.h"
@@ -1579,6 +1580,29 @@ 
    if (VG_(clo_trace_sched))
       print_sched_event(tid, "exiting VG_(scheduler)");
 
+   if (VG_(sim_hint_deactivate_pthread_stack_cache_via_hack)) {
+      if (VG_(client__stack_cache_actsize)) {
+         if (*VG_(client__stack_cache_actsize) == 0) {
+            /* We activate the stack cache disabling just before the first
+               thread exits. At this moment, we are sure the pthread lib
+               loading is terminated/variable was initialised by
+               pthread lib/... */
+            VG_(debugLog)(1,"sched",
+                          "activating pthread stack cache size disabling"
+                          " via hack\n");
+            *VG_(client__stack_cache_actsize) = 1000 * 1000 * 1000;
+            /* Set a value big enough to be above the hardcoded maximum stack cache
+               size in glibc, small enough to allow a pthread stack size to be added
+               without risk of overflow. */
+         }
+      } else {
+          VG_(debugLog)(0,"sched",
+                        "WARNING: pthread cache stack cannot be disabled!\n");
+          VG_(sim_hint_deactivate_pthread_stack_cache_via_hack) = False;
+          // Assign False to avoid having a msg for each thread termination.
+      }
+   }
+
    vg_assert(VG_(is_exiting)(tid));
 
    return tst->exitreason;
Index: docs/xml/manual-core.xml
===================================================================
--- docs/xml/manual-core.xml	(revision 14176)
+++ docs/xml/manual-core.xml	(working copy)
@@ -1958,6 +1958,39 @@ 
       are enabled.  Use with caution!  Currently known hints are:</para>
       <itemizedlist>
         <listitem>
+          <para><option>deactivate-pthread-stack-cache-via-hack: </option>
+            This hint is only relevant when running Valgrind on Linux.</para>
+
+          <para>The GNU C glibc pthread library
+            (<function>libpthread.so</function>), which is used by all
+            pthread programs, maintains a cache of pthread stacks.
+            When a pthread terminates, the memory used for the pthread
+            stack and some thread local storage related data structure
+            are not always directly released.  This memory is kept in
+            a cache (up to a certain size), and is re-used if a new
+            thread is started.</para>
+
+          <para>This cache causes the helgrind tool to report some
+            false positive race condition errors on this cached
+            memory, as helgrind does not understand the internal
+            pthread cache synchronisation. So, when using helgrind,
+            disabling the pthread cache stack helps to avoid false
+            positive race conditions, in particular when using thread
+            local storage variables (e.g. variables using the
+            <function>__thread</function> qualifier).</para>
+
+          <para>Disabling the cache might also help applications that
+            are running out of memory under Valgrind.</para>
+
+          <para>Note: Valgrind disables the cache using a hack based on
+            some internal knowledge of the glibc stack cache
+            implementation and by examining the debug information of
+            the pthread library. This technique is thus somewhat fragile
+            and might not work depending on the glibc version. This hack
+            has been tested with glibc versions 2.11 and 2.16.</para>
+        </listitem>
+
+        <listitem>
           <para><option>lax-ioctls: </option> Be very lax about ioctl
           handling; the only assumption is that the size is
           correct. Doesn't require the full buffer to be initialized
Index: helgrind/docs/hg-manual.xml
===================================================================
--- helgrind/docs/hg-manual.xml	(revision 14176)
+++ helgrind/docs/hg-manual.xml	(working copy)
@@ -972,6 +972,16 @@ 
   </listitem>
 
   <listitem>
+    <para>If your application is using thread local variables,
+    helgrind might report false positive race conditions on these
+    variables, despite being very probably race free.  On Linux, you can
+    use <option>--sim-hints=deactivate-pthread-stack-cache-via-hack</option>
+    to avoid such false positive error messages
+    (see <xref linkend="opt.sim-hints"/>).
+    </para>
+  </listitem>
+
+  <listitem>
     <para>Round up all finished threads using
     <function>pthread_join</function>.  Avoid
     detaching threads: don't create threads in the detached state, and
Index: helgrind/tests/tls_threads.stdout.exp
===================================================================
Index: helgrind/tests/tls_threads.c
===================================================================
--- helgrind/tests/tls_threads.c	(revision 0)
+++ helgrind/tests/tls_threads.c	(revision 0)
@@ -0,0 +1,109 @@ 
+#include <config.h>
+#include <pthread.h>
+#include <stdio.h>
+
+#ifdef HAVE_TLS
+
+static int only_touch_stackvar;
+
+/* We should have no error on local and global
+   as these are both thread local variables. */
+static __thread int local;
+__thread int global;
+
+/* We will wrongly share this variable indirectly  through a pointer
+   We should have an error for this. */
+static __thread int badly_shared_local;
+
+/* ptr_to_badly_shared_local allows to have multiple threads seeing
+   the same thread local storage. This is however really bad sharing
+   as this can cause SEGV or whatever, as when the thread disappears,
+   the badly_shared_local var pointed to can also disappear.
+   By default, the regtest does not test this really bad sharing. */
+pthread_mutex_t protect_ptr_to_badly_shared_local = PTHREAD_MUTEX_INITIALIZER;
+int *ptr_to_badly_shared_local;
+
+static void local_false_positive(void)
+{
+   local = local + 1; // no error is expected
+}
+
+static void global_false_positive(void)
+{
+   global = global + 1; // no error is expected
+}
+
+static void badly_shared_local_error_expected(void)
+{
+   *ptr_to_badly_shared_local = *ptr_to_badly_shared_local + 1; // an error is expected
+   // This can cause a SIGSEGV.
+}
+
+static void *level2(void *p)
+{
+   int stackvar = 0;
+
+   stackvar = stackvar + only_touch_stackvar;
+   
+   local_false_positive();
+   global_false_positive();
+   if (only_touch_stackvar != 0) {
+      badly_shared_local_error_expected();
+   }
+   return 0;
+}
+
+#define NLEVEL2 10
+static void *level1(void *p)
+{
+   pthread_t threads[NLEVEL2];
+   int curthread = 0;
+   int i;
+
+   pthread_mutex_lock(&protect_ptr_to_badly_shared_local);
+   if (ptr_to_badly_shared_local == NULL)
+      ptr_to_badly_shared_local = &badly_shared_local;
+   pthread_mutex_unlock(&protect_ptr_to_badly_shared_local);
+
+   for(i = 0; i < NLEVEL2; i++) {
+      pthread_create(&threads[curthread++], NULL, level2, NULL);
+   }
+
+   for(i = 0; i < curthread; i++)
+      pthread_join(threads[i], NULL);
+
+   return 0;
+}
+
+#define NLEVEL1 3
+int main(int argc, char*argv[])
+{
+   pthread_t threads[NLEVEL1];
+   int curthread = 0;
+   int i;
+
+   only_touch_stackvar = argc > 1;
+
+   for(i = 0; i < NLEVEL1; i++) {
+      pthread_create(&threads[curthread++], NULL, level1, NULL);
+   }
+
+   fprintf(stderr, "starting join in main\n");
+   fflush(stderr);
+   for(i = 0; i < curthread; i++)
+      pthread_join(threads[i], NULL);
+   fprintf(stderr, "finished join in main\n");
+   fflush(stderr);
+   return 0;
+}
+#else
+int main()
+{
+   fprintf(stderr, "starting join in main\n");
+   fflush(stderr);
+   /* do nothing */
+   fprintf(stderr, "finished join in main\n");
+   fflush(stderr);
+   return 0;
+}
+#endif
Index: helgrind/tests/Makefile.am
===================================================================
--- helgrind/tests/Makefile.am	(revision 14176)
+++ helgrind/tests/Makefile.am	(working copy)
@@ -102,7 +102,9 @@ 
 		tc23_bogus_condwait.stderr.exp \
 		tc23_bogus_condwait.stderr.exp-mips32 \
 	tc24_nonzero_sem.vgtest tc24_nonzero_sem.stdout.exp \
-		tc24_nonzero_sem.stderr.exp
+		tc24_nonzero_sem.stderr.exp \
+	tls_threads.vgtest tls_threads.stdout.exp \
+		tls_threads.stderr.exp
 
 # XXX: tc18_semabuse uses operations that are unsupported on Darwin.  It
 # should be conditionally compiled like tc20_verifywrap is.
@@ -144,7 +146,8 @@ 
 	tc19_shadowmem \
 	tc21_pthonce \
 	tc23_bogus_condwait \
-	tc24_nonzero_sem
+	tc24_nonzero_sem \
+	tls_threads
 
 # DDD: it seg faults, and then the Valgrind exit path hangs
 # JRS 29 July 09: it craps out in the stack unwinder, in
Index: helgrind/tests/tls_threads.stderr.exp
===================================================================
--- helgrind/tests/tls_threads.stderr.exp	(revision 0)
+++ helgrind/tests/tls_threads.stderr.exp	(revision 0)
@@ -0,0 +1,2 @@ 
+starting join in main
+finished join in main
Index: helgrind/tests/tls_threads.vgtest
===================================================================
--- helgrind/tests/tls_threads.vgtest	(revision 0)
+++ helgrind/tests/tls_threads.vgtest	(revision 0)
@@ -0,0 +1,2 @@ 
+prog: tls_threads
+vgopts: -q --sim-hints=deactivate-pthread-stack-cache-via-hack
Index: none/tests/cmdline2.stdout.exp
===================================================================
--- none/tests/cmdline2.stdout.exp	(revision 14176)
+++ none/tests/cmdline2.stdout.exp	(working copy)
@@ -88,7 +88,8 @@ 
     --vgdb-prefix=<prefix>    prefix for vgdb FIFOs [/tmp/vgdb-pipe]
     --run-libc-freeres=no|yes free up glibc memory at exit on Linux? [yes]
     --sim-hints=hint1,hint2,...  known hints:
-                                 lax-ioctls, enable-outer, fuse-compatible [none]
+         deactivate-pthread-stack-cache-via-hack, lax-ioctls, enable-outer,
+         no-inner-prefix, fuse-compatible [none]
     --fair-sched=no|yes|try   schedule threads fairly on multicore systems [no]
     --kernel-variant=variant1,variant2,...  known variants: bproc [none]
                               handle non-standard kernel variants
Index: none/tests/cmdline1.stdout.exp
===================================================================
--- none/tests/cmdline1.stdout.exp	(revision 14176)
+++ none/tests/cmdline1.stdout.exp	(working copy)
@@ -88,7 +88,8 @@ 
     --vgdb-prefix=<prefix>    prefix for vgdb FIFOs [/tmp/vgdb-pipe]
     --run-libc-freeres=no|yes free up glibc memory at exit on Linux? [yes]
     --sim-hints=hint1,hint2,...  known hints:
-                                 lax-ioctls, enable-outer, fuse-compatible [none]
+         deactivate-pthread-stack-cache-via-hack, lax-ioctls, enable-outer,
+         no-inner-prefix, fuse-compatible [none]
     --fair-sched=no|yes|try   schedule threads fairly on multicore systems [no]
     --kernel-variant=variant1,variant2,...  known variants: bproc [none]
                               handle non-standard kernel variants