[v2] Single threaded stdio optimization

Message ID 594AA0A4.7010600@arm.com
State New, archived
Headers

Commit Message

Szabolcs Nagy June 21, 2017, 4:36 p.m. UTC
  Locking overhead can be significant in some stdio operations
that are common in single threaded applications.  I'd like
to address it in this release as it causes performance problems
to aarch64 users.  I prefer this high-level approach to be
reviewed, if that does not work then the aarch64 specific
low-level approach will be taken.

This patch adds the _IO_FLAGS2_NEED_LOCK flag to indicate if
an _IO_FILE object needs to be locked and some of the stdio
functions just jump to their _unlocked variant when not.  The
flag is set on all _IO_FILE objects when the first thread is
created.  A new libc abi symbol, _IO_enable_locks, does this.
(The abilist files will have to be updated in a separate patch).

The optimization can be applied to more stdio functions,
currently it is only applied to single flag check or single
non-wide-char standard operations.  The flag should probably
be never set for files with _IO_USER_LOCK, but that's just a
further optimization, not a correctness requirement.

The optimization is not valid if user code may run during
an stdio operation (interposed malloc, printf hooks, etc)
because the user code may create threads.  To handle those cases
the flag has to be set when callbacks are registered etc which
is more involved and those cases are less important to optimize
hence _IO_flockfile is not modified.

2017-06-21  Szabolcs Nagy  <szabolcs.nagy@arm.com>

	* libio/libio.h (_IO_FLAGS2_NEED_LOCK, _IO_need_lock): Define.
	* libio/libioP.h (_IO_enable_locks): Declare.
	* libio/Versions (_IO_enable_locks): New symbol.
	* libio/genops.c (_IO_enable_locks): Define.
	(_IO_old_init): Initialize flags2.
	* libio/feof.c.c (_IO_feof): Avoid locking when not needed.
	* libio/ferror.c (_IO_ferror): Likewise.
	* libio/fputc.c (fputc): Likewise.
	* libio/putc.c (_IO_putc): Likewise.
	* libio/getc.c (_IO_getc): Likewise.
	* libio/getchar.c (getchar): Likewise.
	* libio/ioungetc.c (_IO_ungetc): Likewise.
	* nptl/pthread_create.c (__pthread_create_2_1): Enable stdio locks.
	* libio/iofopncook.c (_IO_fopencookie): Enable locking for the file.
	* sysdeps/pthread/flockfile.c (__flockfile): Likewise.
  

Comments

Joseph Myers June 21, 2017, 4:52 p.m. UTC | #1
On Wed, 21 Jun 2017, Szabolcs Nagy wrote:

> created.  A new libc abi symbol, _IO_enable_locks, does this.
> (The abilist files will have to be updated in a separate patch).

I don't see any declarations or uses of this symbol in public headers, so 
why does it need to be at a public symbol version instead of 
GLIBC_PRIVATE?
  
Szabolcs Nagy June 21, 2017, 4:53 p.m. UTC | #2
On 21/06/17 17:52, Joseph Myers wrote:
> On Wed, 21 Jun 2017, Szabolcs Nagy wrote:
> 
>> created.  A new libc abi symbol, _IO_enable_locks, does this.
>> (The abilist files will have to be updated in a separate patch).
> 
> I don't see any declarations or uses of this symbol in public headers, so 
> why does it need to be at a public symbol version instead of 
> GLIBC_PRIVATE?
> 

oh i didn't know GLIBC_PRIVATE is the way to do libc internal symbols.

thanks.
  
Florian Weimer July 7, 2017, 12:38 p.m. UTC | #3
This patch breaks the attached test case.  Not all stream objects are
linked into the global list, so the locking upgrade doesn't happen for
some of them.

The global list management is quite expensive because the list is
single-linked, so we can't just add all stream objects when not yet
running multi-threaded.

I fear that this patch may have to be backed out again, until we can
address these issues.

Thanks,
Florian
  
Szabolcs Nagy July 7, 2017, 1:38 p.m. UTC | #4
On 07/07/17 13:38, Florian Weimer wrote:
> This patch breaks the attached test case.  Not all stream objects are
> linked into the global list, so the locking upgrade doesn't happen for
> some of them.
> 

i thought all files need to be flushed on exit
or on fflush(0), if glibc does not do that that's
observably non-conforming.

> The global list management is quite expensive because the list is
> single-linked, so we can't just add all stream objects when not yet
> running multi-threaded.
> 
> I fear that this patch may have to be backed out again, until we can
> address these issues.
> 

i can back it out, or try to create all the
problematic files with the optimization-disabling
flag set (in case that's simple.. will look into
it in a minute).

> Thanks,
> Florian
> 
> 
> memstream-thread.c
> 
> 
> #include <err.h>
> #include <errno.h>
> #include <pthread.h>
> #include <stdio.h>
> 
> enum
>   {
>     thread_count = 2,
>     byte_count = 1000000,
>   };
> 
> struct closure
> {
>   FILE *fp;
>   char b;
> };
> 
> static void *
> thread_func (void *closure)
> {
>   struct closure *args = closure;
> 
>   for (int i = 0; i < byte_count; ++i)
>     fputc (args->b, args->fp);
> 
>   return NULL;
> }
> 
> int
> main (void)
> {
>   char *buffer = NULL;
>   size_t buffer_length = 0;
>   FILE *fp = open_memstream (&buffer, &buffer_length);
>   if (fp == NULL)
>     err (1, "open_memstream");
> 
>   pthread_t threads[thread_count];
>   struct closure args[thread_count];
> 
>   for (int i = 0; i < thread_count; ++i)
>     {
>       args[i].fp = fp;
>       args[i].b = 'A' + i;
>       int ret = pthread_create (threads + i, NULL, thread_func, args + i);
>       if (ret != 0)
>         {
>           errno = ret;
>           err (1, "pthread_create");
>         }
>     }
> 
>   for (int i = 0; i < thread_count; ++i)
>     {
>       int ret = pthread_join (threads[i], NULL);
>       if (ret != 0)
>         {
>           errno = ret;
>           err (1, "pthread_join");
>         }
>     }
> 
>   fclose (fp);
> 
>   if (buffer_length != thread_count * byte_count)
>     errx (1, "unexpected number of written bytes: %zu (should be %d)",
>           buffer_length, thread_count * byte_count);
> 
>   size_t counts[thread_count] = { 0, };
>   for (size_t i = 0; i < buffer_length; ++i)
>     {
>       if (buffer[i] < 'A' || buffer[i] >= 'A' + thread_count)
>         errx (1, "written byte at %zu out of range: %d", i, buffer[i] & 0xFF);
>       ++counts[buffer[i] - 'A'];
>     }
>   for (int i = 0; i < thread_count; ++i)
>     if (counts[i] != byte_count)
>       errx (1, "incorrect write count for thread %d: %zu (should be %d)",
>             i, counts[i], byte_count);
> 
>   return 0;
> }
>
  
Szabolcs Nagy July 7, 2017, 3:02 p.m. UTC | #5
On 07/07/17 14:38, Szabolcs Nagy wrote:
> On 07/07/17 13:38, Florian Weimer wrote:
>> This patch breaks the attached test case.  Not all stream objects are
>> linked into the global list, so the locking upgrade doesn't happen for
>> some of them.
>>
> 
> i thought all files need to be flushed on exit
> or on fflush(0), if glibc does not do that that's
> observably non-conforming.
> 
>> The global list management is quite expensive because the list is
>> single-linked, so we can't just add all stream objects when not yet
>> running multi-threaded.
>>
>> I fear that this patch may have to be backed out again, until we can
>> address these issues.
>>
> 
> i can back it out, or try to create all the
> problematic files with the optimization-disabling
> flag set (in case that's simple.. will look into
> it in a minute).
> 

it seems both changing the optimization flag or adding
these streams to the list are easy.

i think glibc should just fix the open_memstream bug,
https://sourceware.org/bugzilla/show_bug.cgi?id=21735
i'll work on a patch.

(i don't expect a large number of open/close memstream
operations, so it should not have a huge impact)
  
Szabolcs Nagy July 7, 2017, 5:30 p.m. UTC | #6
On 07/07/17 16:02, Szabolcs Nagy wrote:
> On 07/07/17 14:38, Szabolcs Nagy wrote:
>> On 07/07/17 13:38, Florian Weimer wrote:
>>> This patch breaks the attached test case.  Not all stream objects are
>>> linked into the global list, so the locking upgrade doesn't happen for
>>> some of them.
>>>
>>
>> i thought all files need to be flushed on exit
>> or on fflush(0), if glibc does not do that that's
>> observably non-conforming.
>>
>>> The global list management is quite expensive because the list is
>>> single-linked, so we can't just add all stream objects when not yet
>>> running multi-threaded.
>>>
>>> I fear that this patch may have to be backed out again, until we can
>>> address these issues.
>>>
>>
>> i can back it out, or try to create all the
>> problematic files with the optimization-disabling
>> flag set (in case that's simple.. will look into
>> it in a minute).
>>
> 
> it seems both changing the optimization flag or adding
> these streams to the list are easy.
> 
> i think glibc should just fix the open_memstream bug,
> https://sourceware.org/bugzilla/show_bug.cgi?id=21735
> i'll work on a patch.
> 
> (i don't expect a large number of open/close memstream
> operations, so it should not have a huge impact)
> 

sorry, i cannot work on this until monday,
i hope it's ok to leave multithread open_memstream
broken for a few days.
  

Patch

diff --git a/libio/Versions b/libio/Versions
index 2a1d6e6c85..300f539243 100644
--- a/libio/Versions
+++ b/libio/Versions
@@ -152,6 +152,9 @@  libc {
     # f*
     fmemopen;
   }
+  GLIBC_2.26 {
+    _IO_enable_locks;
+  }
   GLIBC_PRIVATE {
     # Used by NPTL and librt
     __libc_fatal;
diff --git a/libio/feof.c b/libio/feof.c
index 9712a81e78..8890a5f51f 100644
--- a/libio/feof.c
+++ b/libio/feof.c
@@ -32,6 +32,8 @@  _IO_feof (_IO_FILE *fp)
 {
   int result;
   CHECK_FILE (fp, EOF);
+  if (!_IO_need_lock (fp))
+    return _IO_feof_unlocked (fp);
   _IO_flockfile (fp);
   result = _IO_feof_unlocked (fp);
   _IO_funlockfile (fp);
diff --git a/libio/ferror.c b/libio/ferror.c
index 01e3bd8e2b..d10fcd9fff 100644
--- a/libio/ferror.c
+++ b/libio/ferror.c
@@ -32,6 +32,8 @@  _IO_ferror (_IO_FILE *fp)
 {
   int result;
   CHECK_FILE (fp, EOF);
+  if (!_IO_need_lock (fp))
+    return _IO_ferror_unlocked (fp);
   _IO_flockfile (fp);
   result = _IO_ferror_unlocked (fp);
   _IO_funlockfile (fp);
diff --git a/libio/fputc.c b/libio/fputc.c
index a7cd682fe2..b72305c06f 100644
--- a/libio/fputc.c
+++ b/libio/fputc.c
@@ -32,6 +32,8 @@  fputc (int c, _IO_FILE *fp)
 {
   int result;
   CHECK_FILE (fp, EOF);
+  if (!_IO_need_lock (fp))
+    return _IO_putc_unlocked (c, fp);
   _IO_acquire_lock (fp);
   result = _IO_putc_unlocked (c, fp);
   _IO_release_lock (fp);
diff --git a/libio/genops.c b/libio/genops.c
index a466cfa337..132f1f1a7d 100644
--- a/libio/genops.c
+++ b/libio/genops.c
@@ -570,11 +570,28 @@  _IO_init (_IO_FILE *fp, int flags)
   _IO_init_internal (fp, flags);
 }
 
+static int stdio_needs_locking;
+
+void
+_IO_enable_locks (void)
+{
+  _IO_ITER i;
+
+  if (stdio_needs_locking)
+    return;
+  stdio_needs_locking = 1;
+  for (i = _IO_iter_begin (); i != _IO_iter_end (); i = _IO_iter_next (i))
+    _IO_iter_file (i)->_flags2 |= _IO_FLAGS2_NEED_LOCK;
+}
+libc_hidden_def (_IO_enable_locks)
+
 void
 _IO_old_init (_IO_FILE *fp, int flags)
 {
   fp->_flags = _IO_MAGIC|flags;
   fp->_flags2 = 0;
+  if (stdio_needs_locking)
+    fp->_flags2 |= _IO_FLAGS2_NEED_LOCK;
   fp->_IO_buf_base = NULL;
   fp->_IO_buf_end = NULL;
   fp->_IO_read_base = NULL;
diff --git a/libio/getc.c b/libio/getc.c
index b58fd62308..fd66ef93cf 100644
--- a/libio/getc.c
+++ b/libio/getc.c
@@ -34,6 +34,8 @@  _IO_getc (FILE *fp)
 {
   int result;
   CHECK_FILE (fp, EOF);
+  if (!_IO_need_lock (fp))
+    return _IO_getc_unlocked (fp);
   _IO_acquire_lock (fp);
   result = _IO_getc_unlocked (fp);
   _IO_release_lock (fp);
diff --git a/libio/getchar.c b/libio/getchar.c
index 5b41595d17..d79932114e 100644
--- a/libio/getchar.c
+++ b/libio/getchar.c
@@ -33,6 +33,8 @@  int
 getchar (void)
 {
   int result;
+  if (!_IO_need_lock (_IO_stdin))
+    return _IO_getc_unlocked (_IO_stdin);
   _IO_acquire_lock (_IO_stdin);
   result = _IO_getc_unlocked (_IO_stdin);
   _IO_release_lock (_IO_stdin);
diff --git a/libio/iofopncook.c b/libio/iofopncook.c
index a08dfdaa42..982f464a68 100644
--- a/libio/iofopncook.c
+++ b/libio/iofopncook.c
@@ -172,6 +172,8 @@  _IO_cookie_init (struct _IO_cookie_file *cfile, int read_write,
   _IO_mask_flags (&cfile->__fp.file, read_write,
 		  _IO_NO_READS+_IO_NO_WRITES+_IO_IS_APPENDING);
 
+  cfile->__fp.file._flags2 |= _IO_FLAGS2_NEED_LOCK;
+
   /* We use a negative number different from -1 for _fileno to mark that
      this special stream is not associated with a real file, but still has
      to be treated as such.  */
diff --git a/libio/ioungetc.c b/libio/ioungetc.c
index 951064fa12..917cad8abb 100644
--- a/libio/ioungetc.c
+++ b/libio/ioungetc.c
@@ -33,6 +33,8 @@  _IO_ungetc (int c, _IO_FILE *fp)
   CHECK_FILE (fp, EOF);
   if (c == EOF)
     return EOF;
+  if (!_IO_need_lock (fp))
+    return _IO_sputbackc (fp, (unsigned char) c);
   _IO_acquire_lock (fp);
   result = _IO_sputbackc (fp, (unsigned char) c);
   _IO_release_lock (fp);
diff --git a/libio/libio.h b/libio/libio.h
index 518ffd8e44..14bcb92332 100644
--- a/libio/libio.h
+++ b/libio/libio.h
@@ -119,6 +119,7 @@ 
 # define _IO_FLAGS2_SCANF_STD 16
 # define _IO_FLAGS2_NOCLOSE 32
 # define _IO_FLAGS2_CLOEXEC 64
+# define _IO_FLAGS2_NEED_LOCK 128
 #endif
 
 /* These are "formatting flags" matching the iostream fmtflags enum values. */
@@ -451,6 +452,9 @@  extern int _IO_ftrylockfile (_IO_FILE *) __THROW;
 #define _IO_cleanup_region_end(_Doit) /**/
 #endif
 
+#define _IO_need_lock(_fp) \
+  (((_fp)->_flags2 & _IO_FLAGS2_NEED_LOCK) != 0)
+
 extern int _IO_vfscanf (_IO_FILE * __restrict, const char * __restrict,
 			_IO_va_list, int *__restrict);
 extern int _IO_vfprintf (_IO_FILE *__restrict, const char *__restrict,
diff --git a/libio/libioP.h b/libio/libioP.h
index eb93418c8d..163dfb1386 100644
--- a/libio/libioP.h
+++ b/libio/libioP.h
@@ -444,6 +444,8 @@  extern void _IO_list_unlock (void) __THROW;
 libc_hidden_proto (_IO_list_unlock)
 extern void _IO_list_resetlock (void) __THROW;
 libc_hidden_proto (_IO_list_resetlock)
+extern void _IO_enable_locks (void) __THROW;
+libc_hidden_proto (_IO_enable_locks)
 
 /* Default jumptable functions. */
 
@@ -750,7 +752,7 @@  extern _IO_off64_t _IO_seekpos_unlocked (_IO_FILE *, _IO_off64_t, int)
 
 #if _G_HAVE_MMAP
 
-# ifdef _LIBC
+# if IS_IN (libc)
 /* When using this code in the GNU libc we must not pollute the name space.  */
 #  define mmap __mmap
 #  define munmap __munmap
diff --git a/libio/putc.c b/libio/putc.c
index b591c5538b..6e1fdeef3a 100644
--- a/libio/putc.c
+++ b/libio/putc.c
@@ -25,6 +25,8 @@  _IO_putc (int c, _IO_FILE *fp)
 {
   int result;
   CHECK_FILE (fp, EOF);
+  if (!_IO_need_lock (fp))
+    return _IO_putc_unlocked (c, fp);
   _IO_acquire_lock (fp);
   result = _IO_putc_unlocked (c, fp);
   _IO_release_lock (fp);
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index c7d1b8f413..0b3fa942f2 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -32,6 +32,7 @@ 
 #include <exit-thread.h>
 #include <default-sched.h>
 #include <futex-internal.h>
+#include "libioP.h"
 
 #include <shlib-compat.h>
 
@@ -756,6 +757,9 @@  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
         collect_default_sched (pd);
     }
 
+  if (__glibc_unlikely (__nptl_nthreads == 1))
+    _IO_enable_locks ();
+
   /* Pass the descriptor to the caller.  */
   *newthread = (pthread_t) pd;
 
diff --git a/sysdeps/pthread/flockfile.c b/sysdeps/pthread/flockfile.c
index 7fe8e99161..a8e6c28ed9 100644
--- a/sysdeps/pthread/flockfile.c
+++ b/sysdeps/pthread/flockfile.c
@@ -25,6 +25,7 @@ 
 void
 __flockfile (FILE *stream)
 {
+  stream->_flags2 |= _IO_FLAGS2_NEED_LOCK;
   _IO_lock_lock (*stream->_lock);
 }
 strong_alias (__flockfile, _IO_flockfile)