libstdc++: basic_filebuf: don't flush more often than necessary.

  `basic_filebuf::xsputn` would bypass the buffer when passed a chunk of
size 1024 and above, seemingly as an optimisation.

This can have a significant performance impact if the overhead of a
`write` syscall is non-negligible, e.g. on a slow disk, on network
filesystems, or simply during IO contention because instead of flushing
every `BUFSIZ` (by default), we can flush every 1024 char.
The impact is even greater with custom larger buffers, e.g. for network
filesystems, because the code could issue `write` for example 1000X more
often than necessary with respect to the buffer size.
It also introduces a significant discontinuity in performance when
writing chunks of size 1024 and above.

See this reproducer which writes down a fixed number of chunks to a file
open with `O_SYNC` - to replicate high-latency `write` - for varying
size of chunks:

```
$ cat test_fstream_flush.cpp

int
main(int argc, char* argv[])
{
  assert(argc == 3);

  const auto* path = argv[1];
  const auto chunk_size = std::stoul(argv[2]);

  const auto fd =
    open(path, O_CREAT | O_TRUNC | O_WRONLY | O_SYNC | O_CLOEXEC, 0666);
  assert(fd >= 0);

  auto filebuf = __gnu_cxx::stdio_filebuf<char>(fd, std::ios_base::out);
  auto stream = std::ostream(&filebuf);

  const auto chunk = std::vector<char>(chunk_size);

  for (auto i = 0; i < 1'000; ++i) {
    stream.write(chunk.data(), chunk.size());
  }

  return 0;
}
```

```
$ g++ -o /tmp/test_fstream_flush test_fstream_flush.cpp -std=c++17
$ for i in $(seq 1021 1025); do echo -e "\n$i"; time /tmp/test_fstream_flush /tmp/foo $i; done

1021

real    0m0.997s
user    0m0.000s
sys     0m0.038s

1022

real    0m0.939s
user    0m0.005s
sys     0m0.032s

1023

real    0m0.954s
user    0m0.005s
sys     0m0.034s

1024

real    0m7.102s
user    0m0.040s
sys     0m0.192s

1025

real    0m7.204s
user    0m0.025s
sys     0m0.209s
```

See the huge drop in performance at the 1024-boundary.

An `strace` confirms that from size 1024 we effectively defeat
buffering:
1023-sized writes
```
$ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush /tmp/foo 1023 2>&1 | head -n5
openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC, 0666) = 3
writev(3, [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8184}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1023}], 2) = 9207
writev(3, [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8184}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1023}], 2) = 9207
writev(3, [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8184}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1023}], 2) = 9207
writev(3, [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8184}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1023}], 2) = 9207
```

vs 1024-sized writes
```
$ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush /tmp/foo 1024 2>&1 | head -n5
openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC, 0666) = 3
writev(3, [{iov_base=NULL, iov_len=0}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1024}], 2) = 1024
writev(3, [{iov_base="", iov_len=0}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1024}], 2) = 1024
writev(3, [{iov_base="", iov_len=0}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1024}], 2) = 1024
writev(3, [{iov_base="", iov_len=0}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1024}], 2) = 1024
```

Instead, it makes sense to only bypass the buffer if the amount of data
to be written is larger than the buffer capacity.

Closes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63746
---
 libstdc++-v3/include/bits/fstream.tcc         |  9 +--
 .../27_io/basic_filebuf/sputn/char/63746.cc   | 55 +++++++++++++++++++
 2 files changed, 58 insertions(+), 6 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc

Message ID	20220905225046.193799-1-cf.natali@gmail.com
State	Superseded
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C98033858D28 To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [PATCH] libstdc++: basic_filebuf: don't flush more often than necessary. Date: Mon, 5 Sep 2022 23:50:46 +0100 Message-Id: <20220905225046.193799-1-cf.natali@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list From: Charles-Francois Natali via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Charles-Francois Natali <cf.natali@gmail.com> Cc: Charles-Francois Natali <cf.natali@gmail.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
Series	libstdc++: basic_filebuf: don't flush more often than necessary. \| libstdc++: basic_filebuf: don't flush more often than necessary.

libstdc++: basic_filebuf: don't flush more often than necessary.

Commit Message

Comments

Patch