[v2,1/2] libstdc++: use copy_file_range, improve sendfile in filesystem::copy_file

Message ID CAFqe=zKDb8AV9dubh6Jqokg_qynXWfVsENxhDd45Nm8bi7oyZQ@mail.gmail.com
State New
Delegated to: Jonathan Wakely
Headers
Series [v2,1/2] libstdc++: use copy_file_range, improve sendfile in filesystem::copy_file |

Commit Message

Jannik Glückert March 15, 2023, 7:29 p.m. UTC
  This iteration improves error handling for copy_file_range,
particularly around undocumented error codes in earlier kernel
versions.
Additionally this fixes the userspace copy fallback to handle
zero-length files such as in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178.

Lastly, the case "src gets resized during the copy loop" is now
considered and will return true once the loop hits EOF (this is the
only situation, aside from a zero-length src, where sendfile and
copy_file_range return 0).

Best
Jannik
  

Comments

Jonathan Wakely March 20, 2023, 3:16 p.m. UTC | #1
On Wed, 15 Mar 2023 at 19:30, Jannik Glückert via Libstdc++ <
libstdc++@gcc.gnu.org> wrote:

> This iteration improves error handling for copy_file_range,
> particularly around undocumented error codes in earlier kernel
> versions.
> Additionally this fixes the userspace copy fallback to handle
> zero-length files such as in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178.
>
> Lastly, the case "src gets resized during the copy loop" is now
> considered and will return true once the loop hits EOF (this is the
> only situation, aside from a zero-length src, where sendfile and
> copy_file_range return 0).
>

I've applied this patch (with some whitespace fixes) and started testing
it. I'm seeing some regressions:

FAIL: experimental/filesystem/operations/copy.cc execution test
FAIL: experimental/filesystem/operations/copy_file.cc execution test

The failures in the log look like:

terminate called after throwing an instance of
'std::experimental::filesystem::v1::__cxx11::filesystem_error'
 what():  filesystem error: cannot copy: Input/output error
[filesystem-test.copy.2900321341.ySWT77]
[filesystem-test.copy.2900321342.vjeAar]
FAIL: experimental/filesystem/operations/copy.cc execution test
  
Jonathan Wakely March 20, 2023, 3:18 p.m. UTC | #2
On Mon, 20 Mar 2023 at 15:16, Jonathan Wakely wrote:

>
>
> On Wed, 15 Mar 2023 at 19:30, Jannik Glückert via Libstdc++ <
> libstdc++@gcc.gnu.org> wrote:
>
>> This iteration improves error handling for copy_file_range,
>> particularly around undocumented error codes in earlier kernel
>> versions.
>> Additionally this fixes the userspace copy fallback to handle
>> zero-length files such as in
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178.
>>
>> Lastly, the case "src gets resized during the copy loop" is now
>> considered and will return true once the loop hits EOF (this is the
>> only situation, aside from a zero-length src, where sendfile and
>> copy_file_range return 0).
>>
>
> I've applied this patch (with some whitespace fixes) and started testing
> it. I'm seeing some regressions:
>
> FAIL: experimental/filesystem/operations/copy.cc execution test
> FAIL: experimental/filesystem/operations/copy_file.cc execution test
>
> The failures in the log look like:
>
> terminate called after throwing an instance of
> 'std::experimental::filesystem::v1::__cxx11::filesystem_error'
>  what():  filesystem error: cannot copy: Input/output error
> [filesystem-test.copy.2900321341.ySWT77]
> [filesystem-test.copy.2900321342.vjeAar]
> FAIL: experimental/filesystem/operations/copy.cc execution test
>

Not just the Filesystem TS versions of those tests, but the std::filesystem
ones too:

FAIL: 27_io/filesystem/operations/copy.cc execution test
FAIL: 27_io/filesystem/operations/copy_file.cc execution test
  
Jonathan Wakely March 20, 2023, 10:27 p.m. UTC | #3
On 06/03/23 20:52 +0100, Jannik Glückert wrote:
>we were previously only using sendfile for files smaller than 2GB, as
>sendfile needs to be called repeatedly for files bigger than that.
>
>some quick numbers, copying a 16GB file, average of 10 repetitions:
>    old:
>        real: 13.4s
>        user: 0.14s
>        sys : 7.43s
>    new:
>        real: 8.90s
>        user: 0.00s
>        sys : 3.68s
>
>Additionally, this fixes
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178
>
>libstdc++-v3/ChangeLog:
>
>        * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): define
>        * config.h.in: Regenerate.
>        * configure: Regenerate.
>        * src/filesystem/ops-common.h: enable sendfile for files
>          >2GB in std::filesystem::copy_file, skip zero-length files
>
>Signed-off-by: Jannik Glückert <jannik.glueckert@gmail.com>
>---
> libstdc++-v3/acinclude.m4                |  51 +++++----
> libstdc++-v3/config.h.in                 |   3 +
> libstdc++-v3/configure                   | 127 ++++++++++++++++-------
> libstdc++-v3/src/filesystem/ops-common.h |  86 ++++++++-------
> 4 files changed, 175 insertions(+), 92 deletions(-)
>
>diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
>index 5136c0571e8..85a09a5a869 100644
>--- a/libstdc++-v3/acinclude.m4
>+++ b/libstdc++-v3/acinclude.m4
>@@ -4583,6 +4583,7 @@ dnl  _GLIBCXX_USE_FCHMOD
> dnl  _GLIBCXX_USE_FCHMODAT
> dnl  _GLIBCXX_USE_SENDFILE
> dnl  HAVE_LINK
>+dnl  HAVE_LSEEK
> dnl  HAVE_READLINK
> dnl  HAVE_SYMLINK
> dnl
>@@ -4718,25 +4719,6 @@ dnl
>   if test $glibcxx_cv_fchmodat = yes; then
>     AC_DEFINE(_GLIBCXX_USE_FCHMODAT, 1, [Define if fchmodat is available in <sys/stat.h>.])
>   fi
>-dnl
>-  AC_CACHE_CHECK([for sendfile that can copy files],
>-    glibcxx_cv_sendfile, [dnl
>-    case "${target_os}" in
>-      gnu* | linux* | solaris* | uclinux*)
>-	GCC_TRY_COMPILE_OR_LINK(
>-	  [#include <sys/sendfile.h>],
>-	  [sendfile(1, 2, (off_t*)0, sizeof 1);],
>-	  [glibcxx_cv_sendfile=yes],
>-	  [glibcxx_cv_sendfile=no])
>-	;;
>-      *)
>-	glibcxx_cv_sendfile=no
>-	;;
>-    esac
>-  ])
>-  if test $glibcxx_cv_sendfile = yes; then
>-    AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in <sys/sendfile.h>.])
>-  fi
> dnl
>   AC_CACHE_CHECK([for link],
>     glibcxx_cv_link, [dnl
>@@ -4749,6 +4731,18 @@ dnl
>   if test $glibcxx_cv_link = yes; then
>     AC_DEFINE(HAVE_LINK, 1, [Define if link is available in <unistd.h>.])
>   fi
>+dnl
>+  AC_CACHE_CHECK([for lseek],
>+    glibcxx_cv_lseek, [dnl
>+    GCC_TRY_COMPILE_OR_LINK(
>+      [#include <unistd.h>],
>+      [lseek(1, 0, SEEK_SET);],
>+      [glibcxx_cv_lseek=yes],
>+      [glibcxx_cv_lseek=no])
>+  ])
>+  if test $glibcxx_cv_lseek = yes; then
>+    AC_DEFINE(HAVE_LSEEK, 1, [Define if lseek is available in <unistd.h>.])
>+  fi
> dnl
>   AC_CACHE_CHECK([for readlink],
>     glibcxx_cv_readlink, [dnl
>@@ -4785,6 +4779,25 @@ dnl
>   if test $glibcxx_cv_truncate = yes; then
>     AC_DEFINE(HAVE_TRUNCATE, 1, [Define if truncate is available in <unistd.h>.])
>   fi
>+dnl
>+  AC_CACHE_CHECK([for sendfile that can copy files],
>+    glibcxx_cv_sendfile, [dnl
>+    case "${target_os}" in
>+      gnu* | linux* | solaris* | uclinux*)
>+	GCC_TRY_COMPILE_OR_LINK(
>+	  [#include <sys/sendfile.h>],
>+	  [sendfile(1, 2, (off_t*)0, sizeof 1);],
>+	  [glibcxx_cv_sendfile=yes],
>+	  [glibcxx_cv_sendfile=no])
>+	;;
>+      *)
>+	glibcxx_cv_sendfile=no
>+	;;
>+    esac
>+  ])
>+  if test $glibcxx_cv_sendfile = yes && test $glibcxx_cv_lseek = yes; then
>+    AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in <sys/sendfile.h>.])
>+  fi
> dnl
>   AC_CACHE_CHECK([for fdopendir],
>     glibcxx_cv_fdopendir, [dnl
>diff --git a/libstdc++-v3/src/filesystem/ops-common.h b/libstdc++-v3/src/filesystem/ops-common.h
>index abbfca43e5c..9e1b1d41dc5 100644
>--- a/libstdc++-v3/src/filesystem/ops-common.h
>+++ b/libstdc++-v3/src/filesystem/ops-common.h
>@@ -51,6 +51,7 @@
> # include <ext/stdio_filebuf.h>
> # ifdef _GLIBCXX_USE_SENDFILE
> #  include <sys/sendfile.h> // sendfile
>+#  include <unistd.h> // lseek
> # endif
> #endif
>
>@@ -358,6 +359,32 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
>   }
>
> #ifdef NEED_DO_COPY_FILE
>+#if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
>+  bool
>+  copy_file_sendfile(int fd_in, int fd_out, size_t length) noexcept
>+  {
>+    // a zero-length file is either empty, or not copyable by this syscall
>+    // return early to avoid the syscall cost
>+    if (length == 0)
>+      {
>+        errno = EINVAL;
>+        return false;
>+      }
>+    size_t bytes_left = length;
>+    off_t offset = 0;
>+    ssize_t bytes_copied;
>+    do {
>+      bytes_copied = ::sendfile(fd_out, fd_in, &offset, bytes_left);
>+      bytes_left -= bytes_copied;
>+    } while (bytes_left > 0 && bytes_copied > 0);
>+    if (bytes_copied < 0)
>+      {
>+        ::lseek(fd_out, 0, SEEK_SET);
>+        return false;
>+      }
>+    return true;
>+  }
>+#endif
>   bool
>   do_copy_file(const char_type* from, const char_type* to,
> 	       std::filesystem::copy_options_existing_file options,
>@@ -498,28 +525,30 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
> 	return false;
>       }
>
>-    size_t count = from_st->st_size;
>+    bool has_copied = false;
>+
> #if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
>-    off_t offset = 0;
>-    ssize_t n = ::sendfile(out.fd, in.fd, &offset, count);
>-    if (n < 0 && errno != ENOSYS && errno != EINVAL)
>+    if (!has_copied)
>+      has_copied = copy_file_sendfile(in.fd, out.fd, from_st->st_size);
>+    if (!has_copied)
>       {
>-	ec.assign(errno, std::generic_category());
>-	return false;
>+      if (errno != ENOSYS && errno != EINVAL)

Indentation is wrong here.

>+        {
>+          ec.assign(errno, std::generic_category());
>+          return false;
>+        }
>       }
>-    if ((size_t)n == count)
>+#endif
>+
>+    if (has_copied)
>       {
>-	if (!out.close() || !in.close())
>-	  {
>-	    ec.assign(errno, std::generic_category());
>-	    return false;
>-	  }
>-	ec.clear();
>-	return true;
>+        if (!out.close() || !in.close())
>+          {
>+	          ec.assign(errno, std::generic_category());

Indentation is wrong here. Tabstop should be 8, so this should have
one tab followed by four spaces.

>+	          return false;
>+          }

You need to ec.clear() before returning true here.

>+        return true;
>       }
>-    else if (n > 0)
>-      count -= n;
>-#endif // _GLIBCXX_USE_SENDFILE
>
>     using std::ios;
>     __gnu_cxx::stdio_filebuf<char> sbin(in.fd, ios::in|ios::binary);
>@@ -530,29 +559,12 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
>     if (sbout.is_open())
>       out.fd = -1;
>
>-#ifdef _GLIBCXX_USE_SENDFILE
>-    if (n != 0)
>+    if (!(std::ostream(&sbout) << &sbin))

As discussed on IRC, removing the `if (count && ...)` part of this
condition breaks copying empty files, because the semantics of
ostream::operator<<(streambuf*) are:
  
"If the function inserts no characters, it calls setstate(failbit)"

So let's keep the original behaviour here, and deal with fixing PR
108178 in a separate patch.


>       {
>-	if (n < 0)
>-	  n = 0;
>-
>-	const auto p1 = sbin.pubseekoff(n, ios::beg, ios::in);
>-	const auto p2 = sbout.pubseekoff(n, ios::beg, ios::out);
>-
>-	const std::streampos errpos(std::streamoff(-1));
>-	if (p1 == errpos || p2 == errpos)
>-	  {
>-	    ec = std::make_error_code(std::errc::io_error);
>-	    return false;
>-	  }
>+  ec = std::make_error_code(std::errc::io_error);
>+  return false;

Indentation is wrong here.

>       }
>-#endif
>
>-    if (count && !(std::ostream(&sbout) << &sbin))
>-      {
>-	ec = std::make_error_code(std::errc::io_error);
>-	return false;
>-      }
>     if (!sbout.close() || !sbin.close())
>       {
> 	ec.assign(errno, std::generic_category());
>-- 
>2.39.2
  
Jonathan Wakely March 20, 2023, 10:30 p.m. UTC | #4
On 20/03/23 22:27 +0000, Jonathan Wakely wrote:
>On 06/03/23 20:52 +0100, Jannik Glückert wrote:
>>we were previously only using sendfile for files smaller than 2GB, as
>>sendfile needs to be called repeatedly for files bigger than that.
>>
>>some quick numbers, copying a 16GB file, average of 10 repetitions:
>>   old:
>>       real: 13.4s
>>       user: 0.14s
>>       sys : 7.43s
>>   new:
>>       real: 8.90s
>>       user: 0.00s
>>       sys : 3.68s
>>
>>Additionally, this fixes
>>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178
>>
>>libstdc++-v3/ChangeLog:
>>
>>       * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): define
>>       * config.h.in: Regenerate.
>>       * configure: Regenerate.
>>       * src/filesystem/ops-common.h: enable sendfile for files
>>         >2GB in std::filesystem::copy_file, skip zero-length files

Also, the ChangeLog entry needs to be indented with tabs, name the
changed functions, and should be complete sentences, e.g.

	* acinclude.m4 (_GLIBCXX_HAVE_LSEEK): Define.
	* config.h.in: Regenerate.
	* configure: Regenerate.
	* src/filesystem/ops-common.h (copy_file_sendfile): Define new
	function for sendfile logic. Loop to support large files. Skip
	zero-length files.
	(do_copy_file): Use it.
  
Jonathan Wakely March 22, 2023, 12:14 p.m. UTC | #5
On Mon, 20 Mar 2023 at 22:30, Jonathan Wakely via Libstdc++ <
libstdc++@gcc.gnu.org> wrote:

> On 20/03/23 22:27 +0000, Jonathan Wakely wrote:
> >On 06/03/23 20:52 +0100, Jannik Glückert wrote:
> >>we were previously only using sendfile for files smaller than 2GB, as
> >>sendfile needs to be called repeatedly for files bigger than that.
> >>
> >>some quick numbers, copying a 16GB file, average of 10 repetitions:
> >>   old:
> >>       real: 13.4s
> >>       user: 0.14s
> >>       sys : 7.43s
> >>   new:
> >>       real: 8.90s
> >>       user: 0.00s
> >>       sys : 3.68s
> >>
> >>Additionally, this fixes
> >>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178
> >>
> >>libstdc++-v3/ChangeLog:
> >>
> >>       * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): define
> >>       * config.h.in: Regenerate.
> >>       * configure: Regenerate.
> >>       * src/filesystem/ops-common.h: enable sendfile for files
> >>         >2GB in std::filesystem::copy_file, skip zero-length files
>
> Also, the ChangeLog entry needs to be indented with tabs, name the
> changed functions, and should be complete sentences, e.g.
>
>         * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): Define.
>         * config.h.in: Regenerate.
>         * configure: Regenerate.
>         * src/filesystem/ops-common.h (copy_file_sendfile): Define new
>         function for sendfile logic. Loop to support large files. Skip
>         zero-length files.
>         (do_copy_file): Use it.
>
>
Here's what I plan to commit in a few weeks when GCC 14 Stage 1 opens.
commit c177825f952bb353cdf412f46f45539b8992abe1
Author: Jannik Glückert <jannik.glueckert@gmail.com>
Date:   Mon Mar 6 19:52:08 2023

    libstdc++: Also use sendfile for big files
    
    We were previously only using sendfile for files smaller than 2GB, as
    sendfile needs to be called repeatedly for files bigger than that.
    
    Some quick numbers, copying a 16GB file, average of 10 repetitions:
        old:
            real: 13.4s
            user: 0.14s
            sys : 7.43s
        new:
            real: 8.90s
            user: 0.00s
            sys : 3.68s
    
    libstdc++-v3/ChangeLog:
    
            * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): Define.
            * config.h.in: Regenerate.
            * configure: Regenerate.
            * src/filesystem/ops-common.h (copy_file_sendfile): Define new
            function for sendfile logic. Loop to support large files. Skip
            zero-length files.
            (do_copy_file): Use it.
    
    Signed-off-by: Jannik Glückert <jannik.glueckert@gmail.com>

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 5136c0571e8..85a09a5a869 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4583,6 +4583,7 @@ dnl  _GLIBCXX_USE_FCHMOD
 dnl  _GLIBCXX_USE_FCHMODAT
 dnl  _GLIBCXX_USE_SENDFILE
 dnl  HAVE_LINK
+dnl  HAVE_LSEEK
 dnl  HAVE_READLINK
 dnl  HAVE_SYMLINK
 dnl
@@ -4718,25 +4719,6 @@ dnl
   if test $glibcxx_cv_fchmodat = yes; then
     AC_DEFINE(_GLIBCXX_USE_FCHMODAT, 1, [Define if fchmodat is available in <sys/stat.h>.])
   fi
-dnl
-  AC_CACHE_CHECK([for sendfile that can copy files],
-    glibcxx_cv_sendfile, [dnl
-    case "${target_os}" in
-      gnu* | linux* | solaris* | uclinux*)
-	GCC_TRY_COMPILE_OR_LINK(
-	  [#include <sys/sendfile.h>],
-	  [sendfile(1, 2, (off_t*)0, sizeof 1);],
-	  [glibcxx_cv_sendfile=yes],
-	  [glibcxx_cv_sendfile=no])
-	;;
-      *)
-	glibcxx_cv_sendfile=no
-	;;
-    esac
-  ])
-  if test $glibcxx_cv_sendfile = yes; then
-    AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in <sys/sendfile.h>.])
-  fi
 dnl
   AC_CACHE_CHECK([for link],
     glibcxx_cv_link, [dnl
@@ -4749,6 +4731,18 @@ dnl
   if test $glibcxx_cv_link = yes; then
     AC_DEFINE(HAVE_LINK, 1, [Define if link is available in <unistd.h>.])
   fi
+dnl
+  AC_CACHE_CHECK([for lseek],
+    glibcxx_cv_lseek, [dnl
+    GCC_TRY_COMPILE_OR_LINK(
+      [#include <unistd.h>],
+      [lseek(1, 0, SEEK_SET);],
+      [glibcxx_cv_lseek=yes],
+      [glibcxx_cv_lseek=no])
+  ])
+  if test $glibcxx_cv_lseek = yes; then
+    AC_DEFINE(HAVE_LSEEK, 1, [Define if lseek is available in <unistd.h>.])
+  fi
 dnl
   AC_CACHE_CHECK([for readlink],
     glibcxx_cv_readlink, [dnl
@@ -4785,6 +4779,25 @@ dnl
   if test $glibcxx_cv_truncate = yes; then
     AC_DEFINE(HAVE_TRUNCATE, 1, [Define if truncate is available in <unistd.h>.])
   fi
+dnl
+  AC_CACHE_CHECK([for sendfile that can copy files],
+    glibcxx_cv_sendfile, [dnl
+    case "${target_os}" in
+      gnu* | linux* | solaris* | uclinux*)
+	GCC_TRY_COMPILE_OR_LINK(
+	  [#include <sys/sendfile.h>],
+	  [sendfile(1, 2, (off_t*)0, sizeof 1);],
+	  [glibcxx_cv_sendfile=yes],
+	  [glibcxx_cv_sendfile=no])
+	;;
+      *)
+	glibcxx_cv_sendfile=no
+	;;
+    esac
+  ])
+  if test $glibcxx_cv_sendfile = yes && test $glibcxx_cv_lseek = yes; then
+    AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in <sys/sendfile.h>.])
+  fi
 dnl
   AC_CACHE_CHECK([for fdopendir],
     glibcxx_cv_fdopendir, [dnl
diff --git a/libstdc++-v3/src/filesystem/ops-common.h b/libstdc++-v3/src/filesystem/ops-common.h
index c95511b5c95..7874a95488a 100644
--- a/libstdc++-v3/src/filesystem/ops-common.h
+++ b/libstdc++-v3/src/filesystem/ops-common.h
@@ -51,6 +51,7 @@
 # include <ext/stdio_filebuf.h>
 # ifdef _GLIBCXX_USE_SENDFILE
 #  include <sys/sendfile.h> // sendfile
+#  include <unistd.h> // lseek
 # endif
 #endif
 
@@ -358,6 +359,34 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
   }
 
 #ifdef NEED_DO_COPY_FILE
+#if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  bool
+  copy_file_sendfile(int fd_in, int fd_out, size_t length) noexcept
+  {
+    // a zero-length file is either empty, or not copyable by this syscall
+    // return early to avoid the syscall cost
+    if (length == 0)
+      {
+	errno = EINVAL;
+	return false;
+      }
+    size_t bytes_left = length;
+    off_t offset = 0;
+    ssize_t bytes_copied;
+    do
+      {
+	bytes_copied = ::sendfile(fd_out, fd_in, &offset, bytes_left);
+	bytes_left -= bytes_copied;
+      }
+    while (bytes_left > 0 && bytes_copied > 0);
+    if (bytes_copied < 0)
+      {
+	::lseek(fd_out, 0, SEEK_SET);
+	return false;
+      }
+    return true;
+  }
+#endif
   bool
   do_copy_file(const char_type* from, const char_type* to,
 	       std::filesystem::copy_options_existing_file options,
@@ -498,28 +527,31 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
 	return false;
       }
 
-    size_t count = from_st->st_size;
+    bool has_copied = false;
+
 #if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
-    off_t offset = 0;
-    ssize_t n = ::sendfile(out.fd, in.fd, &offset, count);
-    if (n < 0 && errno != ENOSYS && errno != EINVAL)
+    if (!has_copied)
+      has_copied = copy_file_sendfile(in.fd, out.fd, from_st->st_size);
+    if (!has_copied)
       {
-	ec.assign(errno, std::generic_category());
-	return false;
-      }
-    if ((size_t)n == count)
-      {
-	if (!out.close() || !in.close())
+	if (errno != ENOSYS && errno != EINVAL)
 	  {
 	    ec.assign(errno, std::generic_category());
 	    return false;
 	  }
-	ec.clear();
-	return true;
       }
-    else if (n > 0)
-      count -= n;
-#endif // _GLIBCXX_USE_SENDFILE
+#endif
+
+    if (has_copied)
+      {
+        if (!out.close() || !in.close())
+          {
+	    ec.assign(errno, std::generic_category());
+	    return false;
+          }
+	ec.clear();
+        return true;
+      }
 
     using std::ios;
     __gnu_cxx::stdio_filebuf<char> sbin(in.fd, ios::in|ios::binary);
@@ -530,29 +562,12 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
     if (sbout.is_open())
       out.fd = -1;
 
-#ifdef _GLIBCXX_USE_SENDFILE
-    if (n != 0)
-      {
-	if (n < 0)
-	  n = 0;
-
-	const auto p1 = sbin.pubseekoff(n, ios::beg, ios::in);
-	const auto p2 = sbout.pubseekoff(n, ios::beg, ios::out);
-
-	const std::streampos errpos(std::streamoff(-1));
-	if (p1 == errpos || p2 == errpos)
-	  {
-	    ec = std::make_error_code(std::errc::io_error);
-	    return false;
-	  }
-      }
-#endif
-
-    if (count && !(std::ostream(&sbout) << &sbin))
+    if (from_st->st_size && !(std::ostream(&sbout) << &sbin))
       {
 	ec = std::make_error_code(std::errc::io_error);
 	return false;
       }
+
     if (!sbout.close() || !sbin.close())
       {
 	ec.assign(errno, std::generic_category());
  
Jonathan Wakely March 22, 2023, 12:18 p.m. UTC | #6
On Wed, 22 Mar 2023 at 12:14, Jonathan Wakely wrote:

>
>
> On Mon, 20 Mar 2023 at 22:30, Jonathan Wakely via Libstdc++ <
> libstdc++@gcc.gnu.org> wrote:
>
>> On 20/03/23 22:27 +0000, Jonathan Wakely wrote:
>> >On 06/03/23 20:52 +0100, Jannik Glückert wrote:
>> >>we were previously only using sendfile for files smaller than 2GB, as
>> >>sendfile needs to be called repeatedly for files bigger than that.
>> >>
>> >>some quick numbers, copying a 16GB file, average of 10 repetitions:
>> >>   old:
>> >>       real: 13.4s
>> >>       user: 0.14s
>> >>       sys : 7.43s
>> >>   new:
>> >>       real: 8.90s
>> >>       user: 0.00s
>> >>       sys : 3.68s
>> >>
>> >>Additionally, this fixes
>> >>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178
>> >>
>> >>libstdc++-v3/ChangeLog:
>> >>
>> >>       * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): define
>> >>       * config.h.in: Regenerate.
>> >>       * configure: Regenerate.
>> >>       * src/filesystem/ops-common.h: enable sendfile for files
>> >>         >2GB in std::filesystem::copy_file, skip zero-length files
>>
>> Also, the ChangeLog entry needs to be indented with tabs, name the
>> changed functions, and should be complete sentences, e.g.
>>
>>         * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): Define.
>>         * config.h.in: Regenerate.
>>         * configure: Regenerate.
>>         * src/filesystem/ops-common.h (copy_file_sendfile): Define new
>>         function for sendfile logic. Loop to support large files. Skip
>>         zero-length files.
>>         (do_copy_file): Use it.
>>
>>
> Here's what I plan to commit in a few weeks when GCC 14 Stage 1 opens.
>
>
>
And similarly for the copy_file_range change.
commit 2ad500e358c03ef63af1540d44645df582a4809c
Author: Jannik Glückert <jannik.glueckert@gmail.com>
Date:   Wed Mar 8 18:37:43 2023

    libstdc++: Use copy_file_range for filesystem::copy_file
    
    copy_file_range is a recent-ish syscall for copying files. It is similar
    to sendfile but allows filesystem-specific optimizations. Common are:
    Reflinks: BTRFS, XFS, ZFS (does not implement the syscall yet)
    Server-side copy: NFS, SMB, Ceph
    
    If copy_file_range is not available for the given files, fall back to
    sendfile / userspace copy.
    
    libstdc++-v3/ChangeLog:
    
            * acinclude.m4 (_GLIBCXX_USE_COPY_FILE_RANGE): Define.
            * config.h.in: Regenerate.
            * configure: Regenerate.
            * src/filesystem/ops-common.h (copy_file_copy_file_range):
            Define new function.
            (do_copy_file): Use it.
    
    Signed-off-by: Jannik Glückert <jannik.glueckert@gmail.com>

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 85a09a5a869..4cf02dc6e4e 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4581,6 +4581,7 @@ dnl  _GLIBCXX_USE_UTIMENSAT
 dnl  _GLIBCXX_USE_ST_MTIM
 dnl  _GLIBCXX_USE_FCHMOD
 dnl  _GLIBCXX_USE_FCHMODAT
+dnl  _GLIBCXX_USE_COPY_FILE_RANGE
 dnl  _GLIBCXX_USE_SENDFILE
 dnl  HAVE_LINK
 dnl  HAVE_LSEEK
@@ -4779,6 +4780,25 @@ dnl
   if test $glibcxx_cv_truncate = yes; then
     AC_DEFINE(HAVE_TRUNCATE, 1, [Define if truncate is available in <unistd.h>.])
   fi
+dnl
+  AC_CACHE_CHECK([for copy_file_range that can copy files],
+    glibcxx_cv_copy_file_range, [dnl
+    case "${target_os}" in
+      linux*)
+	GCC_TRY_COMPILE_OR_LINK(
+	  [#include <unistd.h>],
+	  [copy_file_range(1, nullptr, 2, nullptr, 1, 0);],
+	  [glibcxx_cv_copy_file_range=yes],
+	  [glibcxx_cv_copy_file_range=no])
+	;;
+      *)
+	glibcxx_cv_copy_file_range=no
+	;;
+    esac
+  ])
+  if test $glibcxx_cv_copy_file_range = yes; then
+    AC_DEFINE(_GLIBCXX_USE_COPY_FILE_RANGE, 1, [Define if copy_file_range is available in <unistd.h>.])
+  fi
 dnl
   AC_CACHE_CHECK([for sendfile that can copy files],
     glibcxx_cv_sendfile, [dnl
diff --git a/libstdc++-v3/src/filesystem/ops-common.h b/libstdc++-v3/src/filesystem/ops-common.h
index 7874a95488a..906436b484e 100644
--- a/libstdc++-v3/src/filesystem/ops-common.h
+++ b/libstdc++-v3/src/filesystem/ops-common.h
@@ -49,6 +49,9 @@
 #ifdef NEED_DO_COPY_FILE
 # include <filesystem>
 # include <ext/stdio_filebuf.h>
+# ifdef _GLIBCXX_USE_COPY_FILE_RANGE
+#  include <unistd.h> // copy_file_range
+# endif
 # ifdef _GLIBCXX_USE_SENDFILE
 #  include <sys/sendfile.h> // sendfile
 #  include <unistd.h> // lseek
@@ -359,6 +362,32 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
   }
 
 #ifdef NEED_DO_COPY_FILE
+#ifdef _GLIBCXX_USE_COPY_FILE_RANGE
+  bool
+  copy_file_copy_file_range(int fd_in, int fd_out, size_t length) noexcept
+  {
+    // a zero-length file is either empty, or not copyable by this syscall
+    // return early to avoid the syscall cost
+    if (length == 0)
+      {
+        errno = EINVAL;
+        return false;
+      }
+    size_t bytes_left = length;
+    off64_t off_in = 0, off_out = 0;
+    ssize_t bytes_copied;
+    do
+      {
+	bytes_copied = ::copy_file_range(fd_in, &off_in, fd_out, &off_out,
+					 bytes_left, 0);
+	bytes_left -= bytes_copied;
+      }
+    while (bytes_left > 0 && bytes_copied > 0);
+    if (bytes_copied < 0)
+      return false;
+    return true;
+  }
+#endif
 #if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
   bool
   copy_file_sendfile(int fd_in, int fd_out, size_t length) noexcept
@@ -529,6 +558,33 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
 
     bool has_copied = false;
 
+#ifdef _GLIBCXX_USE_COPY_FILE_RANGE
+    if (!has_copied)
+      has_copied = copy_file_copy_file_range(in.fd, out.fd, from_st->st_size);
+    if (!has_copied)
+      {
+	// EINVAL: src and dst are the same file (this is not cheaply
+	// detectable from userspace)
+	// EINVAL: copy_file_range is unsupported for this file type by the
+	// underlying filesystem
+	// ENOTSUP: undocumented, can arise with old kernels and NFS
+	// EOPNOTSUPP: filesystem does not implement copy_file_range
+	// ETXTBSY: src or dst is an active swapfile (nonsensical, but allowed
+	// with normal copying)
+	// EXDEV: src and dst are on different filesystems that do not support
+	// cross-fs copy_file_range
+	// ENOENT: undocumented, can arise with CIFS
+	// ENOSYS: unsupported by kernel or blocked by seccomp
+        if (errno != EINVAL && errno != ENOTSUP && errno != EOPNOTSUPP
+	      && errno != ETXTBSY && errno != EXDEV && errno != ENOENT
+	      && errno != ENOSYS)
+          {
+            ec.assign(errno, std::generic_category());
+            return false;
+          }
+      }
+#endif
+
 #if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
     if (!has_copied)
       has_copied = copy_file_sendfile(in.fd, out.fd, from_st->st_size);
  
Jonathan Wakely March 22, 2023, 12:20 p.m. UTC | #7
On Wed, 22 Mar 2023 at 12:18, Jonathan Wakely wrote:

> On Wed, 22 Mar 2023 at 12:14, Jonathan Wakely wrote:
>
>>
>>
>> On Mon, 20 Mar 2023 at 22:30, Jonathan Wakely via Libstdc++ <
>> libstdc++@gcc.gnu.org> wrote:
>>
>>> On 20/03/23 22:27 +0000, Jonathan Wakely wrote:
>>> >On 06/03/23 20:52 +0100, Jannik Glückert wrote:
>>> >>we were previously only using sendfile for files smaller than 2GB, as
>>> >>sendfile needs to be called repeatedly for files bigger than that.
>>> >>
>>> >>some quick numbers, copying a 16GB file, average of 10 repetitions:
>>> >>   old:
>>> >>       real: 13.4s
>>> >>       user: 0.14s
>>> >>       sys : 7.43s
>>> >>   new:
>>> >>       real: 8.90s
>>> >>       user: 0.00s
>>> >>       sys : 3.68s
>>> >>
>>> >>Additionally, this fixes
>>> >>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178
>>> >>
>>> >>libstdc++-v3/ChangeLog:
>>> >>
>>> >>       * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): define
>>> >>       * config.h.in: Regenerate.
>>> >>       * configure: Regenerate.
>>> >>       * src/filesystem/ops-common.h: enable sendfile for files
>>> >>         >2GB in std::filesystem::copy_file, skip zero-length files
>>>
>>> Also, the ChangeLog entry needs to be indented with tabs, name the
>>> changed functions, and should be complete sentences, e.g.
>>>
>>>         * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): Define.
>>>         * config.h.in: Regenerate.
>>>         * configure: Regenerate.
>>>         * src/filesystem/ops-common.h (copy_file_sendfile): Define new
>>>         function for sendfile logic. Loop to support large files. Skip
>>>         zero-length files.
>>>         (do_copy_file): Use it.
>>>
>>>
>> Here's what I plan to commit in a few weeks when GCC 14 Stage 1 opens.
>>
>>
>>
> And similarly for the copy_file_range change.
>

And finally, here's the fix for PR libstdc++/108178, replacing the
zero-size check with checking for EOF in the source file
commit 3d994f1998c8f2efc2c8f5744615e92661bde46f
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Tue Mar 21 12:29:08 2023

    libstdc++: Make std::filesystem::copy_file work for procfs [PR108178]
    
    The size reported by stat is always zero for some special files such as
    those under /proc, which means the current copy_file implementation
    thinks there is nothing to copy. Instead of trusting the stat value, try
    to read a character from a streambuf and check for EOF.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/108178
            * src/filesystem/ops-common.h (do_copy_file): Check for empty
            files by trying to read a character.
            * testsuite/27_io/filesystem/operations/copy_file_108178.cc:
            New test.

diff --git a/libstdc++-v3/src/filesystem/ops-common.h b/libstdc++-v3/src/filesystem/ops-common.h
index 906436b484e..a28cbeb10b5 100644
--- a/libstdc++-v3/src/filesystem/ops-common.h
+++ b/libstdc++-v3/src/filesystem/ops-common.h
@@ -618,11 +618,16 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
     if (sbout.is_open())
       out.fd = -1;
 
-    if (from_st->st_size && !(std::ostream(&sbout) << &sbin))
-      {
-	ec = std::make_error_code(std::errc::io_error);
-	return false;
-      }
+    // ostream::operator<<(streambuf*) fails if it extracts no characters,
+    // so don't try to use it for empty files. But from_st->st_size == 0 for
+    // some special files (e.g. procfs, see PR libstdc++/108178) so just try
+    // to read a character to decide whether there is anything to copy or not.
+    if (sbin.sgetc() != char_traits<char>::eof())
+      if (!(std::ostream(&sbout) << &sbin))
+	{
+	  ec = std::make_error_code(std::errc::io_error);
+	  return false;
+	}
 
     if (!sbout.close() || !sbin.close())
       {
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/operations/copy_file_108178.cc b/libstdc++-v3/testsuite/27_io/filesystem/operations/copy_file_108178.cc
new file mode 100644
index 00000000000..25135834e21
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/filesystem/operations/copy_file_108178.cc
@@ -0,0 +1,33 @@
+// { dg-do run { target c++17 } }
+// { dg-require-filesystem-ts "" }
+
+// C++17 30.10.15.4 Copy [fs.op.copy_file]
+
+#include <filesystem>
+#include <fstream>
+#include <unistd.h> // getpid
+#include <testsuite_fs.h>
+#include <testsuite_hooks.h>
+
+namespace fs = std::filesystem;
+
+void
+test_procfs() // PR libstdc++/108178
+{
+  auto pid = ::getpid();
+  std::string from = "/proc/" + std::to_string(pid) + "/status";
+  if (fs::exists(from))
+  {
+    auto to = __gnu_test::nonexistent_path();
+    fs::copy_file(from, to);
+    std::ifstream f(to);
+    VERIFY(f.is_open());
+    VERIFY(f.peek() != std::char_traits<char>::eof());
+    fs::remove(to);
+  }
+}
+
+int main()
+{
+  test_procfs();
+}
  
Jonathan Wakely June 6, 2023, 11:35 a.m. UTC | #8
On Wed, 22 Mar 2023 at 12:14, Jonathan Wakely <jwakely@redhat.com> wrote:

>
>
> On Mon, 20 Mar 2023 at 22:30, Jonathan Wakely via Libstdc++ <
> libstdc++@gcc.gnu.org> wrote:
>
>> On 20/03/23 22:27 +0000, Jonathan Wakely wrote:
>> >On 06/03/23 20:52 +0100, Jannik Glückert wrote:
>> >>we were previously only using sendfile for files smaller than 2GB, as
>> >>sendfile needs to be called repeatedly for files bigger than that.
>> >>
>> >>some quick numbers, copying a 16GB file, average of 10 repetitions:
>> >>   old:
>> >>       real: 13.4s
>> >>       user: 0.14s
>> >>       sys : 7.43s
>> >>   new:
>> >>       real: 8.90s
>> >>       user: 0.00s
>> >>       sys : 3.68s
>> >>
>> >>Additionally, this fixes
>> >>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178
>> >>
>> >>libstdc++-v3/ChangeLog:
>> >>
>> >>       * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): define
>> >>       * config.h.in: Regenerate.
>> >>       * configure: Regenerate.
>> >>       * src/filesystem/ops-common.h: enable sendfile for files
>> >>         >2GB in std::filesystem::copy_file, skip zero-length files
>>
>> Also, the ChangeLog entry needs to be indented with tabs, name the
>> changed functions, and should be complete sentences, e.g.
>>
>>         * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): Define.
>>         * config.h.in: Regenerate.
>>         * configure: Regenerate.
>>         * src/filesystem/ops-common.h (copy_file_sendfile): Define new
>>         function for sendfile logic. Loop to support large files. Skip
>>         zero-length files.
>>         (do_copy_file): Use it.
>>
>>
> Here's what I plan to commit in a few weeks when GCC 14 Stage 1 opens.
>
>
>
Pushed to trunk now (after testing on btrfs, xfs, and tmpfs using kernel
6.3 and on xfs using kernel 3.10).
  
Jonathan Wakely June 6, 2023, 11:36 a.m. UTC | #9
On Wed, 22 Mar 2023 at 12:18, Jonathan Wakely <jwakely@redhat.com> wrote:

> On Wed, 22 Mar 2023 at 12:14, Jonathan Wakely wrote:
>
>>
>>
>> On Mon, 20 Mar 2023 at 22:30, Jonathan Wakely via Libstdc++ <
>> libstdc++@gcc.gnu.org> wrote:
>>
>>> On 20/03/23 22:27 +0000, Jonathan Wakely wrote:
>>> >On 06/03/23 20:52 +0100, Jannik Glückert wrote:
>>> >>we were previously only using sendfile for files smaller than 2GB, as
>>> >>sendfile needs to be called repeatedly for files bigger than that.
>>> >>
>>> >>some quick numbers, copying a 16GB file, average of 10 repetitions:
>>> >>   old:
>>> >>       real: 13.4s
>>> >>       user: 0.14s
>>> >>       sys : 7.43s
>>> >>   new:
>>> >>       real: 8.90s
>>> >>       user: 0.00s
>>> >>       sys : 3.68s
>>> >>
>>> >>Additionally, this fixes
>>> >>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178
>>> >>
>>> >>libstdc++-v3/ChangeLog:
>>> >>
>>> >>       * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): define
>>> >>       * config.h.in: Regenerate.
>>> >>       * configure: Regenerate.
>>> >>       * src/filesystem/ops-common.h: enable sendfile for files
>>> >>         >2GB in std::filesystem::copy_file, skip zero-length files
>>>
>>> Also, the ChangeLog entry needs to be indented with tabs, name the
>>> changed functions, and should be complete sentences, e.g.
>>>
>>>         * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): Define.
>>>         * config.h.in: Regenerate.
>>>         * configure: Regenerate.
>>>         * src/filesystem/ops-common.h (copy_file_sendfile): Define new
>>>         function for sendfile logic. Loop to support large files. Skip
>>>         zero-length files.
>>>         (do_copy_file): Use it.
>>>
>>>
>> Here's what I plan to commit in a few weeks when GCC 14 Stage 1 opens.
>>
>>
>>
> And similarly for the copy_file_range change.
>
>
This one is also now pushed to trunk (after testing on btrfs, xfs, and
tmpfs using kernel 6.3 and on xfs using kernel 3.10).


>
>
  
Jonathan Wakely June 6, 2023, 11:37 a.m. UTC | #10
On Wed, 22 Mar 2023 at 12:20, Jonathan Wakely <jwakely@redhat.com> wrote:

> On Wed, 22 Mar 2023 at 12:18, Jonathan Wakely wrote:
>
>> On Wed, 22 Mar 2023 at 12:14, Jonathan Wakely wrote:
>>
>>>
>>>
>>> On Mon, 20 Mar 2023 at 22:30, Jonathan Wakely via Libstdc++ <
>>> libstdc++@gcc.gnu.org> wrote:
>>>
>>>> On 20/03/23 22:27 +0000, Jonathan Wakely wrote:
>>>> >On 06/03/23 20:52 +0100, Jannik Glückert wrote:
>>>> >>we were previously only using sendfile for files smaller than 2GB, as
>>>> >>sendfile needs to be called repeatedly for files bigger than that.
>>>> >>
>>>> >>some quick numbers, copying a 16GB file, average of 10 repetitions:
>>>> >>   old:
>>>> >>       real: 13.4s
>>>> >>       user: 0.14s
>>>> >>       sys : 7.43s
>>>> >>   new:
>>>> >>       real: 8.90s
>>>> >>       user: 0.00s
>>>> >>       sys : 3.68s
>>>> >>
>>>> >>Additionally, this fixes
>>>> >>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178
>>>> >>
>>>> >>libstdc++-v3/ChangeLog:
>>>> >>
>>>> >>       * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): define
>>>> >>       * config.h.in: Regenerate.
>>>> >>       * configure: Regenerate.
>>>> >>       * src/filesystem/ops-common.h: enable sendfile for files
>>>> >>         >2GB in std::filesystem::copy_file, skip zero-length files
>>>>
>>>> Also, the ChangeLog entry needs to be indented with tabs, name the
>>>> changed functions, and should be complete sentences, e.g.
>>>>
>>>>         * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): Define.
>>>>         * config.h.in: Regenerate.
>>>>         * configure: Regenerate.
>>>>         * src/filesystem/ops-common.h (copy_file_sendfile): Define new
>>>>         function for sendfile logic. Loop to support large files. Skip
>>>>         zero-length files.
>>>>         (do_copy_file): Use it.
>>>>
>>>>
>>> Here's what I plan to commit in a few weeks when GCC 14 Stage 1 opens.
>>>
>>>
>>>
>> And similarly for the copy_file_range change.
>>
>
> And finally, here's the fix for PR libstdc++/108178, replacing the
> zero-size check with checking for EOF in the source file
>
>
Also now pushed to trunk.
  

Patch

From b55eb8dccaa44f07d8acbe6294326a46c920b04f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jannik=20Gl=C3=BCckert?= <jannik.glueckert@gmail.com>
Date: Mon, 6 Mar 2023 20:52:08 +0100
Subject: [PATCH 1/2] libstdc++: also use sendfile for big files
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

we were previously only using sendfile for files smaller than 2GB, as
sendfile needs to be called repeatedly for files bigger than that.

some quick numbers, copying a 16GB file, average of 10 repetitions:
    old:
        real: 13.4s
        user: 0.14s
        sys : 7.43s
    new:
        real: 8.90s
        user: 0.00s
        sys : 3.68s

Additionally, this fixes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178

libstdc++-v3/ChangeLog:

        * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): define
        * config.h.in: Regenerate.
        * configure: Regenerate.
        * src/filesystem/ops-common.h: enable sendfile for files
          >2GB in std::filesystem::copy_file, skip zero-length files

Signed-off-by: Jannik Glückert <jannik.glueckert@gmail.com>
---
 libstdc++-v3/acinclude.m4                |  51 +++++----
 libstdc++-v3/config.h.in                 |   3 +
 libstdc++-v3/configure                   | 127 ++++++++++++++++-------
 libstdc++-v3/src/filesystem/ops-common.h |  86 ++++++++-------
 4 files changed, 175 insertions(+), 92 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 5136c0571e8..85a09a5a869 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4583,6 +4583,7 @@  dnl  _GLIBCXX_USE_FCHMOD
 dnl  _GLIBCXX_USE_FCHMODAT
 dnl  _GLIBCXX_USE_SENDFILE
 dnl  HAVE_LINK
+dnl  HAVE_LSEEK
 dnl  HAVE_READLINK
 dnl  HAVE_SYMLINK
 dnl
@@ -4718,25 +4719,6 @@  dnl
   if test $glibcxx_cv_fchmodat = yes; then
     AC_DEFINE(_GLIBCXX_USE_FCHMODAT, 1, [Define if fchmodat is available in <sys/stat.h>.])
   fi
-dnl
-  AC_CACHE_CHECK([for sendfile that can copy files],
-    glibcxx_cv_sendfile, [dnl
-    case "${target_os}" in
-      gnu* | linux* | solaris* | uclinux*)
-	GCC_TRY_COMPILE_OR_LINK(
-	  [#include <sys/sendfile.h>],
-	  [sendfile(1, 2, (off_t*)0, sizeof 1);],
-	  [glibcxx_cv_sendfile=yes],
-	  [glibcxx_cv_sendfile=no])
-	;;
-      *)
-	glibcxx_cv_sendfile=no
-	;;
-    esac
-  ])
-  if test $glibcxx_cv_sendfile = yes; then
-    AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in <sys/sendfile.h>.])
-  fi
 dnl
   AC_CACHE_CHECK([for link],
     glibcxx_cv_link, [dnl
@@ -4749,6 +4731,18 @@  dnl
   if test $glibcxx_cv_link = yes; then
     AC_DEFINE(HAVE_LINK, 1, [Define if link is available in <unistd.h>.])
   fi
+dnl
+  AC_CACHE_CHECK([for lseek],
+    glibcxx_cv_lseek, [dnl
+    GCC_TRY_COMPILE_OR_LINK(
+      [#include <unistd.h>],
+      [lseek(1, 0, SEEK_SET);],
+      [glibcxx_cv_lseek=yes],
+      [glibcxx_cv_lseek=no])
+  ])
+  if test $glibcxx_cv_lseek = yes; then
+    AC_DEFINE(HAVE_LSEEK, 1, [Define if lseek is available in <unistd.h>.])
+  fi
 dnl
   AC_CACHE_CHECK([for readlink],
     glibcxx_cv_readlink, [dnl
@@ -4785,6 +4779,25 @@  dnl
   if test $glibcxx_cv_truncate = yes; then
     AC_DEFINE(HAVE_TRUNCATE, 1, [Define if truncate is available in <unistd.h>.])
   fi
+dnl
+  AC_CACHE_CHECK([for sendfile that can copy files],
+    glibcxx_cv_sendfile, [dnl
+    case "${target_os}" in
+      gnu* | linux* | solaris* | uclinux*)
+	GCC_TRY_COMPILE_OR_LINK(
+	  [#include <sys/sendfile.h>],
+	  [sendfile(1, 2, (off_t*)0, sizeof 1);],
+	  [glibcxx_cv_sendfile=yes],
+	  [glibcxx_cv_sendfile=no])
+	;;
+      *)
+	glibcxx_cv_sendfile=no
+	;;
+    esac
+  ])
+  if test $glibcxx_cv_sendfile = yes && test $glibcxx_cv_lseek = yes; then
+    AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in <sys/sendfile.h>.])
+  fi
 dnl
   AC_CACHE_CHECK([for fdopendir],
     glibcxx_cv_fdopendir, [dnl
diff --git a/libstdc++-v3/src/filesystem/ops-common.h b/libstdc++-v3/src/filesystem/ops-common.h
index abbfca43e5c..9e1b1d41dc5 100644
--- a/libstdc++-v3/src/filesystem/ops-common.h
+++ b/libstdc++-v3/src/filesystem/ops-common.h
@@ -51,6 +51,7 @@ 
 # include <ext/stdio_filebuf.h>
 # ifdef _GLIBCXX_USE_SENDFILE
 #  include <sys/sendfile.h> // sendfile
+#  include <unistd.h> // lseek
 # endif
 #endif
 
@@ -358,6 +359,32 @@  _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
   }
 
 #ifdef NEED_DO_COPY_FILE
+#if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  bool
+  copy_file_sendfile(int fd_in, int fd_out, size_t length) noexcept
+  {
+    // a zero-length file is either empty, or not copyable by this syscall
+    // return early to avoid the syscall cost
+    if (length == 0)
+      {
+        errno = EINVAL;
+        return false;
+      }
+    size_t bytes_left = length;
+    off_t offset = 0;
+    ssize_t bytes_copied;
+    do {
+      bytes_copied = ::sendfile(fd_out, fd_in, &offset, bytes_left);
+      bytes_left -= bytes_copied;
+    } while (bytes_left > 0 && bytes_copied > 0);
+    if (bytes_copied < 0)
+      {
+        ::lseek(fd_out, 0, SEEK_SET);
+        return false;
+      }
+    return true;
+  }
+#endif
   bool
   do_copy_file(const char_type* from, const char_type* to,
 	       std::filesystem::copy_options_existing_file options,
@@ -498,28 +525,30 @@  _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
 	return false;
       }
 
-    size_t count = from_st->st_size;
+    bool has_copied = false;
+
 #if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
-    off_t offset = 0;
-    ssize_t n = ::sendfile(out.fd, in.fd, &offset, count);
-    if (n < 0 && errno != ENOSYS && errno != EINVAL)
+    if (!has_copied)
+      has_copied = copy_file_sendfile(in.fd, out.fd, from_st->st_size);
+    if (!has_copied)
       {
-	ec.assign(errno, std::generic_category());
-	return false;
+      if (errno != ENOSYS && errno != EINVAL)
+        {
+          ec.assign(errno, std::generic_category());
+          return false;
+        }
       }
-    if ((size_t)n == count)
+#endif
+
+    if (has_copied)
       {
-	if (!out.close() || !in.close())
-	  {
-	    ec.assign(errno, std::generic_category());
-	    return false;
-	  }
-	ec.clear();
-	return true;
+        if (!out.close() || !in.close())
+          {
+	          ec.assign(errno, std::generic_category());
+	          return false;
+          }
+        return true;
       }
-    else if (n > 0)
-      count -= n;
-#endif // _GLIBCXX_USE_SENDFILE
 
     using std::ios;
     __gnu_cxx::stdio_filebuf<char> sbin(in.fd, ios::in|ios::binary);
@@ -530,29 +559,12 @@  _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
     if (sbout.is_open())
       out.fd = -1;
 
-#ifdef _GLIBCXX_USE_SENDFILE
-    if (n != 0)
+    if (!(std::ostream(&sbout) << &sbin))
       {
-	if (n < 0)
-	  n = 0;
-
-	const auto p1 = sbin.pubseekoff(n, ios::beg, ios::in);
-	const auto p2 = sbout.pubseekoff(n, ios::beg, ios::out);
-
-	const std::streampos errpos(std::streamoff(-1));
-	if (p1 == errpos || p2 == errpos)
-	  {
-	    ec = std::make_error_code(std::errc::io_error);
-	    return false;
-	  }
+  ec = std::make_error_code(std::errc::io_error);
+  return false;
       }
-#endif
 
-    if (count && !(std::ostream(&sbout) << &sbin))
-      {
-	ec = std::make_error_code(std::errc::io_error);
-	return false;
-      }
     if (!sbout.close() || !sbin.close())
       {
 	ec.assign(errno, std::generic_category());
-- 
2.39.2