[v4,2/3] posix: Add pidfd_fork

Message ID 20230517173516.1988283-3-adhemerval.zanella@linaro.org
State Superseded
Headers
Series Add pidfd_spawn, pidfd_spawnp, pidfd_fork, and pidfd_getpid |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Adhemerval Zanella May 17, 2023, 5:35 p.m. UTC
  Returning a pidfd allows a process to keep a race-free handle to a
child process, otherwise the caller will need to either use pidfd_open
(which still might be subject to TOCTOU) or keep using the old racy
interface.

The implementation makes sure that kernel must support the complete
pidfd interface, meaning that waitid (P_PIDFD) should be supported.
It ensure that non racy workaround is required (such as reading procfs
fdinfo pid to use along with old wait interfaces).  If kernel does
not have the required support the interface returns -1 and set errno
to ENOSYS.

The interface is:

  int pidfd_fork (unsigned int flags)

The kernel already sets O_CLOEXEC as default and it follow fork/_Fork
convention on returning a positive or negative value to the parent
(with negative indicating an error) and zero to the child.

Similar to fork, pidfd_fork also runs the pthread_atfork handlers
It can be change by using PIDFDFORK_ASYNCSAFE flag, which make
pidfd_fork acts a _Fork.

Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PID or waitid
support), Linux 5.15 (only clone support), and Linux 5.19 (full
support including clone3).
---
 NEWS                                          |   5 +
 manual/process.texi                           |  47 ++++--
 posix/Makefile                                |   3 +-
 posix/fork-internal.c                         | 125 +++++++++++++++
 posix/fork-internal.h                         |  29 ++++
 posix/fork.c                                  |  98 +-----------
 sysdeps/nptl/_Fork.c                          |   2 +-
 sysdeps/unix/sysv/linux/Makefile              |   3 +
 sysdeps/unix/sysv/linux/Versions              |   1 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   1 +
 sysdeps/unix/sysv/linux/arch-fork.h           |  16 +-
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   1 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   1 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   1 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   1 +
 .../sysv/linux/microblaze/be/libc.abilist     |   1 +
 .../sysv/linux/microblaze/le/libc.abilist     |   1 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   1 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   1 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   1 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/pidfd_fork.c          |  76 +++++++++
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   1 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   1 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   1 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   1 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   1 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   1 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/sys/pidfd.h           |  13 ++
 sysdeps/unix/sysv/linux/tst-pidfd_fork.c      | 150 ++++++++++++++++++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   1 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   1 +
 47 files changed, 488 insertions(+), 114 deletions(-)
 create mode 100644 posix/fork-internal.c
 create mode 100644 posix/fork-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_fork.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_fork.c
  

Comments

Florian Weimer May 22, 2023, 10:12 a.m. UTC | #1
* Adhemerval Zanella:

> +@deftypefun int pidfd_fork (unsigned int @var{flags})
> +@standards{GNU, sys/pidfd.h}
> +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
> +The @code{fork} function is similar to @code{fork} but return a file
> +descriptor instead of process ID.
> +
> +If the operation is sucessful, there are both parent and child processes
> +and both see @code{pidfd_fork} return, but with different values: it return
> +a value of @code{0} in the child process and returns a file descriptor to
> +child's process int the parent process.  Although the value of @code{0} clashes
> +with @code{STDIN_FILENO}, it also assumed that stardand input is always
> +available so a conformant application will never see a the @code{0} stream
> +being returned to parent as a valid file descriptor.

I think this is a bit ugly.  Maybe pidfd_fork should return the PID/zero
as fork, and store the FD through an int * argument?

The documentaiton should discuss if SIGCHLD is sent for such processes,
and if the PID can be used in waitpid etc.  Ideally we'd have test cases
for this behavior, too. 8-)

I think it would be valuable to support creating subprocesses without
ever sending SIGCHLD on their termination.  Not sure if this should be
the default behavior, or something enabled with a flag.

> diff --git a/posix/fork-internal.c b/posix/fork-internal.c
> new file mode 100644
> index 0000000000..be0d3219af
> --- /dev/null
> +++ b/posix/fork-internal.c

> +void
> +__fork_pos (int state, uint64_t lastrun, bool multiple_threads,
> +	    struct nss_database_data *nss_database_data)

Maybe __fork_post, with a t?

> diff --git a/posix/fork-internal.h b/posix/fork-internal.h
> new file mode 100644
> index 0000000000..3814d851c4
> --- /dev/null
> +++ b/posix/fork-internal.h

> +uint64_t __fork_pre (bool, struct nss_database_data *) attribute_hidden;
> +void __fork_pos (int, uint64_t, bool, struct nss_database_data *)
> +  attribute_hidden;

I suggest to put the arguments into a struct, with struct
nss_database_data as a non-pointer member.  It makes it easier to
document the meaning of the arguments.  And we may have to add more
things in the future.

> diff --git a/sysdeps/unix/sysv/linux/pidfd_fork.c b/sysdeps/unix/sysv/linux/pidfd_fork.c
> new file mode 100644
> index 0000000000..a3ef48ce77
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/pidfd_fork.c

> +static int
> +forkfd (void)
> +{
> +  /* The kernel should not change the pidfd for the child, and the returning
> +     value on the parent is always larger than 0 (since a new file descriptor
> +     is always opened with CLONE_PIDFD is the kernel supports it.  */

Missing ).

Thanks,
Florian
  
Christian Brauner May 22, 2023, 12:44 p.m. UTC | #2
On Mon, May 22, 2023 at 12:12:16PM +0200, Florian Weimer via Libc-alpha wrote:
> * Adhemerval Zanella:
> 
> > +@deftypefun int pidfd_fork (unsigned int @var{flags})
> > +@standards{GNU, sys/pidfd.h}
> > +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
> > +The @code{fork} function is similar to @code{fork} but return a file
> > +descriptor instead of process ID.
> > +
> > +If the operation is sucessful, there are both parent and child processes
> > +and both see @code{pidfd_fork} return, but with different values: it return
> > +a value of @code{0} in the child process and returns a file descriptor to
> > +child's process int the parent process.  Although the value of @code{0} clashes
> > +with @code{STDIN_FILENO}, it also assumed that stardand input is always
> > +available so a conformant application will never see a the @code{0} stream
> > +being returned to parent as a valid file descriptor.
> 
> I think this is a bit ugly.  Maybe pidfd_fork should return the PID/zero

I agree. We went through some pain to ensure that 0 can be used as a
valid fd value. Imho, it'd be better to use a return argument. It would
also align with how CLONE_PIDFD with legacy clone() works.

> as fork, and store the FD through an int * argument?
> 
> The documentaiton should discuss if SIGCHLD is sent for such processes,

Depends on whether you set exit_signal. I would expect pidfd_fork() to
very closely mirror the semantics of fork(). So I would expect SIGCHLD
to be delivered. But that's not how I understand this patchset just from
glancing at it.

> and if the PID can be used in waitpid etc.  Ideally we'd have test cases

The pid can be used with waitpid(). I don't understand why it wouldn't.

> for this behavior, too. 8-)

> 
> I think it would be valuable to support creating subprocesses without
> ever sending SIGCHLD on their termination.  Not sure if this should be
> the default behavior, or something enabled with a flag.

Best behind a flag. If you name it pidfd_fork() then it might be best if
you default to delivering SIGCHLD and provide an opt-out through a
flag.
  
Adhemerval Zanella May 22, 2023, 2:48 p.m. UTC | #3
On 22/05/23 07:12, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> +@deftypefun int pidfd_fork (unsigned int @var{flags})
>> +@standards{GNU, sys/pidfd.h}
>> +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
>> +The @code{fork} function is similar to @code{fork} but return a file
>> +descriptor instead of process ID.
>> +
>> +If the operation is sucessful, there are both parent and child processes
>> +and both see @code{pidfd_fork} return, but with different values: it return
>> +a value of @code{0} in the child process and returns a file descriptor to
>> +child's process int the parent process.  Although the value of @code{0} clashes
>> +with @code{STDIN_FILENO}, it also assumed that stardand input is always
>> +available so a conformant application will never see a the @code{0} stream
>> +being returned to parent as a valid file descriptor.
> 
> I think this is a bit ugly.  Maybe pidfd_fork should return the PID/zero
> as fork, and store the FD through an int * argument?

Alright, I was not really sure about whether to expose the PID but I tend
to agree that for error code handling is not straightforward. I will change
to return the pid_t as fork and the pidfd as first argument.

> 
> The documentaiton should discuss if SIGCHLD is sent for such processes,
> and if the PID can be used in waitpid etc.  Ideally we'd have test cases
> for this behavior, too. 8-)

The default pidfd_fork should be similar semantically to fork, the first
version I make it similar to _Fork but I think this was a mistake.  This
included SIGCHLD handling as well (the flag is passed on clone syscall
as fork).  I will update the documentation and add extra tests.

> 
> I think it would be valuable to support creating subprocesses without
> ever sending SIGCHLD on their termination.  Not sure if this should be
> the default behavior, or something enabled with a flag.

I tend to agree with Christian and this should be behind a flags.  I
will add a flag for this.

> 
>> diff --git a/posix/fork-internal.c b/posix/fork-internal.c
>> new file mode 100644
>> index 0000000000..be0d3219af
>> --- /dev/null
>> +++ b/posix/fork-internal.c
> 
>> +void
>> +__fork_pos (int state, uint64_t lastrun, bool multiple_threads,
>> +	    struct nss_database_data *nss_database_data)
> 
> Maybe __fork_post, with a t?

Ack.

> 
>> diff --git a/posix/fork-internal.h b/posix/fork-internal.h
>> new file mode 100644
>> index 0000000000..3814d851c4
>> --- /dev/null
>> +++ b/posix/fork-internal.h
> 
>> +uint64_t __fork_pre (bool, struct nss_database_data *) attribute_hidden;
>> +void __fork_pos (int, uint64_t, bool, struct nss_database_data *)
>> +  attribute_hidden;
> 
> I suggest to put the arguments into a struct, with struct
> nss_database_data as a non-pointer member.  It makes it easier to
> document the meaning of the arguments.  And we may have to add more
> things in the future.

Ack.

> 
>> diff --git a/sysdeps/unix/sysv/linux/pidfd_fork.c b/sysdeps/unix/sysv/linux/pidfd_fork.c
>> new file mode 100644
>> index 0000000000..a3ef48ce77
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/pidfd_fork.c
> 
>> +static int
>> +forkfd (void)
>> +{
>> +  /* The kernel should not change the pidfd for the child, and the returning
>> +     value on the parent is always larger than 0 (since a new file descriptor
>> +     is always opened with CLONE_PIDFD is the kernel supports it.  */
> 
> Missing ).

Ack.
  

Patch

diff --git a/NEWS b/NEWS
index 2c49eaf373..af647c7c46 100644
--- a/NEWS
+++ b/NEWS
@@ -42,6 +42,11 @@  Major new features:
   The pidfd functionality avoid the issue of PID reuse with traditional
   posix_spawn interface.
 
+* On Linux, the pidfd_fork has been added.  It has a similar semantic
+  as fork or _Fork, where it clone the calling process.  However instead of
+  return a process ID, it returns a file descriptor that can be used along
+  other pidfd functions.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * In the Linux kernel for the hppa/parisc architecture some of the
diff --git a/manual/process.texi b/manual/process.texi
index 927aebbe3d..93e0ab5fb7 100644
--- a/manual/process.texi
+++ b/manual/process.texi
@@ -137,12 +137,12 @@  creating a process and making it run another program.
 @cindex subprocess
 A new processes is created when one of the functions
 @code{posix_spawn}, @code{fork}, @code{_Fork}, @code{vfork}, or
-@code{pidfd_spawn} is called.  (The @code{system} and @code{popen} also
-create new processes internally.)  Due to the name of the @code{fork}
-function, the act of creating a new process is sometimes called
-@dfn{forking} a process.  Each new process (the @dfn{child process} or
-@dfn{subprocess}) is allocated a process ID, distinct from the process
-ID of the parent process.  @xref{Process Identification}.
+@code{pidfd_spawn}, or @code{pidfd_fork} is called.  (The @code{system}
+and @code{popen} also create new processes internally.)  Due to the name
+of the @code{fork} function, the act of creating a new process is
+sometimes called @dfn{forking} a process.  Each new process (the
+@dfn{child process} or @dfn{subprocess}) is allocated a process ID,
+distinct from the process ID of the parent process.  @xref{Process Identification}.
 
 After forking a child process, both the parent and child processes
 continue to execute normally.  If you want your program to wait for a
@@ -153,10 +153,10 @@  limited information about why the child terminated---for example, its
 exit status code.
 
 A newly forked child process continues to execute the same program as
-its parent process, at the point where the @code{fork} or @code{_Fork}
-call returns.  You can use the return value from @code{fork} or
-@code{_Fork} to tell whether the program is running in the parent process
-or the child.
+its parent process, at the point where the @code{fork}, @code{_Fork},
+or @code{pidfd_fork} call returns.  You can use the return value from
+@code{fork}, @code{_Fork}, or @code{pidfd_fork} to tell whether the
+program is running in the parent process or the child.
 
 @cindex process image
 Having several processes run the same program is only occasionally
@@ -362,6 +362,33 @@  the proper precautions for using @code{vfork}, your program will still
 work even if the system uses @code{fork} instead.
 @end deftypefun
 
+@deftypefun int pidfd_fork (unsigned int @var{flags})
+@standards{GNU, sys/pidfd.h}
+@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
+The @code{fork} function is similar to @code{fork} but return a file
+descriptor instead of process ID.
+
+If the operation is sucessful, there are both parent and child processes
+and both see @code{pidfd_fork} return, but with different values: it return
+a value of @code{0} in the child process and returns a file descriptor to
+child's process int the parent process.  Although the value of @code{0} clashes
+with @code{STDIN_FILENO}, it also assumed that stardand input is always
+available so a conformant application will never see a the @code{0} stream
+being returned to parent as a valid file descriptor.
+
+The @var{flags} argument should be either zero, or the bitwise OR of some of the
+following flags:
+
+@table @code
+@item PIDFDFORK_ASYNCSAFE
+Acts as @code{_Fork}, where it does not invoke any callbacks registered with
+@code{pthread_atfork}, nor does it reset internal state or locks (such as the
+@code{malloc} locks).
+@end table
+@end deftypefun
+
+This function is specific to Linux.
+
 @node Executing a File
 @section Executing a File
 @cindex executing a file
diff --git a/posix/Makefile b/posix/Makefile
index 4016a7758a..c4837589fd 100644
--- a/posix/Makefile
+++ b/posix/Makefile
@@ -84,6 +84,7 @@  routines := \
   fexecve \
   fnmatch \
   fork \
+  fork-internal \
   fpathconf \
   gai_strerror \
   get_child_max \
@@ -579,7 +580,7 @@  CFLAGS-execl.os = -fomit-frame-pointer
 CFLAGS-execvp.os = -fomit-frame-pointer
 CFLAGS-execlp.os = -fomit-frame-pointer
 CFLAGS-nanosleep.c += -fexceptions -fasynchronous-unwind-tables
-CFLAGS-fork.c = $(libio-mtsafe) $(config-cflags-wno-ignored-attributes)
+CFLAGS-fork-internal.c = $(libio-mtsafe) $(config-cflags-wno-ignored-attributes)
 
 tstgetopt-ARGS = -a -b -cfoobar --required foobar --optional=bazbug \
 		--none random --col --color --colour
diff --git a/posix/fork-internal.c b/posix/fork-internal.c
new file mode 100644
index 0000000000..be0d3219af
--- /dev/null
+++ b/posix/fork-internal.c
@@ -0,0 +1,125 @@ 
+/* Internal fork definitions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <fork.h>
+#include <fork-internal.h>
+#include <ldsodefs.h>
+#include <libio/libioP.h>
+#include <malloc/malloc-internal.h>
+#include <register-atfork.h>
+#include <stdio-lock.h>
+#include <unwind-link.h>
+
+static void
+fresetlockfiles (void)
+{
+  _IO_ITER i;
+
+  for (i = _IO_iter_begin(); i != _IO_iter_end(); i = _IO_iter_next(i))
+    if ((_IO_iter_file (i)->_flags & _IO_USER_LOCK) == 0)
+      _IO_lock_init (*((_IO_lock_t *) _IO_iter_file(i)->_lock));
+}
+
+uint64_t
+__fork_pre (bool multiple_threads, struct nss_database_data *nss_database_data)
+{
+  uint64_t lastrun = __run_prefork_handlers (multiple_threads);
+
+  /* If we are not running multiple threads, we do not have to
+     preserve lock state.  If fork runs from a signal handler, only
+     async-signal-safe functions can be used in the child.  These data
+     structures are only used by unsafe functions, so their state does
+     not matter if fork was called from a signal handler.  */
+  if (multiple_threads)
+    {
+      call_function_static_weak (__nss_database_fork_prepare_parent,
+				 nss_database_data);
+
+      _IO_list_lock ();
+
+      /* Acquire malloc locks.  This needs to come last because fork
+	 handlers may use malloc, and the libio list lock has an
+	 indirect malloc dependency as well (via the getdelim
+	 function).  */
+      call_function_static_weak (__malloc_fork_lock_parent);
+    }
+
+  return lastrun;
+}
+
+void
+__fork_pos (int state, uint64_t lastrun, bool multiple_threads,
+	    struct nss_database_data *nss_database_data)
+{
+  if (state == 0)
+    {
+      fork_system_setup ();
+
+      /* Reset the lock state in the multi-threaded case.  */
+      if (multiple_threads)
+	{
+	  __libc_unwind_link_after_fork ();
+
+	  fork_system_setup_after_fork ();
+
+	  /* Release malloc locks.  */
+	  call_function_static_weak (__malloc_fork_unlock_child);
+
+	  /* Reset the file list.  These are recursive mutexes.  */
+	  fresetlockfiles ();
+
+	  /* Reset locks in the I/O code.  */
+	  _IO_list_resetlock ();
+
+	  call_function_static_weak (__nss_database_fork_subprocess,
+				     nss_database_data);
+	}
+
+      /* Reset the lock the dynamic loader uses to protect its data.  */
+      __rtld_lock_initialize (GL(dl_load_lock));
+
+      /* Reset the lock protecting dynamic TLS related data.  */
+      __rtld_lock_initialize (GL(dl_load_tls_lock));
+
+      reclaim_stacks ();
+
+      /* Run the handlers registered for the child.  */
+      __run_postfork_handlers (atfork_run_child, multiple_threads, lastrun);
+    }
+  else
+    {
+      /* If _Fork failed, preserve its errno value.  */
+      int save_errno = errno;
+
+      /* Release acquired locks in the multi-threaded case.  */
+      if (multiple_threads)
+	{
+	  /* Release malloc locks, parent process variant.  */
+	  call_function_static_weak (__malloc_fork_unlock_parent);
+
+	  /* We execute this even if the 'fork' call failed.  */
+	  _IO_list_unlock ();
+	}
+
+      /* Run the handlers registered for the parent.  */
+      __run_postfork_handlers (atfork_run_parent, multiple_threads, lastrun);
+
+      if (state < 0)
+	__set_errno (save_errno);
+    }
+}
diff --git a/posix/fork-internal.h b/posix/fork-internal.h
new file mode 100644
index 0000000000..3814d851c4
--- /dev/null
+++ b/posix/fork-internal.h
@@ -0,0 +1,29 @@ 
+/* Internal fork definitions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _FORK_INTERNAL_H
+#define _FORK_INTERNAL_H
+
+#include <stdint.h>
+#include <nss/nss_database.h>
+
+uint64_t __fork_pre (bool, struct nss_database_data *) attribute_hidden;
+void __fork_pos (int, uint64_t, bool, struct nss_database_data *)
+  attribute_hidden;
+
+#endif
diff --git a/posix/fork.c b/posix/fork.c
index b4aaa9fa6d..4423cc5792 100644
--- a/posix/fork.c
+++ b/posix/fork.c
@@ -16,25 +16,10 @@ 
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <fork.h>
-#include <libio/libioP.h>
-#include <ldsodefs.h>
-#include <malloc/malloc-internal.h>
-#include <nss/nss_database.h>
-#include <register-atfork.h>
-#include <stdio-lock.h>
+#include <fork-internal.h>
 #include <sys/single_threaded.h>
 #include <unwind-link.h>
-
-static void
-fresetlockfiles (void)
-{
-  _IO_ITER i;
-
-  for (i = _IO_iter_begin(); i != _IO_iter_end(); i = _IO_iter_next(i))
-    if ((_IO_iter_file (i)->_flags & _IO_USER_LOCK) == 0)
-      _IO_lock_init (*((_IO_lock_t *) _IO_iter_file(i)->_lock));
-}
+#include <unistd.h>
 
 pid_t
 __libc_fork (void)
@@ -46,89 +31,14 @@  __libc_fork (void)
      best effort to make is async-signal-safe at least for single-thread
      case.  */
   bool multiple_threads = !SINGLE_THREAD_P;
-  uint64_t lastrun;
-
-  lastrun = __run_prefork_handlers (multiple_threads);
 
   struct nss_database_data nss_database_data;
 
-  /* If we are not running multiple threads, we do not have to
-     preserve lock state.  If fork runs from a signal handler, only
-     async-signal-safe functions can be used in the child.  These data
-     structures are only used by unsafe functions, so their state does
-     not matter if fork was called from a signal handler.  */
-  if (multiple_threads)
-    {
-      call_function_static_weak (__nss_database_fork_prepare_parent,
-				 &nss_database_data);
-
-      _IO_list_lock ();
-
-      /* Acquire malloc locks.  This needs to come last because fork
-	 handlers may use malloc, and the libio list lock has an
-	 indirect malloc dependency as well (via the getdelim
-	 function).  */
-      call_function_static_weak (__malloc_fork_lock_parent);
-    }
+  uint64_t lastrun = __fork_pre (multiple_threads, &nss_database_data);
 
   pid_t pid = _Fork ();
 
-  if (pid == 0)
-    {
-      fork_system_setup ();
-
-      /* Reset the lock state in the multi-threaded case.  */
-      if (multiple_threads)
-	{
-	  __libc_unwind_link_after_fork ();
-
-	  fork_system_setup_after_fork ();
-
-	  /* Release malloc locks.  */
-	  call_function_static_weak (__malloc_fork_unlock_child);
-
-	  /* Reset the file list.  These are recursive mutexes.  */
-	  fresetlockfiles ();
-
-	  /* Reset locks in the I/O code.  */
-	  _IO_list_resetlock ();
-
-	  call_function_static_weak (__nss_database_fork_subprocess,
-				     &nss_database_data);
-	}
-
-      /* Reset the lock the dynamic loader uses to protect its data.  */
-      __rtld_lock_initialize (GL(dl_load_lock));
-
-      /* Reset the lock protecting dynamic TLS related data.  */
-      __rtld_lock_initialize (GL(dl_load_tls_lock));
-
-      reclaim_stacks ();
-
-      /* Run the handlers registered for the child.  */
-      __run_postfork_handlers (atfork_run_child, multiple_threads, lastrun);
-    }
-  else
-    {
-      /* If _Fork failed, preserve its errno value.  */
-      int save_errno = errno;
-
-      /* Release acquired locks in the multi-threaded case.  */
-      if (multiple_threads)
-	{
-	  /* Release malloc locks, parent process variant.  */
-	  call_function_static_weak (__malloc_fork_unlock_parent);
-
-	  /* We execute this even if the 'fork' call failed.  */
-	  _IO_list_unlock ();
-	}
-
-      /* Run the handlers registered for the parent.  */
-      __run_postfork_handlers (atfork_run_parent, multiple_threads, lastrun);
-
-      if (pid < 0)
-	__set_errno (save_errno);
-    }
+  __fork_pos (pid, lastrun, multiple_threads, &nss_database_data);
 
   return pid;
 }
diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
index f8322ae557..aa99e05b5b 100644
--- a/sysdeps/nptl/_Fork.c
+++ b/sysdeps/nptl/_Fork.c
@@ -22,7 +22,7 @@ 
 pid_t
 _Fork (void)
 {
-  pid_t pid = arch_fork (&THREAD_SELF->tid);
+  pid_t pid = arch_fork (0, NULL, &THREAD_SELF->tid);
   if (pid == 0)
     {
       struct pthread *self = THREAD_SELF;
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 6809b1eb1c..675c226a99 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -489,14 +489,17 @@  sysdep_headers += \
 sysdep_routines += \
   getcpu \
   oldglob \
+  pidfd_fork \
   pidfd_spawn \
   pidfd_spawnp \
   sched_getcpu \
+  sched_getcpu \
   # sysdep_routines
 
 tests += \
   tst-affinity \
   tst-affinity-pid \
+  tst-pidfd_fork \
   tst-posix_spawn-setsid-pidfd \
   tst-spawn-chdir-pidfd \
   tst-spawn-pidfd \
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index 28825473ae..7e4109e9c5 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -322,6 +322,7 @@  libc {
 %endif
   }
   GLIBC_2.38 {
+    pidfd_fork;
     pidfd_spawn;
     pidfd_spawnp;
   }
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index bac74ae474..dbf7c44319 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2665,5 +2665,6 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index dadd302f8d..a9d6ffce14 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2774,6 +2774,7 @@  GLIBC_2.38 __nldbl___isoc23_vsscanf F
 GLIBC_2.38 __nldbl___isoc23_vswscanf F
 GLIBC_2.38 __nldbl___isoc23_vwscanf F
 GLIBC_2.38 __nldbl___isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
index 0f4f07b6bb..c08d548556 100644
--- a/sysdeps/unix/sysv/linux/arc/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
@@ -2426,5 +2426,6 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
diff --git a/sysdeps/unix/sysv/linux/arch-fork.h b/sysdeps/unix/sysv/linux/arch-fork.h
index 84e5ef5a7f..ce08fdb99e 100644
--- a/sysdeps/unix/sysv/linux/arch-fork.h
+++ b/sysdeps/unix/sysv/linux/arch-fork.h
@@ -32,24 +32,24 @@ 
    override it with one of the supported calling convention (check generic
    kernel-features.h for the clone abi variants).  */
 static inline pid_t
-arch_fork (void *ctid)
+arch_fork (int flags, void *ptid, void *ctid)
 {
-  const int flags = CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD;
   long int ret;
+  flags |= CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD;
 #ifdef __ASSUME_CLONE_BACKWARDS
 # ifdef INLINE_CLONE_SYSCALL
-  ret = INLINE_CLONE_SYSCALL (flags, 0, NULL, 0, ctid);
+  ret = INLINE_CLONE_SYSCALL (flags, 0, ptid, 0, ctid);
 # else
-  ret = INLINE_SYSCALL_CALL (clone, flags, 0, NULL, 0, ctid);
+  ret = INLINE_SYSCALL_CALL (clone, flags, 0, ptid, 0, ctid);
 # endif
 #elif defined(__ASSUME_CLONE_BACKWARDS2)
-  ret = INLINE_SYSCALL_CALL (clone, 0, flags, NULL, ctid, 0);
+  ret = INLINE_SYSCALL_CALL (clone, 0, flags, ptid, ctid, 0);
 #elif defined(__ASSUME_CLONE_BACKWARDS3)
-  ret = INLINE_SYSCALL_CALL (clone, flags, 0, 0, NULL, ctid, 0);
+  ret = INLINE_SYSCALL_CALL (clone, flags, 0, 0, ptid, ctid, 0);
 #elif defined(__ASSUME_CLONE2)
-  ret = INLINE_SYSCALL_CALL (clone2, flags, 0, 0, NULL, ctid, 0);
+  ret = INLINE_SYSCALL_CALL (clone2, flags, 0, 0, ptid, ctid, 0);
 #elif defined(__ASSUME_CLONE_DEFAULT)
-  ret = INLINE_SYSCALL_CALL (clone, flags, 0, NULL, ctid, 0);
+  ret = INLINE_SYSCALL_CALL (clone, flags, 0, ptid, ctid, 0);
 #else
 # error "Undefined clone variant"
 #endif
diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
index 38e5dcb81a..f348336ee7 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
@@ -546,6 +546,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 _Exit F
diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
index 423d20cbb5..5aed593521 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
@@ -543,6 +543,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 _Exit F
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index 488418d14e..8a7022aaa8 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2702,5 +2702,6 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 75d495e1c7..e2ac6949df 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2651,6 +2651,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index e53877c52a..5e8d49fc1f 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2835,6 +2835,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index 5ed2b1ebf4..3cd3c80334 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2600,6 +2600,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
index d050fbf6d0..0965db2ff0 100644
--- a/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/loongarch/lp64/libc.abilist
@@ -2186,5 +2186,6 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index 6465af266c..1ec24886a3 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -547,6 +547,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 _Exit F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 00013cc564..239b1d70be 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2778,6 +2778,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
index 34085d81ab..dd41d63394 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
@@ -2751,5 +2751,6 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
index b26b116f1f..55c3dcd471 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
@@ -2748,5 +2748,6 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index c86418d156..295e4d8608 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2743,6 +2743,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index 3c6f8f4d60..19fb1aebe7 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2741,6 +2741,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 8dfa065638..2dd6ef5416 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2749,6 +2749,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 0ca815f2cd..6445d27220 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2651,6 +2651,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index a1e0f772c6..c1cc5134c9 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2790,5 +2790,6 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
index 91cd6075b9..293643346a 100644
--- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
+++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
@@ -2172,5 +2172,6 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
diff --git a/sysdeps/unix/sysv/linux/pidfd_fork.c b/sysdeps/unix/sysv/linux/pidfd_fork.c
new file mode 100644
index 0000000000..a3ef48ce77
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pidfd_fork.c
@@ -0,0 +1,76 @@ 
+/* pidfd_fork - Duplicated calling process and return a process file
+   descriptor.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <arch-fork.h>
+#include <clone_internal.h>
+#include <fork-internal.h>
+#include <sys/pidfd.h>
+
+static int
+forkfd (void)
+{
+  /* The kernel should not change the pidfd for the child, and the returning
+     value on the parent is always larger than 0 (since a new file descriptor
+     is always opened with CLONE_PIDFD is the kernel supports it.  */
+  int pidfd = 0;
+  int ret = arch_fork (CLONE_PIDFD, &pidfd, &THREAD_SELF->tid);
+  if (ret < 0)
+    return -1;
+  else if (ret == 0)
+    {
+      struct pthread *self = THREAD_SELF;
+
+      /* Initialize the robust mutex, check _Fork implementation for a full
+	 description why this is required.  */
+#if __PTHREAD_MUTEX_HAVE_PREV
+      self->robust_prev = &self->robust_head;
+#endif
+      self->robust_head.list = &self->robust_head;
+      INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
+			     sizeof (struct robust_list_head));
+    }
+  return pidfd;
+}
+
+int
+pidfd_fork (unsigned int flags)
+{
+  if (!__clone_pidfd_supported ())
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOSYS);
+
+  if (flags & ~(PIDFDFORK_ASYNCSAFE))
+    return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
+
+  int pidfd;
+  if (!(flags & PIDFDFORK_ASYNCSAFE))
+    {
+      bool multiple_threads = !SINGLE_THREAD_P;
+      struct nss_database_data nss_database_data;
+
+      uint64_t lastrun = __fork_pre (multiple_threads, &nss_database_data);
+      pidfd = forkfd ();
+      /* It follow the usual fork semantic, where a positive or negative
+	 value is returned to parent, and 0 for the child.  */
+      __fork_pos (pidfd, lastrun, multiple_threads, &nss_database_data);
+    }
+  else
+    pidfd = forkfd ();
+
+  return pidfd;
+}
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index 21244ece27..03a35e860d 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2817,6 +2817,7 @@  GLIBC_2.38 __nldbl___isoc23_vsscanf F
 GLIBC_2.38 __nldbl___isoc23_vswscanf F
 GLIBC_2.38 __nldbl___isoc23_vwscanf F
 GLIBC_2.38 __nldbl___isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index 238d00998b..a91af4245b 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2850,6 +2850,7 @@  GLIBC_2.38 __nldbl___isoc23_vsscanf F
 GLIBC_2.38 __nldbl___isoc23_vswscanf F
 GLIBC_2.38 __nldbl___isoc23_vwscanf F
 GLIBC_2.38 __nldbl___isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index 36fb9b57ac..c15137b9ad 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2571,6 +2571,7 @@  GLIBC_2.38 __nldbl___isoc23_vsscanf F
 GLIBC_2.38 __nldbl___isoc23_vswscanf F
 GLIBC_2.38 __nldbl___isoc23_vwscanf F
 GLIBC_2.38 __nldbl___isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index 0c25995661..12c0f4cfba 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2885,5 +2885,6 @@  GLIBC_2.38 __nldbl___isoc23_vsscanf F
 GLIBC_2.38 __nldbl___isoc23_vswscanf F
 GLIBC_2.38 __nldbl___isoc23_vwscanf F
 GLIBC_2.38 __nldbl___isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index cda030f623..9537dac42d 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2428,5 +2428,6 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index bd20c202fb..ea72e6b648 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2628,5 +2628,6 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index a1e9247f72..d6c895b487 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2815,6 +2815,7 @@  GLIBC_2.38 __nldbl___isoc23_vsscanf F
 GLIBC_2.38 __nldbl___isoc23_vswscanf F
 GLIBC_2.38 __nldbl___isoc23_vwscanf F
 GLIBC_2.38 __nldbl___isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index b80a131bde..b375f5a7ab 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2608,6 +2608,7 @@  GLIBC_2.38 __nldbl___isoc23_vsscanf F
 GLIBC_2.38 __nldbl___isoc23_vswscanf F
 GLIBC_2.38 __nldbl___isoc23_vwscanf F
 GLIBC_2.38 __nldbl___isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
index 496a0185b6..7b2266ecb3 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
@@ -2658,6 +2658,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
index 3e0d858d08..2f9f732044 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
@@ -2655,6 +2655,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 84dcda8db2..144d54ebf2 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2810,6 +2810,7 @@  GLIBC_2.38 __nldbl___isoc23_vsscanf F
 GLIBC_2.38 __nldbl___isoc23_vswscanf F
 GLIBC_2.38 __nldbl___isoc23_vwscanf F
 GLIBC_2.38 __nldbl___isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 _IO_fprintf F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index 1688a0a7e9..bc8bd8ff54 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2623,6 +2623,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/sys/pidfd.h b/sysdeps/unix/sysv/linux/sys/pidfd.h
index 342e593288..1942773957 100644
--- a/sysdeps/unix/sysv/linux/sys/pidfd.h
+++ b/sysdeps/unix/sysv/linux/sys/pidfd.h
@@ -46,4 +46,17 @@  extern int pidfd_getfd (int __pidfd, int __targetfd,
 extern int pidfd_send_signal (int __pidfd, int __sig, siginfo_t *__info,
 			      unsigned int __flags) __THROW;
 
+
+/* Do not issue the pthread_atfork on pidfd_fork.  */
+#define PIDFDFORK_ASYNCSAFE (1U << 1)
+
+/* Clone the calling process, creating an exact copy.
+
+   The __FLAGS can be used to specify whether to run pthread_atfork handlers
+   and reset internal states.  The default is to run it, similar to fork.
+
+   Return -1 for errors, 0 to the new process, and the process file descriptor
+   of the new process to the parent process.  */
+extern int pidfd_fork (unsigned int __flags) __THROW;
+
 #endif /* _PIDFD_H  */
diff --git a/sysdeps/unix/sysv/linux/tst-pidfd_fork.c b/sysdeps/unix/sysv/linux/tst-pidfd_fork.c
new file mode 100644
index 0000000000..32542b8331
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-pidfd_fork.c
@@ -0,0 +1,150 @@ 
+/* Basic tests for pidfd_fork.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <array_length.h>
+#include <errno.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <support/check.h>
+#include <support/temp_file.h>
+#include <support/xunistd.h>
+#include <sys/pidfd.h>
+#include <sys/wait.h>
+
+#define SIG_PID_EXIT_CODE 20
+
+static bool atfork_prepare_var;
+static bool atfork_parent_var;
+static bool atfork_child_var;
+
+static void
+atfork_prepare (void)
+{
+  atfork_prepare_var = true;
+}
+
+static void
+atfork_parent (void)
+{
+  atfork_parent_var = true;
+}
+
+static void
+atfork_child (void)
+{
+  atfork_child_var = true;
+}
+
+static int
+singlethread_test (unsigned int flags)
+{
+  const char testdata1[] = "abcdefghijklmnopqrtuvwxz";
+  enum { testdatalen1 = array_length (testdata1) };
+  const char testdata2[] = "01234567890";
+  enum { testdatalen2 = array_length (testdata2) };
+
+  pid_t ppid = getpid ();
+
+  int tempfd = create_temp_file ("tst-pidfd_fork", NULL);
+
+  /* Check if the opened file is shared between process by read and write
+     some data on parent and child processes.  */
+  xwrite (tempfd, testdata1, testdatalen1);
+  off_t off = xlseek (tempfd, 0, SEEK_CUR);
+  TEST_COMPARE (off, testdatalen1);
+
+  int pidfd = pidfd_fork (flags);
+  if (pidfd == -1 && errno == ENOSYS)
+    FAIL_UNSUPPORTED ("kernel does not support CLONE_PIDFD clone flag");
+  TEST_VERIFY_EXIT (pidfd != -1);
+
+  if (pidfd == 0)
+    {
+      if (flags & PIDFDFORK_ASYNCSAFE)
+	TEST_VERIFY (!atfork_child_var);
+      else
+	TEST_VERIFY (atfork_child_var);
+
+      TEST_VERIFY_EXIT (getpid () != ppid);
+      TEST_COMPARE (getppid(), ppid);
+
+      TEST_COMPARE (xlseek (tempfd, 0, SEEK_CUR), testdatalen1);
+
+      xlseek (tempfd, 0, SEEK_SET);
+      char buf[testdatalen1];
+      TEST_COMPARE (read (tempfd, buf, sizeof (buf)), testdatalen1);
+      TEST_COMPARE_BLOB (buf, testdatalen1, testdata1, testdatalen1);
+
+      xlseek (tempfd, 0, SEEK_SET);
+      xwrite (tempfd, testdata2, testdatalen2);
+
+      xclose (tempfd);
+
+      _exit (EXIT_SUCCESS);
+    }
+  else
+    TEST_VERIFY_EXIT (pidfd != -1);
+
+  {
+    siginfo_t sinfo;
+    TEST_COMPARE (waitid (P_PIDFD, pidfd, &sinfo, WEXITED), 0);
+    TEST_COMPARE (sinfo.si_code, CLD_EXITED);
+    TEST_COMPARE (sinfo.si_status, 0);
+  }
+
+  TEST_COMPARE (xlseek (tempfd, 0, SEEK_CUR), testdatalen2);
+
+  xlseek (tempfd, 0, SEEK_SET);
+  char buf[testdatalen2];
+  TEST_COMPARE (read (tempfd, buf, sizeof (buf)), testdatalen2);
+
+  TEST_COMPARE_BLOB (buf, testdatalen2, testdata2, testdatalen2);
+
+  return 0;
+}
+
+static int
+do_test (void)
+{
+  pthread_atfork (atfork_prepare, atfork_parent, atfork_child);
+
+  /* With default flags, pidfd_fork acts as fork and run the pthread_atfork
+     handlers.  */
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    singlethread_test (0);
+    TEST_VERIFY (atfork_prepare_var);
+    TEST_VERIFY (atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
+  /* With PIDFDFORK_ASYNCSAFE, pidfd_fork acts as _Fork.  */
+  {
+    atfork_prepare_var = atfork_parent_var = atfork_child_var = false;
+    pthread_atfork (atfork_prepare, atfork_parent, atfork_child);
+    singlethread_test (PIDFDFORK_ASYNCSAFE);
+    TEST_VERIFY (!atfork_prepare_var);
+    TEST_VERIFY (!atfork_parent_var);
+    TEST_VERIFY (!atfork_child_var);
+  }
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index b7422d5747..045205de0c 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2574,6 +2574,7 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F
 GLIBC_2.4 __confstr_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index 90680bc1d7..5a67604948 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2680,5 +2680,6 @@  GLIBC_2.38 __isoc23_wcstoull F
 GLIBC_2.38 __isoc23_wcstoull_l F
 GLIBC_2.38 __isoc23_wcstoumax F
 GLIBC_2.38 __isoc23_wscanf F
+GLIBC_2.38 pidfd_fork F
 GLIBC_2.38 pidfd_spawn F
 GLIBC_2.38 pidfd_spawnp F