glibc/include/clone_internal.h

48 lines
1.8 KiB
C
Raw Normal View History

#ifndef _CLONE_INTERNAL_H
#define _CLONE_INTERNAL_H
Add an internal wrapper for clone, clone2 and clone3 The clone3 system call (since Linux 5.3) provides a superset of the functionality of clone and clone2. It also provides a number of API improvements, including the ability to specify the size of the child's stack area which can be used by kernel to compute the shadow stack size when allocating the shadow stack. Add: extern int __clone_internal (struct clone_args *__cl_args, int (*__func) (void *__arg), void *__arg); to provide an abstract interface for clone, clone2 and clone3. 1. Simplify stack management for thread creation by passing both stack base and size to create_thread. 2. Consolidate clone vs clone2 differences into a single file. 3. Call __clone3 if HAVE_CLONE3_WAPPER is defined. If __clone3 returns -1 with ENOSYS, fall back to clone or clone2. 4. Use only __clone_internal to clone a thread. Since the stack size argument for create_thread is now unconditional, always pass stack size to create_thread. 5. Enable the public clone3 wrapper in the future after it has been added to all targets. NB: Sandbox will return ENOSYS on clone3 in both Chromium: The following revision refers to this bug: https://chromium.googlesource.com/chromium/src/+/218438259dd795456f0a48f67cbe5b4e520db88b commit 218438259dd795456f0a48f67cbe5b4e520db88b Author: Matthew Denton <mpdenton@chromium.org> Date: Thu Jun 03 20:06:13 2021 Linux sandbox: return ENOSYS for clone3 Because clone3 uses a pointer argument rather than a flags argument, we cannot examine the contents with seccomp, which is essential to preventing sandboxed processes from starting other processes. So, we won't be able to support clone3 in Chromium. This CL modifies the BPF policy to return ENOSYS for clone3 so glibc always uses the fallback to clone. Bug: 1213452 Change-Id: I7c7c585a319e0264eac5b1ebee1a45be2d782303 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2936184 Reviewed-by: Robert Sesek <rsesek@chromium.org> Commit-Queue: Matthew Denton <mpdenton@chromium.org> Cr-Commit-Position: refs/heads/master@{#888980} [modify] https://crrev.com/218438259dd795456f0a48f67cbe5b4e520db88b/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc and Firefox: https://hg.mozilla.org/integration/autoland/rev/ecb4011a0c76 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-02-14 03:47:46 +08:00
#include <clone3.h>
Add an internal wrapper for clone, clone2 and clone3 The clone3 system call (since Linux 5.3) provides a superset of the functionality of clone and clone2. It also provides a number of API improvements, including the ability to specify the size of the child's stack area which can be used by kernel to compute the shadow stack size when allocating the shadow stack. Add: extern int __clone_internal (struct clone_args *__cl_args, int (*__func) (void *__arg), void *__arg); to provide an abstract interface for clone, clone2 and clone3. 1. Simplify stack management for thread creation by passing both stack base and size to create_thread. 2. Consolidate clone vs clone2 differences into a single file. 3. Call __clone3 if HAVE_CLONE3_WAPPER is defined. If __clone3 returns -1 with ENOSYS, fall back to clone or clone2. 4. Use only __clone_internal to clone a thread. Since the stack size argument for create_thread is now unconditional, always pass stack size to create_thread. 5. Enable the public clone3 wrapper in the future after it has been added to all targets. NB: Sandbox will return ENOSYS on clone3 in both Chromium: The following revision refers to this bug: https://chromium.googlesource.com/chromium/src/+/218438259dd795456f0a48f67cbe5b4e520db88b commit 218438259dd795456f0a48f67cbe5b4e520db88b Author: Matthew Denton <mpdenton@chromium.org> Date: Thu Jun 03 20:06:13 2021 Linux sandbox: return ENOSYS for clone3 Because clone3 uses a pointer argument rather than a flags argument, we cannot examine the contents with seccomp, which is essential to preventing sandboxed processes from starting other processes. So, we won't be able to support clone3 in Chromium. This CL modifies the BPF policy to return ENOSYS for clone3 so glibc always uses the fallback to clone. Bug: 1213452 Change-Id: I7c7c585a319e0264eac5b1ebee1a45be2d782303 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2936184 Reviewed-by: Robert Sesek <rsesek@chromium.org> Commit-Queue: Matthew Denton <mpdenton@chromium.org> Cr-Commit-Position: refs/heads/master@{#888980} [modify] https://crrev.com/218438259dd795456f0a48f67cbe5b4e520db88b/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc and Firefox: https://hg.mozilla.org/integration/autoland/rev/ecb4011a0c76 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-02-14 03:47:46 +08:00
/* The clone3 syscall provides a superset of the functionality of the clone
interface. The kernel might extend __CL_ARGS struct in the future, with
each version with a different __SIZE. If the child is created, it will
start __FUNC function with __ARG arguments.
Different than kernel, the implementation also returns EINVAL for an
invalid NULL __CL_ARGS or __FUNC (similar to __clone).
All callers are responsible for correctly aligning the stack. The stack is
not aligned prior to the syscall (this differs from the exported __clone).
This function is only implemented if the ABI defines HAVE_CLONE3_WRAPPER.
*/
extern int __clone3 (struct clone_args *__cl_args, size_t __size,
int (*__func) (void *__arg), void *__arg);
/* The internal wrapper of clone/clone2 and clone3. Different than __clone3,
it will align the stack if required. If __clone3 returns -1 with ENOSYS,
fall back to clone or clone2. */
Add an internal wrapper for clone, clone2 and clone3 The clone3 system call (since Linux 5.3) provides a superset of the functionality of clone and clone2. It also provides a number of API improvements, including the ability to specify the size of the child's stack area which can be used by kernel to compute the shadow stack size when allocating the shadow stack. Add: extern int __clone_internal (struct clone_args *__cl_args, int (*__func) (void *__arg), void *__arg); to provide an abstract interface for clone, clone2 and clone3. 1. Simplify stack management for thread creation by passing both stack base and size to create_thread. 2. Consolidate clone vs clone2 differences into a single file. 3. Call __clone3 if HAVE_CLONE3_WAPPER is defined. If __clone3 returns -1 with ENOSYS, fall back to clone or clone2. 4. Use only __clone_internal to clone a thread. Since the stack size argument for create_thread is now unconditional, always pass stack size to create_thread. 5. Enable the public clone3 wrapper in the future after it has been added to all targets. NB: Sandbox will return ENOSYS on clone3 in both Chromium: The following revision refers to this bug: https://chromium.googlesource.com/chromium/src/+/218438259dd795456f0a48f67cbe5b4e520db88b commit 218438259dd795456f0a48f67cbe5b4e520db88b Author: Matthew Denton <mpdenton@chromium.org> Date: Thu Jun 03 20:06:13 2021 Linux sandbox: return ENOSYS for clone3 Because clone3 uses a pointer argument rather than a flags argument, we cannot examine the contents with seccomp, which is essential to preventing sandboxed processes from starting other processes. So, we won't be able to support clone3 in Chromium. This CL modifies the BPF policy to return ENOSYS for clone3 so glibc always uses the fallback to clone. Bug: 1213452 Change-Id: I7c7c585a319e0264eac5b1ebee1a45be2d782303 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2936184 Reviewed-by: Robert Sesek <rsesek@chromium.org> Commit-Queue: Matthew Denton <mpdenton@chromium.org> Cr-Commit-Position: refs/heads/master@{#888980} [modify] https://crrev.com/218438259dd795456f0a48f67cbe5b4e520db88b/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc and Firefox: https://hg.mozilla.org/integration/autoland/rev/ecb4011a0c76 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-02-14 03:47:46 +08:00
extern int __clone_internal (struct clone_args *__cl_args,
int (*__func) (void *__arg), void *__arg);
/* clone3 wrapper with a sticky check to avoid re-issuing the syscall if
it fails with ENOSYS. */
extern int __clone3_internal (struct clone_args *cl_args,
int (*func) (void *args), void *arg)
attribute_hidden;
/* The fallback code which calls clone/clone2 based on clone3 arguments. */
extern int __clone_internal_fallback (struct clone_args *__cl_args,
int (*__func) (void *__arg),
void *__arg)
attribute_hidden;
Add an internal wrapper for clone, clone2 and clone3 The clone3 system call (since Linux 5.3) provides a superset of the functionality of clone and clone2. It also provides a number of API improvements, including the ability to specify the size of the child's stack area which can be used by kernel to compute the shadow stack size when allocating the shadow stack. Add: extern int __clone_internal (struct clone_args *__cl_args, int (*__func) (void *__arg), void *__arg); to provide an abstract interface for clone, clone2 and clone3. 1. Simplify stack management for thread creation by passing both stack base and size to create_thread. 2. Consolidate clone vs clone2 differences into a single file. 3. Call __clone3 if HAVE_CLONE3_WAPPER is defined. If __clone3 returns -1 with ENOSYS, fall back to clone or clone2. 4. Use only __clone_internal to clone a thread. Since the stack size argument for create_thread is now unconditional, always pass stack size to create_thread. 5. Enable the public clone3 wrapper in the future after it has been added to all targets. NB: Sandbox will return ENOSYS on clone3 in both Chromium: The following revision refers to this bug: https://chromium.googlesource.com/chromium/src/+/218438259dd795456f0a48f67cbe5b4e520db88b commit 218438259dd795456f0a48f67cbe5b4e520db88b Author: Matthew Denton <mpdenton@chromium.org> Date: Thu Jun 03 20:06:13 2021 Linux sandbox: return ENOSYS for clone3 Because clone3 uses a pointer argument rather than a flags argument, we cannot examine the contents with seccomp, which is essential to preventing sandboxed processes from starting other processes. So, we won't be able to support clone3 in Chromium. This CL modifies the BPF policy to return ENOSYS for clone3 so glibc always uses the fallback to clone. Bug: 1213452 Change-Id: I7c7c585a319e0264eac5b1ebee1a45be2d782303 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2936184 Reviewed-by: Robert Sesek <rsesek@chromium.org> Commit-Queue: Matthew Denton <mpdenton@chromium.org> Cr-Commit-Position: refs/heads/master@{#888980} [modify] https://crrev.com/218438259dd795456f0a48f67cbe5b4e520db88b/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc and Firefox: https://hg.mozilla.org/integration/autoland/rev/ecb4011a0c76 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-02-14 03:47:46 +08:00
posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) Returning a pidfd allows a process to keep a race-free handle for a child process, otherwise, the caller will need to either use pidfd_open (which still might be subject to TOCTOU) or keep the old racy interface base on pid_t. To correct use pifd_spawn, the kernel must support not only returning the pidfd with clone/clone3 but also waitid (P_PIDFD) (added on Linux 5.4). If kernel does not support the waitid, pidfd return ENOSYS. It avoids the need to racy workarounds, such as reading the procfs fdinfo to get the pid to use along with other wait interfaces. These interfaces are similar to the posix_spawn and posix_spawnp, with the only difference being it returns a process file descriptor (int) instead of a process ID (pid_t). Their prototypes are: int pidfd_spawn (int *restrict pidfd, const char *restrict file, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict], char *const envp[restrict]) int pidfd_spawnp (int *restrict pidfd, const char *restrict path, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict_arr], char *const envp[restrict_arr]); A new symbol is used instead of a posix_spawn extension to avoid possible issues with language bindings that might track the return argument lifetime. Although on Linux pid_t and int are interchangeable, POSIX only states that pid_t should be a signed integer. Both symbols reuse the posix_spawn posix_spawn_file_actions_t and posix_spawnattr_t, to void rehash posix_spawn API or add a new one. It also means that both interfaces support the same attribute and file actions, and a new flag or file action on posix_spawn is also added automatically for pidfd_spawn. Also, using posix_spawn plumbing allows the reusing of most of the current testing with some changes: - waitid is used instead of waitpid since it is a more generic interface. - tst-posix_spawn-setsid.c is adapted to take into consideration that the caller can check for session id directly. The test now spawns itself and writes the session id as a file instead. - tst-spawn3.c need to know where pidfd_spawn is used so it keeps an extra file description unused. Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid support), Linux 5.4 (full support), and Linux 6.2. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-08-25 00:42:18 +08:00
/* Return whether the kernel supports pid file descriptor, including clone
with CLONE_PIDFD and waitid with P_PIDFD. */
extern bool __clone_pidfd_supported (void) attribute_hidden;
Add an internal wrapper for clone, clone2 and clone3 The clone3 system call (since Linux 5.3) provides a superset of the functionality of clone and clone2. It also provides a number of API improvements, including the ability to specify the size of the child's stack area which can be used by kernel to compute the shadow stack size when allocating the shadow stack. Add: extern int __clone_internal (struct clone_args *__cl_args, int (*__func) (void *__arg), void *__arg); to provide an abstract interface for clone, clone2 and clone3. 1. Simplify stack management for thread creation by passing both stack base and size to create_thread. 2. Consolidate clone vs clone2 differences into a single file. 3. Call __clone3 if HAVE_CLONE3_WAPPER is defined. If __clone3 returns -1 with ENOSYS, fall back to clone or clone2. 4. Use only __clone_internal to clone a thread. Since the stack size argument for create_thread is now unconditional, always pass stack size to create_thread. 5. Enable the public clone3 wrapper in the future after it has been added to all targets. NB: Sandbox will return ENOSYS on clone3 in both Chromium: The following revision refers to this bug: https://chromium.googlesource.com/chromium/src/+/218438259dd795456f0a48f67cbe5b4e520db88b commit 218438259dd795456f0a48f67cbe5b4e520db88b Author: Matthew Denton <mpdenton@chromium.org> Date: Thu Jun 03 20:06:13 2021 Linux sandbox: return ENOSYS for clone3 Because clone3 uses a pointer argument rather than a flags argument, we cannot examine the contents with seccomp, which is essential to preventing sandboxed processes from starting other processes. So, we won't be able to support clone3 in Chromium. This CL modifies the BPF policy to return ENOSYS for clone3 so glibc always uses the fallback to clone. Bug: 1213452 Change-Id: I7c7c585a319e0264eac5b1ebee1a45be2d782303 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2936184 Reviewed-by: Robert Sesek <rsesek@chromium.org> Commit-Queue: Matthew Denton <mpdenton@chromium.org> Cr-Commit-Position: refs/heads/master@{#888980} [modify] https://crrev.com/218438259dd795456f0a48f67cbe5b4e520db88b/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc and Firefox: https://hg.mozilla.org/integration/autoland/rev/ecb4011a0c76 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-02-14 03:47:46 +08:00
#ifndef _ISOMAC
libc_hidden_proto (__clone3)
libc_hidden_proto (__clone_internal)
#endif
#endif