sys/prctl.h anyway includes linux/prctl.h and actually these .c files
includes sys/prctl.h. Hence, it is not necessary to explicitly include
linux/prctl.h.
When we are running in chroot, safe_fork() with FORK_MOUNTNS_SLAVE fails
with the following eror:
```
Failed to set mount propagation to MS_SLAVE for all mounts: Invalid argument
```
Let's skip the test cases when we are running in chroot.
We use symbols provided by unistd.h without including it. E.g.
open(), close(), read(), write(), access(), symlink(), unlink(), rmdir(),
fsync(), syncfs(), lseek(), ftruncate(), fchown(), dup2(), pipe2(),
getuid(), getgid(), gettid(), getppid(), pipe2(), execv(), _exit(),
environ, STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO, F_OK, and their
friends and variants, so on.
Currently, unistd.h is indirectly included mainly in the following two paths:
- through missing_syscall.h, which is planned to covert to .c file.
- through signal.h -> bits/sigstksz.h, which is new since glibc-2.34.
Note, signal.h is included by sd-eevent.h. So, many source files
indirectly include unistd.h if newer glibc is used.
Currently, our baseline on glibc is 2.31. We need to support glibc older
than 2.34, but unfortunately, we do not have any CI environments with
such old glibc. CIFuzz uses glibc-2.31, but it builds only fuzzers, and
many files are even not compiled.
To make sure everything still compiles, we add a preliminary include
of forward.h to tests.h to make sure it is included in every test source
file. We'll clean up the tests.h includes in a later commit.
We also add a <errno.h> include to errno-list.h to keep test-errno-list.c
compiling. It'll be removed again when we clean up includes in src/basic.
Split out of #37344.
The test fails when running under sanitizers due to missing sanitizer
libraries. For now, let's skip the test until we can make the necessary
changes to run it under sanitizers.
s390x will define both s390x and s390, so exec-personality-s390.service is ran
in both cases but fails on s390x, as the personality returned is s390x.
Split the test and check specifically for s390x.
Since e56a8790a0 debugging test-execute fails has been a royal PITA, since
we ditch all potentially useful output from the test units (that, for
the most part, run `sh -x ...`). Let's improve the situation a bit by
setting EXEC_OUTPUT_NULL only when running the single test case that
needs it, and inheriting stdout otherwise.
For example, with a purposefully introduced error we get this output
with this patch:
exec-personality-x86-64.service: About to execute: sh -x -c "c=\$\$(uname -m); test \"\$\$c\" = \"foo_bar\""
Serializing sd-executor-state to memfd.
...
Personality: x86-64
LockPersonality: no
SystemCallErrorNumber: kill
++ uname -m
+ c=x86_64
+ test x86_64 = foo_bar
Received SIGCHLD from PID 1520588 (sh).
Child 1520588 (sh) died (code=exited, status=1/FAILURE)
exec-personality-x86-64.service: Child 1520588 belongs to exec-personality-x86-64.service.
exec-personality-x86-64.service: Main process exited, code=exited, status=1/FAILURE
exec-personality-x86-64.service: Failed with result 'exit-code'.
...
Exit Status: 1
src/test/test-execute.c:456:test_exec_personality: exec-personality-x86-64.service: can_unshare=yes: exit status 1, expected 0
(test-execute-root) terminated by signal ABRT.
Assertion 'r >= 0' failed at src/test/test-execute.c:1433, function prepare_ns(). Aborting.
Aborted
But without it, we'd miss the most important part:
exec-personality-x86-64.service: About to execute: sh -x -c "c=\$\$(uname -m); test \"\$\$c\" = \"foo_bar\""
Serializing sd-executor-state to memfd.
...
Personality: x86-64
LockPersonality: no
SystemCallErrorNumber: kill
Received SIGCHLD from PID 1521365 (sh).
Child 1521365 (sh) died (code=exited, status=1/FAILURE)
exec-personality-x86-64.service: Child 1521365 belongs to exec-personality-x86-64.service.
exec-personality-x86-64.service: Main process exited, code=exited, status=1/FAILURE
exec-personality-x86-64.service: Failed with result 'exit-code'.
...
Exit Status: 1
src/test/test-execute.c:456:test_exec_personality: exec-personality-x86-64.service: can_unshare=yes: exit status 1, expected 0
(test-execute-root) terminated by signal ABRT.
Assertion 'r >= 0' failed at src/test/test-execute.c:1433, function prepare_ns(). Aborting.
Aborted
If we're running test-execute from the build directory which is under
one of the tmpfs-ed directories (i.e. /root or /tmp), test-execute might
behave strangely, since in that case manager_new() pins the system
systemd-executor binary instead of the build dir one, which may lead to
a very confusing test fails (if there's enough difference between the
system and built sd-executor binary). Let's account for that and
bind-mount the build dir under the tmpfs-ed directory if necessary.
Bump the timeout for test-execute subtests if running with plain QEMU
(as part of TEST-02-UNITTESTS), since we might start hitting the default
2m timeout with some more involved subtests, especially when the AWS
region we're running in is under heavy load. I see this regularly in the
CentOS Stream 9 nightly cron job with exec-dynamicuser-statedir.service
which has a lot of ExecStart's.
Some tests in test-execute are already skipped if we do not have
unprivileged user namespaces. Extend this check to look for an apparmor
specific sysctl indicating that unprivileged userns creation is
restricted.
With newer versions of AppArmor, unprivileged user namespace creation
may be restricted by default, in which case user manager instances will
not be able to apply PrivateUsers=yes (or the settings which require it).
Additionally, if a kernel has the kernel.unprivileged_userns_clone
sysctl patch, and that sysctl is 0, then unprivileged userns creation
will always fail.
If a test unit is going to be run in a user manager, and that unit
requires PrivateUsers=yes (explicitly or implicitly), then skip it if
we do not have user namespace privileges.
In some environments, such as a LXD container, the netns setup might
fail because ip netns exec fails trying to mount /sys:
$ systemd-detect-virt
lxc
$ ip link add dummy-test-exec type dummy
$ ip netns add test-execute-netns
$ ip netns exec test-execute-netns ip link add dummy-test-ns type dummy
mount of /sys failed: Operation not permitted
If this setup fails, test_exec_networknamespacepath will fail, so check
the exit codes for these setup calls and skip the test if necessary.
The read-only bit is flipped after setting up all the mounts, so that
bind mounts can be added. Remove the early config, and add a unit
test.
Fixes https://github.com/systemd/systemd/issues/30372
When starting a service with a non-root user and a SystemCallFilter and
other settings (like ProtectClock), the no_new_privs flag should not be set.
Also, test that CapabilityBoundingSet behaves correctly, since we need
to preserve some capabilities to do the seccomp filter and restore the
ones set by the service before executing.
Sometimes it makes sense to hard kill a client if we die. Let's hence
add a third FORK_DEATHSIG flag for this purpose: FORK_DEATHSIG_SIGKILL.
To make things less confusing this also renames FORK_DEATHSIG to
FORK_DEATHSIG_SIGTERM to make clear it sends SIGTERM. We already had
FORK_DEATHSIG_SIGINT, hence this makes things nicely symmetric.
A bunch of users are switched over for FORK_DEATHSIG_SIGKILL where we
know it's safe to abort things abruptly. This should make some kernel
cases more robust, since we cannot get confused by signal masks or such.
While we are at it, also fix a bunch of bugs where we didn't take
FORK_DEATHSIG_SIGINT into account in safe_fork()
We use it for more than just pipe() arrays. For example also for
socketpair(). Hence let's give it a generic name.
Also add EBADF_TRIPLET to mirror this for things like
stdin/stdout/stderr arrays, which we use a bunch of times.
This adds a new structure UnitDefaults which embedds the various default
settings for units we maintain. We so far maintained two sets of
variables for this, one in main.c as static variables and one in the
Manager structure. This moves them into a common structure.
This is most just search/replace, i.e. very dumb refactoring.
The fact that we now use a common structure for this allows us further
refactorings later.
Inspired by the discussions on #27890
This reverts commits
- 9ae3624889
"test-execute: add tests for credentials directory with mount namespace"↲
- 94fe4cf255
"core: do not leak mount for credentials directory if mount namespace is enabled",
- 7241b9cd72
"core/credential: make setup_credentials() return path to credentials directory",
- fbaf3b23ae
"core: set $CREDENTIALS_DIRECTORY only when we set up credentials"
Before the commits, credentials directory set up on ExecStart= was kept
on e.g. ExecStop=. But, with the changes, if a service requests a
private mount namespace, the credentials directory is discarded after
ExecStart= is finished.
Let's revert the change, and find better way later.
Addresses the post-merge comment
https://github.com/systemd/systemd/pull/28787#issuecomment-1690614202.
seccomp-util.h doesn't need ifdeffing, hence don't. It has worked since
quite a while with HAVE_SECCOMP is off, hence use it everywhere.
Also drop explicit seccomp.h inclusion everywhere (which needs
HAVE_SECCOMP ifdeffery everywhere). seccomp-util.h includes it anyway,
automatically, which we can just rely on, and it deals with HAVE_SECCOMP
at one central place.
In order to get a good approximation of latencies when starting
services, timestamp before/after running the test cases and print
the difference. This allows to measure while ignoring the setup/shutdown
time for the test harness.
Otherwise the get_testdata_dir() call fails if the source tree is under
/root (which is usually the case in CIs).
I got bitten by this after leaving the source tree under /root but moving the
$BUILD_DIR elsewhere. This used to work by accident, as load_testdata_env()
would try to read $BUILD_DIR/systemd-runtest.env, but would fail if the
$BUILD_DIR is also under /root and fall back to SYSTEMD_TEST_DATA
(/lib/systemd/tests/testdata), which usually exist as we install the just built
revision. However, if the $BUILD_DIR is outside of /root we'd read
$BUILD_DIR/systemd-runtest.env which contains
SYSTEMD_TEST_DATA=/path/to/source/tree/test and that source tree is not visible
once we overmount /root with tmpfs making the test fail:
/* test_run_tests_unprivileged */
Successfully forked off '(test-execute-unprivileged)' as PID 10672.
Changing mount flags / (MS_REMOUNT|MS_BIND "")...
Changing mount propagation / (MS_REC|MS_SHARED "")
Mounting tmpfs (tmpfs) on /dev/shm (MS_NOSUID|MS_NODEV "")...
Mounting tmpfs (tmpfs) on /root (MS_NOSUID|MS_NODEV "")...
Mounting tmpfs (tmpfs) on /tmp (MS_NOSUID|MS_NODEV "")...
Mounting tmpfs (tmpfs) on /var/tmp (MS_NOSUID|MS_NODEV "")...
Mounting tmpfs (tmpfs) on /var/lib (MS_NOSUID|MS_NODEV "")...
Mounting tmpfs (tmpfs) on /run/test-execute-unit-dir (MS_NOSUID|MS_NODEV "")...
ERROR: $SYSTEMD_TEST_DATA directory [/root/systemd/test] not accessible: No such file or directory
Assertion 'get_testdata_dir("test-execute/", &unit_dir) >= 0' failed at src/test/test-execute.c:1306, function prepare_ns(). Aborting.
(test-execute-unprivileged) terminated by signal ABRT.