Commit Graph

1087 Commits

Author SHA1 Message Date
Mike Yuan
d3da74696b core: record transactions that have seen ordering cycles 2025-11-12 23:47:39 +01:00
Mike Yuan
0d9e79d5ca core/transaction: assign unique ids to transactions and encode them in log
Preparation for later commits, but I think this one makes
a ton of sense on its own. When debug logging is enabled
it's otherwise difficult to dig up the portion of journal
for transaction construction.
2025-11-12 23:47:38 +01:00
Yu Watanabe
6431f2e072 musl: time-util: introduce get_tzname() helper function
musl leaves the DST timezone name unset if there is no DST.
The helper function maps that back to no DST.
2025-11-13 03:13:55 +09:00
Daniel Foster
c7a444a9c1 tree-wide: extend $LISTEN_FDS protocol with $LISTEN_PIDFDID
Although extremely unlikely, there is a race present in solely checking the
$LISTEN_PID environment variable, due to PID recycling. Fix that by introducing
$LISTEN_PIDFDID, which contains the 64-bit ID of a pidfd for the child process
that is not subject to recycling.
2025-10-22 09:34:14 +02:00
Mike Yuan
4f8c1de213 core/manager: honor show_status_overridden in manager_watch_jobs_next_time()
Prompted by #39029
2025-09-20 00:01:54 +02:00
Mike Yuan
d25c8ee7f9 core: console status fixes (#39029) 2025-09-19 20:30:11 +02:00
Lennart Poettering
9ecc969855 core: fix status output suppression
This fixes two things: first of all it ensures we take the override
status output field properly into account, instead of going directly to
the regular one.

Moreover, it ensures that we bypass auto for both notice + emergency,
since both have the same "impact", and, don't limit this for notice
only.
2025-09-19 17:32:48 +02:00
Lennart Poettering
4d8c5c657a build: make libaudit dep dlopen() 2025-09-19 16:30:13 +02:00
Yu Watanabe
7184f8366f firewall-util: drop FirewallContext
After iptables support is dropped, FirewallContext is a trivial
wrapper of sd_netlink. Let's drop it and directly use sd_netlink.
2025-09-19 15:33:17 +09:00
Zbigniew Jędrzejewski-Szmek
66b2d758c5 various: add a fixed name to log about plugin execution
Function execute_directories logged in a way that was meaningless
without additional context:
  systemd[1]: No executables found.
In execute_strv this was partially rectified by extracting the directory
name from one of the directories and using this as the identifier. But
the directory name is not always meaningful, and can also be set from
an environment variable. Let's simplify things by providing a fixed name
that can be used consistently in all log messages. In particular this will
make error messages easier to understand if users report just the error
without additional context.
2025-09-03 08:56:23 +02:00
Grimmauld
30bbdf0771 core: add 'DefaultRestrictSUIDSGID' config option
closes #37602

On typical systems, only few services need to create SUID/SGID files.
This often is limited to the user explicitly setting suid/sgid, the
`systemd-tmpfiles*` services, and the package manager. Allowing a default
to globally restrict creation of suid/sgid files makes it easier to apply
this restriction precisely.
2025-07-09 11:08:34 +02:00
Mike Yuan
1b4ab5a209 core/socket: introduce DeferTrigger= and DeferTriggerMaxSec=
Alternative to b50f6dbe57

The commit naively returned early from socket_enter_running(), which however
is quite problematic, as the socket will be woken up over and over again
without doing a thing, until we eventually hit Poll/TriggerLimit*=.
On top of that it requires hacks to hold the start job for initrd-switch-root.service
up. Overall I doubt that is the right approach.

Let's instead hook this into our job engine, and try to activate
the service again when some other units are stopped. If all installed
jobs have been run yet we're still seeing the conflict or the manually
selected timeout is reached, fail the socket as before.
2025-06-30 13:10:43 +02:00
Valentin David
0dc39dffbd Use paths specified from environment variables for /etc configuration files
Some configuration files that need updates are directly under in /etc. To
update them atomically, we need write access to /etc. For Ubuntu Core this is
an issue as /etc is not writable. Only a selection of subdirectories can be
writable. The general solution is symlinks or bind mounts to writable places.
But for atomic writes in /etc, that does not work. So Ubuntu has had a patch
for that that did not age well.

Instead we would like to introduce some environment variables for alternate
paths.

 * SYSTEMD_ETC_HOSTNAME: /etc/hostname
 * SYSTEMD_ETC_MACHINE_INFO: /etc/machine-info
 * SYSTEMD_ETC_LOCALTIME: /etc/localtime
 * SYSTEMD_ETC_LOCALE_CONF: /etc/locale.conf
 * SYSTEMD_ETC_VCONSOLE_CONF: /etc/vconsole.conf
 * SYSTEMD_ETC_ADJTIME: /etc/adjtime

While it is for now expected that there is a symlink from the standard, we
still try to read them from that alternate path. This is important for
`/etc/localtime`, which is a symlink, so we cannot have an indirect symlink or
bind mount for it.

Since machine-id is typically written only once and not updated. This commit
does not cover it. An initrd can properly create it and bind mount it.
2025-06-23 15:32:11 +02:00
Mike Yuan
85352c095e various: turn off SO_PASSRIGHTS where fds are not expected 2025-06-17 13:16:44 +02:00
Lennart Poettering
d65dc4c593 core: break lines in some overly long function calls 2025-06-06 09:04:45 +02:00
Zbigniew Jędrzejewski-Szmek
42ba99748d various: do not include file names directly in error messages
git grep -l 'Failed to open /'|xargs sed -r -i 's|"Failed to open (/[^ ]+): %m"|"Failed to open %s: %m", "\1"|g'
git grep -l $'Failed to open \'/'|xargs sed -r -i $'s|"Failed to open \'(/[^ ]+)\': %m"|"Failed to open %s: %m", "\\1"|g'
git grep -l "Failed to open /"|xargs sed -r -i $'s|"Failed to open (/[^ ]+), ignoring: %m"|"Failed to open %s, ignoring: %m", "\\1"|g'
+ some manual fixups.
2025-06-02 11:10:38 +02:00
Dusty Mabe
bdd852a199 src/core/manager.c: log preset activity on first boot
This gives us a little more information about what units were enabled
or disabled on that first boot and will be useful for OS developers
tracking down the source of unit state.

An example with this enabled looks like:

```
NET: Registered PF_VSOCK protocol family
systemd[1]: Applying preset policy.
systemd[1]: Unit /etc/systemd/system/dnsmasq.service is masked, ignoring.
systemd[1]: Unit /etc/systemd/system/systemd-repart.service is masked, ignoring.
systemd[1]: Removed '/etc/systemd/system/sockets.target.wants/systemd-resolved-monitor.socket'.
systemd[1]: Removed '/etc/systemd/system/sockets.target.wants/systemd-resolved-varlink.socket'.
systemd[1]: Created symlink '/etc/systemd/system/multi-user.target.wants/var-mnt-workdir.mount' → '/etc/systemd/system/var-mnt-workdir.mount'.
systemd[1]: Created symlink '/etc/systemd/system/multi-user.target.wants/var-mnt-workdir\x2dtmp.mount' → '/etc/systemd/system/var-mnt-workdir\x2dtmp.mount'.
systemd[1]: Created symlink '/etc/systemd/system/afterburn-sshkeys.target.requires/afterburn-sshkeys@core.service' → '/usr/lib/systemd/system/afterburn-sshkeys@.service'.
systemd[1]: Created symlink '/etc/systemd/system/sockets.target.wants/systemd-resolved-varlink.socket' → '/usr/lib/systemd/system/systemd-resolved-varlink.socket'.
systemd[1]: Created symlink '/etc/systemd/system/sockets.target.wants/systemd-resolved-monitor.socket' → '/usr/lib/systemd/system/systemd-resolved-monitor.socket'.
systemd[1]: Populated /etc with preset unit settings.
```

Considering it only happens on first boot and not on every boot I think
the extra information is worth the extra verbosity in the logs just for
that boot.
2025-05-27 05:53:36 +09:00
Daan De Meyer
6ae1ba8a84 core: Clean up includes
Follow up for 836e4e7ea8
2025-05-22 09:41:18 +02:00
Daan De Meyer
836e4e7ea8 core: Clean up includes
Split out of #37344.
2025-05-22 09:37:20 +02:00
Daan De Meyer
cdd5fac068 tree-wide: Include <libaudit.h> via libaudit-util.h
Let's keep the ifdeffery for the include in one place.
2025-05-21 14:05:56 +02:00
Luca Boccassi
6946eed3fa core: Also refresh confext extensions when reloading notify-reload service (#33995)
`ExtensionImages=` and `ExtensionDirectories=` now let you specify
vpick-named extensions; however, since they just get set up once when
the service is started, you can't see newer versions without restarting
the service entirely. Here, also reload confext extensions when you
reload a service. This allows you to deploy a new version of some
configuration and have it picked up at reload time without interruption
to your workload.

Right now, we would only reload confext extensions and leave the sysext
ones behind, since it didn't seem prudent to swap out what is likely
program code at reload. This is made possible by only going for the
`SYSTEMD_CONFEXT_HIERARCHIES` overlays (which only contains `/etc`).

This PR:
- Adjusts `service.c` to also refresh extensions when needed. 
- Adds integration tests to check that a confext reload actually
occurred.
- Adds to the `systemd.exec` man pages to document this behavior.

This is a follow up to #24864 and #31364. Thank you to @bluca and
@goenkam for help in getting this up.
2025-05-20 11:27:34 +01:00
maia x.
dfdeb0b1cb core: reload confexts when reloading notify-reload services
`ExtensionImages=` and `ExtensionDirectories=` now let you specify
vpick-named extensions; however, since they just get set up once when
the service is started, you can't see newer versions without restarting
the service entirely.  Here, also reload confext extensions when you
reload a service. This allows you to deploy a new version of some
configuration and have it picked up at reload time without interruption
to your workload.

Right now, we would only reload confext extensions and leave the sysext
ones behind, since it didn't seem prudent to swap out what is likely
program code at reload. This is made possible by only going for the
`SYSTEMD_CONFEXT_HIERARCHIES` overlays (which only contains `/etc`).

Implementation wise, this uses the new kernel API and two collaborating
child processes under the host & child namespaces in order to gather the
right FDs needed:

  - (1) In child, set up the extension images and directories in a slave
	mountns, and obtain their FDs.
  - (2) Fork into a grandchild under target process namespace, and do a
        "fake" unmount to obtain the FD of the underlying target folder
	say /etc).
  - (3) In the child again, set up new overlay under host NS rights.

We do not want to do I/O heavy jobs inline in PID1 blocking the state
machine, so add separate async states to handle this case.

Co-authored-by: Luca Boccassi <luca.boccassi@gmail.com>
2025-05-19 13:36:21 +01:00
Mike Yuan
741a184a31 core/manager: do not pop gc_unit_queue before unit_gc_sweep()
Follow-up for 52e3671bf7

unit_gc_sweep() might try to add the unit to gc queue again.
While that becomes no-op as Unit.in_gc_queue is not cleared
yet, it induces minor inconsistency of states.
2025-05-18 05:33:09 +09:00
Mike Yuan
29da53dde3 core: always enable CPU accounting
Our baseline is v5.4 and cgroup v2 is enforced now,
which means CPU accounting is cheap everywhere without
requiring any controller, hence just remove the directive.
2025-05-15 02:19:16 +02:00
Mike Yuan
3274ef6792 core: drop Manager.blockio_accounting
Follow-up for 98d64ff500
2025-05-15 02:19:15 +02:00
Mike Yuan
bad578b145 core: rename core-varlink -> varlink
To make things consistent with dbus.[ch]
2025-05-04 12:22:38 +09:00
Daan De Meyer
1cf40697e3 tree-wide: Sort includes
This was done by running a locally built clang-format with
https://github.com/llvm/llvm-project/pull/137617 and
https://github.com/llvm/llvm-project/pull/137840 applied on all .c
and .h files.
2025-04-30 09:30:51 +02:00
Daan De Meyer
4ea4abb651 core: Remove circular dependencies between headers
Currently there are various circular dependencies between headers
in core/. Let's get rid of these by making judicious use of forward
declarations and moving includes into implementation files instead of
having them in header files.

Getting rid of circular header includes simplifies the code and makes
various clang based tooling such as iwyu work much better on our code.

The most important change is getting rid of the manager.h include in
unit.h which is possible thanks to the previous commits. We also move
the OOMPolicy and StatusType enums to unit.h to remove the need for
other unit headers to include manager.h to get access to these enums.
2025-04-23 10:33:35 +02:00
Daan De Meyer
64fd6ba9ca core: Turn manager unit log fields into unit functions
There's no need for these to be fields inside the manager struct,
let's turn them into functions in unit.h instead, again to allow
forward declaring the Manager struct in a later commit.
2025-04-23 09:53:53 +02:00
Mike Yuan
9af70339aa core/manager: assume availability of all RT signals
Our kernel baseline is v5.4 now.
2025-04-23 10:04:04 +09:00
Mike Yuan
ead510fe06 core/manager: also assert on Manager.units_by_invocation_id being empty after cleanup 2025-04-07 15:48:23 +09:00
Yu Watanabe
e4e40936f3 nspawn: drop cgv1 handling; core: drop cgroup agent (#36764) 2025-04-05 17:57:18 +09:00
Mike Yuan
b157a7e613 core: also stash executor path in Manager
Prompted by b58c240312

Let's not query it over and over again in exec_spawn().
2025-04-05 02:33:00 +09:00
Mike Yuan
be1d96dbc3 core: remove cgroups-agent 2025-04-04 15:34:51 +02:00
Yu Watanabe
b58c240312 build-path: make pin_callout_binary() optionally provides the path to the found executable 2025-04-04 21:02:18 +09:00
Yu Watanabe
566b8f4d46 core/manager: update comment 2025-03-31 23:22:38 +09:00
Yu Watanabe
eb3554666e core: drop unused wrappers of manager_get_unit_by_pidref() and friends 2025-03-27 04:15:43 +09:00
Zbigniew Jędrzejewski-Szmek
c4876f604b Ratelimit attempts to open watchdog, increase logging (#35708) 2025-03-24 21:06:57 +01:00
Zbigniew Jędrzejewski-Szmek
a9cee8f4de core/manager: do not exclude watchdog logic from busy-loop protection
As reported in https://github.com/systemd/systemd/issues/35405, if the watchdog
ping failed, we effectively started a busy loop here. The previous commits
should fix this, but in general, the protection here is intended as a safety
net in case the logic is broken somewhere else. We shouldn't exclude the
watchdog stuff from this.
2025-03-24 10:45:49 +01:00
Zbigniew Jędrzejewski-Szmek
ab596e4cde shared/watchdog: give up after a few failed pings
Closes https://github.com/systemd/systemd/issues/35405. Apparently some
watchdog devices can be opened, but then the pings start failing after some
time. Since the timestamp of the last successful ping is not updated, we try to
ping again immediately, causing a busy loop and excessive logging.

After trying a few different approaches to fit this into the existing framework
without changing the logic too much, I settled on an approach with a second
timestamp. In particular, the timestamp of the last successful ping is public,
exposed as WatchdogLastPingTimestamp over dbus. It'd be wrong to redefine this
to mean the last ping *attempt*. So we need a second timestamp in some form.

Also, if we give up on pinging, we probably should attempt to disarm the
watchdog. It's possible that the pinging fails, but the watchdog would still
fire. I don't think we want that, since it seems that our internal loop is
working, it's just the watchdog that is broken.

Structured message with SD_MESSAGE_WATCHDOG_PING_FAILED is logged if we fail
to ping.

I tested this by attaching gdb to pid 1 and calling close(watchdog_fd).
We get a bunch of warning messages and then an attempt to close the watchdog:
Mar 21 15:46:17 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:20 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:23 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:26 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:29 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:32 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:35 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:37 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:40 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:43 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:46 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:49 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:52 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:55 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:58 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0, closing watchdog after 15 attempts: Bad file descriptor
Mar 21 15:46:58 fedora systemd[1]: Failed to disable hardware watchdog, ignoring: Bad file descriptor
Mar 21 15:46:58 fedora systemd[1]: Failed to disarm watchdog timer, ignoring: Bad file descriptor
2025-03-24 10:45:49 +01:00
Zbigniew Jędrzejewski-Szmek
2c6397d1c2 pid1: log if we failed to find a watchdog device
I think we need to log at some point if the user configured a watchdog device,
but no devices were found. We can't log ENOENT immediately, because the device
may likely appear during boot. So wait until the end of the initial transaction
and log then.
2025-03-21 11:38:43 +01:00
Zbigniew Jędrzejewski-Szmek
18284f5a1b core: drop duplicated check in manager_{set,override}_watchdog
Those functions call watchdog_setup() and watchdog_setup_pretimeout(), which
internally do a similar check against the static variables watchdog_timeout and
watchdog_pretimeout. The second check is not useful.
2025-03-21 11:30:26 +01:00
Yu Watanabe
3cf6a3a3d4 tree-wide: check more log message format in log_struct() and friends
This introduce LOG_ITEM() macro that checks arbitrary formats in
log_struct().
Then, drop _printf_ attribute from log_struct_internal(), as it does not
help so much, and compiler checked only the first format string.

Hopefully, this silences false-positive warnings by Coverity.
2025-03-19 01:56:48 +09:00
Zbigniew Jędrzejewski-Szmek
1ae9b0cfa8 basic/glyph-util: rename "special glyph" to just "glyph"
Admittedly, some of our glyphs _are_ special, e.g. "O=" for SPECIAL_GLYPH_TOUCH ;)
But we don't need this in the name. The very long names make some invocations
very wordy, e.g. special_glyph(SPECIAL_GLYPH_SLIGHTLY_UNHAPPY_SMILEY).
Also, I want to add GLYPH_SPACE, which is not special at all.
2025-03-15 14:40:39 +01:00
Lennart Poettering
e707d0459c analyze: don't connect to bus from analyze test run (#36719)
This thing should not be "live", hence don't try to connect to the bus,
or bind the private bus socket.

Fixes: #36540
2025-03-13 17:51:45 +01:00
Lennart Poettering
96a0cfbf47 emergency-action: sleep 5s before rebooting in various cases
This adds a new EMERGENCY_ACTION_SLEEP_5S flag, which when set will
delay the emergency action for 5s. This is supposed to be used together
with EMERGENCY_ACTION_WARN so that users can actually read the message
we output.

We enable this with all emergency action requests that already set
EMERGENCY_ACTION_WARN, except for the 7x ctrl-alt-del burst reboot,
where the user knows what they do and there's no real reason to wait,
they don't need to be informed.

This also enables both EMERGENCY_ACTION_WARN + EMERGENCY_ACTION_SLEEP_5S
for FailureAction= processing of regular units, where these were so far
off. (it leaves this off for SuccessAction= however!). This is a good
thing to make things more debuggable: if something fails and we reboot
this really deserves notification of the user.

(For SuccessAction= this logic does not apply, since the shutdown action
induced here is apparently intended part of the codeflow, for example in
systemd-reboot.service or a similar unit, where the shutdown is goal and
not exception and derserves no additional noisy reporting).

Inspired by: https://github.com/systemd/systemd/pull/36705#issuecomment-2717014120
2025-03-13 17:03:42 +01:00
Lennart Poettering
71a737d68d analyze: don't connect to bus from analyze test run
This thing should not be "live", hence don't try to connect to the bus,
or bind the private bus socket.

Fixes: #36540
2025-03-13 14:22:13 +01:00
Lennart Poettering
e75fbee624 manager: explicitly create our private runtime directory
So far /run/systemd/ was created as side-effect of initializing the
D-Bus client/server. But in one of the next commits we'll suppress
connecting to D-Bus in test runs, hence let's move the logic our of the
D-Bus code and into manager_startup().

Then, also drop creating it again and again in PID 1 at various places,
and just rely on it to exist.
2025-03-13 14:22:13 +01:00
Lennart Poettering
19ade24464 notify-recv: add notify_recv() flavour that returns a split up strv instead of he message text as string
This is useful at various places, since we split up the message as first
thing there anyway.
2025-02-28 14:17:52 +01:00
Mike Yuan
5d09689b5c core/manager: port to notify_recv_with_fds() 2025-02-26 13:27:39 +01:00