Preparation for later commits, but I think this one makes
a ton of sense on its own. When debug logging is enabled
it's otherwise difficult to dig up the portion of journal
for transaction construction.
Although extremely unlikely, there is a race present in solely checking the
$LISTEN_PID environment variable, due to PID recycling. Fix that by introducing
$LISTEN_PIDFDID, which contains the 64-bit ID of a pidfd for the child process
that is not subject to recycling.
This fixes two things: first of all it ensures we take the override
status output field properly into account, instead of going directly to
the regular one.
Moreover, it ensures that we bypass auto for both notice + emergency,
since both have the same "impact", and, don't limit this for notice
only.
Function execute_directories logged in a way that was meaningless
without additional context:
systemd[1]: No executables found.
In execute_strv this was partially rectified by extracting the directory
name from one of the directories and using this as the identifier. But
the directory name is not always meaningful, and can also be set from
an environment variable. Let's simplify things by providing a fixed name
that can be used consistently in all log messages. In particular this will
make error messages easier to understand if users report just the error
without additional context.
closes#37602
On typical systems, only few services need to create SUID/SGID files.
This often is limited to the user explicitly setting suid/sgid, the
`systemd-tmpfiles*` services, and the package manager. Allowing a default
to globally restrict creation of suid/sgid files makes it easier to apply
this restriction precisely.
Alternative to b50f6dbe57
The commit naively returned early from socket_enter_running(), which however
is quite problematic, as the socket will be woken up over and over again
without doing a thing, until we eventually hit Poll/TriggerLimit*=.
On top of that it requires hacks to hold the start job for initrd-switch-root.service
up. Overall I doubt that is the right approach.
Let's instead hook this into our job engine, and try to activate
the service again when some other units are stopped. If all installed
jobs have been run yet we're still seeing the conflict or the manually
selected timeout is reached, fail the socket as before.
Some configuration files that need updates are directly under in /etc. To
update them atomically, we need write access to /etc. For Ubuntu Core this is
an issue as /etc is not writable. Only a selection of subdirectories can be
writable. The general solution is symlinks or bind mounts to writable places.
But for atomic writes in /etc, that does not work. So Ubuntu has had a patch
for that that did not age well.
Instead we would like to introduce some environment variables for alternate
paths.
* SYSTEMD_ETC_HOSTNAME: /etc/hostname
* SYSTEMD_ETC_MACHINE_INFO: /etc/machine-info
* SYSTEMD_ETC_LOCALTIME: /etc/localtime
* SYSTEMD_ETC_LOCALE_CONF: /etc/locale.conf
* SYSTEMD_ETC_VCONSOLE_CONF: /etc/vconsole.conf
* SYSTEMD_ETC_ADJTIME: /etc/adjtime
While it is for now expected that there is a symlink from the standard, we
still try to read them from that alternate path. This is important for
`/etc/localtime`, which is a symlink, so we cannot have an indirect symlink or
bind mount for it.
Since machine-id is typically written only once and not updated. This commit
does not cover it. An initrd can properly create it and bind mount it.
git grep -l 'Failed to open /'|xargs sed -r -i 's|"Failed to open (/[^ ]+): %m"|"Failed to open %s: %m", "\1"|g'
git grep -l $'Failed to open \'/'|xargs sed -r -i $'s|"Failed to open \'(/[^ ]+)\': %m"|"Failed to open %s: %m", "\\1"|g'
git grep -l "Failed to open /"|xargs sed -r -i $'s|"Failed to open (/[^ ]+), ignoring: %m"|"Failed to open %s, ignoring: %m", "\\1"|g'
+ some manual fixups.
This gives us a little more information about what units were enabled
or disabled on that first boot and will be useful for OS developers
tracking down the source of unit state.
An example with this enabled looks like:
```
NET: Registered PF_VSOCK protocol family
systemd[1]: Applying preset policy.
systemd[1]: Unit /etc/systemd/system/dnsmasq.service is masked, ignoring.
systemd[1]: Unit /etc/systemd/system/systemd-repart.service is masked, ignoring.
systemd[1]: Removed '/etc/systemd/system/sockets.target.wants/systemd-resolved-monitor.socket'.
systemd[1]: Removed '/etc/systemd/system/sockets.target.wants/systemd-resolved-varlink.socket'.
systemd[1]: Created symlink '/etc/systemd/system/multi-user.target.wants/var-mnt-workdir.mount' → '/etc/systemd/system/var-mnt-workdir.mount'.
systemd[1]: Created symlink '/etc/systemd/system/multi-user.target.wants/var-mnt-workdir\x2dtmp.mount' → '/etc/systemd/system/var-mnt-workdir\x2dtmp.mount'.
systemd[1]: Created symlink '/etc/systemd/system/afterburn-sshkeys.target.requires/afterburn-sshkeys@core.service' → '/usr/lib/systemd/system/afterburn-sshkeys@.service'.
systemd[1]: Created symlink '/etc/systemd/system/sockets.target.wants/systemd-resolved-varlink.socket' → '/usr/lib/systemd/system/systemd-resolved-varlink.socket'.
systemd[1]: Created symlink '/etc/systemd/system/sockets.target.wants/systemd-resolved-monitor.socket' → '/usr/lib/systemd/system/systemd-resolved-monitor.socket'.
systemd[1]: Populated /etc with preset unit settings.
```
Considering it only happens on first boot and not on every boot I think
the extra information is worth the extra verbosity in the logs just for
that boot.
`ExtensionImages=` and `ExtensionDirectories=` now let you specify
vpick-named extensions; however, since they just get set up once when
the service is started, you can't see newer versions without restarting
the service entirely. Here, also reload confext extensions when you
reload a service. This allows you to deploy a new version of some
configuration and have it picked up at reload time without interruption
to your workload.
Right now, we would only reload confext extensions and leave the sysext
ones behind, since it didn't seem prudent to swap out what is likely
program code at reload. This is made possible by only going for the
`SYSTEMD_CONFEXT_HIERARCHIES` overlays (which only contains `/etc`).
This PR:
- Adjusts `service.c` to also refresh extensions when needed.
- Adds integration tests to check that a confext reload actually
occurred.
- Adds to the `systemd.exec` man pages to document this behavior.
This is a follow up to #24864 and #31364. Thank you to @bluca and
@goenkam for help in getting this up.
`ExtensionImages=` and `ExtensionDirectories=` now let you specify
vpick-named extensions; however, since they just get set up once when
the service is started, you can't see newer versions without restarting
the service entirely. Here, also reload confext extensions when you
reload a service. This allows you to deploy a new version of some
configuration and have it picked up at reload time without interruption
to your workload.
Right now, we would only reload confext extensions and leave the sysext
ones behind, since it didn't seem prudent to swap out what is likely
program code at reload. This is made possible by only going for the
`SYSTEMD_CONFEXT_HIERARCHIES` overlays (which only contains `/etc`).
Implementation wise, this uses the new kernel API and two collaborating
child processes under the host & child namespaces in order to gather the
right FDs needed:
- (1) In child, set up the extension images and directories in a slave
mountns, and obtain their FDs.
- (2) Fork into a grandchild under target process namespace, and do a
"fake" unmount to obtain the FD of the underlying target folder
say /etc).
- (3) In the child again, set up new overlay under host NS rights.
We do not want to do I/O heavy jobs inline in PID1 blocking the state
machine, so add separate async states to handle this case.
Co-authored-by: Luca Boccassi <luca.boccassi@gmail.com>
Follow-up for 52e3671bf7
unit_gc_sweep() might try to add the unit to gc queue again.
While that becomes no-op as Unit.in_gc_queue is not cleared
yet, it induces minor inconsistency of states.
Our baseline is v5.4 and cgroup v2 is enforced now,
which means CPU accounting is cheap everywhere without
requiring any controller, hence just remove the directive.
Currently there are various circular dependencies between headers
in core/. Let's get rid of these by making judicious use of forward
declarations and moving includes into implementation files instead of
having them in header files.
Getting rid of circular header includes simplifies the code and makes
various clang based tooling such as iwyu work much better on our code.
The most important change is getting rid of the manager.h include in
unit.h which is possible thanks to the previous commits. We also move
the OOMPolicy and StatusType enums to unit.h to remove the need for
other unit headers to include manager.h to get access to these enums.
There's no need for these to be fields inside the manager struct,
let's turn them into functions in unit.h instead, again to allow
forward declaring the Manager struct in a later commit.
As reported in https://github.com/systemd/systemd/issues/35405, if the watchdog
ping failed, we effectively started a busy loop here. The previous commits
should fix this, but in general, the protection here is intended as a safety
net in case the logic is broken somewhere else. We shouldn't exclude the
watchdog stuff from this.
Closes https://github.com/systemd/systemd/issues/35405. Apparently some
watchdog devices can be opened, but then the pings start failing after some
time. Since the timestamp of the last successful ping is not updated, we try to
ping again immediately, causing a busy loop and excessive logging.
After trying a few different approaches to fit this into the existing framework
without changing the logic too much, I settled on an approach with a second
timestamp. In particular, the timestamp of the last successful ping is public,
exposed as WatchdogLastPingTimestamp over dbus. It'd be wrong to redefine this
to mean the last ping *attempt*. So we need a second timestamp in some form.
Also, if we give up on pinging, we probably should attempt to disarm the
watchdog. It's possible that the pinging fails, but the watchdog would still
fire. I don't think we want that, since it seems that our internal loop is
working, it's just the watchdog that is broken.
Structured message with SD_MESSAGE_WATCHDOG_PING_FAILED is logged if we fail
to ping.
I tested this by attaching gdb to pid 1 and calling close(watchdog_fd).
We get a bunch of warning messages and then an attempt to close the watchdog:
Mar 21 15:46:17 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:20 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:23 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:26 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:29 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:32 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:35 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:37 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:40 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:43 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:46 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:49 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:52 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:55 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:58 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0, closing watchdog after 15 attempts: Bad file descriptor
Mar 21 15:46:58 fedora systemd[1]: Failed to disable hardware watchdog, ignoring: Bad file descriptor
Mar 21 15:46:58 fedora systemd[1]: Failed to disarm watchdog timer, ignoring: Bad file descriptor
I think we need to log at some point if the user configured a watchdog device,
but no devices were found. We can't log ENOENT immediately, because the device
may likely appear during boot. So wait until the end of the initial transaction
and log then.
Those functions call watchdog_setup() and watchdog_setup_pretimeout(), which
internally do a similar check against the static variables watchdog_timeout and
watchdog_pretimeout. The second check is not useful.
This introduce LOG_ITEM() macro that checks arbitrary formats in
log_struct().
Then, drop _printf_ attribute from log_struct_internal(), as it does not
help so much, and compiler checked only the first format string.
Hopefully, this silences false-positive warnings by Coverity.
Admittedly, some of our glyphs _are_ special, e.g. "O=" for SPECIAL_GLYPH_TOUCH ;)
But we don't need this in the name. The very long names make some invocations
very wordy, e.g. special_glyph(SPECIAL_GLYPH_SLIGHTLY_UNHAPPY_SMILEY).
Also, I want to add GLYPH_SPACE, which is not special at all.
This adds a new EMERGENCY_ACTION_SLEEP_5S flag, which when set will
delay the emergency action for 5s. This is supposed to be used together
with EMERGENCY_ACTION_WARN so that users can actually read the message
we output.
We enable this with all emergency action requests that already set
EMERGENCY_ACTION_WARN, except for the 7x ctrl-alt-del burst reboot,
where the user knows what they do and there's no real reason to wait,
they don't need to be informed.
This also enables both EMERGENCY_ACTION_WARN + EMERGENCY_ACTION_SLEEP_5S
for FailureAction= processing of regular units, where these were so far
off. (it leaves this off for SuccessAction= however!). This is a good
thing to make things more debuggable: if something fails and we reboot
this really deserves notification of the user.
(For SuccessAction= this logic does not apply, since the shutdown action
induced here is apparently intended part of the codeflow, for example in
systemd-reboot.service or a similar unit, where the shutdown is goal and
not exception and derserves no additional noisy reporting).
Inspired by: https://github.com/systemd/systemd/pull/36705#issuecomment-2717014120
So far /run/systemd/ was created as side-effect of initializing the
D-Bus client/server. But in one of the next commits we'll suppress
connecting to D-Bus in test runs, hence let's move the logic our of the
D-Bus code and into manager_startup().
Then, also drop creating it again and again in PID 1 at various places,
and just rely on it to exist.