Commit Graph

82854 Commits

Author SHA1 Message Date
Luca Boccassi
375d80b04a ci: re-enable uefi secure boot
Kernel 6.11.0-1018-azure is now in use, which has a workaround
for the HyperV bug, so this should work again in GHA
2025-07-12 21:07:58 +09:00
Yu Watanabe
b1eb6cc28b pidref: propagate critical errors in pidref_acquire_pidfd_id()
Follow-up for 571867ffa7.

Fixes CID#1612242.
2025-07-12 19:51:01 +09:00
DaanDeMeyer
b98d6bff23 core: Fix scope SIGTERM logging
KILL_TERMINATE_AND_LOG doesn't do anything at the moment, let's fix
that.
2025-07-12 19:50:47 +09:00
Lennart Poettering
d38dd7d17a core/scope: drop effectively unused unit_watch_pidref() calls (#38186) 2025-07-12 07:27:56 +02:00
Lennart Poettering
6a26f25b74 update TODO 2025-07-12 07:22:56 +02:00
DaanDeMeyer
bd427d005c journal: Fix socket max level initialization
Follow up for df5b3426f6
2025-07-12 07:18:45 +02:00
Yu Watanabe
7c208a64ba units: check if kmod command exists
We already check existences of quotaon in quotaon@.service and
quotacheck in systemd-quotacheck@.service.
Let's also check if kmod command exists.

Closes #38179.
2025-07-12 07:18:17 +02:00
Mike Yuan
66867d5308 core/scope: serialize_item() is NOP on NULL 2025-07-11 22:20:25 +02:00
Mike Yuan
4db4641a65 core/scope: drop effectively unused unit_watch_pidref() calls
Follow-up for 495e75ed5c

The mentioned commit switched scope unit's "pids" deserialization
to call unit_watch_pid() already, meaning all later invocations
in scope_coldplug() are no-op. Remove the cruft altogether.
2025-07-11 21:58:51 +02:00
Mike Yuan
f22187bd7e units/machines.target: fix typo
Follow-up for 48cb009afc
2025-07-11 21:38:58 +02:00
Lennart Poettering
b2f23bd2b1 Support global sysext/confext in systemd-stub/systemd-sysext (#38113)
Systemd-stub supports loading addons, credentials, system and
configuration
extensions from ESP and while addons and credentials can be both global
and
per-UKI, sysext/confext are only per-UKI. 

Add support for global sysext/confext to systemd-stub/systemd-sysext.

Fixes #37993
2025-07-11 21:10:51 +02:00
Lennart Poettering
aac7e892e4 machined: make registration of unpriv user's VMs/containers work (#37855)
This adds missing glue to reasonably allow unpriv users VMs/containers
to register with the system machined.

This primarily adds two things:

1. machined can now properly track VMs/containers residing in subcgroups
of units, because that's effectively what happens for per-user
VMs/containers: they are placed below the system unit `user@….service`
in some user unit.

2. machines registered with machined now have an owning UID: users can
operate on their own machines withour re-authentication, but not on
others.

Note that this is only a first step regarding machined's hookup of
nspawn/vmspawn in the long run for unpriv operation.

I think eventually we should make it so that there's both a per-user and
a per-system machined instance (so far, and even with this PR there's
still one per-system instance), and per-user containers/VMs would
registering with *both*. Having two instances makes sense I think,
because it would mean we can make machined reasonably manage the
per-user image discovery, and also do the per-system network/hostname
handling.
2025-07-11 21:10:08 +02:00
Eisuke Kawashima
4571a1d77a shell-completion: update systemd-run 2025-07-11 19:04:44 +02:00
Lennart Poettering
6d44b761ea update TODO 2025-07-11 18:17:04 +02:00
Lennart Poettering
bfd356da63 test: add testcase for unpriv machined nspawns reg + killing
Let's add a superficial test for the code we just added: spawn a
container unpriv, make sure registration fully worked, then kill it via
machinectl, to ensure it all works properly.

Not too thorough but a good start.
2025-07-11 18:17:04 +02:00
Lennart Poettering
3405b84d8c units: systems might take a while to boot
vmspawn systems might take quite a while to boot in particular if they
go through uefi and wait for a network lease. Hence let's increase the
start timeout to 2min (from 45s). We'll do that for both nspawn and
vmspawn, even though the UEFI thing certainly doesn't apply there (but
the DHCP thing still does).
2025-07-11 18:17:04 +02:00
Lennart Poettering
48cb009afc units: add units for vmspawn/nspawn in --user mode too 2025-07-11 18:17:04 +02:00
Lennart Poettering
12d1f44681 vmspawn: do not set vt220
We do not let qemu do terminal stuff, hence no point in setting any
TERM.
2025-07-11 18:17:04 +02:00
Lennart Poettering
f820b27565 vmspawn: introduce --notify-ready= switch
This mimics the switch of the same name from nspawn: it controls whether
we expect a READY=1 message from the payload or not. Previously we'd
always expect that. This makes it configurable, just like it is in
nspawn.

There's one fundamental difference in behaviour though: in nspawn it
defaults to off, in vmspawn it defaults to on. (for historical reasons,
ideally we'd default to on in both cases, but changing is quite a compat
break both directly and indirectly: since timeouts might get triggered).
2025-07-11 18:17:04 +02:00
Lennart Poettering
0fc45c8d20 vmspawn: substantially beef up cgroup logic, to match more closely what nspawn does
This beefs up the cgroup logic, adding --slice=, --property= to vmspawn
the same way it already exists in nspawn.

There are a bunch of differences though: we don't delegate the cgroup
access in the allocated unit (since qemu wouldn't need that), and we do
registration via varlink not dbus. Hence, while this follows a similar
logic now, it differs in a lot of details.

This makes in particular one change: when invoked on the command line
we'll only add the qemu instance to the allocated scope, not the vmspawn
process itself (this follows more closely how nspawn does this where
only the container payload has its scope, not nspawn itself). This is
quite tricky to implement: unlike in nspawn we have auxiliary services
to start, with depencies to the scope. This means we need to start the
scope early, so that we know the scope's name. But the command line to
invoke is only assembled from the data we learn about the auxiliary
services, hence much later. To addres we'll now fork off the child that
eventually will become early, then move it to a scope, prepare the
cmdline and then very late send the cmdline (and the fds we want to
pass) to the prepared child, which then execs it.
2025-07-11 18:17:04 +02:00
Lennart Poettering
3cfa7826d2 vmspawn: spawn polkit during registration phase
Just like in nspawn, there's a chance we need to PK authenticate the
registration, hence let's spawn off the agent for that during that
phase, and terminate it once we don't need it anymore.
2025-07-11 18:17:04 +02:00
Lennart Poettering
6dc6e6459b vmspawn: use VM leader PID not vmspawn PID to register machine
Let's make vmspawn machine registration more like nspawn machine
registration, and register the payload, not vmspawn/nspawn itself.
2025-07-11 18:17:04 +02:00
Lennart Poettering
6ef1fc6d02 nspawn: properly order include of constants.h 2025-07-11 18:16:48 +02:00
Lennart Poettering
0c250b3919 nspawn: tweak logging/notifications when processing exit requests 2025-07-11 18:15:12 +02:00
Lennart Poettering
f63ca4fc14 nspawn: slightly beef up READY= logic in nspawn
Let's also send out a STATUS= message when we get READY=1 if it didn't
come with a STATUS= message itself.

Also, let's initially say the container is "started", and only once the
READY=1 is seen claim it was "running".
2025-07-11 18:15:12 +02:00
Lennart Poettering
f2f26f1527 nspawn: reorganize scope allocation/registration logic
This cleans up allocation of a scope unit for the container: when
invoked in user context we'll now allocate a scope through the per-user
service manager instead of the per-system manager. This makes a ton more
sense, since it's the user that invokes things after all. And given that
machined now can register containers in the user manager there's nothing
stopping us to clean this up.

Note that this means we'll connect to two busses if run unpriv: once to
the per-user bus to allocate the scope unit, and once to the per-system
bus to register it with machined.
2025-07-11 18:15:12 +02:00
Lennart Poettering
ca1daebdd6 machinectl: output supervisor info in status output 2025-07-11 18:15:12 +02:00
Lennart Poettering
596c596d09 machined: add a bit more debug logging 2025-07-11 18:15:12 +02:00
Lennart Poettering
74546a7e29 machined: explicitly watch machine cgroup for getting empty 2025-07-11 18:15:12 +02:00
Lennart Poettering
97754cd14d machined: also track 'supervisor' process of a machine
So far, machined strictly tracked the "leader" process of a machine,
i.e. the topmost process that is actually the payload of the machine.
Its runtime also defines the runtime of the machine, and we can directly
interact with it if we need to, for example for containers to join the
namespaces, or kill it.

Let's optionally also track the "supervisor" process of a machine, i.e.
the host process that manages the payload if there is one. This is
generally useful info, but in particular is useful because we might need
to communicate with it to shutdown a machine without cooperation of the
payload. Traditionally we did this by simply stopping the unit of the
machine, but this is not doable now that the host machined can be used
to track per-user machines.

In the long run we probably want a more bespoke protocol between
machined and supervisors (so that we can execute other commands too,
such as request cooperative reboots/shutdowns), but that's for later.

Some environments call the concept "monitor" rather than "supervisor" or
use some other term. I stuck to "supervisor" because nspawn uses this,
and ultimately one name is as good as another.

And of course, in other implementations of VM managers of containers
there might not be a single process tracking each VM/container. Because
of this, the concept of a supervisor is optional.
2025-07-11 18:15:12 +02:00
Lennart Poettering
adaff8eb35 machined: use different polkit actions for registering and creating a machine
The difference between these two operations are large: one is relatively
superficial: for "registration" all resources remain associated with the
invoking user, only the cgroup is reported to machined which then keeps
track of the machine, too. OTOH "creation" a scope is allocated in
system context, hence the invoked code will be owned by the system, and
its resource usage charged against the system.

Hence, use two distinct polkit actions for this, so that we can relax
access to registration, but keep access to creation tough.
2025-07-11 18:15:12 +02:00
Lennart Poettering
276d200186 machined: track UID owner of machines
Now that unpriv clients can register machines, let's register their UID
too. This allows us to do two things:

1. make sure the scope delegation is assigned to the right UID (so that
   the unpriv user can actually create cgroups below the delegated
   scope)

2. permit certain types of access (i.e. killing, or pty access) to the
   client without auth if it owns the machine.
2025-07-11 18:15:12 +02:00
Lennart Poettering
d5feeb373c machined: optionally track machines in cgroup subgroups 2025-07-11 18:15:12 +02:00
Lennart Poettering
7bb1147b00 cgroup-util: add cg_path_get_unit_full() helper and related calls
This helper returns not only the unit a cgroup belongs to, but also the
cgroup sub-path beyond it.
2025-07-11 18:15:08 +02:00
vlefebvre
96ba43388f uki.conf is used by the ukify tool to create an Unified Kernel Image. It
would make sense to install it only if ukify is wanted.
2025-07-12 00:40:08 +09:00
DaanDeMeyer
42c288dfd8 test: Fix --capability=CAP_BPF condition
We also run in a VM if we're not running as root, yet we weren't
checking this when deciding whether to pass --capability=CAP_BPF or
not. Let's fix that.

Follow up for 9554ac3052
2025-07-11 16:08:00 +02:00
Yu Watanabe
3e9128fcb5 network: clean up link_may_have_ipv6ll() and allow to run RADV on Tun interface (#38175)
Closes #38170.
2025-07-11 23:04:18 +09:00
Yu Watanabe
f2e9193fcf test: drop unnecessary line continuation 2025-07-11 22:24:25 +09:00
Yu Watanabe
4a58d8ed51 udevadm: fix memleak
Fixes a bug in a4a6e21673.

Fixes the following memleak:
```
$ sudo valgrind --leak-check=full build/udevadm cat /usr/lib/udev/rules.d
==3975939==
==3975939== HEAP SUMMARY:
==3975939==     in use at exit: 640 bytes in 1 blocks
==3975939==   total heap usage: 7,657 allocs, 7,656 frees, 964,328 bytes allocated
==3975939==
==3975939== 640 bytes in 1 blocks are definitely lost in loss record 1 of 1
==3975939==    at 0x4841866: malloc (vg_replace_malloc.c:446)
==3975939==    by 0x4ACA71F: malloc_multiply (alloc-util.h:92)
==3975939==    by 0x4ACF988: _hashmap_dump_entries_sorted (hashmap.c:2167)
==3975939==    by 0x4ACFC76: _hashmap_dump_sorted (hashmap.c:2209)
==3975939==    by 0x4AA60A4: hashmap_dump_sorted (hashmap.h:311)
==3975939==    by 0x4AA9077: dump_files (conf-files.c:397)
==3975939==    by 0x4AAA14E: conf_files_list_strv_full (conf-files.c:596)
==3975939==    by 0x42426A: search_rules_file (udevadm-util.c:301)
==3975939==    by 0x424768: search_rules_files (udevadm-util.c:334)
==3975939==    by 0x41287D: cat_main (udevadm-cat.c:110)
==3975939==    by 0x4A7B911: dispatch_verb (verbs.c:139)
==3975939==    by 0x427272: udevadm_main (udevadm.c:121)
==3975939==
==3975939== LEAK SUMMARY:
==3975939==    definitely lost: 640 bytes in 1 blocks
==3975939==    indirectly lost: 0 bytes in 0 blocks
==3975939==      possibly lost: 0 bytes in 0 blocks
==3975939==    still reachable: 0 bytes in 0 blocks
==3975939==         suppressed: 0 bytes in 0 blocks
==3975939==
==3975939== For lists of detected and suppressed errors, rerun with: -s
==3975939== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```
2025-07-11 22:07:41 +09:00
Yu Watanabe
fabcb1eb06 man: fix version info tag
Follow-up for 63770fa1d3.
2025-07-11 14:33:25 +02:00
Yu Watanabe
52d6032b4a network/radv: allow to send Router Advertisement from e.g. Tun interface
Sending router advertisement requires an IPv6LL address and
IFF_MULTICAST flag. The length of the hardware address is irrelevant.

Closes #38170.
2025-07-11 20:53:04 +09:00
Yu Watanabe
291b6feedd network: split link_may_have_ipv6ll() into two
This renames and splits link_may_have_ipv6ll() into
link_ipv6ll_enabled_harder() and link_multicast_enabled(),
as they are completely irrelevant to each other.

Also, this makes link_ipv6ll_enabled_harder() work non-Wireguard
interfaces.
2025-07-11 20:53:04 +09:00
Yu Watanabe
2b69797b6d Include more headers explicitly (#38169)
Similar to the recent change like
4f18ff2e29.
2025-07-11 20:21:33 +09:00
Vitaly Kuznetsov
8d07a8d6b1 sysext: Support global sysext/confext
Load global sysext/confext from /.extra/global_{sysext,confext} which
systemd-stub puts there from ESP/loader/credentials/*.{sysext,confext}.raw.
Global extensions are handled the exact same way as per-UKI ones.
2025-07-11 13:08:26 +02:00
Vitaly Kuznetsov
9f7e3820e9 stub: Support global sysext/confext
Systemd-stub support loading addons, credentials, system and configuration
extensions from ESP and while addons and credentials can be both global and
per-UKI, sysext/confext are only per-UKI.

Add support for loading ESP/loader/credentials/*.{sysext,confext}.raw to
systemd-stub.

Note: for backwards compatibility reasons, per-UKI sysexts can also be
*.raw (not only *.sysext.raw) but as global extensions are new, there's
no need to bring this legacy there.
2025-07-11 13:08:15 +02:00
vlefebvre
fb71571d3a detect-virt: add bare-metal support for GCE
Google Compute Engine are not only virtual but can be also physical
machines. Therefore checking only the dmi is not enough to detect if it
is a virtual machine. Therefore systemd-detect-virt return "google"
instead of "none" in c3-highcpu-metal machine.
SMBIOS will not help us to make the difference as for EC2 machines.
However, GCE use KVM hypervisor for these VM, we can use this
information to detect virtualization. [0]

Issue and changes has been tested on SUSE SLE-15-SP7 images with
systemd-254 for both GCE, bare-metal and VM.

[0] -
https://cloud.google.com/blog/products/gcp/7-ways-we-harden-our-kvm-hypervisor-at-google-cloud-security-in-plaintext
2025-07-11 20:07:40 +09:00
Yu Watanabe
cc01ee7871 kernel-install: several follow-ups for --entry-type= (#38160)
Follow-ups for b6d4997683 (#37897).
2025-07-11 20:07:19 +09:00
Zbigniew Jędrzejewski-Szmek
63770fa1d3 systemd-run: add --no-pager, use pager for --help 2025-07-11 19:01:42 +09:00
Zbigniew Jędrzejewski-Szmek
d137f280b8 NEWS: clean up uses of backticks
Backticks are good in markdown files, where they signify text to be rendered
with a mono-space font. But our text files doesn't use markdown, and backticks
are just a particularly bad type of quote (ugly, assymetrical, with a special
significance in shell context). Update older NEWS entries to not use them.
2025-07-11 11:56:19 +02:00
Zbigniew Jędrzejewski-Szmek
ce9d701dc4 NEWS: adjust whitespace and texts for v258 2025-07-11 11:56:19 +02:00