This commit introduces new D-Bus API, StartAuxiliaryScope(). It may be
used by services as part of the restart procedure. Service sends an
array of PID file descriptors corresponding to processes that are part
of the service and must continue running also after service restarts,
i.e. they haven't finished the job why they were spawned in the first
place (e.g. long running video transcoding job). Systemd creates new
scope unit for these processes and migrates them into it. Cgroup
properties of scope are copied from the service so it retains same
cgroup settings and limits as service had.
Distributions apparently only compile a subset of TPM2 drivers into the
kernel. For those not compiled it but provided as kmod we need a
synchronization point: we must wait before the first TPM2 interaction
until the driver is available and accessible.
This adds a tpm2.target unit as such a synchronization point. It's
ordered after /dev/tpmrm0, and is pulled in by a generator whenever we
detect that the kernel reported a TPM2 to exist but we have no device
for it yet.
This should solve the issue, but might create problems: if there are TPM
devices supported by firmware that we don't have Linux drivers for we'll
hang for a bit. Hence let's add a kernel cmdline switch to disable (or
alternatively force) this logic.
Fixes: #30164
Global resource (whole system or root cg's (e.g. in a container)) is
also a well-defined limit for memory and tasks, take it into account
when calculating effective limits.
Users become perplexed when they run their workload in a unit with no
explicit limits configured (moreover, listing the limit property would
even show it's infinity) but they experience unexpected resource
limitation.
The memory and pid limits come as the most visible, therefore add new
unit read-only properties:
- EffectiveMemoryMax=,
- EffectiveMemoryHigh=,
- EffectiveTasksMax=.
These properties represent the most stringent limit systemd is aware of
for the given unit -- and that is typically(*) the effective value.
Implement the properties by simply traversing all parents in the
leaf-slice tree and picking the minimum value. Note that effective
limits are thus defined even for units that don't enable explicit
accounting (because of the hierarchy).
(*) The evasive case is when systemd runs in a cgroupns and cannot
reason about outer setup. Complete solution would need kernel support.
This does what we do for system extension also for configuration
extension.
This is complicated by the fact that we previously looked for
<uki-binary>.d/*.raw for system extensions. We want to measure sysexts
and confexts to different PCRs (13 vs. 12) hence we must distinguish
them, but *.raw would match both kinds.
This commit solves this via the following mechanism: we'll load confexts
from *.confext.raw and sysexts from *.raw but will then enclude
*.confext.raw from the latter. This preserves compatibility but allows
us to somewhat reasonable distinguish both types of images.
The documentation is updated not going into this detail though, and
instead now claims that sysexts shall be *.sysext.raw and confexts
*.confext.raw even though we actually are more lenient than this. This
is simply to push people towards using the longer, more descriptive
suffixes.
I added an XML comment (<!-- … -->) about this to the docs, so that
whenever somebody notices the difference between code and docs
understands why and leaves it that way.
DocBook document model doesn't allow mixing of <refsection> with the
numbered variants (<refsect1> etc.). Therefore, any document that
included something from standard-conf.xml was invalid. Fortunately, all
the includes are at the 1st level, hence let's just change
standard-conf.xml to use <refsect1> to fix that.