Commit Graph

927 Commits

Author SHA1 Message Date
Yu Watanabe
b917743d50 nspawn: update help message for user namespacing
Follow-up for 33eac552ab.
2022-06-27 10:31:41 +09:00
Yu Watanabe
05ab439a62 nspawn: fix UID map string
We send/recv the set of payload uid, host uid, payload gid, host gid.
Hence, the index must be incremented with 4, instead of 2.

Fixes #23664.
2022-06-16 11:52:59 +09:00
Daan De Meyer
a22f518676 meson: Add nspawn-locale meson option
https://github.com/systemd/systemd/pull/23192 caused breakage in
Arch Linux's build tooling. Let's give users an opt-out aside from
reverting the patch. It's hardly any maintenance work on our side
and gives users an easy way to revert the locale change if needed.

Of course, by default we still pick C.UTF-8 if the option is not
specified.
2022-06-09 13:08:27 +09:00
Lennart Poettering
1861986a3b tree-wide: port various users over to connect_unix_path()
Let's make use of our new helper, and thus allow longer paths.
2022-05-14 05:01:38 +09:00
Lennart Poettering
db55bbf29b stat-util: fix dir_is_empty() with hidden/backup files
This is a follow-up for f470cb6d13 which in
turn is a follow-up for a068aceafb.

The latter started to honour hidden files when deciding whether a
directory is empty. The former reverted to the old behaviour to fix
issue #23220.

It introduced a bug though: when a directory contains a larger number of
hidden entries the getdents64() buffer will not suffice to read them,
since we just allocate three entries for it (which is definitely enough
if we just ignore the . + .. entries, but not ig we ignore more).

I think it's a bit confusing that dir_is_empty() can return true even if
rmdir() on the dir would return ENOTEMPTY. Hence, let's rework the
function to make it optional whether hidden files are ignored or not.
After all, I looking at the users of this function I am pretty sure in
more cases we want to honour hidden files.
2022-05-04 13:29:14 +02:00
Daan De Meyer
b626f6959b nspawn: Set LANG to C.UTF-8
Let's default to a UTF-8 locale when running commands using nspawn.
2022-05-02 22:55:32 +01:00
Luca Boccassi
3603f15171 nspawn: fix --ephemeral with --machine
Follow-up for 2362fdde1b

When --machine is specified with --ephemeral, no random suffix is added, so
the recently added assert would fail.

Add a top-level variable with the expected file name for nspawn files, and
compute it when the rest of the names are computed.
2022-04-20 02:33:01 +09:00
Luca Boccassi
2362fdde1b nspawn: fix locating config files with --ephemeral
When --ephemeral is used, a random 16 characters suffix is added to the image
name, so matching on .nspawn files based on the image name no longer works.

Fixes https://github.com/systemd/systemd/issues/13297
2022-04-19 06:17:16 +09:00
Lennart Poettering
41bc484906 tree-wide: take BSD lock on loopback devices we dissect/mount/operate on
So here's something we should always keep in mind:

systemd-udevd actually does *two* things with BSD file locks on block
devices:

1. While it probes a device it takes a LOCK_SH lock. Thus everyone else
   taking a LOCK_EX lock will temporarily block udev from probing
   devices, which is good when making changes to it.

2. Whenever a device is closed after write (detected via inotify), udevd
   will issue BLKRRPART (requesting the kernel to reread the partition
   table). It does this while holding a LOCK_EX lock on the block
   device. Thus anyone else taking LOCK_SH or LOCK_EX will temporarily
   block udevd from issuing that ioctl. And that's quite relevant, since
   the kernel will temporarily flush out all partitions while re-reading
   the partition table and then create them anew. Thus it is smart to
   take LOCK_SH when dissecting a block device to ensure that no
   BLKRRPART is issued in the background, until we mounted the devices.
2022-04-10 22:52:29 +09:00
Zbigniew Jędrzejewski-Szmek
7e6821ed4e nspawn: fix comparisons of versions with non-numerical suffixes
See a2b0cd3f5a. When -Dshared-lib-tag is used,
libsystemd-shared.so and libsystemd-core.so get a suffix which breaks the
parsing done by systemd_installation_has_version(). We can assume that the
tag will be something like "251-rc1-1.fc37" that is currently used in Fedora.
(Anything that does *not* start with the version would be completely crazy.)
By switching to strverscmp_improved() we simplify the code and fix comparisons
with such versions.

$ build/test-nspawn-util /var/lib/machines/rawhide
...
Found libsystemd shared at "/var/lib/machines/rawhide/lib/systemd/libsystemd-shared-251-rc1-1.fc37.so.so", version 251-rc1-1.fc37 (OK).
/var/lib/machines/rawhide has systemd >= 251: yes
...

I noticed this when I started a systemd-nspawn container with Redora rawhide
and got the message "Not running with unified cgroup hierarchy, LSM BPF is not
supported". I thought the message is in error, but it was actually correct:
nspawn was misdetecting that the container does not sport new-enough systemd
to support cgroups-v2.
2022-04-07 18:19:03 +02:00
Zbigniew Jędrzejewski-Szmek
c9394f4f93 Move systemd_installation_has_version() to src/nspawn/
This function implements a heuristic that is only used by nspawn. It doesn't
belong in basic. I opted for a new file "nspawn-utils.c", because it seems
likely that we'll need some other new utilities like that in the future.

No functional change.
2022-04-07 18:17:20 +02:00
наб
53350c7bba Use new default-user-shell option instead of hard-coding bash in nspawn and user-record
Defaults to /bin/bash, no changes in the default configuration

The fallback shell for non-root users is as-specified,
and the interactive shell for nspawn sessions is started as
  exec(default-user-shell, "-" + basename(default-user-shell), ...)
before falling through to bash and sh
2022-03-28 14:24:46 +02:00
Zbigniew Jędrzejewski-Szmek
5980d46304 strv: declare iterator of FOREACH_STRING() in the loop
Same idea as 03677889f0.

No functional change intended. The type of the iterator is generally changed to
be 'const char*' instead of 'char*'. Despite the type commonly used, modifying
the string was not allowed.

I adjusted the naming of some short variables for clarity and reduced the scope
of some variable declarations in code that was being touched anyway.
2022-03-23 11:50:18 +01:00
Yu Watanabe
de010b0b2e strv: make iterator in STRV_FOREACH() declaread in the loop
This also avoids multiple evaluations in STRV_FOREACH_BACKWARDS()
2022-03-19 08:33:33 +09:00
Zbigniew Jędrzejewski-Szmek
d29cc4d6e1 tree-wide: use strv_contains() in more places 2022-03-18 10:22:20 +01:00
Lennart Poettering
50ae2966d2 nspawn: make sure host root can write to the uidmapped mounts we prepare for the container payload
When using user namespaces in conjunction with uidmapped mounts, nspawn
so far set up two uidmappings:

1. One that is used for the uidmapped mount and that maps the UID range
   0…65535 on the backing fs to some high UID range X…X+65535 on the
   uidmapped fs. (Let's call this mapping the "mount mapping")

2. One that is used for the userns namespace the container payload
   processes run in, that maps X…X+65535 back to 0…65535. (Let's call
   this one the "process mapping").

These mappings hence are pretty much identical, one just moves things up
and one back down. (Reminder: we do all this so that the processes can
run under high UIDs while running off file systems that require no
recursive chown()ing, i.e. we want processes with high UID range but
files with low UID range.)

This creates one problem, i.e. issue #20989: if nspawn (which runs as
host root, i.e. host UID 0) wants to add inodes to the uidmapped mount
it can't do that, since host UID 0 is not defined in the mount mapping
(only the X…X+65536 range is, after all, and X > 0), and processes whose
UID is not mapped in a uidmapped fs cannot create inodes in it since
those would be owned by an unmapped UID, which then triggers
the famous EOVERFLOW error.

Let's fix this, by explicitly including an entry for the host UID 0 in
the mount mapping. Specifically, we'll extend the mount mapping to map
UID 2147483646 (which is INT32_MAX-1, see code for an explanation why I
picked this one) of the backing fs to UID 0 on the uidmapped fs. This
way nspawn can creates inode on the uidmapped as it likes (which will
then actually be owned by UID 2147483646 on the backing fs), and as it
always did. Note that we do *not* create a similar entry in the process
mapping. Thus any files created by nspawn that way (and not chown()ed to
something better) will appear as unmapped (i.e. as overflowuid/"nobody")
in the container payload. And that's good. Of course, the latter is
mostly theoretic, as nspawn should generally chown() the inodes it
creates to UID ranges that actually make sense for the container (and we
generally already do this correctly), but it#s good to know that we are
safe here, given we might accidentally forget to chown() some inodes we
create.

Net effect: the two mappings will not be identical anymore. The mount
mapping has one entry more, and the only reason it exists is so that
nspawn can access the uidmapped fs reasonably independently from any
process mapping.

Fixes: #20989
2022-03-17 19:08:12 +01:00
Lennart Poettering
aff7ae0d67 nspawn: if we refuse to operate on some directory, explain why
(Also, some refactoring to use safer path_join())
2022-03-17 19:08:12 +01:00
Lennart Poettering
1eb874b978 nspawn: make more stuff const
And if we make it const, we can also make it static.
2022-03-17 19:07:48 +01:00
Lennart Poettering
d1d0b895dc nspawn: rebreak all comments in outer_child() 2022-03-17 19:03:58 +01:00
Lennart Poettering
852b62507b pid1,nspawn: raise default RLIMIT_MEMLOCK to 8M
This mirrors a similar check in Linux kernel 5.16
(9dcc38e2813e0cd3b195940c98b181ce6ede8f20) that raised the
RLIMIT_MEMLOCK to 8M.

This change does two things: raise the default limit for nspawn
containers (where we try to mimic closely what the kernel does), and
bump it when running on old kernels which still have the lower setting.

Fixes: #16300
See: https://lwn.net/Articles/876288/
2022-03-10 18:30:24 +01:00
Lennart Poettering
b74163607b sd128: export sd_id128_to_uuid_string()
We expose various other forms of UUID helpers already, i.e.
SD_ID128_UUID_FORMAT_STR and SD_ID128_MAKE_UUID_STR(), and we parse
UUIDs, hence add a high-level helper for formatting UUIDs too.

This doesn't add any new code, it just moves some helpers
id128-util.[ch] → sd-id128.[ch], to make them public.
2022-02-14 15:13:23 +01:00
Yu Watanabe
8add30a03c tree-wide: use ERRNO_IS_TRANSIENT() 2021-11-30 23:06:43 +09:00
Lennart Poettering
fb9044cb6b nspawn: voidify expose_port_execute() calls 2021-11-22 22:33:40 +01:00
Lennart Poettering
a50966416e nspawn: use FOREACH_STRING() more 2021-11-20 17:54:53 +00:00
Lennart Poettering
3f692e2ece tree-wide: don't use mkdir_errno_wrapper() without reason
Simple mkdir() is fine, too, no need to use the wrapper
2021-11-16 17:02:58 +01:00
Lennart Poettering
7c248223eb tree-wide: use new RET_NERRNO() helper at various places 2021-11-16 08:04:09 +01:00
Lennart Poettering
52f05ef21d umask-util: add helper that resets umask until end of current code block 2021-11-12 16:01:40 +01:00
Lennart Poettering
9baa294c12 nspawn: don't muck with caps if no network setting is used in settings file
Our goal here (as in the previous commits) is to ensure that a settings
file loaded in --settings=override mode is truly a NOP. Previously this
was not the case as we'd drop CAP_NET_ADMIN from the caps if the
settings file didn't enable networking.

With this change we'll drop it only if explicitly turned off in the
settings file, and otherwise let the built-in defaults and cmdline
params reign supreme as documented.

Fixes: #20055
2021-11-09 18:32:30 +01:00
Lennart Poettering
2d09ea44fc nspawn: only copy syscall filters from settings if actually configured
As in the previous commit, let's not copy settings that aren#t
configured, so that --settings=override with an empty .nspawn file is
truly a NOP.
2021-11-09 18:32:25 +01:00
Lennart Poettering
0cc3c9f997 nspawn: copy BindUser= setting from settings only if set
Let's only pick this up from the settings if actually set.

As in the previous commit this makes sure that an empty settings file in
--settings=override mode is really a NOP.
2021-11-09 18:32:20 +01:00
Lennart Poettering
d3689b9435 nspawn: use three boolean fields from settings file when actually set
Let's turn these three fields into tristates, so that we can distinguish
whether they are not configured at all from explicitly turned off.

Let#s then use this to ensure that we only copy the settings fields into
our execution environment if they are actually configured.

We already do this for some of the boolean settings, this adds it for
the missing ones.

The goal here is to ensure that an empty settings file used in
--settings=override mode (i.e. the default mode used in the
systemd-nspawn@.service unit) is truly a NOP.
2021-11-09 18:32:15 +01:00
Lennart Poettering
a1dfd585c4 nspawn: add helper settings_network_configured()
The new helper returns whether the settings file had *any* networking
setting configured at all. We already have a similar helper
settings_private_network() which returns a similar result. The
difference is that the new helper will return true when the private
network was explicitly turned off, while the old one will only return
true if configured and enabled.

We'll reuse the helper a 2nd time later on, but even without it it makes
things a bit more readable.
2021-11-09 18:32:10 +01:00
Luca Boccassi
8389fd19d2 Merge pull request #20138 from keszybz/coding-style-variable-decls
A coding style tweak and checking of sd_notify() calls and voidification of pager_open()
2021-11-05 13:57:30 +00:00
Lennart Poettering
8f03de5323 tree-wide: port various places to use TAKE_PID() 2021-11-03 16:36:09 +01:00
Zbigniew Jędrzejewski-Szmek
384c2c3239 Make pager_open() return void 2021-11-03 15:24:56 +01:00
Zbigniew Jędrzejewski-Szmek
d4341b76d0 tree-wide: drop "f" from sd_notify() calls with a static string
If we don't need to do any formatting, let's optimize things a bit.
2021-11-03 11:29:49 +01:00
Zbigniew Jędrzejewski-Szmek
4bf4f50faa tree-wide: warn when sd_notify fails with READY=1 or FDSTOREREMOVE=1
Most sd_notify() calls are like log_info() — the result is only informative
and if they fail, it's best ignore this. But if a call with READY=1 fails,
the unit may enter a failed state, so we should warn about this. Similarly
for FSTOREREMOVE=1: the manager may be left with a stale fd, at least wasting
resources.
2021-11-03 11:29:49 +01:00
Andreas Valder
c0c8f71800 nspawn: add filesystem id mapping support to --bind and --bind-ro 2021-10-28 19:19:22 +02:00
Yu Watanabe
2db32618fe nspawn: fix build when SECCOMP is disabled
Follow-up for 20e458ae3c.
2021-10-25 19:23:55 +09:00
Yu Watanabe
20e458ae3c nspawn: ignore --suppress-sync=yes when seccomp is disabled
Follow-up for 4a4654e024.

Fixes #21090.
2021-10-22 23:43:20 +02:00
Lennart Poettering
dbf1aca619 nspawn: bump RLIMIT_NOFILE for nspawn payload similar to how host PID 1 does it for its payload
We try to pass containers roughly the same rlimits as the host gets from
the kernel. However, this means we'd set the RLIMIT_NOFILE to 4K. Which
is quite limiting though, and is something we actually departed from in
PID1: since 52d6207578 we raise the limit
substantially for all userspace.

Given that nspawn is quite often invoked without proper PID1, let's raise the
limits for container payloads the same way as we do from the real PID1
to its service payloads.
2021-10-22 23:42:55 +02:00
Lennart Poettering
4a4654e024 nspawn: add --suppress-sync=yes mode for turning sync() and friends into NOPs via seccomp
This is supposed to be used by package/image builders such as mkosi to
speed up building, since it allows us to suppress sync() inside a
container.

This does what Debian's eatmydata tool does, but for a container, and
via seccomp (instead of LD_PRELOAD).
2021-10-20 11:35:15 +02:00
Lennart Poettering
f435195925 basic: spit out chase_symlinks() from fs-util.[ch] → chase-symlinks.[ch] 2021-10-05 16:14:37 +02:00
Lennart Poettering
88b3300fdc dissect-image: load embedded verity signature info from image
This adds support for actually using embedded signature data from
partitions.
2021-09-28 17:02:54 +02:00
Lennart Poettering
8ee9615e10 dissect-image: discover verity signature partitions
This doesn't make use of the discovered partitions yet, but it finds
them at least.
2021-09-28 17:02:27 +02:00
Frantisek Sumsal
d7ac09520b tree-wide: mark set-but-not-used variables as unused to make LLVM happy
LLVM 13 introduced `-Wunused-but-set-variable` diagnostic flag, which
trips over some intentionally set-but-not-used variables or variables
attached to cleanup handlers with side effects (`_cleanup_umask_`,
`_cleanup_(notify_on_cleanup)`, `_cleanup_(restore_sigsetp)`, etc.):

```
../src/basic/process-util.c:1257:46: error: variable 'saved_ssp' set but not used [-Werror,-Wunused-but-set-variable]
        _cleanup_(restore_sigsetp) sigset_t *saved_ssp = NULL;
                                                     ^
                                                     1 error generated.
```
2021-09-15 13:09:45 +02:00
Lennart Poettering
c3c88d67c0 dissect-image: rename verity flag booleans
Let's make the booleans indicating verity state a bit more descriptive.

Let's rename:

    can_verity → has_verity: because that's really what this about
    whether verity data is included in the image. Whether we actually
    can use it is a different story.

    verity → verity_ready: this one should tell us if we have everything
    need to actually set it up, hence explicitly say "ready to use" in
    the name.

No change in behaviour. Just a bit of renaming.
2021-09-10 14:14:53 +02:00
Lennart Poettering
32b9736a23 nspawn: fix type to pass to connect()
It expects a generic "struct sockaddr", not a "struct sockaddr_un".
Pass the right member of the union.

Not sure why gcc/llvm never complained about this...
2021-09-02 08:27:46 +09:00
Luca Boccassi
1f08acf406 Merge pull request #20257 from bluca/seqno
Use new diskseq block device property
2021-08-31 09:06:33 +01:00
Lennart Poettering
85b55869bc tree-wide: port everything over to new sd-id128 compund literal bliss 2021-08-20 11:09:48 +02:00