The kernel has had filesystem independent reflink ioctls for a
while now, let's try to use them and fall back to the btrfs specific
ones if they're not supported.
Let's clean up validation/escaping of cgroup names. i.e. split out code
that tests if name needs escaping. Return proper error codes, and extend
test a bit.
We translate 'all' to UNIT64_MAX, which has a lot more 'f's. Use the
helper macro, since a decimal uint64_t will always be >> than a hex
representation.
root@image:~# systemd-run -t --property CoredumpFilter=all ls /tmp
Running as unit: run-u13.service
Press ^] three times within 1s to disconnect TTY.
*** stack smashing detected ***: terminated
[137256.320511] systemd[1]: run-u13.service: Main process exited, code=dumped, status=6/ABRT
[137256.320850] systemd[1]: run-u13.service: Failed with result 'core-dump'.
__builtin_popcount() is a bit of a mouthful, so let's provide a helper.
Using _Generic has the advantage that if a type other then the ones on
the list is given, compilation will fail. This is nice, because if by any
change we pass a wider type, it is rejected immediately instead of being
truncated.
log.h is also needed. It is included transitively, but let's include it
directly.
macro.h is *not* needed.
Normal users do not have permissions to access /proc/1/root, so
'systemd-detect-virt -r' fails, but the output, even at debug level
is cryptic:
$ SYSTEMD_LOG_LEVEL=debug build/systemd-detect-virt -r
Failed to check for chroot() environment: Permission denied
Let's make this a bit easier to figure out:
$ SYSTEMD_LOG_LEVEL=debug build/systemd-detect-virt -r
Cannot stat /proc/1/root: Permission denied
Failed to check for chroot() environment: Permission denied
I looked over other users of files_same(), and I think in general the message
at debug level is OK for them too.
This is the function version of STARTSWITH_SET(). We also move
STARTSWITH_SET() to string-util.h as it fits more there than in
strv.h and reimplement it using startswith_strv().
Let's avoid confusing developers and users when log messages suddenly
stop getting logged to kmsg because of ratelimiting by logging an
additional message if we start ratelimiting log messages to kmsg.
An overflow here (i.e. the counter reaching 2^32 within a ratelimit time
window) is not so unlikely. Let's handle this somewhat sanely
and simply stop counting, while remaining in the "limit is hit" state until
the time window has passed.
The function path_prefix_root_cwd() was introduced for prefixing the
result from chaseat() with root, but
- it is named slightly generic,
- the logic is different from what chase() does.
This makes the name more explanative and specific for the result of the
chaseat(), and make the logic consistent with chase().
Fixes https://github.com/systemd/systemd/pull/27199#issuecomment-1511387731.
Follow-up for #27199.
Now, dir_fd_is_root() is heavily used in chaseat(), which is used at
various places. If the kernel is too old and /proc is not mounted, then
there is no way to get the mount ID of a directory. In that case, let's
silently skip the mount ID check.
Fixes https://github.com/systemd/systemd/pull/27299#issuecomment-1511403680.
Usually, we pass the file descriptor of the root directory to chaseat()
when `--root=` is not specified. Previously, even in such case, the
result was relative, and we need to prefix the path with "/" when we
want to pass the path to other functions that do not support dir_fd, or
log or show the path. That's inconvenient.
Let's be more careful with generating error codes for (expected) error
causes.
This does not introduce new error conditions, it just changes what we
return under specific cases, to make things nicely recognizable in each
case. Most importantly this detects if fdinfo reports a pid of "-1" for
pidfds with processes that are already reaped (and thus have no PID
anymore)
None of our current users care about these error codes, but let's get
this right for the future.
Commit f90eea7d18
virt: Improve detection of EC2 metal instances
Added support for detecting EC2 metal instances via the product
name in DMI by testing for the ".metal" suffix.
Unfortunately this doesn't cover all cases, as there are going to be
instance types where ".metal" is not a suffix (ie, .metal-16xl,
.metal-32xl, ...)
This modifies the logic to also allow those new forms.
Signed-off-by: Benjamin Herrenschmidt <benh@amazon.com>
The cmd(3) man page says about CMSG_DATA():
> The pointer returned cannot be assumed to be suitably aligned for
> accessing arbitrary payload data types. Applications should not cast
> it to a pointer type matching the payload, but should instead use
> memcpy(3) to copy data to or from a suitably declared object.
Hence, if we want to use unaligned data in cmsg, we need to copy it
before use. That's typically important for reading timestamps in
RISCV32, as the time_t is 64bit and size_t is 32bit on the system.
strstrafter() is like strstr() but returns a pointer to the first
character *after* the found substring, not on the substring itself.
Quite often this is what we actually want.
Inspired by #27267 I think it makes sense to add a helper for this,
to avoid the potentially fragile manual pointer increment afterwards.
The overflow check was hosed in two ways: overflows in C are undefined,
hence gcc was free to just optimize the whole thing away. We need to
catch overflows before we run into them, not after.
It checked for an overflow against size_t, but the field we need to
write this in is unsigned. i.e. typically 32bit rather than 64bit. Hence
check for the right maximum.
(The whole check is paranoia anyway, the kernel really shouldn't return
values that would induce an overflow, but you never know, the syscall
turned out to be problematic in so many other ways, hence let's stick to
this.)
The concept of a "mount" is a local one, hence there's no point in going
to the network to retrieve mnt_id or STATX_ATTR_MOUNT_ROOT. Hence set
AT_STATX_DONT_SYNC so that the call will not go to the network ever, and
risk deadlocking on that.
Just some extra safety.