This reverts PR #23269 and its follow-up commit. Especially,
2299b1cae3 (partially), and
3cf63830ac.
The PR was merged without final approval, and has several issues:
- The NetLabel for static addresses are not assigned, as labels are
stored in the Address objects managed by Network, instead of Link.
- If NetLabel is specified for a static address, then the address
section will be invalid and the address will not be configured,
- It should be implemented with Request object,
- There is no test about the feature.
This reverts PR #22587 and its follow-up commit. More specifically,
2299b1cae3 (partially),
e176f855278d5098d3fecc5aa24ba702147d42e0,
ceb46a31a01b3d3d1d6095d857e29ea214a2776b, and
51bb9076ab8c050bebb64db5035852385accda35.
The PR was merged without final approval, and has several issues:
- OSS fuzz reported issues in the conf parser,
- It calls synchrnous netlink call, it should not be especially in PID1,
- The importance of NFTSet for CGroup and DynamicUser may be
questionable, at least, there was no justification PID1 should support
it.
- For networkd, it should be implemented with Request object,
- There is no test for the feature.
Fixes#23711.
Fixes#23717.
Fixes#23719.
Fixes#23720.
Fixes#23721.
Fixes#23759.
Note, if `n != SIZE_MAX`, we cannot check the existence of the specified
string in the set without duplicating the string. And, set_consume() also
checks the existence of the string. Hence, it is not necessary to call
set_contains() if `n != SIZE_MAX`.
New directives `NFTSet=`, `IPv4NFTSet=` and `IPv6NFTSet=` provide a method for
integrating configuration of dynamic networks into firewall rules with NFT
sets.
/etc/systemd/network/eth.network
```
[DHCPv4]
...
NFTSet=netdev:filter:eth_ipv4_address
```
```
table netdev filter {
set eth_ipv4_address {
type ipv4_addr
flags interval
}
chain eth_ingress {
type filter hook ingress device "eth0" priority filter; policy drop;
ip saddr != @eth_ipv4_address drop
accept
}
}
```
```
sudo nft list set netdev filter eth_ipv4_address
table netdev filter {
set eth_ipv4_address {
type ipv4_addr
flags interval
elements = { 10.0.0.0/24 }
}
}
```
raise() won't propagate the siginfo information of the signal that's
re-raised. rt_sigqueueinfo() allows us to provide the original siginfo
struct which makes sure it is propagated to the next signal handler
(or to the coredump).
New directive `NetLabel=` provides a method for integrating dynamic network
configuration into Linux NetLabel subsystem rules, used by Linux security
modules (LSMs) for network access control. The option expects a whitespace
separated list of NetLabel labels. The labels must conform to lexical
restrictions of LSM labels. When an interface is configured with IP addresses,
the addresses and subnetwork masks will be appended to the NetLabel Fallback
Peer Labeling rules. They will be removed when the interface is
deconfigured. Failures to manage the labels will be ignored.
Example:
```
[DHCP]
NetLabel=system_u:object_r:localnet_peer_t:s0
```
With the above rules for interface `eth0`, when the interface is configured with
an IPv4 address of 10.0.0.0/8, `systemd-networkd` performs the equivalent of
`netlabelctl` operation
```
$ sudo netlabelctl unlbl add interface eth0 address:10.0.0.0/8 label:system_u:object_r:localnet_peer_t:s0
```
Result:
```
$ sudo netlabelctl -p unlbl list
...
interface: eth0
address: 10.0.0.0/8
label: "system_u:object_r:localnet_peer_t:s0"
...
```
The general rule should be to be strict when parsing data, but lenient
when printing it. Or in other words, we should verify data in verification
functions, but not when printing things. It doesn't make sense to refuse
to print a value that we are using internally.
We were tripping ourselves in some of the print functions:
we want to report than an address was configured with too-long prefix, but
the log line would use "n/a" if the prefix was too long. This is not useful.
Most of the time, the removal of the check doesn't make any difference,
because we verified the prefix length on input.
Since we don't need the error value, and the buffer is allocated with a fixed
size, the whole logic provided by in_addr_to_string() becomes unnecessary, so
it's enough to wrap inet_ntop() directly.
inet_ntop() can only fail with ENOSPC. But we specify a buffer that is supposed
to be large enough, so this should never fail. A bunch of tests of this are added.
This allows all the wrappers like strna(), strnull(), strempty() to be dropped.
The guard of 'if (DEBUG_LOGGING)' can be dropped from around log_debug(),
because log_debug() implements the check outside of the function call. But
log_link_debug() does not, so it we need it to avoid unnecessary evaluation of
the formatting.
This reverts commit 0bd292567a.
It isn't guaranteed anywhere that __builtin_dynamic_object_size can
always deduce the size of every object passed to it so systemd
can end up using either malloc_usable_size or
__builtin_dynamic_object_size when pointers are passed around,
which in turn can lead to actual segfaults like the one mentioned in
https://github.com/systemd/systemd/issues/23619.
Apparently __builtin_object_size can return different results for
pointers referring to the same memory as well but somehow it hasn't
caused any issues yet. Looks like this whole
malloc_usable_size/FORTIFY_SOURCE stuff should be revisited.
Closes https://github.com/systemd/systemd/issues/23619 and
https://github.com/systemd/systemd/issues/23150.
Reopens https://github.com/systemd/systemd/issues/22801
We currently have a convoluted and complex selection of which random
numbers to use. We can simplify this down to two functions that cover
all of our use cases:
1) Randomness for crypto: this one needs to wait until the RNG is
initialized. So it uses getrandom(0). If that's not available, it
polls on /dev/random, and then reads from /dev/urandom. This function
returns whether or not it was successful, as before.
2) Randomness for other things: this one uses getrandom(GRND_INSECURE).
If it's not available it uses getrandom(GRND_NONBLOCK). And if that
would block, then it falls back to /dev/urandom. And if /dev/urandom
isn't available, it uses the fallback code. It never fails and
doesn't return a value.
These two cases match all the uses of randomness inside of systemd.
I would prefer to make both of these return void, and get rid of the
fallback code, and simply assert in the incredibly unlikely case that
/dev/urandom doesn't exist. But Luca disagrees, so this commit attempts
to instead keep case (1) returning a return value, which all the callers
already check, and fix the fallback code in (2) to be less bad than
before.
For the less bad fallback code for (2), we now use auxval and some
timestamps, together with various counters representing the invocation,
hash it all together and provide the output. Provided that AT_RANDOM is
secure, this construction is probably okay too, though notably it
doesn't have any forward secrecy. Fortunately, it's only used by
random_bytes() and not by crypto_random_bytes().
After sending a SIGKILL to a process, the process might disappear from
`cgroup.threads` but still show up in `cgroup.procs` and still remains in the
cgroup and cause migrating new processes to `Delegate=yes` cgroups to fail with
`-EBUSY`. This is especially likely for heavyweight processes that consume more
kernel CPU time to clean up.
Fix this by only returning 0 when both `cgroup.threads` and
`cgroup.procs` are empty.
This will be helpful in cases where we are repeatedly adding entries
to a long strv and want to skip the iteration over old entries leading
to quadratic behaviour.
Note that we don't want to calculate the length if not necessary, so
the calculation is delayed until after we've checked that value is not
NULL.
This way we can connect correctly to any AF_UNIX socket in the file
system, and even save some code. Yay!
This also adds some test code for this, that ensures read_file_full()
works correctly for AF_UNIX sockets that violate the 108 char limit.
Supporting sockets like this kinda matters I think, for the simple
reason that apps want to build socket paths via XDG_RUNTIME_DIR and
suchlike, and we should be able to connect to them, even via
non-normalized paths.
This is a short helper for connecting to AF_UNIX sockets in the file
system. It works around the 108ch limit of sockaddr_un, and supports
"at" style fds.
This doesn't come with a test of its own, but the next patch will add
that.
let's not make up new errors in these checks that validate if connect()
work at all. After all, we don't really know if the ENXIO we saw earlier
actually is really caused by the inode being an AF_UNIX socket, we just
have the suspicion...
This way we can implement nice fallbacks later on.
While we are at it, provide a test for this (one that is a bit over the
top, but then again, we can never have enough tests).
With an intentional mistake:
../src/login/logind-dbus.c: In function ‘bus_manager_log_shutdown’:
../src/login/logind-dbus.c:1542:39: error: format ‘%s’ expects a matching ‘char *’ argument [-Werror=format=]
1542 | LOG_MESSAGE("%s %s", message),
| ^~~~~~~
Also break some long lines for more uniform formatting. No functional change.
I went over all log_struct, log_struct_errno, log_unit_struct,
log_unit_struct_errno calls, and they seem fine.