When READ_FULL_FILE_UNBASE64 (or READ_FULL_FILE_UNHEX) is specified,
setting size argument by caller is difficult, as it is hard to estimate
the encoded length.
This makes when size is specified with decoding option, let's read file
more, and check decoded size later with the specified size.
This new helper can be used after reading process info from procfs, to
verify that the data that was just read actually matches the pidfd, and
does not belong to some new process that just reused the numeric PID of
the process we originally pinned.
Usually we want to embed PidRef in other structures, but sometimes it
makes sense to allocate it on the heap in case it should be used
standalone. Add helpers for that.
Primary usecase: use as key in Hashmap objects, that for example map
process to unit objects in PID 1.
This adds pidref_free()/pidref_freep() for freeing such an allocated
struct, as well as pidref_dup() (for duplicating an existing PidRef
on the heap 1:1), and pidref_new_pid() (for allocating a new PidRef from a
PID).
This helper truns a pid_t into a PidRef. It's different from
pidref_set_pid() in being "passive", i.e. it does not attempt to acquire
a pidfd for the pid.
This is useful when using the PidRef as a lookup key that shall also
work after a process is already dead, and hence no conversion to a pidfd
is possible anymore.
'systemctl status /../dev' now looks for 'dev.mount', not '-..-dev.service',
and 'systemctl status /../foo' looks for 'foo.mount', not '-..-foo.service'. I
think this much more useful. I think the escaping is not very useful, so I plan
to submit a later series which changes that behaviour. But I think this first
step here is already useful on its own.
Note that the patch is smaller than it seems: before, is_device_path() would
return true only for absolute paths, so moving of is_device_path() under the
path_is_absolute() conditional doesn't influence the logic.
So, unfortunately oomd uses "io.system." rather than "io.systemd." as
prefix for its sockets. This is a mistake, and doesn't match the
Varlink interface naming or anything else in oomd.
hence, let's fix that.
Given that this is an internal protocol between PID1 and oomd let's
simply change this without retaining compat.
path_simplify_full()/path_simplify() are changed to allow a NULL path, for
which a NULL is returned. Generally, callers have already asserted before that
the argument is nonnull. This way path_simplify_full()/path_simplify() and
path_simplify_alloc() behave consistently.
In sd-device.c, logging in device_set_syspath() is intentionally dropped: other
branches don't log.
In mount-tool.c, logging in parse_argv() is changed to log the user-specified
value, not the simplified string. In an error message, we should show the
actual argument we got, not some transformed version.
I.e., /.. becomes /, /../foo becomes /foo, /../../bar becomes /bar, etc. We can
do this unconditionally, without access to the file system, because the parent
of the root directory always resolves to. /.. in other places is handled as
before, because resolving it properly would require access to the file system
which we don't want to do in path_simplify().
nanosleep() is kinda broken since it sleeps in the CLOCK_REALTIME clock,
i.e. is subject to time changes.
Let's use clock_nanosleep() instead with CLOCK_MONOTONIC, which is
really the only thing that makes sense.
This compares two PidRef structures via the pid_t field. Ideally we'd do
a stricter comparison here, that is safe towards PID reuse, but so far
the pidfd API lacks suitable mechanisms for that, hence do the best we
can do.
"/dev" or "/dev/" is the mount point, not a device path. In particular,
'systemctl status /dev' clearly does not refer to a device, so let's tweak
the code a bit to say that those are not device paths.
(Treating "/../dev" same as "/dev" would be also be reasonable, but that
requires chase(), which requires disk access, which we don't want to do from
this lightweight function.)
Let's start with the conversion of PID 1 to pidfds. Let's add a simple
structure with just two fields that can be used to maintain a reference
to arbitrary processes via both pid_t and pidfd.
This is an embeddable struct, to keep it in line with where we
previously used a pid_t directly to track a process.
Of course, since this might contain an fd on systems where we have pidfd
this structure has a proper lifecycle.
(Note that this is quite different from sd_event_add_child() event
source objects as that one is only for child processes and collects
process results, while this infra is much simpler and more generic and
can be used to reference any process, anywhere in the tree.)
We basically parsed the RFC3339 format already, except with a space:
NOTE: ISO 8601 defines date and time separated by "T".
Applications using this syntax may choose, for the sake of
readability, to specify a full-date and full-time separated by
(say) a space character.
so now we handle both
2012-11-23 11:12:13.456
2012-11-23T11:12:13.456
as equivalent.
Parse directly-suffixed Z and +05:30 timezones as well:
2012-11-23T11:12:13.456Z
2012-11-23T11:12:13.456+02:00
as they're both defined by RFC3339.
We do /not/ allow z or t; the RFC says
NOTE: Per [ABNF] and ISO8601, the "T" and "Z" characters in this
syntax may alternatively be lower case "t" or "z" respectively.
This date/time format may be used in some environments or contexts
that distinguish between the upper- and lower-case letters 'A'-'Z'
and 'a'-'z' (e.g. XML). Specifications that use this format in
such environments MAY further limit the date/time syntax so that
the letters 'T' and 'Z' used in the date/time syntax must always
be upper case. Applications that generate this format SHOULD use
upper case letters.
We /are/ in a case-sensitive environment, neither are in wide-spread
use, and "z" poses an issue of whether "todayz" should be the same
as "todayZ" ("today UTC") or an error (it should be an error).
Fractional seconds are limited to six digits (they're nominally
time-secfrac = "." 1*DIGIT
), since we only support 1µs-resolution timestamps, and limit to six
digits in our other sub-second formats.
Parsing
2012-11-23T11:12
is an extension two ways (no seconds, no timezone),
mirroring our "canonical" format.
Fixes#5194
The enum definition, the two string tables and the test all were using
different orders (and in case of the test even missed entries).
Let's unify this, and make sure we always use the same order. This
settles the confusion, and makes the order used for the unicode string
table the canonical one, adjusting the other lists to match it. And adds
the missing entries to the tets.
New directive `NFTSet=` provides a method for integrating network configuration
into firewall rules with NFT sets. The benefit of using this setting is that
static network configuration or dynamically obtained network addresses can be
used in firewall rules with the indirection of NFT set types. For example,
access could be granted for hosts in the local subnetwork only. Firewall rules
using IP address of an interface are also instantly updated when the network
configuration changes, for example via DHCP.
This option expects a whitespace separated list of NFT set definitions. Each
definition consists of a colon-separated tuple of source type (one of
"address", "prefix", or "ifindex"), NFT address family (one of "arp", "bridge",
"inet", "ip", "ip6", or "netdev"), table name and set name. The names of tables
and sets must conform to lexical restrictions of NFT table names. The type of
the element used in the NFT filter must match the type implied by the
directive ("address", "prefix" or "ifindex") and address type (IPv4 or IPv6)
as shown type implied by the directive ("address", "prefix" or "ifindex") and
address type (IPv4 or IPv6) must also match the set definition.
When an interface is configured with IP addresses, the addresses, subnetwork
masks or interface index will be appended to the NFT sets. The information will
be removed when the interface is deconfigured. systemd-networkd only inserts
elements to (or removes from) the sets, so the related NFT rules, tables and
sets must be prepared elsewhere in advance. Failures to manage the sets will be
ignored.
/etc/systemd/network/eth.network
```
[DHCPv4]
...
NFTSet=prefix:netdev:filter:eth_ipv4_prefix
```
Example NFT rules:
```
table netdev filter {
set eth_ipv4_prefix {
type ipv4_addr
flags interval
}
chain eth_ingress {
type filter hook ingress device "eth0" priority filter; policy drop;
ip saddr != @eth_ipv4_prefix drop
accept
}
}
```
```
$ sudo nft list set netdev filter eth_ipv4_prefix
table netdev filter {
set eth_ipv4_prefix {
type ipv4_addr
flags interval
elements = { 10.0.0.0/24 }
}
}
```
We might inherit a max rlim value that's larger than the kernel's
maximum (nr_open). This will cause setrlimit() to fail as the given
maximum is larger than the kernel's maximum. To get around this,
let's limit the max rlim we pass to rlimit() to the value of nr_open.
Should fix#28965
Let's make utf8_to_utf16() and utf16_to_utf8() a bit nicer to use by
adding shortcuts for common cases.
This is particularly relevant for utf16_to_utf8() since the
multiplication with 2 is easy to forget.