Let's synthesize DNS RRs for leases handed out by our DHCP server. This
way local VMs can have resolvable hostnames locally.
This does not implement reverse look ups for now. We can add this
later in similar fashion.
bpf_loop() and bpf_strncmp(), used by sysctl-monitor, were introduced
in libbpf 0.7, so skip the module if using an older version
Follow-up for 6d9ef22acd
With 9ccc369ff3, PersistLeases= is
disabled on the host side virtual interfaces for containers.
However, even it is not necessary to save the leases for containers
on a persistent storage, still we should save them on somewhere.
Otherwise, leases will be lost when networkd on the host is restarted
or the host side interface is reconfigured.
This introduce PersistLeases=runtime to save and load leases on runtime
storage.
Typically, the same client identifier setting is used for all
interfaces. Hence, better to provide the system-wide setting to specify
the client identifier.
Currently, only configuration sources and providers of addresses and
routes are serialized/deserialized.
This should mostly not change behavior, as dynamic (except for DHCPv4)
configurations will be dropped before stopping networkd, and for DHCPv4
protocol, we have already had another logic to handle DHCPv4
configurations.
Preparation for later commits.
This makes networkd process all queued remove requests when a
terminating or restarting signal is received. Otherwise, e.g. DHCPv4
address will not be removed on stop, especially when
KeepConfiguration=no.
Fixes a bug introduced by 85a6f300c1 and
its subsequent commits.
Fixes#34837.
Co-authored-by: Will Fancher <elvishjerricco@gmail.com>
Monitor the sysctl set by networkd for writes, if a sysctl is
overwritten with a different value than the one we set, emit a warning.
Writes are detected with an eBPF program attached as BPF_CGROUP_SYSCTL
which reports the sysctl writes only in net/.
The eBPF program only reports sysctl writes from a different cgroup than networkd.
To do this, it uses the `bpf_current_task_under_cgroup_proto()` helper,
which will be available allowed in BPF_CGROUP_SYSCTL from kernel 6.12[1].
Loading a BPF_CGROUP_SYSCTL program requires the CAP_SYS_ADMIN capability,
so drop it just after the program load, whether it loads successfully or not.
Writes are logged but permitted, in future the functionality can be
extended to also deny writes to managed sysctls.
[1] https://lore.kernel.org/bpf/20240819162805.78235-3-technoboy85@gmail.com/
It's time. sd-json was already done earlier in this cycle, let's now
make sd-varlink public too.
This is mostly just a search/replace job of epical proportions.
I left some functions internal (mostly IDL handling), and I turned some
static inline calls into regular calls.
This also drop the support of /run/systemd/netif/persistent-storage-ready,
as the file is anyway removed when networkd is stopped.
Let's use $SYSTEMD_NETWORK_PERSISTENT_STORAGE_READY=1 instead on testing.
This deprecates IPForward= setting, which unconditionally controled
the global setting, even though it is a setting in .network file.
Instead, this introduces new IPv4Forwarding= and IPv6Forwarding=
settings both in .network and networkd.conf.
If these settings are specified in a .network file, then the
per-interface forwarding setting will be configured.
If specified in networkd.conf, then the global IP forwarding setting will
be configured.
Closes#30648.
Prompted by https://github.com/systemd/systemd/pull/30085#discussion_r1401534107.
Note, like Reconfigure bus method, even reconfiguration for an interface is
triggered by Reload method, the method only wait for the link enters
configuring state (or unmanaged state if no matching .network file exists).
Users still need to invoke systemd-networkd-wait-online if it is
necessary to wait for the interface enters configured state after Reload
medhod.
This is similar to Request, but will be used on removing configuration
(e.g. address, route, and so on).
By using another queue for removing configuration, then we can avoid to
fill the reply callback buffer in sd-netlink by remove message calls.
Follow-up for 4e6a35e2b2.
Let's get networkd onto Varlink. This only adds the most basic of
operations.
I'd love to see networkd do Varlink for all its basic operations so that
networkctl can use that, and work correctly before D-Bus is up. Right
now, many of networkctls calls simply don't work before D-Bus, and I'd
like to see that improved.
The kernel manages nexthops by their IDs. Previously networkd manages
nexthops in three ways:
- by the corresponding link, if a nexthop has ifindex,
- by the manager, if a nexthop does not have ifindex,
- by the manager with their IDs.
This unifies the three managements of nexthops into one, and use the
same way as the kernel uses.
This is the one for nexthop already done by
aa9626ee3b for neighbor.
When a new wireless network interface is created by the kernel, it emits
both RTM_NEWLINK and NL80211_CMD_NEW_INTERFACE. These events can arrive
in either order and networkd must behave correctly in both cases.
The typical case is that RTM_NEWLINK is handled first, in which case
networkd creates a Link object and starts tracking it. When the
NL80211_CMD_NEW_INTERFACE message is handled, networkd then populates
the Link object with relevant wireless properties such as wireless
interface type (managed, AP, etc.).
In the event that the order is reversed however, networkd will fail to
populate these wireless properties because at the time of processing the
nl80211 message, the link is considered unknown. In that case, a debug
message is emitted:
systemd-networkd[467]: nl80211: received new_interface(7) message for link '109' we don't know about, ignoring.
This is problematic because after the subsequent RTM_NEWLINK message,
networkd will have an incomplete view of the link. In particular, if a
.network configuration matches on some of the missing wireless
properties, such as WLANInterfaceType=, then it will never match.
The above race can be reproduced by using the mac80211_hwsim driver.
Suppose that there exists a .network configuration:
[Match]
WLANInterfaceType=ap
...
Now loop the creation/destruction of such an AP interface:
while true
do
iw dev wlan0 interface add uap0 type __ap
iw dev uap0 del
done
The above debug message from networkd will then be observed very
quickly. And in that event, the .network file will fail to match.
To address the above race, have the nl80211 message handler store the
interface index in a set in case a Link object is not found on
NL80211_CMD_NEW_INTERFACE. The handler for RTM_NEWLINK can then query
this set, and explicitly request the wireless properties from nl80211
upon the creation of the Link object.
Not sure when the issue was fixed.
- kernel-3.10 on CentOS 7 has the issue,
- kernel-4.18 on CentOS 8 works fine.
Note, the workaround dropped by the commit is not incomplete:
with an old kernel which has the issue, all non-prefix routes are
configured on the specified route table, but the prefix route is
configured on the main table. That should not work for most cases,
hence, the workaround is mostly meaningless.
When the upstream link gained a lease, then several downstream links may
not appear yet. Previously, the explicitly specified subnet ID for a
downstream link which appears later may be already assigned to an
interface which does not request specific subnet ID.
To avoid such situation, this makes all specified IDs are excluded when
searching free IDs.
As a side effect, we can avoid the second call of
dhcp6_pd_distribute_prefix().
When a system has thousands of downstream interfaces, previously the
total cost of finding free subnet ID was O(n^2), where n is the number
of downstream interfaces.
This makes assigned prefixes are managed by Manager with Hashmap. So,
the cost becomes O(n log n).
This also changes the logic when Priority= is not specified.
Previously, we request without FRA_PRIORITY attribute and kernel picks
the highest unused priority for the rule.
This makes networkd picks the highest unused priority and always request
FRA_PRIORITY attribute.