Possibility to treat /32 and /128 non-gateway routes as onlink on BSD?
Hi everyone, In a recent change in the FreeBSD Wireguard kernel module the POINTTOPOINT interface flag is dropped for wg interfaces[1]. This behaviour will probably not change, because Wireguard is not peer-to-peer or a broadcast domain, but peer-to-multipoint. Hence my previous configuration of setting peer addresses on the interface is not working anymore: ifconfig wg0 inet 192.168.0.10/32 192.168.0.4 The above syntax is for POINTTOPOINT interfaces. In this case 192.168.0.10 is the local address of the interface and 192.168.0.4 is the address of the remote peer. So if one does not want to add a dedicated subnet to every tunnel device, one has to add static host routes similar to these ones: ifconfig wg0 192.168.0.10/32 route add 192.168.0.4 -iface wg0 Note that this actually models the real world better, because Wireguard is not point-to-point but point-to-multipoint. So multiple peers can be done in the following way: ifconfig wg0 192.168.0.10/32 route add 192.168.0.4 -iface wg0 route add 192.168.0.8 -iface wg0 However, this does not work with bird 2.0.8: bird will not recognize the p2p peer addresses as gateways and it will log on kernel table rescan: KRT: Received route 192.168.42.0/24 with strange next-hop 192.168.0.4 KRT: Error sending route 192.168.42.0/24 to kernel: File exists I did have a look why this is happening: bird discards the route, because the gateway does not belong to any of its interface's addresses (the peer address is configured as a static host route and not on the interface). At the same time it forgets that it was bird itself who added the route (because the route is discarded on kernel table rescan, it thinks it is absent in the kernel table but present in bird's internal routing table -> "add" logic triggers) and tries to add the route again (instead of replacing/keeping the route). This explains the cascade of error messages above. While the Linux netlink code allows to handle onlink routes, this functionality is absent in the BSD code path. Unfortunately, on BSD there is no dedicated flag to mark routes as being directly connected onlink routes. On Linux one could add an onlink route like so: ip route add 192.168.0.4 via 192.168.0.4 dev wg0 onlink The command looks quite odd, but it is working. The via clause is necessary in conjunction with onlink. Afterwards a `birdc dump neighbors` verifies that the peer is recognized: 000055db793035c0 192.168.0.4 wg0 wg0 kernel1 000000000000000000000000 scope univ ONLINK (Also the "strange next-hop" message will go away.) I created a patch that emulates the behaviour on FreeBSD. In this patch the onlink setting is heuristically determined by inspecting the RTF_GATEWAY flag (which must be absent) and checking for a /32 or /128 prefix length. I do not see how to handle cases where the destination is a subnet. The patch was tested on FreeBSD 13.0-RELEASE and 14.0-CURRENT. During rescan of the kernel table the above route is now not dropped because the gateway is recognized to be onlink. While in the Linux route the gateway is still set, on FreeBSD it is unset (~RTF_GATEWAY). So I actually use the destination address of the route, because that's the address of the peer. While I tested the functionality on my machines and also checked the output of `birdc dump neighbors`, the patch might be faulty. Please be kind with me and treat it more as a proof of concept. As I am not experienced with bird's code base I do not fully understand how the neighbor cache is working in detail, so the suggested patch might have unexpected side effects. I would appreciate feedback. In case you think the whole idea is not sensible, I would also love to hear suggestions how to get the above setup working. Kind regards, Stefan [1]: https://git.zx2c4.com/wireguard-freebsd/commit/?id=8801509656e955c27ebf4b9b3... sysdep/bsd/krt-sock.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/sysdep/bsd/krt-sock.c b/sysdep/bsd/krt-sock.c index ea0cc4d9..4a0f3a7a 100644 --- a/sysdep/bsd/krt-sock.c +++ b/sysdep/bsd/krt-sock.c @@ -547,6 +547,12 @@ krt_read_route(struct ks_msg *msg, struct krt_proto *p, int scan) net->n.addr, a.nh.gw); return; } + } else if ((!ipv6 && pxlen == 32) || (ipv6 && pxlen == 128)) { + /* Treat non gateway /32 and /128 routes as onlink routes and inject this + information into neighbor cache. */ + if (!neigh_find(&p->p, idst, a.nh.iface, NEF_ONLINK)) + log(L_ERR, "KRT: Failed to inject onlink neighbor %I for interface %s", + idst, a.nh.iface->name); } done: -- 2.31.1
On Sun, Apr 18, 2021 at 08:55:12PM +0200, Stefan Haller wrote:
Hi everyone,
In a recent change in the FreeBSD Wireguard kernel module the POINTTOPOINT interface flag is dropped for wg interfaces[1]. This behaviour will probably not change, because Wireguard is not peer-to-peer or a broadcast domain, but peer-to-multipoint. Hence my previous configuration of setting peer addresses on the interface is not working anymore:
ifconfig wg0 inet 192.168.0.10/32 192.168.0.4
Hi Just to be sure, Wireguard is really PtMP (some peers on the iface may not be able to communicate between themselves directly) and not NBMA? In that case this network setup makes sense.
Note that this actually models the real world better, because Wireguard is not point-to-point but point-to-multipoint. So multiple peers can be done in the following way:
ifconfig wg0 192.168.0.10/32 route add 192.168.0.4 -iface wg0 route add 192.168.0.8 -iface wg0
However, this does not work with bird 2.0.8: bird will not recognize the p2p peer addresses as gateways and it will log on kernel table rescan:
Yes, the main issue is that (sans onlink flag) BIRD validates next-hops against interface ranges and not against direct (non-gateway) routes. In most cases it does not matter but with PtMP it would require manually configuring multiple PtP address pairs for an iface.
KRT: Received route 192.168.42.0/24 with strange next-hop 192.168.0.4 KRT: Error sending route 192.168.42.0/24 to kernel: File exists
I did have a look why this is happening: bird discards the route, because the gateway does not belong to any of its interface's addresses (the peer address is configured as a static host route and not on the interface). At the same time it forgets that it was bird itself who added the route (because the route is discarded on kernel table rescan, it thinks it is absent in the kernel table but present in bird's internal routing table -> "add" logic triggers) and tries to add the route again (instead of replacing/keeping the route). This explains the cascade of error messages above.
True
While the Linux netlink code allows to handle onlink routes, this functionality is absent in the BSD code path. Unfortunately, on BSD there is no dedicated flag to mark routes as being directly connected onlink routes. On Linux one could add an onlink route like so:
ip route add 192.168.0.4 via 192.168.0.4 dev wg0 onlink
This does not make sense to me. You can do this just by: ip route add 192.168.0.4 dev wg0 The onlink flag is useful if you have some network behind 192.168.0.4, then you can do: ip route add 192.168.42.0/24 via 192.168.0.4 dev wg0 onlink
The command looks quite odd, but it is working. The via clause is necessary in conjunction with onlink. Afterwards a `birdc dump neighbors` verifies that the peer is recognized:
000055db793035c0 192.168.0.4 wg0 wg0 kernel1 000000000000000000000000 scope univ ONLINK
So the strange ip-route command (192.168.0.4 via 192.168.0.4) is here just to inject such neighbor entry to neighbor cache? That seems more like a bug in neighbor cache (where two requests with different flags influence themselves so they produce different results than when run independently), which unexpectedly helped in your case. The proper solution (on Linux) is that the second route (for 192.168.42.0/24) also has onlink flag, so it does not depend on existence of route for 192.168.0.4/32. Babel in BIRD generates routes with onlink flag.
I created a patch that emulates the behaviour on FreeBSD. In this patch the onlink setting is heuristically determined by inspecting the RTF_GATEWAY flag (which must be absent) and checking for a /32 or /128 prefix length. I do not see how to handle cases where the destination is a subnet. The patch was tested on FreeBSD 13.0-RELEASE and 14.0-CURRENT. During rescan of the kernel table the above route is now not dropped because the gateway is recognized to be onlink.
While in the Linux route the gateway is still set, on FreeBSD it is unset (~RTF_GATEWAY). So I actually use the destination address of the route, because that's the address of the peer.
While I tested the functionality on my machines and also checked the output of `birdc dump neighbors`, the patch might be faulty. Please be kind with me and treat it more as a proof of concept. As I am not experienced with bird's code base I do not fully understand how the neighbor cache is working in detail, so the suggested patch might have unexpected side effects. I would appreciate feedback.
In case you think the whole idea is not sensible, I would also love to hear suggestions how to get the above setup working.
As i wrote above, i think this behavior of the neighbor case is a bug, not feature :-), so we should not depend on it. I see the problem as BIRD internally support onlink flag, but BSD kernel does not support that flag, so onlink routes exported to BSD kernel are not read back properly. Seems to me that there is a simple woraround: When i read a route (from kernel on BSD) that has gateway on iface, which has only /32 or /128 IP address(es), so no proper iface range, then i would assume onlink flag for the route (its nexthops). -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hi, first of all thanks for your reply. It became clear that there were some misunderstandings on my side. On Mon, Apr 19, 2021 at 09:41:33PM +0200, Ondrej Zajicek wrote:
On Sun, Apr 18, 2021 at 08:55:12PM +0200, Stefan Haller wrote:
In a recent change in the FreeBSD Wireguard kernel module the POINTTOPOINT interface flag is dropped for wg interfaces[1]. This behaviour will probably not change, because Wireguard is not peer-to-peer or a broadcast domain, but peer-to-multipoint. Hence my previous configuration of setting peer addresses on the interface is not working anymore:
ifconfig wg0 inet 192.168.0.10/32 192.168.0.4
Hi
Just to be sure, Wireguard is really PtMP (some peers on the iface may not be able to communicate between themselves directly) and not NBMA? In that case this network setup makes sense.
That is true, yes. For example, one could build something like this here: A ----- B | / | | / | | / | C ----- D A and D can not communicate directly. This happens in my case, for example, because A and D sit at different dormitory home connections and can not talk to each other directly (easily).
Note that this actually models the real world better, because Wireguard is not point-to-point but point-to-multipoint. So multiple peers can be done in the following way:
ifconfig wg0 192.168.0.10/32 route add 192.168.0.4 -iface wg0 route add 192.168.0.8 -iface wg0
However, this does not work with bird 2.0.8: bird will not recognize the p2p peer addresses as gateways and it will log on kernel table rescan:
Yes, the main issue is that (sans onlink flag) BIRD validates next-hops against interface ranges and not against direct (non-gateway) routes. In most cases it does not matter but with PtMP it would require manually configuring multiple PtP address pairs for an iface.
I see. I think I cleared up my confusion. So on Linux, if I have wg0 with 192.168.0.10/32 configured and I issue: ip route add 192.168.0.4 dev wg0 The equivalent of calling `neigh_find(..., "192.168.0.4", "wg0", ...)` will _not_ find the neighbor. I misread the netlink code and thought the goal of it was to enable this neighbor to be found. Thanks for your explanation. That's why I wanted to replicate it on BSD.
[...]
The proper solution (on Linux) is that the second route (for 192.168.42.0/24) also has onlink flag, so it does not depend on existence of route for 192.168.0.4/32. Babel in BIRD generates routes with onlink flag.
This configuration is working now with Babel + BIRD 2.0.8 on Linux (I was still on 2.0.7 when testing it on Linux previously). Looks like that it would not work with e.g. OSPF. What I still don't get exactly is the following mismatch: (i) If the route is read from the kernel, BIRD checks if the next-hop is reachable by any interface network (= stricter check than kernel). (ii) However, if BIRD sends the route to the kernel it will not check if the gateway is reachable. If BIRD thinks the gateway is unreachable and the route still gets installed (because it in fact is), BIRD will never be able to correctly read the route back in. Shouldn't there be a check in (ii) too?
[...]
I see the problem as BIRD internally support onlink flag, but BSD kernel does not support that flag, so onlink routes exported to BSD kernel are not read back properly. Seems to me that there is a simple woraround:
When i read a route (from kernel on BSD) that has gateway on iface, which has only /32 or /128 IP address(es), so no proper iface range, then i would assume onlink flag for the route (its nexthops).
I will continue working in this direction. Kind regards, Stefan
On Fri, Apr 23, 2021 at 05:06:11PM +0200, Stefan Haller wrote:
Hi,
first of all thanks for your reply. It became clear that there were some misunderstandings on my side.
Hi Glad to help you.
Yes, the main issue is that (sans onlink flag) BIRD validates next-hops against interface ranges and not against direct (non-gateway) routes. In most cases it does not matter but with PtMP it would require manually configuring multiple PtP address pairs for an iface.
I see. I think I cleared up my confusion. So on Linux, if I have wg0 with 192.168.0.10/32 configured and I issue:
ip route add 192.168.0.4 dev wg0
The equivalent of calling `neigh_find(..., "192.168.0.4", "wg0", ...)` will _not_ find the neighbor.
Yes.
[...]
The proper solution (on Linux) is that the second route (for 192.168.42.0/24) also has onlink flag, so it does not depend on existence of route for 192.168.0.4/32. Babel in BIRD generates routes with onlink flag.
This configuration is working now with Babel + BIRD 2.0.8 on Linux (I was still on 2.0.7 when testing it on Linux previously). Looks like that it would not work with e.g. OSPF.
What I still don't get exactly is the following mismatch:
(i) If the route is read from the kernel, BIRD checks if the next-hop is reachable by any interface network (= stricter check than kernel).
(ii) However, if BIRD sends the route to the kernel it will not check if the gateway is reachable. If BIRD thinks the gateway is unreachable and the route still gets installed (because it in fact is), BIRD will never be able to correctly read the route back in.
Shouldn't there be a check in (ii) too?
Routes (and their gateways) are checked when imported from protocols to (BIRD) tables. Routes in tables are assumed to be valid. BIRD (like Linux kernel) uses optional 'onlink' flag on gateways. This is used recently by Babel. Gateway with this flag is considered valid regardless of network range. When route is sent to kernel on Linux, this flag is translated to Linux 'onlink' flag, so it is ok when it is read back. But on BSD such kernel flag does not exist, so it is just forgotten when sent to kernel, and check failed when the route is read back. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hi, I am replying to this older email thread, because the context might be appreciated. Hope that this is acceptable. Thread in archive: <https://bird.network.cz/pipermail/bird-users/2021-April/015419.html> Short summary: On FreeBSD bird will export onlink routes, but the kernel does not support an onlink flag. So when bird is reading back the routes from the kernel table there is some confusion, the route is dismissed and bird tries to re-export the route again and again. In my case I have a Wireguard interface with a local /32 address and two peers: ifconfig wg0 192.168.0.10/32 route add 192.168.0.4 -iface wg0 route add 192.168.0.8 -iface wg0 If bird receives a route via Babel that looks for example like this: route 192.168.42.0/24 via 192.168.0.4%wg0 onlink Then bird's log will be full of the following lines:
KRT: Received route 192.168.42.0/24 with strange next-hop 192.168.0.4 KRT: Error sending route 192.168.42.0/24 to kernel: File exists
bird installs the route, but when reading back the route from the kernel, it will be dismissed, as the next-hop 192.168.0.4 is not part of any interface subnet. Now bird wants to add the missing route. This fails, because the route is in fact already present. For some time now I am using a local patch (on top of bird 2.0.8) that essentially implements the suggestion that was emerging from the discussion: On Mon, Apr 19, 2021 at 09:41:33PM +0200, Ondrej Zajicek wrote:
I see the problem as BIRD internally support onlink flag, but BSD kernel does not support that flag, so onlink routes exported to BSD kernel are not read back properly. Seems to me that there is a simple woraround:
When i read a route (from kernel on BSD) that has gateway on iface, which has only /32 or /128 IP address(es), so no proper iface range, then i would assume onlink flag for the route (its nexthops).
My proposed patch is attached to end of the mail. What it does is that for any route received by the kernel it checks if the iface only has /32 or /128 addresses configured. If this is the case, the RNF_ONLINK flag will be set for this route. If any other address is found, the behaviour is not changed. The neigh_find call was adapted in a way to mimic the call in sysdep/linux/netlink.c. The patch is working fine on my Wireguard/Babel mesh on FreeBSD. I tried to keep the patch as non-intrusive as possible. The disadvantage is that for each route the interface addresses are enumerated. However, for normal use cases it will most likely return after the first address is inspected (either it's not a /32 or /128 address, or there will be only a single /32 or /128 address per interface per interface in normal setups). Another way would be store an 'assume_onlink_routes' flag per interface on interface discovery. Would probably touch more places in the code base. One could also move the iface-addr check after the neigh_find check, so it will only fire in rare corner cases before bailing out. For symmetry I included the same logic for IPv6. Due to availability of scoped link-local addresses I can't think of a real use case though. I am looking forward to feedback or suggestions. Best regards, Stefan Haller diff --git a/sysdep/bsd/krt-sock.c b/sysdep/bsd/krt-sock.c index 5c905bc9..e9f1a82b 100644 --- a/sysdep/bsd/krt-sock.c +++ b/sysdep/bsd/krt-sock.c @@ -366,6 +366,27 @@ krt_replace_rte(struct krt_proto *p, net *n, rte *new, rte *old) } } +/** + * assume_onlink_for_iface - check if routes on interface are considered onlink + * @ipv6: Switch to only consider IPv6 or IPv4 addresses. + * + * The BSD kernel does not support an onlink flag. If the interface only has + * /32 or /128 addresses configured, all routes should be considered as onlink and + * the function returns 1. + */ +static int +assume_onlink_for_iface(struct iface *iface, int ipv6) +{ + struct ifa *ifa; + const u8 type = ipv6 ? NET_IP6 : NET_IP4; + WALK_LIST(ifa, iface->addrs) + { + if (ifa->prefix.type == type && ifa->prefix.pxlen != net_max_prefix_length[type]) + return 0; + } + return 1; +} + #define SKIP(ARG...) do { DBG("KRT: Ignoring route - " ARG); return; } while(0) static void @@ -535,7 +556,17 @@ krt_read_route(struct ks_msg *msg, struct krt_proto *p, int scan) if (ipa_is_link_local(a.nh.gw)) _I0(a.nh.gw) = 0xfe800000; - ng = neigh_find(&p->p, a.nh.gw, a.nh.iface, 0); + /* The BSD kernel does not support an onlink flag. We heuristically + set the onlink flag, if the iface only has /32 or /128 addresses + configured. */ + if (assume_onlink_for_iface(a.nh.iface, ipv6)) + { + a.nh.flags |= RNF_ONLINK; + goto done; + } + + ng = neigh_find(&p->p, a.nh.gw, a.nh.iface, + (a.nh.flags & RNF_ONLINK) ? NEF_ONLINK : 0); if (!ng || (ng->scope == SCOPE_HOST)) { /* Ignore routes with next-hop 127.0.0.1, host routes with such
On Wed, Dec 15, 2021 at 10:01:57PM +0100, Stefan Haller wrote:
Hi,
I am replying to this older email thread, because the context might be appreciated. Hope that this is acceptable.
Thread in archive: <https://bird.network.cz/pipermail/bird-users/2021-April/015419.html>
Short summary: On FreeBSD bird will export onlink routes, but the kernel does not support an onlink flag. So when bird is reading back the routes from the kernel table there is some confusion, the route is dismissed and bird tries to re-export the route again and again. ... For some time now I am using a local patch (on top of bird 2.0.8) that essentially implements the suggestion that was emerging from the discussion:
On Mon, Apr 19, 2021 at 09:41:33PM +0200, Ondrej Zajicek wrote:
I see the problem as BIRD internally support onlink flag, but BSD kernel does not support that flag, so onlink routes exported to BSD kernel are not read back properly. Seems to me that there is a simple woraround:
When i read a route (from kernel on BSD) that has gateway on iface, which has only /32 or /128 IP address(es), so no proper iface range, then i would assume onlink flag for the route (its nexthops).
My proposed patch is attached to end of the mail. What it does is that for any route received by the kernel it checks if the iface only has /32 or /128 addresses configured. If this is the case, the RNF_ONLINK flag will be set for this route. If any other address is found, the behaviour is not changed. The neigh_find call was adapted in a way to mimic the call in sysdep/linux/netlink.c.
Hi Thanks for the patch. I merged it with some modifications [*]. Mainly replacing check for address length with check for host address flag (so it does not apply on ifaces with regular ptp addresses, which are /32 but not host-only due to peer range), also your patch just skipped whole neigh_find() in case of host-only iface instead of just assuming the onlink flag. [*] https://gitlab.nic.cz/labs/bird/-/commit/a39cd2cc0b0c64235457c07e2b618318bbd... It would be great if you could test it for you case, but i tested it on BSD with some simple setups and it seems to work correctly.
Another way would be store an 'assume_onlink_routes' flag per interface on interface discovery. Would probably touch more places in the code base. One could also move the iface-addr check after the neigh_find check, so it will only fire in rare corner cases before bailing out.
Thought about that, but there are pleny of list walking in the krt_read_route() (e.g. in if_find_by_index()), so likey it does not matter. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (2)
-
Ondrej Zajicek -
Stefan Haller