BFD not recovering after link recovers
We are setting up a container based proof of concept with IPv6 only using ECMP, ANYCAST, BGP with BFP and Bird2 and I've hit a problem where BFD doesn't recover after a local link goes down and comes back and are seeking some advice as whether this is expected behaviour or a bug. We have installed Bird2 into 2 LXC containers both with BFD and BGP enabled talking via IPv6 using ULA. Each container is on a different subnet (fd00::1/112 & fd01::1/112). Each container is connected to a different bridge (lxdbr0 & lxdbr1) and the host has routes to link them together. This works fine! We are now testing what happens when the link is broken in various ways. 1. Deleting a host route - works fine BFP sessions recovers when route is re-added. 2. Link down on the host side veth that is attached to the bridge - BFD fails to recover when link is brought back up. The problem seems to be "bfd1: Socket error: bind: Cannot assign requested address" If on the container we then do "birdc resart bfd1" the link recovers. "CONFIG" log syslog all; define BGP_ID = 10.9.8.7; define ROUTE_IP = fd00::1; define HOST_ASN = 64512; define STATIC_IP = fd00:b00b:1e00::/64; define NEIGH_IP = fd01::1; define NEIGH_ASN = 64513; define NEIGH_PASS = "test"; router id BGP_ID; protocol device { scan time 10; # Set scanning time. Default is disabled. } protocol direct { disabled; # Enabled by Default. } filter noroutes { if net = STATIC_IP then accept; else reject; # Disabled by Default. } filter allroutes { accept; # Disabled by Default. } protocol kernel { ipv6 { # Connect protocol to IPv6 table by channel. import none; # Import to table. Default is import all. export none; # Export to protocol. Default is export none. }; } protocol static { ipv6; route STATIC_IP via ROUTE_IP; } protocol bgp infra_v6 { # Name of the BGP interface. description "BGP uplink 0"; # Description of the BGP interface. local ROUTE_IP as HOST_ASN; neighbor NEIGH_IP as NEIGH_ASN; hold time 600; # Hold time, time in seconds to wait for the keepalive to initiate. password NEIGH_PASS; # Password, type whatever password you wish. multihop; # Multi-hop enabled for more servers in a cluster. bfd on; # Switch to turn on BFD ipv6 { import filter noroutes; # Importing previous IPv6 filters. export filter allroutes; # Exporting all routed filters shown above (Line 30-33). }; } protocol bfd { multihop { interval 2s; }; neighbor NEIGH_IP; } "END OF CONFIG" Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Lost carrier Jan 27 15:01:17 c2 systemd-networkd[7304]: NDISC: Stopping IPv6 Router Solicitation client Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Removing address fd01::1 Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Removing address fd42:1af0:83cf:4510:216:3eff:fe98:2650 Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Removing route: dst: fd00::1/112, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main, proto: static, type: unicast Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Removing route: dst: n/a, src: n/a, gw: fe80::216:3eff:fe41:d39f, prefsrc: n/a, scope: global, table: main, proto: ra, type: unicast Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Removing route: dst: fd42:1af0:83cf:4510::/64, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main, proto: ra, type: unicast Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: State is configured, dropping config Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Removing route: dst: fd00::/112, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main, proto: static, type: unicast Jan 27 15:01:17 c2 systemd-networkd[7304]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=26 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Forgetting address: fd01::1/112 (valid forever) Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Forgetting route: dst: fd01::1/128, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: local, proto: kernel, type: local Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Forgetting route: dst: fd01::/112, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main, proto: kernel, type: unicast Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Forgetting address: fd42:1af0:83cf:4510:216:3eff:fe98:2650/64 (valid for 59min 34s) Jan 27 15:01:17 c2 systemd-networkd[7304]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1/link/_310 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=27 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Jan 27 15:01:17 c2 systemd-networkd[7304]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=28 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Forgetting route: dst: fd42:1af0:83cf:4510:216:3eff:fe98:2650/128, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: local, proto: kernel, type: local Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Forgetting route: dst: fd00::/112, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main, proto: static, type: unicast Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Forgetting route: dst: n/a, src: n/a, gw: fe80::216:3eff:fe41:d39f, prefsrc: n/a, scope: global, table: main, proto: ra, type: unicast Jan 27 15:01:17 c2 systemd-networkd[7304]: eth0: Forgetting route: dst: fd42:1af0:83cf:4510::/64, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main, proto: ra, type: unicast Jan 27 15:01:18 c2 bird[6234]: bfd1: Socket error: Network is unreachable Jan 27 15:01:20 c2 bird[6234]: bfd1: Socket error: Network is unreachable Jan 27 15:01:20 c2 systemd-networkd[7304]: rtnl: received non-static neighbor, ignoring. Jan 27 15:01:21 c2 bird[6234]: bfd1: Socket error: Network is unreachable Jan 27 15:01:23 c2 bird[6234]: bfd1: Socket error: Network is unreachable Jan 27 15:01:25 c2 bird[6234]: bfd1: Socket error: Network is unreachable Jan 27 15:01:26 c2 bird[6234]: bfd1: Socket error: Network is unreachable Jan 27 15:01:26 c2 bird[6234]: bfd1: Socket error: Network is unreachable Jan 27 15:01:26 c2 bird[6234]: bfd1: Socket error: bind: Cannot assign requested address Jan 27 15:01:32 c2 systemd-networkd[7304]: eth0: Flags change: +LOWER_UP +RUNNING Jan 27 15:01:32 c2 systemd-networkd[7304]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1/link/_310 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=29 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Jan 27 15:01:32 c2 systemd-networkd[7304]: eth0: Gained carrier Jan 27 15:01:32 c2 systemd-networkd[7304]: eth0: Discovering IPv6 routers Jan 27 15:01:32 c2 systemd-networkd[7304]: NDISC: Started IPv6 Router Solicitation client Jan 27 15:01:32 c2 systemd-networkd[7304]: eth0: State changed: configured -> configuring Jan 27 15:01:32 c2 systemd-networkd[7304]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1/link/_310 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=30 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Jan 27 15:01:32 c2 systemd-networkd[7304]: eth0: Setting addresses Jan 27 15:01:32 c2 systemd-networkd[7304]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=31 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Jan 27 15:01:32 c2 systemd-networkd[7304]: eth0: Remembering route: dst: fd01::/112, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main, proto: kernel, type: unicast Jan 27 15:01:32 c2 systemd-networkd[7304]: eth0: Remembering updated address: fd01::1/112 (valid forever) Jan 27 15:01:32 c2 systemd-networkd[7304]: eth0: Addresses set Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Remembering updated address: fd01::1/112 (valid forever) Jan 27 15:01:33 c2 systemd-networkd[7304]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1/link/_310 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=32 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Configuring route: dst: fd00::1/112, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main, proto: static, type: unicast Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Setting routes Jan 27 15:01:33 c2 systemd-networkd[7304]: NDISC: Sent Router Solicitation, next solicitation in 3s Jan 27 15:01:33 c2 systemd-networkd[7304]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=33 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Remembering route: dst: fd01::1/128, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: local, proto: kernel, type: local Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Remembering route: dst: fd00::/112, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main, proto: static, type: unicast Jan 27 15:01:33 c2 systemd-networkd[7304]: NDISC: Received Router Advertisement: flags OTHER preference medium lifetime 1800 sec Jan 27 15:01:33 c2 systemd-networkd[7304]: NDISC: Invoking callback for 'router' event. Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Acquiring DHCPv6 lease on NDisc request Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Configuring route: dst: n/a, src: n/a, gw: fe80::216:3eff:fe41:d39f, prefsrc: n/a, scope: global, table: main, proto: ra, type: unicast Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Configuring route: dst: fd42:1af0:83cf:4510::/64, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main, proto: ra, type: unicast Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Routes set Jan 27 15:01:33 c2 systemd-networkd[7304]: rtnl: received non-static neighbor, ignoring. Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Received remembered route: dst: n/a, src: n/a, gw: fe80::216:3eff:fe41:d39f, prefsrc: n/a, scope: global, table: main, proto: ra, type: unicast Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Received remembered route: dst: fd42:1af0:83cf:4510::/64, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main, proto: ra, type: unicast Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Remembering updated address: fd42:1af0:83cf:4510:216:3eff:fe98:2650/64 (valid for 1h) Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Configuring route: dst: fd00::1/112, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main, proto: static, type: unicast Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Setting routes Jan 27 15:01:33 c2 systemd-networkd[7304]: eth0: Routes set Jan 27 15:01:33 c2 systemd-networkd[7304]: rtnl: received non-static neighbor, ignoring. Jan 27 15:01:34 c2 systemd-networkd[7304]: eth0: Remembering updated address: fd42:1af0:83cf:4510:216:3eff:fe98:2650/64 (valid for 59min 59s) Jan 27 15:01:34 c2 systemd-networkd[7304]: eth0: State changed: configuring -> configured Jan 27 15:01:34 c2 systemd-networkd[7304]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1/link/_310 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=34 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Jan 27 15:01:34 c2 systemd-networkd[7304]: eth0: Remembering route: dst: fd42:1af0:83cf:4510:216:3eff:fe98:2650/128, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: local, proto: kernel, type: local Jan 27 15:01:39 c2 systemd-networkd[7304]: rtnl: received non-static neighbor, ignoring. Jan 27 15:01:39 c2 systemd-networkd[7304]: rtnl: received non-static neighbor, ignoring. Jan 27 15:01:44 c2 systemd-networkd[7304]: NDISC: No RA received before link confirmation timeout Jan 27 15:01:44 c2 systemd-networkd[7304]: NDISC: Invoking callback for 'timeout' event.
On Wed, Jan 27, 2021 at 03:23:25PM +0000, 0.bgp@elloe.vision wrote:
We are setting up a container based proof of concept with IPv6 only using ECMP, ANYCAST, BGP with BFP and Bird2 and I've hit a problem where BFD doesn't recover after a local link goes down and comes back and are seeking some advice as whether this is expected behaviour or a bug.
Hi Based on quick evaluation, seems to me that it is a combination of race condition in BIRD and systemd-networkd behavior. There is a condition in BIRD that when a BFD session is added during the time IP address on that link is added/removed, sometimes it fails to add the socket. Seems like this issue is exaggerated by systemd-networkd, as normally it happens just during admin-up/down events, but from the logs it seems that as a reaction on link-up/down, systemd-networkd adds/removes the IP address. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Ondrej Zajicek <santiago@crfreenet.org> writes:
On Wed, Jan 27, 2021 at 03:23:25PM +0000, 0.bgp@elloe.vision wrote:
We are setting up a container based proof of concept with IPv6 only using ECMP, ANYCAST, BGP with BFP and Bird2 and I've hit a problem where BFD doesn't recover after a local link goes down and comes back and are seeking some advice as whether this is expected behaviour or a bug.
Based on quick evaluation, seems to me that it is a combination of race condition in BIRD and systemd-networkd behavior. There is a condition in BIRD that when a BFD session is added during the time IP address on that link is added/removed, sometimes it fails to add the socket.
We see this as well. It seems to only affect IPv6. We have the additional complication that the router ends up learning the linknet address via OSPF on another path, which prevents BFD from ever recovering. We worked around it with check scripts which look for linknet addresses that aren't in the routing table and add them, but it's ugly and still causes outages. -- Alasdair Muckart Network Engineer Catalyst IT - Expert Open Source Solutions Mobile: +64 22 638 5141 | DDI: +64 4 897 7794 | www.catalyst.net.nz
participants (3)
-
0.bgp@elloe.vision -
Alasdair Muckart -
Ondrej Zajicek