We added in backtrace()/backtrace_symbols() output from various error logging sources to understand the call stack. This shows that the cause of the final error is returned via bfd_open_tx_sk() from: sysdep/unix/io.c::sk_open(): sockaddr_fill(&sa, s->af, bind_addr, s->iface, bind_port); if (bind(fd, &sa.sa, SA_LEN(sa)) < 0) ERR2("bind"); } However, we're not clear on why, when the link comes back up, bird is unable to return to Up status. In theory once systemd-networkd reassigns IP addresses to the interface and adds back routes bird should be able to recover. We do not understand the code enough at present to determine what should be happening, or where to look to figure out a solution. Pointers to where we should be looking and what we should expect are welcome. bfd1: Socket error: bind: Cannot assign requested address Stack trace 13 frames /usr/sbin/bird(print_trace_io+0x34) [0x564255163b64] /usr/sbin/bird(bfd_open_tx_sk+0x12c) [0x5642551243ac] /usr/sbin/bird(+0x5e9c3) [0x5642551219c3] /usr/sbin/bird(+0x5ec98) [0x564255121c98] /usr/sbin/bird(bfd_request_session+0x80) [0x564255121ee0] /usr/sbin/bird(+0x6d77e) [0x56425513077e] /usr/sbin/bird(+0x71575) [0x564255134575] /usr/sbin/bird(+0x4e75c) [0x56425511175c] /usr/sbin/bird(ev_run_list+0xa1) [0x564255102441] /usr/sbin/bird(io_loop+0x5c) [0x564255166f1c] /usr/sbin/bird(main+0x786) [0x5642550e08c6] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fb4e2c310b3] /usr/sbin/bird(_start+0x2e) [0x5642550e0a8e]
On Wed, Jan 27, 2021 at 03:23:25PM +0000, 0.bgp@elloe.vision wrote:
We are setting up a container based proof of concept with IPv6 only using ECMP, ANYCAST, BGP with BFP and Bird2 and I've hit a problem where BFD doesn't recover after a local link goes down and comes back and are seeking some advice as whether this is expected behaviour or a bug. Hi
Based on quick evaluation, seems to me that it is a combination of race condition in BIRD and systemd-networkd behavior. There is a condition in BIRD that when a BFD session is added during the time IP address on that link is added/removed, sometimes it fails to add the socket.
Seems like this issue is exaggerated by systemd-networkd, as normally it happens just during admin-up/down events, but from the logs it seems that as a reaction on link-up/down, systemd-networkd adds/removes the IP address.