BFD not recovering after link recovers

0.bgp at elloe.vision 0.bgp at elloe.vision
Thu Jan 28 11:35:46 CET 2021


We added in backtrace()/backtrace_symbols() output from various error
logging sources to understand the call stack.

This shows that the cause of the final error is returned via
bfd_open_tx_sk() from:

sysdep/unix/io.c::sk_open():

    sockaddr_fill(&sa, s->af, bind_addr, s->iface, bind_port);
    if (bind(fd, &sa.sa, SA_LEN(sa)) < 0)
      ERR2("bind");
  }


However, we're not clear on why, when the link comes back up, bird is
unable to return to Up status. In theory once systemd-networkd reassigns
IP addresses to the interface and adds back routes bird should be able
to recover.

We do not understand the code enough at present to determine what should
be happening, or where to look to figure out a solution.

Pointers to where we should be looking and what we should expect are
welcome.


bfd1: Socket error: bind: Cannot assign requested address
Stack trace 13 frames
/usr/sbin/bird(print_trace_io+0x34) [0x564255163b64]
/usr/sbin/bird(bfd_open_tx_sk+0x12c) [0x5642551243ac]
/usr/sbin/bird(+0x5e9c3) [0x5642551219c3]
/usr/sbin/bird(+0x5ec98) [0x564255121c98]
/usr/sbin/bird(bfd_request_session+0x80) [0x564255121ee0]
/usr/sbin/bird(+0x6d77e) [0x56425513077e]
/usr/sbin/bird(+0x71575) [0x564255134575]
/usr/sbin/bird(+0x4e75c) [0x56425511175c]
/usr/sbin/bird(ev_run_list+0xa1) [0x564255102441]
/usr/sbin/bird(io_loop+0x5c) [0x564255166f1c]
/usr/sbin/bird(main+0x786) [0x5642550e08c6]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fb4e2c310b3]
/usr/sbin/bird(_start+0x2e) [0x5642550e0a8e]



> On Wed, Jan 27, 2021 at 03:23:25PM +0000, 0.bgp at elloe.vision wrote:
>> We are setting up a container based proof of concept with IPv6 only
>> using ECMP, ANYCAST, BGP with BFP and Bird2 and I've hit a problem
>> where BFD doesn't recover after a local link goes down and comes back
>> and are seeking some advice as whether this is expected behaviour or
>> a bug.
> Hi
>
> Based on quick evaluation, seems to me that it is a combination of
> race condition in BIRD and systemd-networkd behavior. There is a
> condition in BIRD that when a BFD session is added during the time IP
> address on that link is added/removed, sometimes it fails to add the
> socket.
>
> Seems like this issue is exaggerated by systemd-networkd, as normally
> it happens just during admin-up/down events, but from the logs it
> seems that as a reaction on link-up/down, systemd-networkd
> adds/removes the IP address.



More information about the Bird-users mailing list