Bird stops responding to BFD messages
santiago at crfreenet.org
Wed Nov 2 12:14:29 CET 2016
On Tue, Nov 01, 2016 at 11:03:15PM +0100, Pavlos Parissis wrote:
> We have 1.4.5 running on ~50 CentOS 7 servers and we have observed that Bird
> daemon stops responding on BFD messages which causes the BGP peering to be stopped
> and started again.
> Some details on our setup.
> Servers have 2 interfaces (north and south) and advertise /32 prefixes to the
> north and south for IPs assigned to loopback interface.
> Bird receives 'Received: Other configuration change' message over BGP from both
> peers, which are 2 different arista switches, at the same time. Tracing on the
> switches shows that Bird didn't respond on 3 BFD messages and arista informed Bird
> about it. It is very unlikely that switches or cables are the problem here.
I have no explanation ready, but one thing seems strange to me - there are
these lines in log message:
> Nov 01 16:23:00 bird: bfd1: Bad packet from 22.214.171.124 - unknown session id
Which means that arista switches send BFD packets with (presumably old)
BIRD session ID, although if BFD on arista detected session down, it
should reset the old session ID and should start with zero. I see three
1) The issue was not caused by BFD session going down on Arista
2) Arista did not correctly reset its remote session id state when session went down
3) BFD packets from BIRD to Arista and BGP shutdown from Arista to BIRD
were processed simultaneously, which means that after BFD/BGP session
drop Arista relearned old BIRD session id from a BFD packet that was sent
before BIRD noticed the session went down.
It would be useful to see BFD state change logs from Arista.
Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago at crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
More information about the Bird-users