I just tried downgrading from 1.4.5 to 1.4.4, using the 1.4.4-1~bpo70+1 Debian package from http://bird.network.cz/?download&tdir=debian/ The result is the same, bird6 also freezes periodically with version 1.4.4. By the way, I think I ruled out the possibility that a particular BGP peer is sending garbage: the issue still arises when leaving only one BGP session active, whichever it is. Is there anything else I can do to help troubleshoot the root cause of this issue? On Thu, Jan 29, 2015 at 08:03:07PM +0100, Baptiste Jonglez wrote:
Hi,
We are experiencing regular "freezes" of bird6 on a BGP router. When this happens, bird6 maxes out a CPU for several minutes. If a command is run in birdc6 during such a freeze, the command hangs, and the result is only returned when birdc6 has stopped using the CPU. Note that this also applies to "cheap" commands like "show protocols", which usually complete instantly (both with bird, and with bird6 in non-freeze conditions).
Sometimes (but not always), the non-responsiveness of bird6 causes all BGP sessions to drop, which is really annoying on a full-view BGP router.
The freezes happen at random, but seem to happen more frequently when the router is under load (typically, at peak time, each CPU spends ~20% forwarding packets, on a 4-core box).
The BGP setup is made of multiple transit and peerings, on multiple VLANs (some BGP neighbours share the same VLAN). The setup is pretty similar on bird and bird6, but only bird6 exhibits these freezes, bird works just fine.
The box is running Debian wheezy on amd64, with bird from backports: 1.4.5-1~bpo70+1
Attached is the configuration, and two extracts of the logs when all BGP sessions dropped (with debug { states, interfaces, events }). All files are anonymised, but should be consistent.
What do you think? It looks like bird6 gets stuck on some very expensive operation, which prevents it from doing anything else (include maintaining BGP sessions alive).
Thanks, Baptiste