On Sun, Jul 02, 2017 at 12:27:18PM +0300, Andrew wrote:
Hi all.
Today I saw strange bird hangup, which results in OSPF failure in areas where router is. BIRD works as OSPF + BGP. All returns to work state after bird was killed by SIGKILL and started again.
Here's strange logs records:
Jul 2 11:33:46 gw2 bird: I/O loop cycle took 7016 ms for 35 events Jul 2 11:33:57 gw2 bird: I/O loop cycle took 7208 ms for 58 events Jul 2 11:34:06 gw2 bird: I/O loop cycle took 5829 ms for 31 events
This seems like related to a problem. Do you have this messages in log during normal operation? Esp. since 11:36:03, it locked for 38 s, enough to timeout OSPF neighbors:
Jul 2 11:36:03 gw2 bird: I/O loop cycle took 37685 ms for 22 events
Was BIRD at least partially interactive during hangup? Did it run with 100 % CPU load.
Jul 2 11:41:49 gw2 bird: filters, line 47: AS path expected Jul 2 11:41:49 gw2 bird: filters, line 47: AS path expected Jul 2 11:41:49 gw2 bird: filters, line 47: AS path expected Jul 2 11:41:49 gw2 bird: filters, line 47: AS path expected Jul 2 11:41:49 gw2 bird: filters, line 47: AS path expected Jul 2 11:41:49 gw2 bird: ...
There's filter config where error message happens:
function rt_client(int asn; prefix set nets) { return (net ~ nets && (bgp_path.first = asn || bgp_path ~ [= my_as asn * =]) && bgp_path.last = asn); }
You could avoid that 'AS path expected' error msg by checking whether the attribute is defined: return (net ~ nets && defined(bgp_path) && (bgp_path.first = asn || bgp_path ~ [= my_as asn * =]) && bgp_path.last = asn);
After restart - all seems to be OK, except periodical messages 'Kernel dropped some netlink messages, will resync on next scan.' in log.
What was happened? Bad BGP packets? Or some bug in bird?
-- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."