On 04.07.2017 15:59, Ondrej Zajicek wrote:
On Sun, Jul 02, 2017 at 12:27:18PM +0300, Andrew wrote:
Hi all.
Today I saw strange bird hangup, which results in OSPF failure in areas where router is. BIRD works as OSPF + BGP. All returns to work state after bird was killed by SIGKILL and started again.
Here's strange logs records:
Jul 2 11:33:46 gw2 bird: I/O loop cycle took 7016 ms for 35 events Jul 2 11:33:57 gw2 bird: I/O loop cycle took 7208 ms for 58 events Jul 2 11:34:06 gw2 bird: I/O loop cycle took 5829 ms for 31 events This seems like related to a problem. Do you have this messages in log during normal operation? Esp. since 11:36:03, it locked for 38 s, enough to timeout OSPF neighbors: No, I grepped logs - there are such errors only when trouble happened, for ~30 mins till daemon restart
Jul 2 11:36:03 gw2 bird: I/O loop cycle took 37685 ms for 22 events Was BIRD at least partially interactive during hangup? Did it run with 100 % CPU load.
I didn't look at top, but LA on server raised when trouble happens. Also I didn't checked interactivity by birdc; but it seems like BGP sessions with this host (according to logs on other servers) flapped some times till bird was rebooted.
Jul 2 11:41:49 gw2 bird: filters, line 47: AS path expected Jul 2 11:41:49 gw2 bird: filters, line 47: AS path expected Jul 2 11:41:49 gw2 bird: filters, line 47: AS path expected Jul 2 11:41:49 gw2 bird: filters, line 47: AS path expected Jul 2 11:41:49 gw2 bird: filters, line 47: AS path expected Jul 2 11:41:49 gw2 bird: ...
There's filter config where error message happens:
function rt_client(int asn; prefix set nets) { return (net ~ nets && (bgp_path.first = asn || bgp_path ~ [= my_as asn * =]) && bgp_path.last = asn); } You could avoid that 'AS path expected' error msg by checking whether the attribute is defined:
return (net ~ nets && defined(bgp_path) && (bgp_path.first = asn || bgp_path ~ [= my_as asn * =]) && bgp_path.last = asn);
Ok thanks.