Bird 1.6.3 strange hangup

Andrew nitr0 at seti.kr.ua
Sun Jul 2 11:27:18 CEST 2017


Hi all.

Today I saw strange bird hangup, which results in OSPF failure in areas 
where router is. BIRD works as OSPF + BGP. All returns to work state 
after bird was killed by SIGKILL and started again.

Here's strange logs records:

Jul  2 11:33:46 gw2 bird: I/O loop cycle took 7016 ms for 35 events
Jul  2 11:33:57 gw2 bird: I/O loop cycle took 7208 ms for 58 events
Jul  2 11:34:06 gw2 bird: I/O loop cycle took 5829 ms for 31 events
Jul  2 11:34:16 gw2 upsd[8901]: UPS [ups] data is no longer stale
Jul  2 11:34:19 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:19 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:19 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:19 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:19 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:19 gw2 bird: ...
Jul  2 11:34:25 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:25 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:25 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:25 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:25 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:25 gw2 bird: ...
Jul  2 11:34:28 gw2 bird: I/O loop cycle took 9297 ms for 30 events
Jul  2 11:34:28 gw2 bird: Kernel dropped some netlink messages, will 
resync on next scan.
Jul  2 11:34:32 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:32 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:32 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:32 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:32 gw2 bird: filters, line 47: AS path expected
Jul  2 11:34:32 gw2 bird: ...
Jul  2 11:34:33 gw2 bird: Kernel dropped some netlink messages, will 
resync on next scan.
Jul  2 11:34:35 gw2 upsd[8901]: Data for UPS [ups] is stale - check driver
Jul  2 11:34:41 gw2 upsd[8901]: UPS [ups] data is no longer stale
Jul  2 11:34:52 gw2 bird: bgp_peer_dg_w: Error: Hold timer expired
Jul  2 11:34:58 gw2 upsd[8901]: Data for UPS [ups] is stale - check driver
Jul  2 11:34:58 gw2 bird: Kernel dropped some netlink messages, will 
resync on next scan.
Jul  2 11:35:08 gw2 upsd[8901]: UPS [ups] data is no longer stale
Jul  2 11:35:08 gw2 bird: bgp_peer_dg_ua: Error: Hold timer expired
Jul  2 11:35:08 gw2 bird: I/O loop cycle took 9312 ms for 23 events
Jul  2 11:35:17 gw2 bird: I/O loop cycle took 8410 ms for 14 events
Jul  2 11:35:17 gw2 bird: Next hop address 77.88.200.157 resolvable 
through recursive route for 77.88.x.x/30
Jul  2 11:35:24 gw2 upsd[8901]: Data for UPS [ups] is stale - check driver
Jul  2 11:35:24 gw2 bird: I/O loop cycle took 6134 ms for 12 events
Jul  2 11:35:46 gw2 upsd[8901]: UPS [ups] data is no longer stale
Jul  2 11:36:03 gw2 bird: bgp_peer_nas2: Error: Hold timer expired
Jul  2 11:36:03 gw2 bird: bgp_peer_nas4: Error: Hold timer expired
Jul  2 11:36:03 gw2 bird: bgp_peer_nas4_w: Error: Hold timer expired
Jul  2 11:36:03 gw2 bird: I/O loop cycle took 37685 ms for 22 events
Jul  2 11:36:07 gw2 upsd[8901]: Data for UPS [ups] is stale - check driver
Jul  2 11:36:33 gw2 snmpd[8061]: Connection from UDP: 
[192.168.x.x]:44096->[10.255.x.x]:161
Jul  2 11:36:33 gw2 snmpd[8061]: send response: Failure in sendto (parse 
header: message overflow: 10 len + 2 delta > 4 len)
Jul  2 11:36:33 gw2 snmpd[8061]:     -- IF-MIB::ifOutUcastPkts.11
Jul  2 11:36:39 gw2 bird: bgp_peer_g1: Error: Hold timer expired
Jul  2 11:36:39 gw2 bird: I/O loop cycle took 34913 ms for 17 events
Jul  2 11:36:39 gw2 bird: Next hop address x.x.x.x resolvable through 
recursive route for x.x.x.x/30
Jul  2 11:36:46 gw2 upsd[8901]: UPS [ups] data is no longer stale
Jul  2 11:36:53 gw2 monit[8133]: 'localhost' loadavg(5min) of 7.4 
matches resource limit [loadavg(5min)>7.0]
Jul  2 11:36:53 gw2 monit[8133]: 'localhost' loadavg(1min) of 15.3 
matches resource limit [loadavg(1min)>10.0]
Jul  2 11:36:57 gw2 bird: bgp_peer_nas2_w: Error: Hold timer expired
Jul  2 11:37:03 gw2 upsd[8901]: Data for UPS [ups] is stale - check driver
Jul  2 11:37:02 gw2 bird: bgp_peer_nas1_w: Error: Hold timer expired
Jul  2 11:37:02 gw2 bird: bgp_peer_nas3: Error: Hold timer expired
Jul  2 11:37:02 gw2 bird: bgp_peer_g1_w: Error: Hold timer expired
Jul  2 11:37:02 gw2 bird: I/O loop cycle took 22598 ms for 21 events
Jul  2 11:37:15 gw2 bird: I/O loop cycle took 11770 ms for 15 events
Jul  2 11:37:52 gw2 bird: bgp_peer_r2_w: Error: Hold timer expired
Jul  2 11:37:52 gw2 bird: bgp_peer_r2: Error: Hold timer expired
Jul  2 11:37:52 gw2 bird: I/O loop cycle took 35283 ms for 16 events
Jul  2 11:37:56 gw2 monit[8133]: 'localhost' loadavg(5min) of 8.8 
matches resource limit [loadavg(5min)>7.0]
Jul  2 11:37:56 gw2 monit[8133]: 'localhost' loadavg(1min) of 15.2 
matches resource limit [loadavg(1min)>10.0]
Jul  2 11:38:19 gw2 bird: I/O loop cycle took 24751 ms for 22 events
Jul  2 11:38:32 gw2 upsd[8901]: UPS [ups] data is no longer stale
Jul  2 11:38:43 gw2 bird: I/O loop cycle took 23370 ms for 11 events
Jul  2 11:38:48 gw2 upsd[8901]: Data for UPS [ups] is stale - check driver
Jul  2 11:38:54 gw2 bird: I/O loop cycle took 10864 ms for 11 events
Jul  2 11:38:56 gw2 monit[8133]: 'localhost' loadavg(5min) of 9.8 
matches resource limit [loadavg(5min)>7.0]
Jul  2 11:38:56 gw2 monit[8133]: 'localhost' loadavg(1min) of 14.3 
matches resource limit [loadavg(1min)>10.0]
...
Jul  2 11:39:56 gw2 monit[8133]: 'localhost' cpu system usage of 85.2% 
matches resource limit [cpu system usage>60.0%]
Jul  2 11:39:56 gw2 monit[8133]: 'localhost' loadavg(5min) of 10.0 
matches resource limit [loadavg(5min)>7.0]
Jul  2 11:39:56 gw2 monit[8133]: 'localhost' loadavg(1min) of 12.0 
matches resource limit [loadavg(1min)>10.0]
Jul  2 11:39:57 gw2 snmpd[8061]: Connection from UDP: 
[192.168.x.x]:59167->[10.255.x.x]:161
Jul  2 11:39:57 gw2 snmpd[8061]: send response: Failure in sendto
Jul  2 11:39:57 gw2 snmpd[8061]:     -- IF-MIB::ifOutUcastPkts.10
Jul  2 11:39:58 gw2 bird: I/O loop cycle took 13685 ms for 15 events
Jul  2 11:40:01 gw2 snmpd[8061]: Connection from UDP: 
[192.168.x.x]:59167->[10.255.x.x]:161
Jul  2 11:40:01 gw2 snmpd[8061]: send response: Failure in sendto
Jul  2 11:40:01 gw2 snmpd[8061]:     -- IF-MIB::ifOutUcastPkts.10
Jul  2 11:40:56 gw2 monit[8133]: 'localhost' cpu system usage check 
succeeded [current cpu system usage=30.7%]
Jul  2 11:40:56 gw2 monit[8133]: 'localhost' loadavg(5min) of 8.6 
matches resource limit [loadavg(5min)>7.0]
Jul  2 11:40:56 gw2 monit[8133]: 'localhost' loadavg(1min) check 
succeeded [current loadavg(1min)=5.9]
Jul  2 11:41:06 gw2 snmpd[8061]: Connection from UDP: 
[192.168.x.x]:42144->[10.255.x.x]:161
Jul  2 11:41:06 gw2 snmpd[8061]: send response: Failure in sendto
Jul  2 11:41:06 gw2 snmpd[8061]:     -- LM-SENSORS-MIB::lmSensors
Jul  2 11:41:11 gw2 snmpd[8061]: Connection from UDP: 
[192.168.x.x]:42144->[10.255.x.x]:161
Jul  2 11:41:11 gw2 snmpd[8061]: send response: Failure in sendto
Jul  2 11:41:11 gw2 snmpd[8061]:     -- LM-SENSORS-MIB::lmSensors
Jul  2 11:41:35 gw2 bird: Kernel dropped some netlink messages, will 
resync on next scan.
Jul  2 11:41:36 gw2 bird: filters, line 47: AS path expected
Jul  2 11:41:36 gw2 bird: filters, line 47: AS path expected
Jul  2 11:41:36 gw2 bird: filters, line 47: AS path expected
Jul  2 11:41:36 gw2 bird: filters, line 47: AS path expected
Jul  2 11:41:36 gw2 bird: filters, line 47: AS path expected
Jul  2 11:41:36 gw2 bird: ...
Jul  2 11:41:43 gw2 bird: bgp_peer_nas1: Invalid NEXT_HOP attribute in 
route yy.yy.yy.yy/30
Jul  2 11:41:43 gw2 bird: I/O loop cycle took 6922 ms for 41 events
Jul  2 11:41:43 gw2 bird: Kernel dropped some netlink messages, will 
resync on next scan.
Jul  2 11:41:49 gw2 bird: filters, line 47: AS path expected
Jul  2 11:41:49 gw2 bird: filters, line 47: AS path expected
Jul  2 11:41:49 gw2 bird: filters, line 47: AS path expected
Jul  2 11:41:49 gw2 bird: filters, line 47: AS path expected
Jul  2 11:41:49 gw2 bird: filters, line 47: AS path expected
Jul  2 11:41:49 gw2 bird: ...


There's filter config where error message happens:

function rt_client(int asn; prefix set nets)
{
         return (net ~ nets &&
                 (bgp_path.first = asn || bgp_path ~ [= my_as asn * =]) &&
                 bgp_path.last = asn);
}

How it used:

function net_clients()
{
     return rt_client(xxxxx, [91.yyy.yyy.0/24]);
}


After restart - all seems to be OK, except periodical messages 'Kernel 
dropped some netlink messages, will resync on next scan.' in log.


What was happened? Bad BGP packets? Or some bug in bird?



More information about the Bird-users mailing list