Long io loop / missing netlink messages

Nico Schottelius nico.schottelius at ungleich.ch
Sat Dec 21 13:07:25 CET 2019


Good morning,

on a fresh new router running the full routing table,
with Alpine, Linux 4.19.80-0-vanilla, bird-2.0.7

I see a lot of these messages:

Dec 21 12:37:15 router1 daemon.warn bird: Kernel dropped some netlink messages, will resync on next scan.
Dec 21 12:37:34 router1 daemon.warn bird: I/O loop cycle took 5000 ms for 1 events
Dec 21 12:38:14 router1 daemon.warn bird: Kernel dropped some netlink messages, will resync on next scan.
Dec 21 12:38:35 router1 daemon.warn bird: I/O loop cycle took 5328 ms for 1 events
Dec 21 12:38:54 router1 daemon.warn bird: I/O loop cycle took 5013 ms for 1 events
Dec 21 12:39:07 router1 daemon.warn bird: Kernel dropped some netlink messages, will resync on next scan.
Dec 21 12:39:14 router1 daemon.warn bird: I/O loop cycle took 5053 ms for 1 events
Dec 21 12:39:34 router1 daemon.warn bird: Kernel dropped some netlink messages, will resync on next scan.
Dec 21 12:40:14 router1 daemon.warn bird: I/O loop cycle took 5041 ms for 1 events

[12:49] router1.place6:~# ip -6 r | wc -l; ip r | wc -l
78212
779342

With "debug latency;" I get the following additional messages:

Dec 21 12:54:31 router1 daemon.warn bird: Event 0x000055a21afb8144 0x0000000000000000 took 4449 ms
Dec 21 12:54:52 router1 daemon.warn bird: Event 0x000055a21afb8144 0x0000000000000000 took 5608 ms

The system is overall idle with bird spiking to 50-100% cpu usage every
couple of seconds. I first thougt they are only logged after stating
bird (where it might make sense), but the events continue to be logged
around every 30s:

Dec 21 13:04:11 router1 daemon.warn bird: Event 0x000055a21afb8144 0x0000000000000000 took 4596 ms
Dec 21 13:04:32 router1 daemon.warn bird: Event 0x000055a21afb8144 0x0000000000000000 took 5096 ms
Dec 21 13:04:52 router1 daemon.warn bird: Event 0x000055a21afb8144 0x0000000000000000 took 5102 ms
Dec 21 13:05:11 router1 daemon.warn bird: Event 0x000055a21afb8144 0x0000000000000000 took 4676 ms
Dec 21 13:05:31 router1 daemon.warn bird: Event 0x000055a21afb8144 0x0000000000000000 took 4645 ms

Which might loosely correlate to the scan time "20" that is setup for
device and kernel protocols.

How do I best debug this issue?

Best,

Nico



--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch


More information about the Bird-users mailing list