Constant delete / add route after upgrade to 1.6.3
Hi, We've upgraded recently to 1.6.3 (We were using 1.2.5, which we had running for like 8 years!) and after the upgraded we are seeing a weird behavior. When running ip monitor route in the machine, looks like bird is constantly removing and adding routes, which could be considered normal, but the thing is that it deletes a route and then adds the same route again (Those routes are received via BGP): Heres an example: 103.72.2.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 103.72.1.0/24 via 91.126.95.153 dev vlan111 proto bird 103.72.1.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 177.75.72.0/24 via 91.126.95.153 dev vlan111 proto bird 177.75.72.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 185.165.122.0/24 via 91.126.95.153 dev vlan111 proto bird 185.165.122.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 196.46.242.0/24 via 91.126.95.153 dev vlan111 proto bird 196.46.242.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 64.70.30.0/24 via 91.126.95.153 dev vlan111 proto bird 64.70.30.0/24 via 212.80.191.185 dev vlan2 proto bird At the same time we are seeing the following messages on bird log: May 8 13:56:39 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:02 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:27 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:32 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:59 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. Any ideas why this might be happening? Thanks! Saludos Cordiales, Xavier Trilla P. Clouding.io<https://clouding.io/> ¿Un Servidor Cloud con SSDs, redundado y disponible en menos de 30 segundos? ¡Pruébalo ahora en Clouding.io<https://clouding.io/>!
Hi, Some extra Info: I've changed the scan time from 20 to 60 on the kernel and the CPU load on the machine has reduced, but I still see too many route changes. Thanks! Saludos Cordiales, Xavier Trilla P. Clouding.io<https://clouding.io/> ¿Un Servidor Cloud con SSDs, redundado y disponible en menos de 30 segundos? ¡Pruébalo ahora en Clouding.io<https://clouding.io/>! De: Bird-users <bird-users-bounces@network.cz> En nombre de Xavier Trilla Enviado el: martes, 8 de mayo de 2018 14:02 Para: bird-users@network.cz Asunto: Constant delete / add route after upgrade to 1.6.3 Hi, We've upgraded recently to 1.6.3 (We were using 1.2.5, which we had running for like 8 years!) and after the upgraded we are seeing a weird behavior. When running ip monitor route in the machine, looks like bird is constantly removing and adding routes, which could be considered normal, but the thing is that it deletes a route and then adds the same route again (Those routes are received via BGP): Heres an example: 103.72.2.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 103.72.1.0/24 via 91.126.95.153 dev vlan111 proto bird 103.72.1.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 177.75.72.0/24 via 91.126.95.153 dev vlan111 proto bird 177.75.72.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 185.165.122.0/24 via 91.126.95.153 dev vlan111 proto bird 185.165.122.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 196.46.242.0/24 via 91.126.95.153 dev vlan111 proto bird 196.46.242.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 64.70.30.0/24 via 91.126.95.153 dev vlan111 proto bird 64.70.30.0/24 via 212.80.191.185 dev vlan2 proto bird At the same time we are seeing the following messages on bird log: May 8 13:56:39 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:02 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:27 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:32 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:59 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. Any ideas why this might be happening? Thanks! Saludos Cordiales, Xavier Trilla P. Clouding.io<https://clouding.io/> ¿Un Servidor Cloud con SSDs, redundado y disponible en menos de 30 segundos? ¡Pruébalo ahora en Clouding.io<https://clouding.io/>!
On May 8, 2018 3:44:20 PM GMT+02:00, Xavier Trilla <xavier.trilla@clouding.io> wrote:
Hi,
Some extra Info: I've changed the scan time from 20 to 60 on the kernel and the CPU load on the machine has reduced, but I still see too many route changes.
Thanks!
Saludos Cordiales, Xavier Trilla P. Clouding.io<https://clouding.io/>
¿Un Servidor Cloud con SSDs, redundado y disponible en menos de 30 segundos?
¡Pruébalo ahora en Clouding.io<https://clouding.io/>!
De: Bird-users <bird-users-bounces@network.cz> En nombre de Xavier Trilla Enviado el: martes, 8 de mayo de 2018 14:02 Para: bird-users@network.cz Asunto: Constant delete / add route after upgrade to 1.6.3
Hi,
We've upgraded recently to 1.6.3 (We were using 1.2.5, which we had running for like 8 years!) and after the upgraded we are seeing a weird behavior.
When running ip monitor route in the machine, looks like bird is constantly removing and adding routes, which could be considered normal, but the thing is that it deletes a route and then adds the same route again (Those routes are received via BGP):
Heres an example:
103.72.2.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 103.72.1.0/24 via 91.126.95.153 dev vlan111 proto bird 103.72.1.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 177.75.72.0/24 via 91.126.95.153 dev vlan111 proto bird 177.75.72.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 185.165.122.0/24 via 91.126.95.153 dev vlan111 proto bird 185.165.122.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 196.46.242.0/24 via 91.126.95.153 dev vlan111 proto bird 196.46.242.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 64.70.30.0/24 via 91.126.95.153 dev vlan111 proto bird 64.70.30.0/24 via 212.80.191.185 dev vlan2 proto bird
At the same time we are seeing the following messages on bird log:
May 8 13:56:39 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:02 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:27 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:32 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:59 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan.
Any ideas why this might be happening?
Thanks!
Saludos Cordiales, Xavier Trilla P. Clouding.io<https://clouding.io/>
¿Un Servidor Cloud con SSDs, redundado y disponible en menos de 30 segundos?
¡Pruébalo ahora en Clouding.io<https://clouding.io/>!
Hello, have you tried to switch on BGP protocol tracing? Aren't the routes really sent in this way? Haven't you seen some protocol going up and down repeatedly or something like that? This all is to find out whether the routes are flapping on the input side or in the kernel binding in BIRD. The error message is just a warning that the buffer between kernel and BIRD has overflown which may be due to high cpu load -- but it has to be due to something. Could you please also send your config to see whether it can be seen from it directly? Thanks! Maria
Hi Maria, For what I'm seeing, looks like every time birds gets a route update via BGP the route is replaced, I enabled debug in kernel protocol and I'm seeing this: 2018-05-08 22:20:32 <TRACE> icewall_01_BGP > added [best] 80.84.147.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP < rejected by protocol 80.84.147.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> kernel1 < replaced 80.84.147.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP > added [best] 78.40.178.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP < rejected by protocol 78.40.178.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> kernel1 < replaced 78.40.178.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP > added [best] 64.17.246.0/23 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP < rejected by protocol 64.17.246.0/23 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> kernel1 < replaced 64.17.246.0/23 via 172.17.0.1 on vlan10 But does it make sense? I mean, deleting and reading the route to replace it when a route update is received doesn't seem to make much sense considering the gateway does not change. Also, I guess removing and adding the route again could lead to some packet loss. Maybe bird needs to keep some information in the route in order to keep sync and that's why it needs to replace it? Or there is anything I'm missing... Cheers! Saludos Cordiales, Xavier Trilla P. Clouding.io ¿Un Servidor Cloud con SSDs, redundado y disponible en menos de 30 segundos? ¡Pruébalo ahora en Clouding.io! -----Mensaje original----- De: Maria Jan Matějka <jan.matejka@nic.cz> Enviado el: martes, 8 de mayo de 2018 16:24 Para: bird-users@network.cz; Xavier Trilla <xavier.trilla@clouding.io>; bird-users@network.cz Asunto: RE: Constant delete / add route after upgrade to 1.6.3 On May 8, 2018 3:44:20 PM GMT+02:00, Xavier Trilla <xavier.trilla@clouding.io> wrote:
Hi,
Some extra Info: I've changed the scan time from 20 to 60 on the kernel and the CPU load on the machine has reduced, but I still see too many route changes.
Thanks!
Saludos Cordiales, Xavier Trilla P. Clouding.io<https://clouding.io/>
¿Un Servidor Cloud con SSDs, redundado y disponible en menos de 30 segundos?
¡Pruébalo ahora en Clouding.io<https://clouding.io/>!
De: Bird-users <bird-users-bounces@network.cz> En nombre de Xavier Trilla Enviado el: martes, 8 de mayo de 2018 14:02 Para: bird-users@network.cz Asunto: Constant delete / add route after upgrade to 1.6.3
Hi,
We've upgraded recently to 1.6.3 (We were using 1.2.5, which we had running for like 8 years!) and after the upgraded we are seeing a weird behavior.
When running ip monitor route in the machine, looks like bird is constantly removing and adding routes, which could be considered normal, but the thing is that it deletes a route and then adds the same route again (Those routes are received via BGP):
Heres an example:
103.72.2.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 103.72.1.0/24 via 91.126.95.153 dev vlan111 proto bird 103.72.1.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 177.75.72.0/24 via 91.126.95.153 dev vlan111 proto bird 177.75.72.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 185.165.122.0/24 via 91.126.95.153 dev vlan111 proto bird 185.165.122.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 196.46.242.0/24 via 91.126.95.153 dev vlan111 proto bird 196.46.242.0/24 via 91.126.95.153 dev vlan111 proto bird Deleted 64.70.30.0/24 via 91.126.95.153 dev vlan111 proto bird 64.70.30.0/24 via 212.80.191.185 dev vlan2 proto bird
At the same time we are seeing the following messages on bird log:
May 8 13:56:39 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:02 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:27 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:32 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan. May 8 13:57:59 icewall-01 bird: Kernel dropped some netlink messages, will resync on next scan.
Any ideas why this might be happening?
Thanks!
Saludos Cordiales, Xavier Trilla P. Clouding.io<https://clouding.io/>
¿Un Servidor Cloud con SSDs, redundado y disponible en menos de 30 segundos?
¡Pruébalo ahora en Clouding.io<https://clouding.io/>!
Hello, have you tried to switch on BGP protocol tracing? Aren't the routes really sent in this way? Haven't you seen some protocol going up and down repeatedly or something like that? This all is to find out whether the routes are flapping on the input side or in the kernel binding in BIRD. The error message is just a warning that the buffer between kernel and BIRD has overflown which may be due to high cpu load -- but it has to be due to something. Could you please also send your config to see whether it can be seen from it directly? Thanks! Maria
On Tue, May 08, 2018 at 08:29:06PM +0000, Xavier Trilla wrote:
Hi Maria,
For what I'm seeing, looks like every time birds gets a route update via BGP the route is replaced, I enabled debug in kernel protocol and I'm seeing this:
2018-05-08 22:20:32 <TRACE> icewall_01_BGP > added [best] 80.84.147.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP < rejected by protocol 80.84.147.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> kernel1 < replaced 80.84.147.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP > added [best] 78.40.178.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP < rejected by protocol 78.40.178.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> kernel1 < replaced 78.40.178.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP > added [best] 64.17.246.0/23 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP < rejected by protocol 64.17.246.0/23 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> kernel1 < replaced 64.17.246.0/23 via 172.17.0.1 on vlan10
But does it make sense? I mean, deleting and reading the route to replace it when a route update is received doesn't seem to make much sense considering the gateway does not change.
Hi It is true that we send a route to the kernel if there is any change in the route, even if the change is irrelevant for the kernel itself. They may be export filters in kernel that are dependent on any route attributes so it is hard to predict whether the change is relevant or not.
Also, I guess removing and adding the route again could lead to some packet loss.
We could use replace netlink operation instead of remove/add, but that has some other problems (like insufficient protection against overwriting alien routes).
Maybe bird needs to keep some information in the route in order to keep sync and that's why it needs to replace it? Or there is anything I'm missing...
We could keep that information but that essentialy means double memory footprint for simple setups. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hi Ondrej, Thanks for your reply, know I see why it's done this way. But Im wondering, could that lead to packed loss or the remove/add is fast enough? I mean, we just push to the kernel the best paths, so in this scenario a destination could be without a destination for a brief period of time, or the update is fast enough to not even affect a high traffic connection? The alternative I'm seeing is publishing all routes to kernel with weights, but that's something I would like to avoid (I don't want to publish several million routes to the kernel :/ ) Thanks! Saludos Cordiales, Xavier Trilla P. Clouding.io ¿Un Servidor Cloud con SSDs, redundado y disponible en menos de 30 segundos? ¡Pruébalo ahora en Clouding.io! -----Mensaje original----- De: Ondrej Zajicek <santiago@crfreenet.org> Enviado el: miércoles, 9 de mayo de 2018 15:48 Para: Xavier Trilla <xavier.trilla@clouding.io> CC: Maria Jan Matějka <jan.matejka@nic.cz>; bird-users@network.cz Asunto: Re: Constant delete / add route after upgrade to 1.6.3 On Tue, May 08, 2018 at 08:29:06PM +0000, Xavier Trilla wrote:
Hi Maria,
For what I'm seeing, looks like every time birds gets a route update via BGP the route is replaced, I enabled debug in kernel protocol and I'm seeing this:
2018-05-08 22:20:32 <TRACE> icewall_01_BGP > added [best] 80.84.147.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP < rejected by protocol 80.84.147.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> kernel1 < replaced 80.84.147.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP > added [best] 78.40.178.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP < rejected by protocol 78.40.178.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> kernel1 < replaced 78.40.178.0/24 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP > added [best] 64.17.246.0/23 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> icewall_01_BGP < rejected by protocol 64.17.246.0/23 via 172.17.0.1 on vlan10 2018-05-08 22:20:32 <TRACE> kernel1 < replaced 64.17.246.0/23 via 172.17.0.1 on vlan10
But does it make sense? I mean, deleting and reading the route to replace it when a route update is received doesn't seem to make much sense considering the gateway does not change.
Hi It is true that we send a route to the kernel if there is any change in the route, even if the change is irrelevant for the kernel itself. They may be export filters in kernel that are dependent on any route attributes so it is hard to predict whether the change is relevant or not.
Also, I guess removing and adding the route again could lead to some packet loss.
We could use replace netlink operation instead of remove/add, but that has some other problems (like insufficient protection against overwriting alien routes).
Maybe bird needs to keep some information in the route in order to keep sync and that's why it needs to replace it? Or there is anything I'm missing...
We could keep that information but that essentialy means double memory footprint for simple setups. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (3)
-
Maria Jan Matějka -
Ondrej Zajicek -
Xavier Trilla