Dynamic routes from OSPF disappear
Hi. We have two routers and 7 machines with bird installed (7 machines <-> router <- Internet -> router <-> 8th machine) that work without problems. When I've added 8th machine with Ubuntu 14.04 and bird 1.4.3, everything was working for about one hour after all routes announced over OSPF disappeared, also own routes were not announced. By enabling debugging (debug ospf1 all in birdc) there are no messages that would indicate an error (just normal hello exchange and OSPF LSA announcements). By restarting the ospf1 protocol, the OSPF routes are exchanged and it works for another hour or so. Anybody experienced the same behavior? Best regards, Gregor Kališnik
Hi. After some more investigation I've figured out that a LSA type 2 gets removed. After it is removed, all routes from OSPF are removed. By setting the connection as ptp should fix it (link was broadcast before). Interface on the router has IP address 10.16.0.1/12 and on the server it has 10.16.8.1/12. Server has a bridge (for LXC container) with 10.16.8.1/22. RouterOS' OSPF daemon is on 10.16.0.1/12 interface while bird is on 10.16.8.1/12. As far as I know, this configuration should work as broadcast. Best regards, Gregor Kališnik On Wednesday 18 of June 2014 16:47:13 Gregor Kališnik wrote:
Hi.
We have two routers and 7 machines with bird installed (7 machines <-> router <- Internet -> router <-> 8th machine) that work without problems. When I've added 8th machine with Ubuntu 14.04 and bird 1.4.3, everything was working for about one hour after all routes announced over OSPF disappeared, also own routes were not announced.
By enabling debugging (debug ospf1 all in birdc) there are no messages that would indicate an error (just normal hello exchange and OSPF LSA announcements). By restarting the ospf1 protocol, the OSPF routes are exchanged and it works for another hour or so.
Anybody experienced the same behavior?
Best regards, Gregor Kališnik
On Wed, Jun 25, 2014 at 04:40:52PM +0200, Gregor Kališnik wrote:
Hi.
After some more investigation I've figured out that a LSA type 2 gets removed. After it is removed, all routes from OSPF are removed. By setting the connection as ptp should fix it (link was broadcast before).
Interface on the router has IP address 10.16.0.1/12 and on the server it has 10.16.8.1/12. Server has a bridge (for LXC container) with 10.16.8.1/22.
RouterOS' OSPF daemon is on 10.16.0.1/12 interface while bird is on 10.16.8.1/12. As far as I know, this configuration should work as broadcast.
Hi You could check whether both sides have the same idea of DR (designated router) and Backup DR. I have some reports about Mikrotik RouterOS vs. BIRD compatibility problems w.r.t. DR election, where both sides think the other side is DR. Workaround is to change network to PtP (if possible) or set different priority on both sided. BTW, RouterOS being the other OSPF router is the key information i missed in your original report. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hi. I've checked and both agree on the same DR router (in bird checked the show ospf interface). LSA type 2 is removed and with it all networks. bird> show ospf state area 10.16.0.0 router 10.16.8.1 distance 0 network [10.16.0.1] metric 10 stubnet 10.16.8.0/22 metric 10 stubnet 10.16.12.0/22 metric 10 stubnet 10.16.8.1/32 metric 10 stubnet 10.2.126.1/32 metric 10 bird> show ospf topology area 10.16.0.0 router 10.16.8.1 distance 0 network [10.16.0.1] metric 10 bird> show ospf neighbors ospf1: Router ID Pri State DTime Interface Router IP 10.16.0.1 2 full/dr 00:31 eth0 10.16.0.1 bird> show ospf lsadb Area 10.16.0.0 Type LS ID Router Age Sequence Checksum 0003 0.0.0.0 10.16.0.1 1134 80000034 4cc3 0001 10.16.0.1 10.16.0.1 306 80000062 ac99 0001 10.16.8.1 10.16.8.1 1523 800000cf 257a As you can see in show ospf topology, there is no sign of 10.16.0.1 router :) . I'll enable debugging, there should be the message that triggers LSA removal. Best regards, Gregor Kališnik On Wednesday 25 of June 2014 17:41:19 Ondrej Zajicek wrote:
On Wed, Jun 25, 2014 at 04:40:52PM +0200, Gregor Kališnik wrote:
Hi.
After some more investigation I've figured out that a LSA type 2 gets removed. After it is removed, all routes from OSPF are removed. By setting the connection as ptp should fix it (link was broadcast before).
Interface on the router has IP address 10.16.0.1/12 and on the server it has 10.16.8.1/12. Server has a bridge (for LXC container) with 10.16.8.1/22.
RouterOS' OSPF daemon is on 10.16.0.1/12 interface while bird is on 10.16.8.1/12. As far as I know, this configuration should work as broadcast. Hi
You could check whether both sides have the same idea of DR (designated router) and Backup DR. I have some reports about Mikrotik RouterOS vs. BIRD compatibility problems w.r.t. DR election, where both sides think the other side is DR. Workaround is to change network to PtP (if possible) or set different priority on both sided. BTW, RouterOS being the other OSPF router is the key information i missed in your original report.
On Wed, Jun 25, 2014 at 07:47:24PM +0200, Gregor Kališnik wrote:
Hi.
I've checked and both agree on the same DR router (in bird checked the show ospf interface).
Are you sure? According to BIRD output you sent, BIRD thinks that RouterOS (10.16.0.1) is DR, while RouterOS flushes its Net-LSA and does not originate another (which should be done by DR). -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Thursday 26 of June 2014 11:27:08 Ondrej Zajicek wrote:
On Wed, Jun 25, 2014 at 07:47:24PM +0200, Gregor Kališnik wrote:
Hi.
I've checked and both agree on the same DR router (in bird checked the show ospf interface).
Are you sure? According to BIRD output you sent, BIRD thinks that RouterOS (10.16.0.1) is DR, while RouterOS flushes its Net-LSA and does not originate another (which should be done by DR).
Yes. Router said the DR is 10.16.0.1 (itself). I'll check what happens if bird is DR. If I look at the logs, router does send type 2 LSA (inside LSUPD packet), but bird removes it. It does have age specified as 3600, tho. Best regards, Gregor Kališnik
On Thu, Jun 26, 2014 at 11:04:19AM +0200, Gregor Kališnik wrote:
On Thursday 26 of June 2014 11:27:08 Ondrej Zajicek wrote:
On Wed, Jun 25, 2014 at 07:47:24PM +0200, Gregor Kališnik wrote:
Hi.
I've checked and both agree on the same DR router (in bird checked the show ospf interface).
Are you sure? According to BIRD output you sent, BIRD thinks that RouterOS (10.16.0.1) is DR, while RouterOS flushes its Net-LSA and does not originate another (which should be done by DR).
Yes. Router said the DR is 10.16.0.1 (itself). I'll check what happens if bird is DR.
If I look at the logs, router does send type 2 LSA (inside LSUPD packet), but bird removes it. It does have age specified as 3600, tho.
Sending LSA with age 3600 is a way how to flush LSA from OSPF domain. So it is RouterOS who flushed it for some reason. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hi. Interesting, I've made bird to be the DR, and everything is working as intended the whole day. Best regards, Gregor Kališnik On Thursday 26 of June 2014 12:31:25 Ondrej Zajicek wrote:
On Thu, Jun 26, 2014 at 11:04:19AM +0200, Gregor Kališnik wrote:
On Thursday 26 of June 2014 11:27:08 Ondrej Zajicek wrote:
On Wed, Jun 25, 2014 at 07:47:24PM +0200, Gregor Kališnik wrote:
Hi.
I've checked and both agree on the same DR router (in bird checked the show ospf interface).
Are you sure? According to BIRD output you sent, BIRD thinks that RouterOS (10.16.0.1) is DR, while RouterOS flushes its Net-LSA and does not originate another (which should be done by DR).
Yes. Router said the DR is 10.16.0.1 (itself). I'll check what happens if bird is DR.
If I look at the logs, router does send type 2 LSA (inside LSUPD packet), but bird removes it. It does have age specified as 3600, tho.
Sending LSA with age 3600 is a way how to flush LSA from OSPF domain. So it is RouterOS who flushed it for some reason.
Hi. Got it, here it is: Jun 25 20:17:46 drone05 bird: ospf1: LSUPD packet received from 10.16.0.1 via eth0 Jun 25 20:17:46 drone05 bird: ospf1: length 108 Jun 25 20:17:46 drone05 bird: ospf1: router 10.16.0.1 Jun 25 20:17:46 drone05 bird: ospf1: LSA Type: 0002, Id: 10.16.0.1, Rt: 10.16.0.1, Age: 3600, Seq: 80000001, Sum: ab4b Jun 25 20:17:46 drone05 bird: ospf1: LSA Type: 0001, Id: 10.16.0.1, Rt: 10.16.0.1, Age: 1, Seq: 80000065, Sum: a69c Jun 25 20:17:46 drone05 bird: ospf1: Scheduling routing table calculation Jun 25 20:17:46 drone05 bird: ospf1: Going to remove LSA Type: 0002, Id: 10.16.0.1, Rt: 10.16.0.1, Age: 3600, Seqno: 0x80000001 Jun 25 20:17:46 drone05 bird: ospf1: Starting routing table calculation Jun 25 20:17:46 drone05 bird: ospf1: Starting routing table calculation for area 10.16.0.0 Jun 25 20:17:46 drone05 bird: ospf1: Starting routing table calculation for inter-area (area 10.16.0.0) Jun 25 20:17:46 drone05 bird: ospf1: Starting routing table calculation for ext routes Jun 25 20:17:46 drone05 bird: ospf1: Starting routing table synchronisation Jun 25 20:17:46 drone05 bird: ospf1 > removed [replaced] 0.0.0.0/0 via 10.16.0.1 on eth0 Jun 25 20:17:46 drone05 bird: Netlink: No such process Jun 25 20:17:46 drone05 bird: ospf1 < rejected by protocol 0.0.0.0/0 via 10.16.0.1 on eth0 Jun 25 20:17:46 drone05 bird: ospf1 > removed [sole] 10.16.0.0/12 dev eth0 Jun 25 20:17:46 drone05 bird: ospf1 > removed [sole] 10.16.2.1/32 via 10.16.0.1 on eth0 Jun 25 20:17:46 drone05 bird: ospf1: HELLO packet sent via eth0 Jun 25 20:17:47 drone05 bird: ospf1: LSACK packet sent via eth0 Jun 25 20:17:47 drone05 bird: ospf1: length 64 Jun 25 20:17:47 drone05 bird: ospf1: router 10.16.8.1 Jun 25 20:17:47 drone05 bird: ospf1: LSA Type: 0002, Id: 10.16.0.1, Rt: 10.16.0.1, Age: 3600, Seq: 80000001, Sum: ab4b Jun 25 20:17:47 drone05 bird: ospf1: LSA Type: 0001, Id: 10.16.0.1, Rt: 10.16.0.1, Age: 1, Seq: 80000065, Sum: a69c Jun 25 20:17:50 drone05 bird: ospf1: HELLO packet received from 10.16.0.1 via eth0 Jun 25 20:17:51 drone05 bird: ospf1: Refreshing my LSA: Type: 1, Id: 10.16.8.1, Rt: 10.16.8.1 Jun 25 20:17:51 drone05 bird: ospf1: LSUPD packet flooded via eth0 Jun 25 20:17:51 drone05 bird: ospf1: length 112 Jun 25 20:17:51 drone05 bird: ospf1: router 10.16.8.1 Jun 25 20:17:51 drone05 bird: ospf1: LSA Type: 0001, Id: 10.16.8.1, Rt: 10.16.8.1, Age: 1, Seq: 800000d2, Sum: 7c20 Jun 25 20:17:52 drone05 bird: ospf1: LSACK packet received from 10.16.0.1 via eth0 Jun 25 20:17:52 drone05 bird: ospf1: length 44 Jun 25 20:17:52 drone05 bird: ospf1: router 10.16.0.1 Jun 25 20:17:52 drone05 bird: ospf1: LSA Type: 0001, Id: 10.16.8.1, Rt: 10.16.8.1, Age: 1, Seq: 800000d2, Sum: 7c20 Best regards, Gregor Kališnik On Wednesday 25 of June 2014 17:41:19 Ondrej Zajicek wrote:
On Wed, Jun 25, 2014 at 04:40:52PM +0200, Gregor Kališnik wrote:
Hi.
After some more investigation I've figured out that a LSA type 2 gets removed. After it is removed, all routes from OSPF are removed. By setting the connection as ptp should fix it (link was broadcast before).
Interface on the router has IP address 10.16.0.1/12 and on the server it has 10.16.8.1/12. Server has a bridge (for LXC container) with 10.16.8.1/22.
RouterOS' OSPF daemon is on 10.16.0.1/12 interface while bird is on 10.16.8.1/12. As far as I know, this configuration should work as broadcast. Hi
You could check whether both sides have the same idea of DR (designated router) and Backup DR. I have some reports about Mikrotik RouterOS vs. BIRD compatibility problems w.r.t. DR election, where both sides think the other side is DR. Workaround is to change network to PtP (if possible) or set different priority on both sided. BTW, RouterOS being the other OSPF router is the key information i missed in your original report.
participants (2)
-
Gregor Kališnik -
Ondrej Zajicek