neighbor ospf bird crash after local configure
hello. I've managed to crash two remote bird instances on two remote routers by running configure on a local bird instance. they are in the same ospf area, but on different interfaces. third remote router, which is connected to those two interfaces, survived. this is the interesting part in the log. 31-10-2009 19:40:08 <TRACE> ospf1: Scheduling RT calculation. 31-10-2009 19:40:08 <TRACE> ospf1: Starting routing table calculation 31-10-2009 19:40:08 <TRACE> ospf1: Starting routing table calculation for area 0.0.0.0 31-10-2009 19:40:08 <ERR> Router's parent has no next hop. (EN=212.71.177.59, PAR=212.71.177.41) 31-10-2009 19:40:08 <ERR> Router's parent has no next hop. (EN=212.71.177.42, PAR=212.71.177.41) 31-10-2009 19:40:08 <BUG> Did not find next hop interface for INSPF lsa! does anyone now what might have happened? thanks mk
On Sat, Oct 31, 2009 at 08:04:30PM +0100, Martin Kraus wrote:
hello. I've managed to crash two remote bird instances on two remote routers by running configure on a local bird instance. they are in the same ospf area, but on different interfaces. third remote router, which is connected to those two interfaces, survived.
..
31-10-2009 19:40:08 <ERR> Router's parent has no next hop. (EN=212.71.177.59, PAR=212.71.177.41) 31-10-2009 19:40:08 <ERR> Router's parent has no next hop. (EN=212.71.177.42, PAR=212.71.177.41) 31-10-2009 19:40:08 <BUG> Did not find next hop interface for INSPF lsa!
does anyone now what might have happened?
Hmm, interesting. Could you describe what change was done in config file, what is a topology of an affected part of the network and which router IDs were used by which routers, especially which router uses router ID 212.71.177.41? -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Sun, Nov 01, 2009 at 02:50:32PM +0100, Ondrej Zajicek wrote:
Hmm, interesting. Could you describe what change was done in config file, what is a topology of an affected part of the network and which router IDs were used by which routers, especially which router uses router ID 212.71.177.41?
I had to create a diagram which is (hopefully) attached to this mail as an svg. I don't think I could describe it clearly in plaintext. There are five routers with respective IDs written inside the boxes. All are in ospf area 0.0.0.0. There is rest of the network behind routers with IDs 212.71.177.44 and 212.71.152.222 which id didn't put into the diagram. There are some additional complexities, such as a separate ospf instance from routers with IDs 212.71.177.41 and 212.71.177.42 but that wasn't affected so I didn't include that either. ID: 212.71.177.41 is a primary router running bgp to the provider and exporting default route as an E2 route with metric2 10000. All ospf costs are set to 10 except for costs to router ID: 212.71.177.44, which is 15. ID: 212.71.177.42 is a backup router running bgp to the provider and exporting default route as an E2 route with metric2 11000. All ospf interface costs are set to 30. ID: 212.71.177.58 is connected to ID: 212.71.177.41 and ID: 212.71.177.42 All ospf costs are set to 10. ID: 212.71.177.44 is connected to ID: 212.71.177.41 and ID: 212.71.177.42 on one interface and ID: 212.71.152.222 on second interface. There are additional routers connected to a third interface. All ospf costs are set to 15. ID: 212.71.152.222 is connected to ID: 212.71.177.41 and ID: 212.71.177.42 on one interface and ID: 212.71.177.44 on a second interface. There are aditiona routers on a third interface. All ospf costs are set to 10 except for cost to router ID: 212.71.177.44 which is 15. There is an asymmetry in interface costs with anything connected to router ID: 212.71.177.42, because it has been often selected as a gateway between those parts of the network that are separated by this router and the router ID: 212.71.177.41. I prefer using router ID: 212.71.177.41, and this solved my problem by making paths through ID: 212.71.177.42 more expensive. The problem occured while reconfiguring router ID: 212.71.177.41, where I set interface cost to router ID: 212.71.177.44(and of course ID: 212.71.177.42) from 10 to 15. After running configure on router ID: 212.71.177.41, bird daemons crashed on routers ID: 212.71.177.44 ID: 212.71.152.222 ID: 212.71.177.58 leaving all the exported routes in the kernel routing table. The bird daemon on router ID: 212.71.177.42 didn't crash. After I've started bird again, everything went to normal. mk
On Sun, Nov 01, 2009 at 05:08:16PM +0100, Martin Kraus wrote:
On Sun, Nov 01, 2009 at 02:50:32PM +0100, Ondrej Zajicek wrote:
Hmm, interesting. Could you describe what change was done in config file, what is a topology of an affected part of the network and which router IDs were used by which routers, especially which router uses router ID 212.71.177.41?
I had to create a diagram which is (hopefully) attached to this mail as an svg. I don't think I could describe it clearly in plaintext.
There are five routers with respective IDs written inside the boxes. All are in ospf area 0.0.0.0. There is rest of the network behind routers with IDs 212.71.177.44 and 212.71.152.222 which id didn't put into the diagram. There are some additional complexities, such as a separate ospf instance from routers with IDs 212.71.177.41 and 212.71.177.42 but that wasn't affected so I didn't include that either. ...
Thank you for the thorough answer. Just one more question: from which router were the log messages you sent earlier:
31-10-2009 19:40:08 <ERR> Router's parent has no next hop. (EN=212.71.177.59, PAR=212.71.177.41) 31-10-2009 19:40:08 <ERR> Router's parent has no next hop. (EN=212.71.177.42, PAR=212.71.177.41)
-- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Mon, Nov 02, 2009 at 03:52:36PM +0100, Ondrej Zajicek wrote:
There are five routers with respective IDs written inside the boxes. All are in ospf area 0.0.0.0. There is rest of the network behind routers with IDs 212.71.177.44 and 212.71.152.222 which id didn't put into the diagram. There are some additional complexities, such as a separate ospf instance from routers with IDs 212.71.177.41 and 212.71.177.42 but that wasn't affected so I didn't include that either. ...
Thank you for the thorough answer. Just one more question: from which router were the log messages you sent earlier:
31-10-2009 19:40:08 <ERR> Router's parent has no next hop. (EN=212.71.177.59, PAR=212.71.177.41) 31-10-2009 19:40:08 <ERR> Router's parent has no next hop. (EN=212.71.177.42, PAR=212.71.177.41)
on the router ID: 212.71.152.222 Btw, I've found that there are more similar entries in the log from today, although this time no crash. 02-11-2009 02:36:51 <ERR> Router's parent has no next hop. (EN=212.71.129.171, PAR=212.71.177.41) 02-11-2009 02:36:51 <ERR> Router's parent has no next hop. (EN=212.71.177.58, PAR=212.71.177.41) 02-11-2009 02:36:51 <ERR> Router's parent has no next hop. (EN=212.71.177.42, PAR=212.71.177.41) mk
participants (2)
-
Martin Kraus -
Ondrej Zajicek