Fwd: Re: ospf loading for ever...
I resend this because I forgot complete subscription first. So if this is duplicate message I am sorry. But to the problem: I have similar issue but I dont have multiple ip addresses or mtu problem. Looks that on some cases another side router stuck to loading state and another side is on full state. On this point I must restart this side what says that it is on full state. So looks that some how that router what ways "loading" wait something from that router what says "full" but because this "full" it does not send it any more... Or something :) wierd thing is that even stop and start this router what is on loading state it not help. I must stop and start this router what is on full state. This was reason why I just upgrade my bird on routers to 1.3.1 but looks that it does not help for this problem. This problem is random but when my network comes larger problems appears more often. I have ~170 routes from ospf and I use hello 1 and dead 10. ps. What bird does if loading routes takes more time than hello timer is?
On Thursday 24 of March 2011, Ondrej Zajicek wrote:
On Thu, Mar 24, 2011 at 08:57:23AM +0100, Arkadiusz Miskiewicz wrote:
Tou can always check it with tcpdump or wireshark on ospf lan.
tcpdump -vv -nn -i<interface> host 224.0.0.5
I don't see bad IPs used, dump using tcpdump -vv -nn -i eth0 ether host mac-addr and proto 89 athttp://carme.pld-linux.org/~arekm/p/ospf.log.gz<http://carme.pld-linux.org/%7Earekm/p/ospf.log.gz>
Don't you have assymetric MTU?
It seems that there is MTU 9000 on x.x.5.135 and MTU 1500 on x.x.5.159 MTU have to be the same, otherwise x.x.5.135 would send big packets and x.x.5.159 would drop them.
That seem to be it, thanks!
-- Kaikki viestissä ilmoitetut summat ovat alvittomia, ellei toisin ole kyseisen summan yhteydessä ilmoitettu. -- F-Solutions Oy Tapio Haapala PL 7, 90571 Oulu GSM 040-0998371 Skype burner- IRC Burner@ircnet
On Wed, Jun 22, 2011 at 07:56:40PM +0300, Tapio Haapala wrote:
I resend this because I forgot complete subscription first. So if this is duplicate message I am sorry. But to the problem:
I have similar issue but I dont have multiple ip addresses or mtu problem. Looks that on some cases another side router stuck to loading state and another side is on full state. On this point I must restart this side what says that it is on full state. So looks that some how that router what ways "loading" wait something from that router what says "full" but because this "full" it does not send it any more... Or something :) wierd thing is that even stop and start this router what is on loading state it not help. I must stop and start this router what is on full state.
Such random problems were common in really old versions, i hoped that we already fixed all of them as on my network (~ 120 routes, ~40 routers) i didn't noticed that problem for a year. But maybe there are some remaining ones. If you encounter that problem, could you make a tcpdump log (tcpdump -i IFACE -s 0 -w FILE proto 89) of that interaction and look for suspicious messages in BIRD log? -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On now I have pcap. I cut it to 1000packets. Looks that when it it on this state it can flood packets. this 1000packet sample is only 300ms. Last time I did not noticed so mutch flood. Anyway. All started when my bird A crashed Jun 29 15:25:21 rnrt kernel: [5459210.573604] bird[331]: segfault at 40 ip 00a28fb9 sp bfc0b4a0 error 4 in bird[a23000+4a000] Any ideas how I can get more info of that? Then I restart it. I get this state where my table looks this: on router B Router ID Pri State DTime Interface Router IP A ip 1 loading/dr 00:10 eth0 10.231.113.1 on router A Router ID Pri State DTime Interface Router IP 10.123.123.113 1 full/dr 00:09 eth0 XXXXXXXXXXXX 10.231.113.113 1 full/bdr 00:10 eth1.1938 10.231.113.113 10.231.101.101 1 loading/dr 00:10 eth1.105 10.231.101.101 10.231.138.138 1 full/dr 00:10 eth1.1255 10.231.138.138 Sorry I sencored public ip:s from these list. If some developer want see pcap pleas mail me so I can send it. There is paste of it: reading from file eth0.ospf.cut.pcap, link-type EN10MB (Ethernet) 17:08:01.424481 IP (tos 0xc0, ttl 1, id 11162, offset 0, flags [none], proto OSPF (89), length 84) 10.231.113.113 > 224.0.0.5: OSPFv2, LS-Ack, length 64 Router-ID 10.231.113.113, Backbone Area, Authentication Type: none (0) Advertising Router 10.231.113.113, seq 0x7fffffff, age 3600s, length 16 External LSA (5), LSA-ID: XXX.XXX.XXX.127 Options: [none] Advertising Router 10.231.113.113, seq 0x7fffffff, age 3600s, length 16 External LSA (5), LSA-ID: XXX.XXX.XXX.130 Options: [none] 17:08:01.424518 IP (tos 0xc0, ttl 64, id 32570, offset 0, flags [none], proto OSPF (89), length 68) 10.231.113.113 > 10.231.113.1: OSPFv2, LS-Request, length 48 Router-ID 10.231.113.113, Backbone Area, Authentication Type: none (0) Advertising Router: 10.231.113.113, External LSA (5), LSA-ID: XXX.XXX.XXX..127 Advertising Router: 10.231.113.113, External LSA (5), LSA-ID: XXX.XXX.XXX..130 17:08:01.424612 IP (tos 0xc0, ttl 1, id 11163, offset 0, flags [none], proto OSPF (89), length 84) 10.231.113.113 > 224.0.0.5: OSPFv2, LS-Ack, length 64 Router-ID 10.231.113.113, Backbone Area, Authentication Type: none (0) Advertising Router 10.231.113.113, seq 0x7fffffff, age 3600s, length 16 External LSA (5), LSA-ID: XXX.XXX.XXX.127 Options: [none] Advertising Router 10.231.113.113, seq 0x7fffffff, age 3600s, length 16 External LSA (5), LSA-ID: XXX.XXX.XXX.130 Options: [none] 17:08:01.424629 IP (tos 0xc0, ttl 64, id 32571, offset 0, flags [none], proto OSPF (89), length 68) 10.231.113.113 > 10.231.113.1: OSPFv2, LS-Request, length 48 Router-ID 10.231.113.113, Backbone Area, Authentication Type: none (0) Advertising Router: 10.231.113.113, External LSA (5), LSA-ID: XXX.XXX.XXX..127 Advertising Router: 10.231.113.113, External LSA (5), LSA-ID: XXX.XXX.XXX..130 17:08:01.424674 IP (tos 0xc0, ttl 1, id 11164, offset 0, flags [none], proto OSPF (89), length 84) 10.231.113.113 > 224.0.0.5: OSPFv2, LS-Ack, length 64 Router-ID 10.231.113.113, Backbone Area, Authentication Type: none (0) Advertising Router 10.231.113.113, seq 0x7fffffff, age 3600s, length 16 External LSA (5), LSA-ID: XXX.XXX.XXX.127 Options: [none] Advertising Router 10.231.113.113, seq 0x7fffffff, age 3600s, length 16 External LSA (5), LSA-ID: XXX.XXX.XXX.130 Options: [none] On this point I restart both ends and all neighbours of that router and problem go away... but I think that if I restart some end it can come back 25.6.2011 11:27, Ondrej Zajicek kirjoitti:
On Wed, Jun 22, 2011 at 07:56:40PM +0300, Tapio Haapala wrote:
I resend this because I forgot complete subscription first. So if this is duplicate message I am sorry. But to the problem:
I have similar issue but I dont have multiple ip addresses or mtu problem. Looks that on some cases another side router stuck to loading state and another side is on full state. On this point I must restart this side what says that it is on full state. So looks that some how that router what ways "loading" wait something from that router what says "full" but because this "full" it does not send it any more... Or something :) wierd thing is that even stop and start this router what is on loading state it not help. I must stop and start this router what is on full state. Such random problems were common in really old versions, i hoped that we already fixed all of them as on my network (~ 120 routes, ~40 routers) i didn't noticed that problem for a year. But maybe there are some remaining ones. If you encounter that problem, could you make a tcpdump log (tcpdump -i IFACE -s 0 -w FILE proto 89) of that interaction and look for suspicious messages in BIRD log?
-- Kaikki viestissä ilmoitetut summat ovat alvittomia, ellei toisin ole kyseisen summan yhteydessä ilmoitettu. -- F-Solutions Oy Tapio Haapala PL 7, 90571 Oulu GSM 040-0998371 Skype burner- IRC Burner@ircnet
participants (2)
-
Ondrej Zajicek -
Tapio Haapala