Bug in OSPF

Ondrej Zajicek santiago at crfreenet.org
Thu Aug 21 21:16:08 CEST 2008


Hello

After some time i looked again on bug in OSPF that i met before:

"Sometimes the link between two routers hangs. Bird reports for
example  ptp/exstart at one side and ptp/exchange at the other side. Or
full/ptp at one side and nothing at the other side."

I observed three cases:


1) first node stuck in init/ptp, second node don't even see first node
as neighbour. According to tcpdump, first node don't send hello
packets.


2) first node (192.168.36.130) stuck in loading/ptp
tcpdump:

...
11:54:38.126821 IP 192.168.36.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
11:54:38.127048 IP 192.168.36.2 > 224.0.0.5: OSPFv2, Hello (1), length: 48

11:54:39.126493 IP 192.168.36.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
11:54:39.128045 IP 192.168.36.2 > 224.0.0.5: OSPFv2, Hello (1), length: 48

11:54:38.127613 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Request (3), length: 36
11:54:38.129563 IP 192.168.36.2 > 192.168.36.130: OSPFv2, LS-Update (4), length: 88
11:54:38.131034 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Ack (5), length: 44

11:54:39.126067 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Request (3), length: 36
11:54:39.129339 IP 192.168.36.2 > 192.168.36.130: OSPFv2, LS-Update (4), length: 88
11:54:39.130246 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Ack (5), length: 44

11:54:40.125492 IP 192.168.36.2 > 224.0.0.5: OSPFv2, Hello (1), length: 48
11:54:40.126112 IP 192.168.36.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48

11:54:40.126615 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Request (3), length: 36
11:54:40.127747 IP 192.168.36.2 > 192.168.36.130: OSPFv2, LS-Update (4), length: 88
11:54:40.129674 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Ack (5), length: 44

11:54:41.126490 IP 192.168.36.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
11:54:41.128826 IP 192.168.36.2 > 224.0.0.5: OSPFv2, Hello (1), length: 48

11:54:41.126068 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Request (3), length: 36
11:54:41.129284 IP 192.168.36.2 > 192.168.36.130: OSPFv2, LS-Update (4), length: 88
11:54:41.130056 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Ack (5), length: 44
...


3) first node (192.168.36.130) oscillating between loading/ptp and exstart/ptp
We can see, that first node also don't send hello packets.
tcpdump:
 
14:24:07.822578 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
14:24:07.823062 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32
14:24:07.823588 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32
14:24:08.824388 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32
14:24:08.825092 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 692
14:24:08.825660 IP 192.168.36.130 > 192.168.37.130: OSPFv2, LS-Request (3), length: 60
14:24:08.826215 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
14:24:08.827254 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 692
14:24:08.829550 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32
14:24:09.822583 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
14:24:09.822828 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32
14:24:09.823636 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32
14:24:10.820982 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32
14:24:10.821474 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
14:24:10.821816 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 692
14:24:10.822213 IP 192.168.36.130 > 192.168.37.130: OSPFv2, LS-Request (3), length: 60
14:24:10.823820 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 692
14:24:10.824445 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32
14:24:11.822592 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
14:24:11.822836 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32
14:24:11.823629 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32
14:24:12.821025 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32



I assume that the problem is in timer handling code, which breaks during
system time change (for example by agressive NTP daemon). When i manually
fiddled with system time, i often triggered the problem.


Here are two patches to fix this problem. First patch si probably better
but don't work on linux 2.4.x. Second patch should work on it.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santiago at crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
-------------- next part --------------
A non-text attachment was scrubbed...
Name: monotonic_clock1.patch
Type: text/x-diff
Size: 1428 bytes
Desc: not available
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20080821/b4c03f2b/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: monotonic_clock2.patch
Type: text/x-diff
Size: 1535 bytes
Desc: not available
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20080821/b4c03f2b/attachment-0001.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20080821/b4c03f2b/attachment.asc>


More information about the Bird-users mailing list