Hello After some time i looked again on bug in OSPF that i met before: "Sometimes the link between two routers hangs. Bird reports for example ptp/exstart at one side and ptp/exchange at the other side. Or full/ptp at one side and nothing at the other side." I observed three cases: 1) first node stuck in init/ptp, second node don't even see first node as neighbour. According to tcpdump, first node don't send hello packets. 2) first node (192.168.36.130) stuck in loading/ptp tcpdump: ... 11:54:38.126821 IP 192.168.36.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48 11:54:38.127048 IP 192.168.36.2 > 224.0.0.5: OSPFv2, Hello (1), length: 48 11:54:39.126493 IP 192.168.36.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48 11:54:39.128045 IP 192.168.36.2 > 224.0.0.5: OSPFv2, Hello (1), length: 48 11:54:38.127613 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Request (3), length: 36 11:54:38.129563 IP 192.168.36.2 > 192.168.36.130: OSPFv2, LS-Update (4), length: 88 11:54:38.131034 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Ack (5), length: 44 11:54:39.126067 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Request (3), length: 36 11:54:39.129339 IP 192.168.36.2 > 192.168.36.130: OSPFv2, LS-Update (4), length: 88 11:54:39.130246 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Ack (5), length: 44 11:54:40.125492 IP 192.168.36.2 > 224.0.0.5: OSPFv2, Hello (1), length: 48 11:54:40.126112 IP 192.168.36.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48 11:54:40.126615 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Request (3), length: 36 11:54:40.127747 IP 192.168.36.2 > 192.168.36.130: OSPFv2, LS-Update (4), length: 88 11:54:40.129674 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Ack (5), length: 44 11:54:41.126490 IP 192.168.36.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48 11:54:41.128826 IP 192.168.36.2 > 224.0.0.5: OSPFv2, Hello (1), length: 48 11:54:41.126068 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Request (3), length: 36 11:54:41.129284 IP 192.168.36.2 > 192.168.36.130: OSPFv2, LS-Update (4), length: 88 11:54:41.130056 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Ack (5), length: 44 ... 3) first node (192.168.36.130) oscillating between loading/ptp and exstart/ptp We can see, that first node also don't send hello packets. tcpdump: 14:24:07.822578 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48 14:24:07.823062 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32 14:24:07.823588 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32 14:24:08.824388 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32 14:24:08.825092 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 692 14:24:08.825660 IP 192.168.36.130 > 192.168.37.130: OSPFv2, LS-Request (3), length: 60 14:24:08.826215 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48 14:24:08.827254 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 692 14:24:08.829550 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32 14:24:09.822583 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48 14:24:09.822828 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32 14:24:09.823636 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32 14:24:10.820982 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32 14:24:10.821474 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48 14:24:10.821816 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 692 14:24:10.822213 IP 192.168.36.130 > 192.168.37.130: OSPFv2, LS-Request (3), length: 60 14:24:10.823820 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 692 14:24:10.824445 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32 14:24:11.822592 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48 14:24:11.822836 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32 14:24:11.823629 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32 14:24:12.821025 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32 I assume that the problem is in timer handling code, which breaks during system time change (for example by agressive NTP daemon). When i manually fiddled with system time, i often triggered the problem. Here are two patches to fix this problem. First patch si probably better but don't work on linux 2.4.x. Second patch should work on it. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Ondrej Zajicek wrote:
Hello
Hello!
After some time i looked again on bug in OSPF that i met before:
[...]
I assume that the problem is in timer handling code, which breaks during system time change (for example by agressive NTP daemon). When i manually fiddled with system time, i often triggered the problem.
Here are two patches to fix this problem. First patch si probably better but don't work on linux 2.4.x. Second patch should work on it.
Thank you! I am afraid, neither will work on BSD, but it is very easy to transform it. So I will do it. Thank you very much, this will be in next release. Ondrej
Hello!
diff -uprN bird-1.0.11/sysdep/unix/io.c bird-1.0.11n/sysdep/unix/io.c --- bird-1.0.11/sysdep/unix/io.c 2008-08-21 17:28:49.000000000 +0200 +++ bird-1.0.11n/sysdep/unix/io.c 2008-08-21 17:28:04.000000000 +0200 @@ -11,6 +11,7 @@ #include <stdlib.h> #include <time.h> #include <sys/time.h> +#include <sys/times.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/fcntl.h> @@ -18,6 +19,8 @@ #include <unistd.h> #include <errno.h>
+#include <asm/param.h> + #include "nest/bird.h" #include "lib/lists.h" #include "lib/resource.h"
I would very much like to avoid including anything in asm/... and especially using HZ. The proper way to get HZ is to call sysconf(_SC_CLK_TCK). This should be portable to all POSIX systems. Anyway, is there any problem in using time() as previously, but detect timewarps? Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth Don't take life too seriously -- you'll never get out of it alive.
participants (3)
-
Martin Mares -
Ondrej Filip -
Ondrej Zajicek