Hi,
bird needs something like this because in case it daemonizes it still
keeps the fds open of the controlling tty which is bad in case you'd
like to log out which then will hang.
I already opened a Debian Bug - See:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=427772
Flo
diff -Nur bird-1.0.11/sysdep/unix/main.c bird-1.0.11.flo/sysdep/unix/main.c
--- bird-1.0.11/sysdep/unix/main.c 2007-06-20 09:13:21.000000000 +0200
+++ bird-1.0.11.flo/sysdep/unix/main.c 2007-06-20 09:21:00.000000000 +0200
@@ -434,7 +434,18 @@
die("fork: %m");
if (pid)
return 0;
+
setsid();
+
+ /* Close all FDs and thus detach from controlling tty */
+ for (i=getdtablesize();i>=0;--i)
+ close(i);
+
+ /* create harmless stdin/out/err */
+ i=open("/dev/null",O_RDWR);
+ dup(i);
+ dup(i);
+
}
signal_init();
--
Florian Lohoff flo(a)rfc822.org +49-171-2280134
Those who would give up a little freedom to get a little
security shall soon have neither - Benjamin Franklin
Hi,
i debugged further and build a version with debugging symbols and
put an assert into the code where we detect the out of sequence
netlink message. My understanding is that we do an krt_if_scan on a
regular basis - Now we go through the notification chain and end up
sending another netlink message before krt_if_scan pulled all netlink
messages. nl_get_reply gets an out-of-sequence netlink message and drops
it although nl_get_scan would need it for an end-of-messages marker. Thus
returning from our notification chain nl_get_scan ends up polling for the
next message which already got removed via nl_send_route -> nl_exchange ->
nl_get_reply as an out-of-sequence. As the netlink FD is blocking we
deadlock here.
One solution i see would be to convert all nl_get_scan users to first
poll ALL messages before starting to process them which could mean a lot
of memory usage especially on full routing BGP where we need to poll all
routes first.
Another solution would be to make some generic polling/callback based
approach, or probably replacing the netlink.c with an API wrapper around
libnl (http://people.suug.ch/~tgr/libnl/)
(gdb) bt
#0 0xb7e3e947 in raise () from /lib/tls/libc.so.6
#1 0xb7e400c9 in abort () from /lib/tls/libc.so.6
#2 0xb7e3805f in __assert_fail () from /lib/tls/libc.so.6
#3 0x08078a09 in nl_get_reply () at netlink.c:135
#4 0x08078b0d in nl_exchange (pkt=0xbfe3f2b4) at netlink.c:193
#5 0x08079562 in nl_send_route (p=0x80905a0, e=0x809921c, new=0) at netlink.c:542
#6 0x080795d2 in krt_set_notify (p=0x80905a0, n=0x80981c4, new=0x0, old=0x809921c)
at netlink.c:561
#7 0x0807614e in krt_notify (P=0x80905a0, net=0x80981c4, new=0x0, old=0x809921c,
attrs=0x0) at krt.c:698
#8 0x08049910 in do_rte_announce (a=0x8095e40, net=0x80981c4, new=0x80991cc,
old=0x809921c, tmpa=0x0, class=4100)
at /home/flo/p/root/bird-1.0.11/./nest/rt-table.c:227
#9 0x08049521 in rte_announce (tab=0x808f4b0, net=0x80981c4, new=0x80991cc,
old=0x809921c, tmpa=0x0) at /home/flo/p/root/bird-1.0.11/./nest/rt-table.c:261
#10 0x08049b94 in rte_recalculate (table=0x808f4b0, net=0x80981c4, p=0x80906c8,
new=0x80991cc, tmpa=0x0) at /home/flo/p/root/bird-1.0.11/./nest/rt-table.c:368
#11 0x08049ff1 in rte_update (table=0x808f4b0, net=0x80981c4, p=0x80906c8,
new=0x80991cc) at /home/flo/p/root/bird-1.0.11/./nest/rt-table.c:514
#12 0x0804fd41 in dev_ifa_notify (p=0x80906c8, c=1, ad=0x8095c80)
at /home/flo/p/root/bird-1.0.11/./nest/rt-dev.c:69
#13 0x0804e962 in ifa_send_notify (p=0x80906c8, c=1, a=0x8095c80)
at /home/flo/p/root/bird-1.0.11/./nest/iface.c:148
#14 0x0804e87e in ifa_notify_change (c=1, a=0x8095c80)
at /home/flo/p/root/bird-1.0.11/./nest/iface.c:159
#15 0x0804ea7d in if_notify_change (c=1, i=0x8095b38)
at /home/flo/p/root/bird-1.0.11/./nest/iface.c:211
#16 0x0804ed24 in if_update (new=0xbfe3f5ec)
at /home/flo/p/root/bird-1.0.11/./nest/iface.c:280
#17 0x08078f5c in nl_parse_link (h=0x80919b8, scan=1) at netlink.c:334
#18 0x080792a1 in krt_if_scan (p=0x8090778) at netlink.c:445
#19 0x08074fd5 in kif_scan (t=0x8095af8) at krt.c:94
#20 0x08075004 in kif_force_scan () at krt.c:102
#21 0x08076018 in krt_scan (t=0x8095a48) at krt.c:655
#22 0x08073005 in tm_shot () at io.c:298
#23 0x08074886 in io_loop () at io.c:1126
#24 0x080774ef in main (argc=Cannot access memory at address 0x67ec
) at main.c:462
--
Florian Lohoff flo(a)rfc822.org +49-171-2280134
Those who would give up a little freedom to get a little
security shall soon have neither - Benjamin Franklin