SIGSEGV on rt-prune-step, v1.3.7 and git HEAD
I am trying to build a bit more advanced border router here capable of uRPF. For that reason, each eBGP peer gets its own kernel table, and bird is used to feed RST_BGP, RTS_STATIC and RTS_DEVICE routes to these supplementary tables, instead of messing with the "main" kernel routing table. Well, the Linux kernel syncer is not very sturdy ATM: it is prone to leaving behind some routes (which are correctly annotated as proto bird) when you set it to "persist 0". And it also segfaults... It segfaults both in v1.3.7 and git HEAD, so it is not a bug in the new "batched" route flushing code. Here's gdb output for git HEAD (standard hardened Debian build of bird): Starting program: /usr/sbin/bird -c /etc/bird/bird.conf -d -D /tmp/debug -u bird -g birdctl [Thread debugging using libthread_db enabled] Program received signal SIGSEGV, Segmentation fault. 0x000055555555d816 in rt_prune_step (max_feed=<optimized out>, tab=<optimized out>) at ../../nest/rt-table.c:958 958 ../../nest/rt-table.c: No such file or directory. in ../../nest/rt-table.c line 958 of nest/rt-table.c tries to reference e->sender, which apparently can be NULL (whether it *should* be NULL I don't know): rescan: for (e=n->routes; e; e=e->next) if (e->sender->core_state != FS_HAPPY && *** SIGSEGV HERE e->sender->core_state != FS_FEEDING) { if (*max_feed <= 0) { FIB_ITERATE_PUT(fit, fn); return 0; } rte_discard(tab, e); (*max_feed)--; goto rescan; } Any ideas? -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh
On Thu, Apr 05, 2012 at 01:22:11PM -0300, Henrique de Moraes Holschuh wrote:
I am trying to build a bit more advanced border router here capable of uRPF. For that reason, each eBGP peer gets its own kernel table, and bird is used to feed RST_BGP, RTS_STATIC and RTS_DEVICE routes to these supplementary tables, instead of messing with the "main" kernel routing table.
Well, the Linux kernel syncer is not very sturdy ATM: it is prone to leaving behind some routes (which are correctly annotated as proto bird) when you set it to "persist 0".
Are these routes left behind exactly the ones from RTS_DEVICE protocol? There is an obsure and probably historic condition in krt_flush_routes() that disallows removing these ones. You could replace this (in sysdep/unix/krt.c, krt_flush_routes()) : if ((n->n.flags & KRF_INSTALLED) && a->source != RTS_DEVICE && a->source != RTS_INHERIT) with just this: if (n->n.flags & KRF_INSTALLED) I will probably merge thes change soon.
And it also segfaults... It segfaults both in v1.3.7 and git HEAD, so it is not a bug in the new "batched" route flushing code.
During which operation it segfaults? Is the segfault reproductible in some situation? What binary do you use, some special compile variants? Could you get 'core' file and send it to me together with the unstripped binary and the config file? Are you sure that the bug appears in plain version 1.3.7 (i saw "v1.3.7 incorrectly tagged in git repository" e-mail, or perhaps some switched binaries)?
Here's gdb output for git HEAD (standard hardened Debian build of bird):
Starting program: /usr/sbin/bird -c /etc/bird/bird.conf -d -D /tmp/debug -u bird -g birdctl [Thread debugging using libthread_db enabled]
Program received signal SIGSEGV, Segmentation fault. 0x000055555555d816 in rt_prune_step (max_feed=<optimized out>, tab=<optimized out>) at ../../nest/rt-table.c:958 958 ../../nest/rt-table.c: No such file or directory. in ../../nest/rt-table.c
line 958 of nest/rt-table.c tries to reference e->sender, which apparently can be NULL (whether it *should* be NULL I don't know):
Definitely shouldn't. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (2)
-
Henrique de Moraes Holschuh -
Ondrej Zajicek