bird 2.0.1: SIGSEGV in nexthop_size
While trying to setup bird on a new multi-host setup on Arch Linux. We have two routers with two (diverse) directs paths using a /31 (so please ignore the missing broadcast warnings).. At this point we try to get both IPV4 and IPV6 to talk together using both OSPF and BGP. It seems that it is alive for 15 seconds from assert till death... In our handful tries, it is always R1 (and not R2) that dies... (systemd) Journal entries: jan 28 14:59:14 R1 systemd[1]: Started BIRD routing daemon. jan 28 14:59:14 R1 bird[6648]: Missing broadcast address for interface bond0.3670 <<==== due to /31 jan 28 14:59:14 R1 bird[6648]: Missing broadcast address for interface bond0.3671 <<==== due to /31 jan 28 14:59:14 R1 bird[6648]: Started jan 28 14:59:14 R1 bird[6648]: Assertion 'f->addr_type == a->type' failed at nest/rt-fib.c:241 jan 28 14:59:14 R1 bird[6648]: Assertion 'f->addr_type == a->type' failed at nest/rt-fib.c:261 jan 28 14:59:14 R1 bird[6648]: Assertion 'f->addr_type == a->type' failed at nest/rt-fib.c:241 jan 28 14:59:14 R1 bird[6648]: Assertion 'f->addr_type == a->type' failed at nest/rt-fib.c:204 jan 28 14:59:29 R1 bird[6648]: Next hop address 185.38.27.64 is a local address of iface bond0.3670 jan 28 14:59:29 R1 systemd[1]: bird.service: Main process exited, code=dumped, status=11/SEGV jan 28 14:59:29 R1 systemd[1]: bird.service: Failed with result 'core-dump'. GDB from coredump: (gdb) bt #0 nexthop_size (nh=0x48) at ./nest/route.h:601 #1 rta_apply_hostentry (a=0x7ffd3846b5d0, he=0x556b706028c8, mls=0x0) at nest/rt-table.c:1787 #2 0x0000556b6e835f6e in bgp_apply_next_hop (s=0x7ffd3846b6f0, a=0x7ffd3846b5d0, gw=..., ll=...) at proto/bgp/packets.c:767 #3 0x0000556b6e836713 in bgp_decode_nlri (s=s@entry=0x7ffd3846b6f0, afi=afi@entry=65537, nlri=0x556b70603aec "\030\300\250\002\035\n\023:\bP\037\271&\033B\037\271&\033@", '\377' <repeats 16 times>, len=9, ea=ea@entry=0x556b705ff290, nh=<optimized out>, nh_len=4) at proto/bgp/packets.c:2206 #4 0x0000556b6e838dde in bgp_rx_update (conn=conn@entry=0x556b705b6c90, pkt=pkt@entry=0x556b70603ac0 '\377' <repeats 16 times>, len=53) at proto/bgp/packets.c:2306 #5 0x0000556b6e83a51d in bgp_rx_packet (len=<optimized out>, pkt=0x556b70603ac0 '\377' <repeats 16 times>, conn=0x556b705b6c90) at proto/bgp/packets.c:2815 #6 bgp_rx (sk=0x556b70603970, size=<optimized out>) at proto/bgp/packets.c:2860 #7 0x0000556b6e85c44a in call_rx_hook (s=0x556b70603970, size=<optimized out>) at sysdep/unix/io.c:1770 #8 0x0000556b6e85eb57 in sk_read (s=s@entry=0x556b70603970, revents=1) at sysdep/unix/io.c:1858 #9 0x0000556b6e85f5d5 in io_loop () at sysdep/unix/io.c:2318 #10 0x0000556b6e7f493e in main (argc=<optimized out>, argv=<optimized out>) at sysdep/unix/main.c:892 I have the config and both the core file and a binary with debug symbols, that I can send. Svenne
On Sun, Jan 28, 2018 at 03:43:19PM +0100, Svenne Krap wrote:
While trying to setup bird on a new multi-host setup on Arch Linux.
We have two routers with two (diverse) directs paths using a /31 (so please ignore the missing broadcast warnings)..
At this point we try to get both IPV4 and IPV6 to talk together using both OSPF and BGP.
It seems that it is alive for 15 seconds from assert till death...
In our handful tries, it is always R1 (and not R2) that dies...
...
I have the config and both the core file and a binary with debug symbols, that I can send.
Hi Thanks, please send me these files. I will check that. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Ok, I send you the files privately. Svenne On 28-01-2018 18:09, Ondrej Zajicek wrote:
On Sun, Jan 28, 2018 at 03:43:19PM +0100, Svenne Krap wrote:
While trying to setup bird on a new multi-host setup on Arch Linux.
We have two routers with two (diverse) directs paths using a /31 (so please ignore the missing broadcast warnings)..
At this point we try to get both IPV4 and IPV6 to talk together using both OSPF and BGP.
It seems that it is alive for 15 seconds from assert till death...
In our handful tries, it is always R1 (and not R2) that dies...
...
I have the config and both the core file and a binary with debug symbols, that I can send. Hi
Thanks, please send me these files. I will check that.
On Sun, Jan 28, 2018 at 06:39:46PM +0100, Svenne Krap wrote:
Ok, I send you the files privately.
Svenne
Hi The first attached patch (fix-nexthop.patch) should fix the crash. But seems like there are some strange factors in your setup that triggered the issue: jan 28 14:59:29 R1 bird[6648]: Next hop address 185.38.27.64 is a local address of iface bond0.3670 This means that BGP on R1 received a route with bgp_next_hop 185.38.27.64, which is on R1. There are also these messages: jan 28 14:59:14 R1 bird[6648]: Assertion 'f->addr_type == a->type' failed at nest/rt-fib.c:241 jan 28 14:59:14 R1 bird[6648]: Assertion 'f->addr_type == a->type' failed at nest/rt-fib.c:204 Which are some other issue (likely some mixed-up IPv4 and IPv6). Could you try to build and run BIRD with the second patch (force-assert.patch)? That would case crash and core dump on assert. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hi, We downgraded to version 1.6.3 as we are in a hurry to get to production. We can still simulate the old environment, so I will try the patches... It will probably not be today or tomorrow. Svenne On 29-01-2018 12:55, Ondrej Zajicek wrote:
On Sun, Jan 28, 2018 at 06:39:46PM +0100, Svenne Krap wrote:
Ok, I send you the files privately.
Svenne Hi
The first attached patch (fix-nexthop.patch) should fix the crash.
But seems like there are some strange factors in your setup that triggered the issue:
jan 28 14:59:29 R1 bird[6648]: Next hop address 185.38.27.64 is a local address of iface bond0.3670
This means that BGP on R1 received a route with bgp_next_hop 185.38.27.64, which is on R1.
There are also these messages:
jan 28 14:59:14 R1 bird[6648]: Assertion 'f->addr_type == a->type' failed at nest/rt-fib.c:241 jan 28 14:59:14 R1 bird[6648]: Assertion 'f->addr_type == a->type' failed at nest/rt-fib.c:204
Which are some other issue (likely some mixed-up IPv4 and IPv6). Could you try to build and run BIRD with the second patch (force-assert.patch)? That would case crash and core dump on assert.
On Mon, Jan 29, 2018 at 01:38:17PM +0100, Svenne Krap wrote:
Hi,
We downgraded to version 1.6.3 as we are in a hurry to get to production.
We can still simulate the old environment, so I will try the patches...
It will probably not be today or tomorrow.
Hi, the issue noticed by Toke Hoiland-Jorgensen is most likely the cause of failed assertion messages in log, so no need to run the test with the second patch. You could use the attached patch to fix the second issue. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (2)
-
Ondrej Zajicek -
Svenne Krap