On 15:59 25.11.20, Ondrej Zajicek wrote:
Thanks, merged (with some minor changes). The first issue was also reported earlier by Mikael Magnusson.
Thanks!
I am still hunting another variant of this that crashes BIRD (with --enable-debug) whenever a Babel peer is shutting down..
Is is somehow complicated? IMHO this class of issues shoud die() in add_tail(), and exact location should be clear from stack trace.
It didn't die with that visible in the stack trace. Here is what I did see: (gdb) bt #0 babel_announce_rte (p=p@entry=0x1052790, e=e@entry=0x10bbe80) at proto/babel/babel.c:674 #1 0x0000000000453547 in babel_select_route (p=p@entry=0x1052790, e=0x10bbe80, mod=mod@entry=0x10ff3b8) at proto/babel/babel.c:780 #2 0x0000000000454f4b in babel_retract_route (r=0x10ff3b8, p=0x1052790) at proto/babel/babel.c:178 #3 babel_handle_update (m=<optimized out>, ifa=<optimized out>) at proto/babel/babel.c:1240 #4 0x00000000004575b6 in babel_process_packet (pkt=0x10c4cf0, len=<optimized out>, saddr=..., ifa=0x10bf940) at proto/babel/packets.c:1461 #5 0x0000000000457684 in babel_rx_hook (sk=<optimized out>, len=<optimized out>) at proto/babel/packets.c:1520 #6 0x0000000000490ef1 in sk_read (s=s@entry=0x10c4ba0, revents=<optimized out>) at sysdep/unix/io.c:1910 #7 0x000000000049179c in io_loop () at sysdep/unix/io.c:2342 #8 0x000000000049574e in main (argc=<optimized out>, argv=<optimized out>) at sysdep/unix/main.c:923 Keep in mind that a few of the babel.c line numbers might be +/- a few as I've started adding my v4-via-v6 changes to it. The setup didn't have any of those in the routing table tho so I am pretty sure they aren't the source of the issue. When disabling the debug flag the crashes were gone again. (gdb) info locals a0 = {next = 0x0, pprev = 0x0, uc = 0, hash_key = 0, eattrs = 0x0, src = 0x105a498, hostentry = 0x0, from = {addr = {0, 0, 0, 0}}, igp_metric = 0, source = 13 '\r', scope = 4 '\004', dest = 3 '\003', aflags = 0 '\000', nh = {gw = {addr = {0, 0, 0, 0}}, iface = 0x0, next = 0x0, flags = 0 '\000', weight = 0 '\000', labels_orig = 0 '\000', labels = 0 '\000', label = 0x7ffffbd7391c} } a = <optimized out> rte = <optimized out> r = 0x0 c = 0x10529f0 Going by the line numbers the crash should occur roughly here (+/- a few because memset and those assignments above might have been reordered): babel_announce_rte(): … 660 │ else if (e->valid && (e->router_id != p->router_id)) 661 │ { 662 │ /* Unreachable */ 663 │ rta a0 = { 664 │ .src = p->p.main_source, 665 │ .source = RTS_BABEL, 666 │ .scope = SCOPE_UNIVERSE, 667 │ .dest = RTD_UNREACHABLE, 668 │ }; 669 │ 670 │ rta *a = rta_lookup(&a0); 671 │ rte *rte = rte_get_temp(a); 672 │ memset(&rte->u.babel, 0, sizeof(rte->u.babel)); 673 │ rte->pflags = 0; -> 674 │ rte->pref = 1; 675 │ 676 │ e->unreachable = 1; 677 │ rte_update2(c, e->n.addr, rte, p->p.main_source); The assembly at $rip looks like this: 0x0000000000451f9e <+367>: rep stos QWORD PTR es:[rdi],rax 0x0000000000451fa1 <+370>: mov rax,QWORD PTR [r12+0x58] 0x0000000000451fa6 <+375>: mov QWORD PTR [rsp+0x20],rax 0x0000000000451fab <+380>: mov BYTE PTR [rsp+0x44],0xd 0x0000000000451fb0 <+385>: mov BYTE PTR [rsp+0x45],0x4 0x0000000000451fb5 <+390>: mov BYTE PTR [rsp+0x46],0x3 => 0x0000000000451fba <+395>: mov rax,QWORD PTR [rbp+0x28] 0x0000000000451fbe <+399>: mov rax,QWORD PTR [rax+0x10] 0x0000000000451fc2 <+403>: mov rcx,QWORD PTR [rax+0x18] 0x0000000000451fc6 <+407>: mov rsi,QWORD PTR [rbp+0x40] 0x0000000000451fca <+411>: mov rdx,QWORD PTR [rbp+0x48] $ print $rbp $1 = (void *) 0x0 rbp being 0 is likely what causes the crash here? I haven't invested much more time into this. andi-