[PATCH] more list node initialisation
Hi, In addition to those list node initialisation patches that went into master so far I still have these two commits in my local tree. One of them is about the logging configuration that I described in my initial email on the topic. The other patch is about new memory buckets in the BGP implementation. I am still hunting another variant of this that crashes BIRD (with --enable-debug) whenever a Babel peer is shutting down.. andi-
This is required as they otherwise point to already (invalid) existing lists and add_tail will fail (during a debug build). Re-initializing these should be fine as the list they belong to is being re-initialized on entry to the very same function. This became mandatory as of baac7009063d the next and prev pointers of nodes in a list are checked against NULL in debug builds. --- sysdep/unix/log.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/sysdep/unix/log.c b/sysdep/unix/log.c index e24322c6..14a0e875 100644 --- a/sysdep/unix/log.c +++ b/sysdep/unix/log.c @@ -341,7 +341,8 @@ default_log_list(int initial, const char **syslog_name) #ifdef HAVE_SYSLOG_H if (!dbgf) { - static struct log_config lc_syslog = { .mask = ~0 }; + static struct log_config lc_syslog; + lc_syslog = (struct log_config){ .mask = ~0 }; add_tail(&log_list, &lc_syslog.n); *syslog_name = bird_name; } @@ -349,15 +350,22 @@ default_log_list(int initial, const char **syslog_name) if (dbgf && (dbgf != stderr)) { - static struct log_config lc_debug = { .mask = ~0 }; - lc_debug.fh = dbgf; + static struct log_config lc_debug; + lc_debug = (struct log_config){ + .mask = ~0, + .fh = dbgf + }; add_tail(&log_list, &lc_debug.n); } if (initial || (dbgf == stderr)) { - static struct log_config lc_stderr = { .mask = ~0, .terminal_flag = 1}; - lc_stderr.fh = stderr; + static struct log_config lc_stderr; + lc_stderr = (struct log_config){ + .mask = ~0, + .terminal_flag = 1, + .fh = stderr + }; add_tail(&log_list, &lc_stderr.n); } -- 2.29.2
I did observe crashes when running BIRD as a debug build when memory returned from the allocator was supposedly not being zeroed. This became mandatory as of baac7009063d the next and prev pointers of nodes in a list are checked against NULL in debug builds. --- proto/bgp/attrs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proto/bgp/attrs.c b/proto/bgp/attrs.c index 828bc118..b4001f59 100644 --- a/proto/bgp/attrs.c +++ b/proto/bgp/attrs.c @@ -1476,7 +1476,7 @@ bgp_get_bucket(struct bgp_channel *c, ea_list *new) } /* Create the bucket */ - b = mb_alloc(c->pool, size); + b = mb_allocz(c->pool, size); init_list(&b->prefixes); b->hash = hash; -- 2.29.2
On Tue, Nov 24, 2020 at 09:08:29AM +0100, Andreas Rammhold wrote:
Hi,
In addition to those list node initialisation patches that went into master so far I still have these two commits in my local tree. One of them is about the logging configuration that I described in my initial email on the topic. The other patch is about new memory buckets in the BGP implementation.
Hi Thanks, merged (with some minor changes). The first issue was also reported earlier by Mikael Magnusson.
I am still hunting another variant of this that crashes BIRD (with --enable-debug) whenever a Babel peer is shutting down..
Is is somehow complicated? IMHO this class of issues shoud die() in add_tail(), and exact location should be clear from stack trace. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 15:59 25.11.20, Ondrej Zajicek wrote:
Thanks, merged (with some minor changes). The first issue was also reported earlier by Mikael Magnusson.
Thanks!
I am still hunting another variant of this that crashes BIRD (with --enable-debug) whenever a Babel peer is shutting down..
Is is somehow complicated? IMHO this class of issues shoud die() in add_tail(), and exact location should be clear from stack trace.
It didn't die with that visible in the stack trace. Here is what I did see: (gdb) bt #0 babel_announce_rte (p=p@entry=0x1052790, e=e@entry=0x10bbe80) at proto/babel/babel.c:674 #1 0x0000000000453547 in babel_select_route (p=p@entry=0x1052790, e=0x10bbe80, mod=mod@entry=0x10ff3b8) at proto/babel/babel.c:780 #2 0x0000000000454f4b in babel_retract_route (r=0x10ff3b8, p=0x1052790) at proto/babel/babel.c:178 #3 babel_handle_update (m=<optimized out>, ifa=<optimized out>) at proto/babel/babel.c:1240 #4 0x00000000004575b6 in babel_process_packet (pkt=0x10c4cf0, len=<optimized out>, saddr=..., ifa=0x10bf940) at proto/babel/packets.c:1461 #5 0x0000000000457684 in babel_rx_hook (sk=<optimized out>, len=<optimized out>) at proto/babel/packets.c:1520 #6 0x0000000000490ef1 in sk_read (s=s@entry=0x10c4ba0, revents=<optimized out>) at sysdep/unix/io.c:1910 #7 0x000000000049179c in io_loop () at sysdep/unix/io.c:2342 #8 0x000000000049574e in main (argc=<optimized out>, argv=<optimized out>) at sysdep/unix/main.c:923 Keep in mind that a few of the babel.c line numbers might be +/- a few as I've started adding my v4-via-v6 changes to it. The setup didn't have any of those in the routing table tho so I am pretty sure they aren't the source of the issue. When disabling the debug flag the crashes were gone again. (gdb) info locals a0 = {next = 0x0, pprev = 0x0, uc = 0, hash_key = 0, eattrs = 0x0, src = 0x105a498, hostentry = 0x0, from = {addr = {0, 0, 0, 0}}, igp_metric = 0, source = 13 '\r', scope = 4 '\004', dest = 3 '\003', aflags = 0 '\000', nh = {gw = {addr = {0, 0, 0, 0}}, iface = 0x0, next = 0x0, flags = 0 '\000', weight = 0 '\000', labels_orig = 0 '\000', labels = 0 '\000', label = 0x7ffffbd7391c} } a = <optimized out> rte = <optimized out> r = 0x0 c = 0x10529f0 Going by the line numbers the crash should occur roughly here (+/- a few because memset and those assignments above might have been reordered): babel_announce_rte(): … 660 │ else if (e->valid && (e->router_id != p->router_id)) 661 │ { 662 │ /* Unreachable */ 663 │ rta a0 = { 664 │ .src = p->p.main_source, 665 │ .source = RTS_BABEL, 666 │ .scope = SCOPE_UNIVERSE, 667 │ .dest = RTD_UNREACHABLE, 668 │ }; 669 │ 670 │ rta *a = rta_lookup(&a0); 671 │ rte *rte = rte_get_temp(a); 672 │ memset(&rte->u.babel, 0, sizeof(rte->u.babel)); 673 │ rte->pflags = 0; -> 674 │ rte->pref = 1; 675 │ 676 │ e->unreachable = 1; 677 │ rte_update2(c, e->n.addr, rte, p->p.main_source); The assembly at $rip looks like this: 0x0000000000451f9e <+367>: rep stos QWORD PTR es:[rdi],rax 0x0000000000451fa1 <+370>: mov rax,QWORD PTR [r12+0x58] 0x0000000000451fa6 <+375>: mov QWORD PTR [rsp+0x20],rax 0x0000000000451fab <+380>: mov BYTE PTR [rsp+0x44],0xd 0x0000000000451fb0 <+385>: mov BYTE PTR [rsp+0x45],0x4 0x0000000000451fb5 <+390>: mov BYTE PTR [rsp+0x46],0x3 => 0x0000000000451fba <+395>: mov rax,QWORD PTR [rbp+0x28] 0x0000000000451fbe <+399>: mov rax,QWORD PTR [rax+0x10] 0x0000000000451fc2 <+403>: mov rcx,QWORD PTR [rax+0x18] 0x0000000000451fc6 <+407>: mov rsi,QWORD PTR [rbp+0x40] 0x0000000000451fca <+411>: mov rdx,QWORD PTR [rbp+0x48] $ print $rbp $1 = (void *) 0x0 rbp being 0 is likely what causes the crash here? I haven't invested much more time into this. andi-
participants (3)
-
Andreas Rammhold -
andreas@rammhold.de -
Ondrej Zajicek