On November 22, 2020 7:01:52 PM GMT+01:00, "Toke Høiland-Jørgensen" <toke@toke.dk> wrote:
andreas@rammhold.de writes:

On 18:21 22.11.20, Toke Høiland-Jørgensen wrote:
andreas@rammhold.de writes:
I've been playing with the BIRD source code and implementing a new
extension to Babel but that is not what this mail is about.

Still curious - which extension? :)

 The V4 over/via V6 draft. I would link to it but it seems the IETF
 website is currently unreachable (or I am being cloudflared…).

Ah, cool! That was on my list as well, but good to know you're working
on it! I'll let you have at it, then :)

Breakpoint 1, add_tail (n=0x234ac30, l=0x241f620) at ./lib/lists.c:82
#0  add_tail (n=0x234ac30, l=0x241f620) at ./lib/lists.c:82
#1  sl_alloc (s=0x21bd7d0) at lib/slab.c:264

This is that call to add_tail in slab.c, with a few lines of context:

full_partial:
  rem_node(&h->n);
  add_tail(&s->full_heads, &h->n);
  goto redo;

Since add_tail is being called immediately after rem_node(), and they I
both inlines, my guess would be that this comes from the compiler
optimising out the clearing of n->prev and n->next in rem_node() since
they are immediately being reset.

 Interesting. The compiler removes the first assignment becuase it is
 rewritten (across function boundaries/after inlining) but doesn't
 realize that it is being read?

Hmm, yeah, good point, if you have a read in-between that optimisation
really shouldn't happen - but maybe the compiler is rearranging things
in other interesting ways? I guess you'd have to inspect the generated
code to know for sure...

Are these all false positives or should there be more cleanup code on
these list entries?

Speaking for the backtraces you saw in the Babel code, these all come
from variants of the following pattern:

struct babel_msg_node *msgn = sl_alloc(p->msg_slab);
/* initialise msgn->msg */
add_tail(&queue, NODE msgn);

where babel_msg_node is just:

struct babel_msg_node {
  node n;
  union babel_msg msg;
};

Since add_tail sets n->next and n->prev right after those asserts, no
harm is actually done and the memory ends up fully initialised, so I'm
not sure if there's any reason to do anything about it: It's really
only a problem because of the debug asserts.

 That was my hope as well. Maybe there are some attributes / qualifiers
 that could be added to the variables (when the debug flag is set) to
 prohibit this optimisation? Having these checks is probably useful (e.g.
 for CI, tests & development) but only if it doesn't break the entire
 software.

Yeah, it is a bit annoying that it breaks debug builds. My preference
would be to just have the slab code zero out objects, but let's see what
the Bird devs think :)

-Toke