Re: Issues in Babel

19 Oct 2017

Ondrej Zajicek <santiago@crfreenet.org> writes:
...
Hi Toke (and others)
I updated Babel to use microsecond timers [1] and during testing i
noticed several unrelated issues.
Ah, awesome, thanks for doing that! :)
...
1) Hello expiration was not properly implemented, as i understand
RFC, it should reset the timer based on last packet hello interval
and time-out repeatedly until hello map is zero. Fixed in [2].
Yeah, I think you are right. However, your fix doesn't apply
BABEL_HELLO_EXPIRY_FACTOR to the value stored in last_hello_int, which
is probably a bug, right?
...
2) In some setups unicast seqno requests were sent repeatedly and seqno
was increased indefinitely due to last case in babel_handle_update(),
when the RFC specifies it should be done only if unfeasible route would
otherwise become best. Fixed in [3], by this condition handling all
unicast seqno requests:
if (!feasible && (metric != BABEL_INFINITY) &&
      (!best || (r == best) || (metric < best->metric)))
    babel_unicast_seqno_request(e, s, nbr);
It also changes the behavior by sending seqno request when receiving
infeasible update if there is no previous route from that neighbor.
Yup, this seams reasonable as well.
...
3) Seems like the fix of hello expiration caused that when neighbor
disappears, it is no longer removed timely and routes have to wait
for route timeout, which is about a minute for 4 s hello interval.
It seems that by RFC, we should recompute route metrics whanever rxcost
reported by IHU is changed, so IHU expiration should be enough to make
all related routes unreachable. Apparently, this is not done in BIRD.
Hmm, yeah, babel_expire_ihu() should probably do something similar to
babel_flush_neighbor(), i.e., loop through all selected routes from that
neighbor and re-run the route selection on them?
...
4) Seems like we do not trigger updates when a route update with just
increased seqno due to forwarded unicast request is received. That also
slows convergence. IMHO we should do in babel_handle_update() something
like:
if (route_is_selected() && matching_entry_in_seqno_request_cache())
  babel_trigger_update()
Not sure we should use the request cache for this? We might not have
seen the request, but a downstream still needs the updated seqno; so
maybe just do something like:

diff --git a/proto/babel/babel.c b/proto/babel/babel.c
index c5d695c9..06e1d3ad 100644
--- a/proto/babel/babel.c
+++ b/proto/babel/babel.c
@@ -1219,6 +1219,11 @@ babel_handle_update(union babel_msg *m, struct babel_iface *ifa)
   }
   else
   {
+    /* Trigger an update if we get a seqno increase for a selected route, to
+       improve convergence time */
+    if (r->seqno < msg->seqno && r == best)
+      babel_trigger_update(p);
+
     /* Last paragraph above - update the entry */
     r->advert_metric = msg->metric;
     r->metric = metric;
...
Do you think that my interpretation of these issues and the fixes are
correct?
Think so, yeah; nice job. And thanks for taking the time to go over this :)

-Toke

    

Re: Issues in Babel

Toke Høiland-Jørgensen