[PATCH] Add the Babel routing protocol to Bird

Thu Oct 8 12:59:28 CEST 2015

On Mon, Sep 07, 2015 at 11:10:34PM +0200, Toke Høiland-Jørgensen wrote:
> This adds the Babel routing protocol (RFC6126) to Bird. It is a complete
> implementation of the IPv6 subset of RFC6126, but does not implement any
> of the extensions.

Hello

I am sorry i didn't got around to review it sooner, but doing a proper
review for a whole new (and unfamiliar) protocol takes nontrivial amount
of time.

Generally, the code looks good and i am determined to finally merge it,
but it definitely needs more refinement before that.

The review consist of general points followed by comments specific
to lines of the patch. Occasionally i reference the new RIP code,
which is not yet merged to BIRD but is available at
https://gitlab.labs.nic.cz/labs/bird/blob/rip-new/proto/rip/ .

I am glad to see your answers.

First, there is an issue with route selection. Proper route selection in BIRD
should work this way:

1) Babel chooses the best route from received routes, say selected_in
2) If selected_in changes, it is propagated to the core by rte_update()
3) Core compares routes from multiple protocols and one is exported to the Babel
4) This route is either Babel own route or external route
5) In babel_rt_notify(), Babel receives that route, finds appropriate internal
   route (or create a new one) and stores it in say selected_out.
6) Even if the received selected_out is an external route, Babel
   do not stop propagating selected_in to the core
7) selected_out is propagated to the network in periodic updates.
   If selected_out changes, it is an event for triggered updates.

The current code clearly does not work this way:
babel_rt_notify():
  r = (e->selected) ? e->selected : babel_get_route(e, NULL);

Which IMHO means that if there is an internal selected route, the new received
external route will be ignored. E.g. in cases when at first some internal routes
were recived, one of them was selected, and after a while a new external route
appears.

The proper way (for babel_rt_notify()) would be to check whether the received
route (new) is local and based on that use either the selected_in or a route
with NULL neighbor:

  r = (new->attrs->src->proto == p) ? e->selected_in : babel_get_route(e, NULL);

Note that generally babel_rt_notify() should not need to call babel_select_route(),
as changing the selected_out route does not influence the selected_in route.

There are also other places in the code where the behavior is different the
scheme above (like triggering update directly in babel_select_route().

Although it is not necessary to explicitly keep both selected_in and
selected_out as fields in struct babel_entry, but it is simple and
comprehensible.

Second, can we really expect that TLVs in received packets are aligned?
Although all TLVs looks like suited to 32-bit alignment, i did not found
mentioned it in Babel RFC and availability of Pad1, PadN TLVs and per-byte
compressible addresses suggest that they generally are not aligned.

In that case, we have to use get_uXX()/put_uXX() functions instead of directly
accessing babel_tlv_* structure fields. Or we could copy each TLV to a local
buffer before calling ntoh+handle callbacka, as each TLV internally is aligned,
and enforce alignment for outgoing packets. I would generally prefer the first
approach, but the second approach is simpler way to fix it.

Third, although i am OK with accepting IPv6-only implementation, it should
be ready for future expansion. TLVs with AE 1 could be skipped, but there
should not be hidden hardwired IPv6 assumptions in the code.

Mainly, TLV structures should use addr[0] at the end instead of addr[2] and
addr[4]. Address space requirements should be handled explictly based on AE.

Fourth, it seems like triggered updates are not really implemented as they
should. Route change triggers propagation, but the whole routing table is always
propagated instead of just the changed entries. This could be easily fixed by
having timestamp on each babel_entry updated everytime when a significant change
of selected_out route happened. For regular updates all routes are propagated,
for triggered updates only the entries with timestamps >= time when the trigger
update was requested. See want_triggered and tx_changed fields in the new RIP
code.

Fifth, for protocols like RIP or Babel that do not use explicit acknowledgments
it is a very good idea to properly handle TX failures (which usually means that
kernel buffers are full) at least when sending updates. When sk_send_to() fails
during sending updates, the update process should be postponed until TX hook of
the socket is called, then the update process could continue in the TX hook.
Which means FIB_ITERATE with persistent iterator should be used instead of
FIB_WALK. See rip_send_table() and rip_send_response() in the new RIP code.
In the current Babel code this is complicated by the implicit call
to babel_send_queue() inside of babel_add_tlv_size() instead of explicit
call from update process.

Sixth, the code handling the external representation should be in packet.c
Esp. all babel_handle_* functions and most of babel_send_* functions.
In most cases it is just a move, in some cases it makes sense to separate
external-representation-bound parts to internal-structures-bound parts
(e.g. for babel_handle_update() separating the part updating routing table to
a separate function, see rip_receive_response() and rip_update_rte() in the
new RIP code for the example) Also babel_header and TLV structures could be
moved at the beginning of babel.c, Unnecessary function declarations
(babel_hton_*) could be removed from babel.h, Trivial inline functions
babel_add_tlv_* could be just inlined to appropriate babel_send_* function.

Seventh, babel_get_addr_* and babel_put_addr_* functions are strange - IMHO these
should not be named based on TLV, but based on their real functions and explicitly
called from appropriate babel_handle* or babel_send_* functions instead using
dispatch through tlv_data.

Finally, the coding style is much more consistent with the rest of the BIRD than
in the previous patch, just some minor issues: "if(" -> "if (", opening braces
of code blocks should be on separate lines, we don't use aligned field names in
structure definitions. Local var for struct babel_iface * should be generally
named 'ifa' instead of 'bif', for struct babel_neighbor * just 'n' instead of
'bn'.

Rest are comments specific to lines in the patch:

> diff --git a/nest/proto.c b/nest/proto.c
> ..
> +  proto_build(&proto_babel);
> +  bfd_init_all();
> +#endif

Calling bfd_init_all() seems like copy-paste error.

> diff --git a/nest/route.h b/nest/route.h
> ...
> @@ -538,6 +546,7 @@ extern struct protocol *attr_class_to_protocol[EAP_MAX];
>  #define DEF_PREF_RIP		120	/* RIP */
>  #define DEF_PREF_BGP		100	/* BGP */
>  #define DEF_PREF_PIPE		70	/* Routes piped from other tables */
> +#define DEF_PREF_BABEL		42	/* Babel */

Default could be higher, e.g. 130

> diff --git a/proto/babel/babel.h b/proto/babel/babel.h
> ...
> +enum babel_tlv_type_t {

Enum names (like struct names) are not type names, so do not use _t suffix.

> +enum babel_tlv_type_t {
> +  BABEL_TYPE_PAD0             = 0,
> +  BABEL_TYPE_PADN             = 1,

Perhaps BABEL_TLV_* would be better?

> +enum babel_iface_type_t {
> +  BABEL_IFACE_TYPE_WIRED,
> +  BABEL_IFACE_TYPE_WIRELESS,
> +  BABEL_IFACE_TYPE_MAX
> +};

I would prefer to have explicit values here. Perhaps also reserve 0 for
undefined?

> +struct babel_seqno_request_cache {
> +  slab  *slab;
> +  list   entries;  /* Seqno requests in the cache (struct babel_seqno_request) */
> +};

Perhaps this should be merged into babel_proto?

> +struct babel_source {
> +  node n;
> +  struct babel_entry *e;

This ptr seems completely unused.

> +struct babel_config {
> +  struct proto_config c;
> +
> +  list iface_list;              /* Patterns configured -- keep it first; see babel_reconfigure why */
> +  int  port;

Port option should be per-interface

> +#define BABEL_ADD_TLV_SEND(tlv,bif,func,addr) do {                      \

Seems unused

> diff --git a/proto/babel/babel.c b/proto/babel/babel.c
> ...
> +#undef TRACE
> +#define TRACE(level, msg, args...) do { if (p->p.debug & level) { log(L_TRACE "%s: " msg, p->p.name , ## args); } } while(0)

Unnecessary. Just use TRACE() defined in nest/protocol.h

> +/* computes a-b % 65535 for u16 datatypes */
> +static inline u16
> +diff_mod64k(u16 a, u16 b)
> +{
> +  return a >= b ? a-b : 0xffff-b+a;
> +}

Babel uses modulo 2^16 (65536), comment says a-b % 65535
The diff_mod64k() is neither:
    a=0xffff b=0 -> ret=0xffff (ok for modulo 2^16, not for 65535)
    a=0 b=1 -> ret=0xfffe (ok for modulo 65535, not for 2^16)

If it should be just a-b % 2^16, then there is no need for this function
((u16) a-b) should be enough.

> +static inline struct babel_entry *
> +babel_find_entry(struct babel_proto *p, ip_addr prefix, u8 plen)
> +{
> +  return fib_find(&p->rtable, &prefix, plen);
> +}
> +
> +static struct babel_entry *
> +babel_get_entry(struct babel_proto *p, ip_addr prefix, u8 plen)
> +{
> +  struct babel_entry *e = babel_find_entry(p, prefix, plen);
> +  if(e) return e;
> +  e = fib_get(&p->rtable, &prefix, plen);

babel_find_entry() is unnecessary, just use fib_get(), as it uses fib_find()
internally anyways).

> +void
> +babel_flush_entry(struct babel_entry *e)
> +{
> +  struct babel_proto *p = e->proto;
> +  TRACE(D_EVENTS, "Flushing entry %I/%d", e->n.prefix, e->n.pxlen);
> +  rem_node(&e->garbage_node);
> +  if(p) fib_delete(&p->rtable, e);

It is really possible that (p == NULL)? In that case TRACE() would crash.

> +static void
> +babel_expire_routes(struct babel_proto *p)
> +{
> +  struct babel_entry *e;
> +  struct babel_route *r, *rx;
> +  node *n;
> +  FIB_WALK(&p->rtable, n) {
> +    e = (struct babel_entry *)n;
> +    WALK_LIST_DELSAFE(r, rx, e->routes) {
> +      if(r->refresh_time && r->refresh_time <= now) {
> +        refresh_route(r);
> +        r->refresh_time = 0;
> +      }
> +      if(r->expires && r->expires <= now)
> +        expire_route(r);
> +    }
> +    expire_sources(e);
> +  } FIB_WALK_END;
> +  WALK_LIST_FIRST(n, p->garbage) {
> +    e = SKIP_BACK(struct babel_entry, garbage_node, n);
> +    babel_flush_entry(e);
> +  }
> +}

Instead of using p->garbage and garbage_node, you could simply use FIB_ITERATE
instead of FIB_WALK and remove babel_entry inside the cycle:

  FIB_ITERATE_INIT(&fit, &p->rtable);

loop:
  FIB_ITERATE_START(&p->rtable, &fit, node)
  {
    struct babel_entry *e = (void *) node;

    ...

    if (EMPTY_LIST(e->sources) && EMPTY_LIST(e->routes))
    {
      FIB_ITERATE_PUT(&fit, node);
      babel_flush_entry(e)
      goto loop;
    }
  }
  FIB_ITERATE_END(node);

Therefore whole garbage list could be removed.

> +static void
> +babel_expire_neighbors(struct babel_proto *p)
> +{
> +  struct babel_iface *bif;
> +  struct babel_neighbor *bn, *bnx;
> +  WALK_LIST(bif, p->interfaces) {
> +    WALK_LIST_DELSAFE(bn, bnx, bif->neigh_list) {
> +      if(bn->hello_expiry && bn->hello_expiry <= now)
> +        expire_hello(bn);
> +      if(bn->ihu_expiry && bn->ihu_expiry <= now)
> +        expire_ihu(bn);
> +    }

 expire_hello() -> babel_flush_neighbor() -> neighbor is freed
 but on the next line it is accessed.

> +static u16
> +babel_compute_rxcost(struct babel_neighbor *bn)
> +{
> +  struct babel_iface *bif = bn->bif;
> +  struct babel_proto *p = bif->proto;
> +  u8 n, missed;
> +  u16 map=bn->hello_map;
> +
> +  if(!map) return BABEL_INFINITY;
> +  n = __builtin_popcount(map); // number of bits set

This should be defined as our inline function in lib/bitopts.h, then used here.

> +  missed = bn->hello_n-n;
> +
> +  if(bif->cf->type == BABEL_IFACE_TYPE_WIRED) {
> +    /* k-out-of-j selection - Appendix 2.1 in the RFC. */
> +    DBG("Missed %d hellos from %I\n", missed, bn->addr);
> +    /* Link is bad if more than half the expected hellos were lost */
> +    return (missed > 0 && n/missed < 2) ? BABEL_INFINITY : bif->cf->rxcost;

Shouldn't there be bn->hello_n instead of n?

Also the next variant (with bn->hello_n instead of n) is faster and simpler:

  return (missed > n/2) ? BABEL_INFINITY : bif->cf->rxcost;

> +/* Simple additive metric - Appendix 3.1 in the RFC */
> +static u16
> +compute_metric(struct babel_neighbor *bn, u16 metric)
> +{
> +  u16 cost = compute_cost(bn);
> +  return (cost == BABEL_INFINITY) ? cost : cost+metric;
> +}

Well, you could have both metric and cost < BABEL_INFINITY, but
metric+cost > BABEL_INFINITY. Just use:

static u16
compute_metric(struct babel_neighbor *bn, uint metric)
{
  metric += compute_cost(bn);
  return MIN(metric, BABEL_INFINITY);
}

> +static rte *
> +babel_build_rte(struct babel_proto *p, net *n, struct babel_route *r)
> +{
> +  rta *a;
> +  rte *rte;
> +
> +  rta A = {
> +    .src = p->p.main_source,
> +    .source = RTS_BABEL,
> +    .scope = SCOPE_UNIVERSE,
> +    .cast = RTC_UNICAST,
> +    .dest = r->metric == BABEL_INFINITY ? RTD_UNREACHABLE : RTD_ROUTER,
> +    .flags = 0,
> +    .gw = r->next_hop,

Keep .gw zeroed for RTD_UNREACHABLE.

> +  };
> +
> +  if(r->neigh) {
> +    A.from = r->neigh->addr;
> +    A.iface = r->neigh->bif->iface;
> +  }

When r->neigh is NULL? If 'r' is external route (one exported to Babel from
other protocols), then it should not be propagated back to core. See the first
comment in this mail.

> +  a = rta_lookup(&A);
> +  rte = rte_get_temp(a);
> +  rte->u.babel.metric = r->metric;
> +  rte->u.babel.router_id = r->router_id;
> +  rte->net = n;
> +  rte->pflags = 0;
> +  return rte;
> +}

I would suggest to integrate rte_update() here and name the function babel_announce_rte().

> +
> +static void
> +babel_send_seqno_request(struct babel_entry *e)
> +{
> +  struct babel_proto *p = e->proto;
> +  struct babel_route *r = e->selected;
> +  struct babel_source *s = babel_find_source(e, r->router_id);
> +  struct babel_iface *bif;
> +  struct babel_tlv_seqno_request *tlv;
> +
> +  if(s && cache_seqno_request(p, e->n.prefix, e->n.pxlen, r->router_id, s->seqno+1)) {
> +    TRACE(D_EVENTS, "Sending seqno request for %I/%d router_id %0lx",

"%0lx" is incorrect, as long is 32-bit on 32-bit machines, while router_id is u64.
Perhaps bsprintf have to be extended. Or perhabs we just should have function to
print 64-bit router id to local string buffer using some string representation.

I don't know what is the preffered string representation for this 64-bit router
ID in Babel. My suggestion is to use a half of IPv6 (e.g. 1aa9:5ff:fe1a:61e3).

> +static void
> +babel_unicast_seqno_request(struct babel_route *r)
> +{
> +  struct babel_entry *e = r->e;
> +  struct babel_proto *p = e->proto;
> +  struct babel_source *s = babel_find_source(e, r->router_id);
> +  struct babel_iface *bif = r->neigh->bif;
> +  struct babel_tlv_seqno_request *tlv;
> +  if(s && cache_seqno_request(p, e->n.prefix, e->n.pxlen, r->router_id, s->seqno+1)) {
> +    TRACE(D_EVENTS, "Sending seqno request for %I/%d router_id %0lx",

Ditto. There are more such cases in the sources.

> +/**
> + * babel_select_route:
> + * @e: Babel entry to select the best route for.
> + *
> + * Select the best feasible route for a given prefix. This just selects the
> + * feasible route with the lowest metric. If this results in switching upstream
> + * router (identified by router id), the nest is notified of the new route.
> + *
> + * If no feasible route is available for a prefix that previously had a route
> + * selected, a seqno request is sent to try to get a valid route. In the
> + * meantime, the route is marked as infeasible in the nest (to blackhole packets
> + * going to it, as per the RFC).
> + *
> + * If no feasible route is available, and no previous route is selected, the
> + * route is removed from the nest entirely.
> + */
> +static void
> +babel_select_route(struct babel_entry *e)
> +{
> +  struct babel_proto *p = e->proto;
> +  net *n = net_get(p->p.table, e->n.prefix, e->n.pxlen);
> +  struct babel_route *r, *cur = e->selected;
> +
> +  /* try to find the best feasible route */
> +  WALK_LIST(r, e->routes)
> +    if((!cur || r->metric < cur->metric)
> +       && is_feasible(babel_find_source(e, r->router_id),
> +		      r->seqno, r->advert_metric))
> +      cur = r;

Why feasibility is checked here? I would expect that feasibility is checked when
the route is received, but all accepted to e->routes stays feasible.

Also we should not process external routes here (see the first comment of the mail).

> +  if(cur && cur->neigh && ((!e->selected && cur->metric < BABEL_INFINITY)
> +			   || (e->selected && cur->metric < e->selected->metric))) {
> +      TRACE(D_EVENTS, "Picked new route for prefix %I/%d: router id %0lx metric %d",
> +	    e->n.prefix, e->n.pxlen, cur->router_id, cur->metric);
> +      /* Notify the nest of the update. If we change router ID, we also trigger
> +	 a global update. */
> +      if(!e->selected ||
> +         e->selected->metric == BABEL_INFINITY ||
> +         e->selected->router_id != cur->router_id)
> +
> +	ev_schedule(p->update_event);

Generally, we should not triger update event here, that should be triggered as a result
of babel_rt_notify (see the first comment of the mail).

> +static void
> +babel_send_ack(struct babel_iface *bif, ip_addr dest, u16 nonce)
> +{
> +  struct babel_proto *p = bif->proto;
> +  struct babel_tlv_ack *tlv;
> +  TRACE(D_PACKETS, "Babel: Sending ACK to %I with nonce %d\n", dest, nonce);

Starting with "Babel: " is unncesessary in TRACE(). There are more of these in
the sources.

> +static void
> +babel_flush_neighbor(struct babel_neighbor *bn)
> +{
> +  struct babel_proto *p = bn->bif->proto;
> +  struct babel_route *r;
> +  node *n;
> +  TRACE(D_EVENTS, "Flushing neighbor %I", bn->addr);
> +  rem_node(NODE bn);
> +  WALK_LIST_FIRST(n, bn->routes) {
> +    r = SKIP_BACK(struct babel_route, neigh_route, n);
> +    babel_flush_route(r);
> +  }
> +  mb_free(bn);
> +}

Perhaps this should be together with other babel_*_neighbor() functions?

> +/* update hello history according to Appendix A1 of the RFC */
> +static void
> +update_hello_history(struct babel_neighbor *bn, u16 seqno, u16 interval)
> +{
> +  u8 diff;
> +  if(seqno == bn->next_hello_seqno) {/* do nothing */}
> +  /* if the expected and seen seqnos are within 16 of each other (mod 65535),
> +     the modular difference is going to be less than 16 for one of the
> +     directions. Otherwise, the values differ too much, so just reset. */
> +  else if(diff_mod64k(seqno, bn->next_hello_seqno) > 16 &&
> +     diff_mod64k(bn->next_hello_seqno,seqno) > 16) {
> +    /* note state reset - flush entries */
> +    bn->hello_map = bn->hello_n = 0;
> +  } else if((diff = diff_mod64k(bn->next_hello_seqno,seqno)) <= 16) {
> +    /* sending node increased interval; reverse history */
> +    bn->hello_map >>= diff;
> +    bn->hello_n = (diff < bn->hello_n) ? bn->hello_n - diff : 0;
> +  } else if((diff = diff_mod64k(seqno,bn->next_hello_seqno)) <= 16) {
> +    /* sending node decreased interval; fast-forward */
> +    bn->hello_map <<= diff;
> +    bn->hello_n = MIN(bn->hello_n + diff, 16);
> +  }

This is a mess. What about just:

  u16 delta = diff_mod64k(seqno,bn->next_hello_seqno);

  if (delta == 0)
  { /* do nothing */ }
  else if (delta <= 16)
  {
    /* sending node decreased interval; fast-forward */
  }
  else if (delta >= 0xfff0)
  {
    /* sending node increased interval; reverse history */
  }
  else
  {
    /* note state reset - flush entries */
  }

> +int
> +babel_handle_ihu(struct babel_tlv_header *hdr, struct babel_parse_state *state)
> +{
> ...
> +  struct babel_neighbor *bn = babel_get_neighbor(bif, state->saddr);
> +  bn->txcost = tlv->rxcost;
> +  bn->ihu_expiry = now + 1.5*(tlv->interval/100);

Please avoid floating point ops.

> +int
> +babel_handle_update(struct babel_tlv_header *hdr, struct babel_parse_state *state)
> +{
> ...
> +  if(tlv->flags & BABEL_FLAG_ROUTER_ID) {
> +    u64 *buf = (u64*)&prefix;
> +    memcpy(&state->router_id, buf+1, sizeof(u64));

I don't think is correct. BIRD stores ip6_addr as four u32 integers in host
order, also router_id is u64 in host order, therefore such memcpy will swap
meaning of lower and upper half of router ID on little endian machines.

This is more clean and comprehensible:

state->router_id = (((u64) _I2(prefix)) << 32) | _I3(prefix);

I wonder what this flag is supposed to mean when AE is 1, RFC does not mention
this.

> +  }
> +  if(!state->router_id)
> +    log(L_WARN "%s: Received update on %s with no preceding router id", p->p.name, bif->ifname);

BTW, Babel RFC does not explictly specify that zero Router ID is invalid. But
that is likely an ommision in the RFC.

> +static void
> +expire_seqno_requests(struct babel_seqno_request_cache *c) {
> +  struct babel_seqno_request *n, *nx;
> +  WALK_LIST_DELSAFE(n, nx, c->entries) {
> +    if(n->updated < now-BABEL_SEQNO_REQUEST_EXPIRY) {

Perhaps better would be (n->updated + BABEL_SEQNO_REQUEST_EXPIRY < now).
now is not real time, but monotonic timer, so it can be very small. Although
bird_clock_t is likely to be signed, it is safer to stay in positive range.

Also perhaps <= instead of < ?

> +/* The RFC section 3.8.1.2 on seqno requests:
> +
> +   When a node receives a seqno request for a given router-id and sequence
> +   number, it checks whether its routing table contains a selected entry for
> +   that prefix; if no such entry exists, or the entry has infinite metric, it
> +   ignores the request.

There is no need to paste 5 paragraphs from RFC, just reference the section.

> +static void
> +babel_dump(struct proto *P)
> +{
> +  struct babel_proto *p = (struct babel_proto *) P;
> +  struct babel_entry *e;
> +  struct babel_iface *bif;
> +  debug("Babel: router id %0lx update seqno %d\n", p->router_id, p->update_seqno);
> +  WALK_LIST(bif, p->interfaces) {babel_dump_interface(bif);}
> +  FIB_WALK(&p->rtable, n) {
> +    e = (struct babel_entry *)n;
> +    babel_dump_entry(e);
> +  } FIB_WALK_END;
> +}

Although dump functions may be useful, i would suggest also to implement regular
'show babel interfaces' and 'show babel neighbors' user commands. See
rip_show_interfaces(), rip_show_neighbors() in the new RIP code.

> +static void
> +kill_iface(struct babel_iface *bif)
> +{
> +  DBG( "Babel: Interface %s disappeared\n", bif->iface->name);
> +  struct babel_neighbor *bn;
> +  WALK_LIST_FIRST(bn, bif->neigh_list)
> +    babel_flush_neighbor(bn);
> +  rfree(bif->pool);
> +}

It would make sense to integrate rem_node() from babel_if_notify() (explicit
free of lock is unnecessary, that is handled by freeing of pool), and perhaps
rename the function to babel_remove_iface().

> +static void
> +babel_iface_linkdown(struct babel_iface *bif)
> +{
> +  struct babel_neighbor *bn;
> +  struct babel_route *r;
> +  node *n;
> +  WALK_LIST(bn, bif->neigh_list) {
> +    WALK_LIST(n, bn->routes) {
> +      r = SKIP_BACK(struct babel_route, neigh_route, n);
> +      r->metric = BABEL_INFINITY;
> +      r->expires = now + r->expiry_interval;
> +      babel_select_route(r->e);
> +    }
> +  }
> +}

Is there any reason why route handing is different in this case and in
kill_iface() case?

> +static void
> +babel_if_notify(struct proto *P, unsigned c, struct iface *iface)
> +{
> +  struct babel_proto *p = (struct babel_proto *) P;
> +  struct babel_config *cf = (struct babel_config *) P->cf;
> +  DBG("Babel: if notify: %s flags %x\n", iface->name, iface->flags);
> +  if (iface->flags & IF_IGNORE)
> +    return;
> +  if (c & IF_CHANGE_UP) {
> +    struct iface_patt *k = iface_patt_find(&cf->iface_list, iface, iface->addr);
> +
> +    /* we only speak multicast */
> +    if(!(iface->flags & IF_MULTICAST)) return;
> +
> +    if (!k) return; /* We are not interested in this interface */
> +
> +    babel_new_interface(p, iface, iface->flags, k);
> +
> +  }
> +  struct babel_iface *bif = babel_find_interface(p, iface);
> +
> +  if(!bif)
> +    return;
> +
> +  if(!(iface->flags & IF_CHANGE_LINK)) {
> +    TRACE(D_EVENTS, "Interface %s lost link", iface->name);
> +    babel_iface_linkdown(bif);
> +  }

The iface->flags contain IF_LINK_UP, while the 'c' variable contain IF_CHANGE_LINK
it should be:

 if ((c & IF_CHANGE_LINK) && !(iface->flags & IF_LINK_UP))
  ...

I would suggest to see the new rip_if_notify() for an example.

> +static void
> +babel_store_tmp_attrs(struct rte *rt, struct ea_list *attrs)
> +{
> +  eattr *rid = ea_find(attrs, EA_BABEL_ROUTER_ID);
> +  rt->u.babel.router_id = rid ? *((u64*) rid->u.ptr->data) : 0;

This may be an unaligned access. I would suggest to ignore the router_id here,
and just store the metric. It is not a problem as EA_BABEL_ROUTER_ID is
unmodifiable due to being EAF_TYPE_OPAQUE.

See krt_store_tmp_attrs() in sysdep/unix/krt.c for similar case.

> +/*
> + * babel_rt_notify - core tells us about new route (possibly our
> + * own), so store it into our data structures.
> + */
> +static void
> +babel_rt_notify(struct proto *P, struct rtable *table UNUSED, struct network *net,
> +		struct rte *new, struct rte *old, struct ea_list *attrs)
> +{
> +  struct babel_proto *p = (struct babel_proto *)P;
> +  struct babel_entry *e;
> +  struct babel_route *r;
> +
> +  TRACE(D_EVENTS, "Got route from nest: %I/%d", net->n.prefix, net->n.pxlen);
> +  if(new) {
> +    e = babel_get_entry(p, net->n.prefix, net->n.pxlen);
> +    r = (e->selected) ? e->selected : babel_get_route(e, NULL);
> +
> +    if(!r->neigh) {
> +      r->seqno = p->update_seqno;
> +      r->router_id = p->router_id;
> +      r->metric = 0;
> +      e->selected = r;
> +    }
> +  } else if(old) {
> +    /* route has gone away; send retraction */
> +    e = babel_find_entry(p, net->n.prefix, net->n.pxlen);
> +    if(e && e->selected && !e->selected->neigh) {
> +      /* no neighbour, so our route */
> +      e->selected->metric = BABEL_INFINITY;
> +      e->selected->expires = now + BABEL_HOLD_TIME;
> +      babel_select_route(e);
> +    }
> +  } else {
> +    return;
> +  }
> +  ev_schedule(p->update_event);
> +}

> +static void babel_neigh_notify(neighbor *n)

You could just remove this function.

> +void
> +babel_init_config(struct babel_config *c)
> +{
> +  init_list(&c->iface_list);
> +  c->port	= BABEL_PORT;
> +}

Just move that to config.Y, see config.Y of other protocols (e.g. ospf_proto_start).

> +
> +static void
> +babel_get_route_info(rte *rte, byte *buf, ea_list *attrs)
> +{
> +  buf += bsprintf(buf, " (%d/%0lx)", rte->u.babel.metric, rte->u.babel.router_id);
> +}

First number in the output should be rte->pref. To be consistent with OSPF,
router id should be separate, see ospf_get_route_info(): "(%d/%d) [%x]",
rte->pref, rte->u.babel.metric, rte->u.babel.router_id.

OTOH, i would suggest to just ignore router_id here completely, as available
line length is limited.

> +
> +static int
> +babel_get_attr(eattr *a, byte *buf, int buflen UNUSED)
> +{
> +  switch (a->id) {
> +  case EA_BABEL_METRIC: bsprintf( buf, "metric: %d", a->u.data ); return GA_FULL;
> +  default: return GA_UNKNOWN;
> +  }
> +}

As opposed to babel_store_tmp_attrs(), it makes sense to handle
EA_BABEL_ROUTER_ID here.

> +static void
> +babel_copy_config(struct proto_config *dest, struct proto_config *src)

You could just remove this function.

> diff --git a/proto/babel/config.Y b/proto/babel/config.Y
> ...
> +babel_cfg:
> +   babel_cfg_start proto_name '{'
> + | babel_cfg proto_item ';'
> + | babel_cfg PORT expr ';'	{ BABEL_CFG->port = $3; }

Port should be per-interface option.

> + | babel_cfg INTERFACE babel_iface ';'
> + ;
> +
> +
> +babel_iface_item:
> + | RXCOST expr { BABEL_IFACE_CFG->rxcost = $2; }
> + | TX tos { BABEL_IFACE_CFG->tx_tos = $2; }
> + | TX PRIORITY expr { BABEL_IFACE_CFG->tx_priority = $3; }
> + | TYPE WIRED { BABEL_IFACE_CFG->type = BABEL_IFACE_TYPE_WIRED; }
> + | TYPE WIRELESS { BABEL_IFACE_CFG->type = BABEL_IFACE_TYPE_WIRELESS; }
> + | HELLO_INTERVAL expr { BABEL_IFACE_CFG->hello_interval = $2; }
> + | UPDATE_INTERVAL expr { BABEL_IFACE_CFG->update_interval = $2; }

Use HELLO INTERVAL instead of HELLO_INTERVAL, or perhaps just HELLO TIME.

> +babel_iface_init:
> +   /* EMPTY */ {
> +     this_ipatt = cfg_allocz(sizeof(struct babel_iface_config));
> +     add_tail(&BABEL_CFG->iface_list, NODE this_ipatt);
> +     init_list(&this_ipatt->ipn_list);
> +     BABEL_IFACE_CFG->rxcost = BABEL_INFINITY;
> +     BABEL_IFACE_CFG->type = BABEL_IFACE_TYPE_WIRED;
> +     BABEL_IFACE_CFG->tx_tos = IP_PREC_INTERNET_CONTROL;
> +     BABEL_IFACE_CFG->tx_priority = sk_priority_control;
> +     BABEL_IFACE_CFG->hello_interval = BABEL_INFINITY;
> +     BABEL_IFACE_CFG->update_interval = BABEL_INFINITY;

Just use zero for undefined (i.e. just not set them) instead of BABEL_INFINIT.

> diff --git a/proto/babel/packet.c b/proto/babel/packet.c
> new file mode 100644
> index 0000000..54f256b
> --- /dev/null
> +++ b/proto/babel/packet.c
>
> +#undef TRACE
> +#define TRACE(level, msg, args...) do { if (p->p.debug & level) { log(L_TRACE "%s: " msg, p->p.name , ## args); } } while(0)

Unnecessary. Just use TRACE() defined in nest/protocol.h

> +#define TLV_SIZE(t) (t->type == BABEL_TYPE_PAD0 ? 1 : t->length + sizeof(struct babel_tlv_header))
> +#define TLV_LENGTH(t) (tlv_data[t].struct_length-sizeof(struct babel_tlv_header))

TLV_SIZE vs TLV_LENGTH is confusing, perhaps TLV_LENGTH vs TLV_DATA_LENGTH ?

> +static inline ip_addr
> +get_ip6_ll(u32 *addr)
> +{
> +    return ip6_or(ipa_build6(0xfe800000,0,0,0),
> +		  ipa_build6(0,0,ntohl(addr[0]),ntohl(addr[1])));
> +}

You should use ipa_from_ip6(). Also why ip6_or()? Just use:
ipa_build6(0xfe800000,0,ntohl(addr[0]),ntohl(addr[1])).

> +static struct babel_tlv_data tlv_data[BABEL_TYPE_MAX] = {
> +  {1, NULL,NULL,NULL,NULL,NULL},
> +  {3, NULL,NULL,NULL,NULL,NULL},
> +  {sizeof(struct babel_tlv_ack_req),
> +   babel_handle_ack_req, babel_validate_length,
> +   babel_hton_ack_req, babel_ntoh_ack_req,
> +   NULL,NULL},

Please use:  [BABEL_*] = {xxx, yyy, ...},

> +ip_addr
> +babel_get_addr_ihu(struct babel_tlv_header *hdr, struct babel_parse_state *state)
> +{
> +  struct babel_tlv_ihu *tlv = (struct babel_tlv_ihu *)hdr;
> +  struct babel_iface *bif = state->bif;
> +  if(tlv->ae == BABEL_AE_WILDCARD) {
> +    return bif->iface->addr->ip; /* FIXME: Correct? */

Well, you check it against bif->addr in babel_handle_ihu() so you should use that.

> +int
> +babel_validate_update(struct babel_tlv_header *hdr, struct babel_parse_state *state)
> +{
> +  struct babel_tlv_update *tlv = (struct babel_tlv_update *)hdr;
> +  int min_length = TLV_LENGTH(BABEL_TYPE_UPDATE)-sizeof(tlv->addr);
> +  u8 len = tlv->plen/8;
> +  if(tlv->plen % 8) len++;

u8 len = (tlv->plen + 7) / 8;

> +  /* Can only omit bits if a previous update defined a prefix to take them from */
> +  if(tlv->omitted && ipa_equal(state->prefix, IPA_NONE))
> +    return 0;

According to RFC:

   The next-hop address for this update is taken from the last preceding
   Next Hop TLV with a matching address family in the same packet; if no
   such TLV exists, it is taken from the network-layer source address of
   this packet.

So this case is not necessary.

> +ip_addr
> +babel_get_addr_update(struct babel_tlv_header *hdr, struct babel_parse_state *state)
> +{
> +  struct babel_tlv_update *tlv = (struct babel_tlv_update *)hdr;
> +  char buf[16] = {0};

Just: char buf[16] = {};

> +void
> +babel_hton_seqno_request(struct babel_tlv_header *hdr)
> +{
> +  struct babel_tlv_seqno_request *tlv = (struct babel_tlv_seqno_request *)hdr;
> +  tlv->seqno = htons(tlv->seqno);
> +  tlv->router_id = htobe64(tlv->router_id);
> +}

There are some issues with portability of names and including appropriate header
for htobe64() / be64toh(). Perhaps this should be handled somewhere in lib/.

> +void
> +babel_ntoh_seqno_request(struct babel_tlv_header *hdr)
> +{
> +  struct babel_tlv_seqno_request *tlv = (struct babel_tlv_seqno_request *)hdr;
> +  tlv->seqno = ntohs(tlv->seqno);
> +  tlv->router_id = be64toh(tlv->router_id);
> +}

Ditto.

> +struct babel_tlv_header *
> +babel_add_tlv_size(struct babel_iface *bif, u16 type, int len)
> +{
> +  struct babel_header *hdr = bif->current_buf;
> +  struct babel_tlv_header *tlv;
> +  int pktlen = sizeof(struct babel_header)+hdr->length;
> +  if(pktlen+len > bif->max_pkt_len) {
> +    babel_send_queue(bif);
> +    pktlen = sizeof(struct babel_header)+hdr->length;
> +  }

This seems broken - TLV is added to current_buf, which may be either tlv_buf
or just sock->tbuf based on multicast vs unicast. But if there si not enough
space, then babel_send_queue() is called, which assumes that data are in
tlv_buf, and copies it to sock->tbuf.

> +int
> +babel_process_packet(struct babel_header *pkt, int size,
> +                     ip_addr saddr, int port, struct babel_iface *bif)
> +{
> +  struct babel_tlv_header *tlv = FIRST_TLV(pkt);
> +  struct babel_proto *proto = bif->proto;
> +  struct babel_parse_state state = {
> +    .proto	  = proto,
> +    .bif	  = bif,
> +    .saddr	  = saddr,
> +    .prefix	  = IPA_NONE,

It is unnecessary to initialize addresses to IPA_NONE.

> +    .next_hop	  = saddr,
> +  };
> +  char *p = (char *)pkt;

p should be always struct babel_proto ptr. At least for TRACE().

> +  int res = 0;
> +
> +  pkt->length = ntohs(pkt->length);
> +  if(pkt->magic != BABEL_MAGIC
> +     || pkt->version != BABEL_VERSION
> +     || pkt->length > size - sizeof(struct babel_header)) {
> +    DBG("Invalid packet: magic %d version %d length %d size %d\n",
> +	pkt->magic, pkt->version, pkt->length, size);
> +    return 1;
> +  }
> +
> +  while((char *)tlv < p+size) {
> +    if(tlv->type > BABEL_TYPE_PADN
> +       && tlv->type < BABEL_TYPE_MAX
> +       && validate_tlv(tlv, &state)) {
> +      babel_tlv_ntoh(tlv);
> +      res &= tlv_data[tlv->type].handle(tlv, &state);
> +    } else {
> +      DBG("Unknown or invalid TLV of type %d\n",tlv->type);
> +    }
> +    NEXT_TLV(tlv);
> +  }

First, the validation IMHO does not check if the current TLV does not overlap
available packet length.

Second, we should use pkt->length instead of size for validation and processing.

Framing errors, like pkt->length larger than size (- hdr), or TLV overlapping
pkt->length should be logged as error.

IMHO, there should be first iteration doing just validation and if any TLV
failed then whole packet should be dropped (and perhaps logged). Or at least
remaining TLVs.  Otherwise, e.g. if one Router-ID or Next hop TLV is invalid,
then remaining TLVs would be processed with a different router id / next hop.

The IPv4 (and unknown) AE should not fail validation, instead they should be
ignored in handling code.

There should be some general babel_validate_address(), which would be shared
by other validate functions to check address based on ae and remaining length.

Also, the return value (res) is meaningless?

> +  if(state.needs_update)
> +    bif->update_triggered = 1;
> +  return res;
> +}
> +
> +
> +static void
> +babel_tx_err( sock *s, int err )
> +{
> +  log( L_ERR ": Unexpected error at Babel transmit: %M", err );
> +}

It would make sense to report proto and iface:

static void
rip_err_hook(sock *sk, int err)
{
  struct rip_iface *ifa = sk->data;
  struct rip_proto *p = ifa->rip;

  log(L_ERR "%s: Socket error on %s: %M", p->p.name, ifa->iface->name, err);

  rip_reset_tx_session(p, ifa);
}

> +static int
> +babel_rx(sock *s, int size)
> +{
> +  struct babel_iface *bif = s->data;
> +  struct babel_proto *p = bif->proto;
> +  if (! bif->iface || s->lifindex != bif->iface->index)
> +    return 1;
> +
> +  DBG("Babel: incoming packet: %d bytes from %I via %s\n", size, s->faddr, bif->iface->name);
> +  if (size < sizeof(struct babel_header)) BAD( "Too small packet" );
> +
> +  if (ipa_equal(bif->iface->addr->ip, s->faddr)) {
> +    DBG("My own packet\n");
> +    return 1;
> +  }
> +
> +  if (!ipa_is_link_local(s->faddr)) { BAD("Non-link local sender"); }
> +
> +  babel_process_packet((struct babel_header *) s->rbuf, size, s->faddr, s->fport, bif );
> +  return 1;
> +}

At least fport should also be checked. I would suggest to upgrade babel_rx()
based on rip_rx_hook() code from the new RIP.

> +int
> +babel_open_socket(struct babel_iface *bif)
> +{
> +  struct babel_proto *p = bif->proto;
> +  struct babel_config *cf = (struct babel_config *) p->p.cf;
> +  bif->sock = sk_new( bif->pool );
> +  bif->sock->type = SK_UDP;
> +  bif->sock->sport = cf->port;
> +  bif->sock->rx_hook = babel_rx;
> +  bif->sock->data =  bif;

> +  bif->sock->rbsize = 10240;
> +  bif->sock->iface = bif->iface;
> +  bif->sock->tbuf = mb_alloc( bif->pool, bif->iface->mtu);

Why not just use:

bif->sock->rbsize = MIN(512, bif->iface->mtu);
bif->sock->tbsize = bif->sock->rbsize;

This ensures rbuf, tbuf allocation. Also note that buffer sizes should be
updated in case of MTU is changed, handle IF_CHANGE_MTU in babel_if_notify().

> +  bif->sock->err_hook = babel_tx_err;
> +  bif->sock->dport = cf->port;
> +  bif->sock->daddr = IP6_BABEL_ROUTERS;
> ...
> +
> +  return 1;
> +
> + err:
> +  sk_log_error(bif->sock, p->p.name);
> +  log(L_ERR "%s: Cannot open socket for %s", p->p.name,  bif->iface->name);

sk_log_error() already generated the proper log message.

> +  rfree(bif->sock);

bif->sock should be set to NULL, to be sure. Or perhaps you can use a local
variable for socket and assign to bif->sock at the end, like other protocols do.

-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santiago at crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20151008/cbd67743/attachment.asc>