On Tue, Aug 18, 2015 at 11:06:03PM +0100, Toke Høiland-Jørgensen wrote:
The implementation is currently IPv6-only (since Bird does not support mixed IPv4/6 protocols in the same instance), but is otherwise complete as far as MUSTs in the RFC is concerned, with one exception: The RFC specifies that jitter must be applied to packet transmission times. This implementation currently does not buffer TLVs before sending them, but it does introduce some randomness to sending of global updates by timer and event scheduling randomness in the Bird core.
Hi, i am finally sending the first batch of comments. This time it is mostly general comments.
Some cleanup and possible reorganisation is needed in the code. Consider this patch an RFC for the general structure of the protocol addition.
The implementation started out as a copy of the RIP protocol implementation, so some patterns in how the communication with the core is setup is taken from there (but has diverted quite a bit as the different parts of the protocol implementation fell into place).
Using the old RIP protocol implementation as a basis is a problem - that code has several design problems, some of these seems to be shared by the Babel code. Not to mention its idiosyncratic coding conventions are far from ones used in the rest of BIRD. We are currently finishing rewrite of BIRD RIP code to eliminate these problems from RIP. I will send you the new RIP code for inspiration. Route propagation between protocol and core: Old RIP uses core tables to keep some of its route state and therefore has nontrivial interaction with core. This should be avoided - ideally core tables should be handled as a black box with rte_update() in one direction and rt_notify() (and some other auxiliary hooks) in the other. Another important question is how to handle and compare external routes relative to internal routes. For vector-distance (route-propagating) protocols it seems natural to handle external routes received from core like if they are from another neighbor. But this approach is problematic because internal tables and core tables will fight for the right to choose the best route. The proper approach is that the protocol keeps information about incoming routes, choose best of incoming routes, propagate that to core, then core chooses the the global best route, and propagate that back to protocols. The protocol learn that route (which may be either its or external), keep it as 'outgoing' and propagate it to neighbors. Also note that unreachable routing entries should not be propagated to core. Interface and neighbor events: You should create babel_interface structure during babel_if_notify() hook, not as a result from acquiring lock. The used approach is potentially racy (you could end with multiple locks if you get multiple up/down events before lock events). Note that the lock could be allocated from the iface resource pool. See ospf_iface_new()/ospf_iface_add() or radv_iface_new()/radv_iface_add(). I am bit worried about liveness of neighbors and ordering of hooks. Say an iface disappears. First you probably get babel_neigh_notify(), which does nothing, then you get babel_if_notify(), which will do kill_iface() and babel_flush_neighbor(), which unregisters data from the neighbor (bn->neigh->data = NULL), but such neighbor is likely to be already disposed. Some minor notes: Please add comments to each 'list' field in structures to keep which structures are members of that lists (see e.g. radv.h). Having timers for each route or entry is not necessary a good idea, perhaps this could be handled by keeping just timestamps in routes and having common 'timeout' hook, which would walk the fib table and process nodes - see rip_timer() or ospf_disp()/ospf_update_lsadb(). Also having resource pools for entries seems excessive. babel_route structures could be just allocated from a common slab. Not sure what is a point of struct neighbor_route. If you just want to keep babel_route in two linked list, just add two 'node' fields in it and use SKIP_BACK macro. See e.g. struct proto. Avoid unnecessary floating points, just use fixed point math instead. It is a good idea to have a ptr from babel_iface to babel_patt instead of duplicating all the values. Although you would have to change it during reconfiguration, that is the simplest part of reconf. And if you do not have reconfiguration implemented, then the whole protocol is restarted anyways. Do you have any kind of multihop unicast communication in Babel, or everything is single-hop to neighbors? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."