Ondrej Zajicek <santiago@crfreenet.org> writes:
I am sorry i didn't got around to review it sooner, but doing a proper review for a whole new (and unfamiliar) protocol takes nontrivial amount of time.
Generally, the code looks good and i am determined to finally merge it, but it definitely needs more refinement before that.
Hi Ondrej Awesome! Many thanks for your detailed comments, and thank you for going easy on my (lack of) C skills :) I'll go through and do another revision and resubmit. Some answers / additional questions inline below.
First, there is an issue with route selection. Proper route selection in BIRD should work this way:
1) Babel chooses the best route from received routes, say selected_in 2) If selected_in changes, it is propagated to the core by rte_update() 3) Core compares routes from multiple protocols and one is exported to the Babel 4) This route is either Babel own route or external route 5) In babel_rt_notify(), Babel receives that route, finds appropriate internal route (or create a new one) and stores it in say selected_out. 6) Even if the received selected_out is an external route, Babel do not stop propagating selected_in to the core 7) selected_out is propagated to the network in periodic updates. If selected_out changes, it is an event for triggered updates.
Ah, thank you! I struggled a great deal with figuring out how this was supposed to work. I'll fix that, and also try to write this up in a form that is suitable for the documentation and include that in the next version of the patch. Speaking of documentation: What kind of documentation do you expect in the patch? Obviously I'll include user's documentation on the configuration options, but some of the other protocols also has API documentation of (some of) their function hooks, and it's not immediately obvious to me what the purpose of those are? Should I add that for Babel as well?
Second, can we really expect that TLVs in received packets are aligned? Although all TLVs looks like suited to 32-bit alignment, i did not found mentioned it in Babel RFC and availability of Pad1, PadN TLVs and per-byte compressible addresses suggest that they generally are not aligned.
In that case, we have to use get_uXX()/put_uXX() functions instead of directly accessing babel_tlv_* structure fields. Or we could copy each TLV to a local buffer before calling ntoh+handle callbacka, as each TLV internally is aligned, and enforce alignment for outgoing packets. I would generally prefer the first approach, but the second approach is simpler way to fix it.
No, I don't think they are necessarily 32-bit aligned. I struggled with alignment issues for the 64-bit values (and got it to work by using the __attribute__((packed)) attribute in the structure definitions). I'll try to figure out how to do this properly in general.
Third, although i am OK with accepting IPv6-only implementation, it should be ready for future expansion. TLVs with AE 1 could be skipped, but there should not be hidden hardwired IPv6 assumptions in the code.
Mainly, TLV structures should use addr[0] at the end instead of addr[2] and addr[4]. Address space requirements should be handled explictly based on AE.
Noted.
Fourth, it seems like triggered updates are not really implemented as they should. Route change triggers propagation, but the whole routing table is always propagated instead of just the changed entries. This could be easily fixed by having timestamp on each babel_entry updated everytime when a significant change of selected_out route happened. For regular updates all routes are propagated, for triggered updates only the entries with timestamps >= time when the trigger update was requested. See want_triggered and tx_changed fields in the new RIP code.
Hmm, I'm not sure just dumping everything in a triggered update is strictly a violation of the RFC. But I suppose it does have obvious impacts on efficiency. Will try to be smarter about that.
Fifth, ... (snip)
Sixth, ... (snip)
Seventh ... (snip)
Finally, the coding style ... (snip)
Will fix these as well.
Rest are comments specific to lines in the patch:
Most of these seems obvious how to fix; one or two questions below:
diff --git a/nest/route.h b/nest/route.h ... @@ -538,6 +546,7 @@ extern struct protocol *attr_class_to_protocol[EAP_MAX]; #define DEF_PREF_RIP 120 /* RIP */ #define DEF_PREF_BGP 100 /* BGP */ #define DEF_PREF_PIPE 70 /* Routes piped from other tables */ +#define DEF_PREF_BABEL 42 /* Babel */
Default could be higher, e.g. 130
OK. What do these actually mean? Is it simply that a higher priority wins if two protocols have the same route?
+enum babel_iface_type_t { + BABEL_IFACE_TYPE_WIRED, + BABEL_IFACE_TYPE_WIRELESS, + BABEL_IFACE_TYPE_MAX +};
I would prefer to have explicit values here. Perhaps also reserve 0 for undefined?
Well, undefined == wired in this case. I.e. use the simple metric computation mechanism unless explicitly told to treat the interface as wireless. I don't suppose Bird has a way of telling automatically what type of interface we are dealing with?
+static void +babel_expire_routes(struct babel_proto *p) +{ + struct babel_entry *e; + struct babel_route *r, *rx; + node *n; + FIB_WALK(&p->rtable, n) { + e = (struct babel_entry *)n; + WALK_LIST_DELSAFE(r, rx, e->routes) { + if(r->refresh_time && r->refresh_time <= now) { + refresh_route(r); + r->refresh_time = 0; + } + if(r->expires && r->expires <= now) + expire_route(r); + } + expire_sources(e); + } FIB_WALK_END; + WALK_LIST_FIRST(n, p->garbage) { + e = SKIP_BACK(struct babel_entry, garbage_node, n); + babel_flush_entry(e); + } +}
Instead of using p->garbage and garbage_node, you could simply use FIB_ITERATE instead of FIB_WALK and remove babel_entry inside the cycle:
I tried and failed to get this to work; the whole thing would just crash, which is why I came up with the garbage list approach. Probably did it wrong, though (wasn't obvious to me how to use the FIB_* macros), so will give it another shot; thanks for the pointer :)
+static u16 +babel_compute_rxcost(struct babel_neighbor *bn) +{ + struct babel_iface *bif = bn->bif; + struct babel_proto *p = bif->proto; + u8 n, missed; + u16 map=bn->hello_map; + + if(!map) return BABEL_INFINITY; + n = __builtin_popcount(map); // number of bits set
This should be defined as our inline function in lib/bitopts.h, then used here.
So just alias __builtin_popcount to something in bitopts.h? Not entirely sure what compilers you are targeting; that macro seems to be GCC-specific, but is also supported in LLVM since 1.5: http://llvm.org/releases/1.5/docs/ReleaseNotes.html
+int +babel_handle_ihu(struct babel_tlv_header *hdr, struct babel_parse_state *state) +{ ... + struct babel_neighbor *bn = babel_get_neighbor(bif, state->saddr); + bn->txcost = tlv->rxcost; + bn->ihu_expiry = now + 1.5*(tlv->interval/100);
Please avoid floating point ops.
So, substitute 3/2 for 1.5?
+int +babel_handle_update(struct babel_tlv_header *hdr, struct babel_parse_state *state) +{ ... + if(tlv->flags & BABEL_FLAG_ROUTER_ID) { + u64 *buf = (u64*)&prefix; + memcpy(&state->router_id, buf+1, sizeof(u64));
I don't think is correct. BIRD stores ip6_addr as four u32 integers in host order, also router_id is u64 in host order, therefore such memcpy will swap meaning of lower and upper half of router ID on little endian machines.
This is more clean and comprehensible:
state->router_id = (((u64) _I2(prefix)) << 32) | _I3(prefix);
I wonder what this flag is supposed to mean when AE is 1, RFC does not mention this.
Noted. You are right, that is not specified in the RFC. However, I think setting this flag with an AE of 1 should be considered an error, and will treat it as such.
+ } + if(!state->router_id) + log(L_WARN "%s: Received update on %s with no preceding router id", p->p.name, bif->ifname);
BTW, Babel RFC does not explictly specify that zero Router ID is invalid. But that is likely an ommision in the RFC.
Ah yes, that is true. Will try to get that clarified.
+static void +babel_iface_linkdown(struct babel_iface *bif) +{ + struct babel_neighbor *bn; + struct babel_route *r; + node *n; + WALK_LIST(bn, bif->neigh_list) { + WALK_LIST(n, bn->routes) { + r = SKIP_BACK(struct babel_route, neigh_route, n); + r->metric = BABEL_INFINITY; + r->expires = now + r->expiry_interval; + babel_select_route(r->e); + } + } +}
Is there any reason why route handing is different in this case and in kill_iface() case?
My assumption was that linkdown means that the interface is likely to come back up, so I wanted to keep the routes and be able to refresh them when it does (and keep the hello seqno etc so the other hosts on the link don't refresh their metrics completely if the interface goes down for a short time). This came about while playing around with a laptop that was connected by wifi and ethernet to the same network (and running babel on both), then repeatedly unplugging and re-plugging the ethernet cable.
+static void +babel_copy_config(struct proto_config *dest, struct proto_config *src)
You could just remove this function.
Yeah, well, was planning to actually make it do something useful instead (i.e. support proper reconfiguring).
+void +babel_hton_seqno_request(struct babel_tlv_header *hdr) +{ + struct babel_tlv_seqno_request *tlv = (struct babel_tlv_seqno_request *)hdr; + tlv->seqno = htons(tlv->seqno); + tlv->router_id = htobe64(tlv->router_id); +}
There are some issues with portability of names and including appropriate header for htobe64() / be64toh(). Perhaps this should be handled somewhere in lib/.
Not sure what you mean here? Are you referring to htobe64() being glibc-specific? An how do you suggest to handle it? :)
+struct babel_tlv_header * +babel_add_tlv_size(struct babel_iface *bif, u16 type, int len) +{ + struct babel_header *hdr = bif->current_buf; + struct babel_tlv_header *tlv; + int pktlen = sizeof(struct babel_header)+hdr->length; + if(pktlen+len > bif->max_pkt_len) { + babel_send_queue(bif); + pktlen = sizeof(struct babel_header)+hdr->length; + }
This seems broken - TLV is added to current_buf, which may be either tlv_buf or just sock->tbuf based on multicast vs unicast. But if there si not enough space, then babel_send_queue() is called, which assumes that data are in tlv_buf, and copies it to sock->tbuf.
Ah, was trying to be careful with this; obviously didn't work. Will give it another go.
+int +babel_open_socket(struct babel_iface *bif) +{ + struct babel_proto *p = bif->proto; + struct babel_config *cf = (struct babel_config *) p->p.cf; + bif->sock = sk_new( bif->pool ); + bif->sock->type = SK_UDP; + bif->sock->sport = cf->port; + bif->sock->rx_hook = babel_rx; + bif->sock->data = bif;
+ bif->sock->rbsize = 10240; + bif->sock->iface = bif->iface; + bif->sock->tbuf = mb_alloc( bif->pool, bif->iface->mtu);
Why not just use:
bif->sock->rbsize = MIN(512, bif->iface->mtu);
You mean MAX?
bif->sock->tbsize = bif->sock->rbsize;
This ensures rbuf, tbuf allocation. Also note that buffer sizes should be updated in case of MTU is changed, handle IF_CHANGE_MTU in babel_if_notify().
Noted. I'll get back to you with an updated patch, hopefully before too long :) -Toke