On Sat, Apr 30, 2016 at 01:20:14AM +0200, Toke Høiland-Jørgensen wrote:
On 29 April 2016 20:24:43 CEST, Ondrej Zajicek <santiago@crfreenet.org> wrote:
I would like to thank Toke Høiland-Jørgensen for the Babel protocol implementation, which was finally merged despite my tardy code reviews.
Yay, awesome! You're very welcome, and congratulations on the release! :)
Congrats, Toke! I have just tested your implementation (from a pure user perspective, I haven't looked at the code), and it looks great! Thanks to you, I will have to switch my test overlay network to IPv6-only... I have found a few issues while playing with Bird 1.6.0. I also have some general comments, notably about the documentation, but that will wait for another email :) === Configuration === When the quotes are missing around the interface name, the error message is misleading. This configuration: protocol babel { interface eth0; } gives: /etc/bird6.conf, line 240: IP address expected which is very strange, because an IP address does not make sense here (and something like "interface 2001:db8::42" obviously gives an error). === Filtering === When updating a filter that removes some routes (e.g. switching from "export all" to "export none"), the routes stay in the Babel table, even though they are no longer exported to the protocol: bird> show babel entries babel1 babel1: Prefix Router ID Metric Seqno Expires Sources 2001:db8:42::/64 <pending> 1 bird> show route export babel1 bird> === Hello interval range checking === The hello interval cannot be set below 1 second: the parser seems to expect an integer. The packet format encodes intervals in centiseconds, so it would make sense to allow any fractional Hello interval down to 0.01 seconds. I just checked, babeld's parser allows to go as low as 0.001s (which is a bit scary in itself, since it generates about 1 kpps / 1.5 Mbps of control traffic). On a related note, there is no overflow check on the hello interval. For instance, setting the hello interval to 800 seconds produces this in tcpdump: Hello seqno 1 interval 144.64s 80000 centiseconds is too large for 16 bits, and overflows. babeld does check for this, it only allows values up to 655.35 seconds in the configuration. There is a similar issue with the IHU interval and route update interval, which are set to 3 (resp. 4) times the Hello interval. When setting the hello interval so that 3*hello or 4*hello overflows, for instance hello interval = 225 seconds, both IHU and route updates packets contain an overflowed interval: Update 2001:db8:42::/64 metric 0 seqno 1 interval 244.64s IHU fe80::e8db:78ff:fe05:8a64 txcost 96 interval 19.64s This time, babeld seems to have the same issue: when setting the Hello interval too high, the IHU and route intervals do overflow. Looking at tcpdump with babeld and hello-interval set to 600s: Update 172.23.184.205/32 metric 0 seqno 6295 interval 433.92s Hello seqno 42182 interval 600.00s sub-timestamp 2159.389437s IHU fe80::78db:e7ff:fe98:4cde txcost 65535 interval 489.28s === Interaction between reconfiguration and timers === Changing the hello interval and telling bird to reconfigure itself leads to a somewhat inconsistent behaviour: - when decreasing the hello interval, the old interval is still used for the next Hello message, but the new interval is immediately taken into account for IHU and Update messages. I don't think this is a real interoperability issue, but it's just strange to see a lot of Updates and IHU without any Hello (when decreasing the hello interval substantially, for instance from one minute to 3 seconds) Increasing the hello interval works as expected (the old interval is still used for the next Hello, IHU and Update messages). This looks correct and consistent with the RFC, section 3.4.1: the interval change is actually taken into account when sending the next Hello message, because otherwise our neighbours could think we are dead. === TLV parsing error === When a neighbouring babeld starts or stop, Bird complains about an error parsing a TLV. When starting babeld: avril 30 12:23:38 lud bird6[17307]: babel1: Bad TLV from fe80::e8db:78ff:fe05:8a64 via tap-fastd type 8 pos 18 - parse error avril 30 12:23:38 lud bird6[17307]: babel1: Bad TLV from fe80::e8db:78ff:fe05:8a64 via tap-fastd type 8 pos 18 - parse error When stopping babeld: avril 30 12:23:59 lud bird6[17307]: babel1: Bad TLV from fe80::e8db:78ff:fe05:8a64 via tap-fastd type 8 pos 4 - parse error avril 30 12:23:59 lud bird6[17307]: babel2: Bad TLV from fe80::e8db:78ff:fe05:8a64 via tap-fastd type 8 pos 4 - parse error Packet captures are attached (babeld-start-parse-error.pcap and babeld-stop-parse-error.pcap). The local node running Bird is fe80::c07d:62ff:fea6:b8f4, all other speakers are babeld 1.5.0. Type 8 is the Update TLV, I think Bird complains about this particular message (here seen by tcpdump): Update/id ::/0 metric 65535 seqno 21902 interval infinity These are all minor issues, your implementation globally seems to work fine :) Thanks, Baptiste