RFT: Babel RTT extension in Bird
Toke Høiland-Jørgensen
toke at toke.dk
Sat Apr 23 23:20:57 CEST 2022
Stefan Haller <stefan.haller at stha.de> writes:
> Hi Toke,
>
> On Fri, Apr 22, 2022 at 01:48:46AM +0200, Toke Høiland-Jørgensen wrote:
>> I've implemented the Babel RTT extension specified in
>> draft-ietf-babel-rtt-extension in Bird. I've tested that it talks to
>> babeld on a single link and that the two implementations agree on each
>> others' (smoothed) RTT values. However, I'd like to subject the code to
>> some more tortured testing before submitting it to upstream Bird. So I'm
>> sending this note as a request for testing.
>
> Nice work! I replaced the bird binary and changed the interface type to
> "tunnel" on a mesh of four hosts. Works great so far!
>
> Things I noticed:
Thanks for testing! This is just the kind of feedback I was looking for!
> (1) When I forgot to change the config file (one side was type tunnel,
> on side was type wired), the Babel neighbor metric was stuck on 65535. I
> think this happens because the expected time stamp was not received and
> then the metric computation does not work. While I understand that such
> a "broken" setup is not really supported, it was not exactly clear where
> to locate the problem.
No, this should definitely be supported! The reason it breaks is simply
that I misremembered the semantics of the subtlv parsing code, so it
ignored the whole IHU TLV instead of just the timestamp. Will fix!
> (2) Also I think it would be neat if "birdc show babel neigh" would show
> latency info (current latency + smoothed value).
Yup, definitely, will add!
> (3) Due to route flapping I tried to increase "metric decay" to 60s.
> After running "birdc configure" the values became very large for one
> link (on one side only).
Which values become very large? And is this persistent? What kind of
route flapping were you seeing?
>> bird: babel1: RTT sample for neighbour fe80::3 on wg2: 4294966323 us (srtt 99189.162 ms)
>> bird: babel1: Added RTT cost 96 to nbr fe80::3 on wg2 with srtt 99189.162 ms
>
> Nothing changed after >1h. The opposite side was reporting sensible RTT
> numbers. After I restarted the daemon, the smoothed value was still off
> for this one link:
>
>> bird: babel1: RTT sample for neighbour fe80::3 on wg2: 1241 us (srtt 69656.646 ms)
>> bird: babel1: Added RTT cost 96 to nbr fe80::3 on wg2 with srtt 69656.646 ms
>
> The srtt value did not converge after >1h. For all other links the
> smoothing works, e.g. for wg1 on the very same host:
>
>> bird: babel1: RTT sample for neighbour fe80::1 on wg0: 14570 us (srtt 15.876 ms)
>> bird: babel1: Added RTT cost 5 to nbr fe80::1 on wg0 with srtt 15.876 ms
>
> After restarting bird once more (without changing anything) it works
> since then:
>
>> bird: babel1: RTT sample for neighbour fe80::3 on wg2: 1261 us (srtt 1.313 ms)
>
> In this setup wg2 was a tunnel over the local LAN, so latency was often
> < 1000 us. Maybe there is a problem for tiny latencies and/or larger values
> of "metric decay"? I did not find a way to reliably reproduce the
> problem.
Hmm, so that initial value is '-973' as a 32-bit unsigned int. So looks
like there's an overflow in the RTT calculation. I'll add some sanity
checks to make sure this doesn't happen (as indeed babeld has as well).
I force-pushed an updated version to the same branch which fixes the
issues you noted above apart from the route flap one. Inspired by the
first issue noted above, I also added a separate config parameter to
control the *sending* of timestamps, which defaults to on.
-Toke
More information about the Bird-users
mailing list