Re: [Babel-users] Babel RTT settings for low and high latencies
Some of the VM's are in the same data center (latency < 1ms) and some are very far away (Germany <-> Hong Kong 300ms-400ms). Which rtt-min, rtt-max and rtt-cost/max-rtt-penalty would be suitable for such a network?
rtt-min should be chosen so that all links below rtt-min are local -- the idea is that below rtt-min, it's not worth optimising the path. The default value (10ms) should be fine in most networks. rtt-max should be larger than all good links in your network -- the idea is that above rtt-max, it's no longer worth optimising the path. In your case, you might want to increase the value to 350ms or so.
we are running re6st a mesh network based on babel. We have almost the same setup as you: around 400 machines connected worldwide. Our default options for babel are:
max-rtt-penalty 5000 rtt-max 500 rtt-decay 125
That's a little bit extreme, but since Nexedi's network is uncongested, Babel should be fairly stable even with such extreme parameters. -- Juliusz
On Wed, 17 Jul 2024 12:54:52 +0000 Thomas Gambier <thomas.gambier@nexedi.com> wrote:
max-rtt-penalty 5000 rtt-max 500 rtt-decay 125
Thank you for sharing your experience with me! On Wed, 17 Jul 2024 18:18:42 +0200 Juliusz Chroboczek <jch@irif.fr> wrote:
Some of the VM's are in the same data center (latency < 1ms) and some are very far away (Germany <-> Hong Kong 300ms-400ms). Which rtt-min, rtt-max and rtt-cost/max-rtt-penalty would be suitable for such a network?
rtt-min should be chosen so that all links below rtt-min are local -- the idea is that below rtt-min, it's not worth optimising the path. The default value (10ms) should be fine in most networks.
rtt-max should be larger than all good links in your network -- the idea is that above rtt-max, it's no longer worth optimising the path. In your case, you might want to increase the value to 350ms or so.
Thanks for the quick answers! What should I set rtt-cost/max-rtt-penalty to?
-- Juliusz
-- Marek Küthe m.k@mk16.de er/ihm he/him
Thanks for the quick answers!
What should I set rtt-cost/max-rtt-penalty to?
Max-rtt-penalty controls how much Babel will prefer routes with many links but low RTT to routes with few links but hight RTT. The default is fairly conservative, meaning that RTT has only a very small influence on routing. You should set it to 96*n where n is the maximum number of extra links that you're willing to take in order to avoid one high-RTT link. You should probably set it to 400 or so, and start tweaking from there. The value used by Nexedi (5000) looks quite extreme to me. As to rtt-decay, the default value should be fine unless you have a mobile network. Increase it in order to react faster to RTT variations, at the risk of route flaps every time the network has a hiccup. -- Juliusz
Thank you for the explanations. I now clearly understand the options better. This is probably a slightly more stupid question, but I'm not very familiar with metrics: A year ago you said [1] that bird had not yet implemented "hysteresis on metrics" and therefore `rtt-min 1 rtt-max 1001 max-rtt-penalty 1000` could be unfavorable or problematic. Has bird implemented this in the meantime? And what is the difference between hysteresis and `rtt-decoy`? [1] https://trubka.network.cz/pipermail/bird-users/2023-October/017216.html On Wed, 17 Jul 2024 20:41:52 +0200 Juliusz Chroboczek <jch@irif.fr> wrote:
Thanks for the quick answers!
What should I set rtt-cost/max-rtt-penalty to?
Max-rtt-penalty controls how much Babel will prefer routes with many links but low RTT to routes with few links but hight RTT. The default is fairly conservative, meaning that RTT has only a very small influence on routing. You should set it to 96*n where n is the maximum number of extra links that you're willing to take in order to avoid one high-RTT link.
You should probably set it to 400 or so, and start tweaking from there. The value used by Nexedi (5000) looks quite extreme to me.
As to rtt-decay, the default value should be fine unless you have a mobile network. Increase it in order to react faster to RTT variations, at the risk of route flaps every time the network has a hiccup.
-- Juliusz
-- Marek Küthe m.k@mk16.de er/ihm he/him
A year ago you said [1] that bird had not yet implemented "hysteresis on metrics" [...] Has bird implemented this in the meantime?
I haven't checked the BIRD implementation recently (I know, I should be doing more work on Babel, but I'm currently spending all my mana on fighting the French administration), you'll need to wait until Toke comes back from his Viking raid.
And what is the difference between hysteresis and `rtt-decoy`?
decay/min/max are specific mechanisms, that completely avoid route flaps in most (but not all) cases. Hysteresis is a fully general mechanism, that is effective at reducing the frequency of route flaps in all cases, but cannot completely avoid them. In practice, with well tuned min/max, route flaps don't happen, so hysteresis is spurious. When the decay/min/max algorithms fail, hysteresis kicks in, and limits the frequency of route flaps to one every few minuts. -- Juliusz
On Thu, Jul 18, 2024 at 12:38:20PM +0200, Juliusz Chroboczek wrote:
A year ago you said [1] that bird had not yet implemented "hysteresis on metrics" [...] Has bird implemented this in the meantime?
I haven't checked the BIRD implementation recently (I know, I should be doing more work on Babel, but I'm currently spending all my mana on fighting the French administration), you'll need to wait until Toke comes back from his Viking raid.
We do not have oscillation prevention measures in BIRD yet. I was doing work on proto/babel most recently and what's complicating that right now is BIRD 3 (thread-next branch) development disincentivising the BIRD team from applying potentially conflicting changes as both branches will need to be maintained going forward. Perhaps it's time for me to start releasing a babel-bird fork while that's going on so we can carry on development in the meantime? It seems I lost the motivation to work on it since nothing is getting merged/released which makes testing by the community significantly less likeley and doing it in a way the team *might* pick up patches (I discussed this with them privately) is too much work for me to overcome my activation energy requirements ;) --Daniel
I was doing work on proto/babel most recently and what's complicating that right now is BIRD 3 (thread-next branch) development disincentivising the BIRD team
I've answered this by private mail, lest anyone think that I'm ignoring Daniel. -- Juliusz
participants (3)
-
Daniel Gröber -
Juliusz Chroboczek -
Marek Küthe